SIMD in WebAssembly →

11th Feb, 2020

WebAssembly SIMD (Single Instruction, Multiple Data) currently in development and soon will be added.

What is SIMD

SIMD is an interesting concept of parallel computing and can make huge performance gains in some cases. What essentially SIMD is instead of performing a single calculation on a single element of an array or vector on single instruction it can perform calculations on multiple or all elements (depending on type and size) in a single SIMD instruction.

Areas which will have major performance gains

Audio Processing
Video, Image Processing
Anything involving vector/tensor operations

It is commonly required in image processing to apply the same operation/calculation to all the pixels. Example might include changing hue of an image where it is needed to go through all the pixels and manipulate their rgb value.

Without SIMD

(Source: SIMD Wikipedia article)

With SIMD

(Source: SIMD Wikipedia article)

Similarly in machine learning and artificial intelligence vector & tensor product are needed to be calculated and SIMD can really fasten it up and improve performance by large factor.

Using SIMD with emscripten

SIMD can be enabled using -msimd128 flag when compiling using emscripten.

emcc -O3 -msimd128 in.c -o out.wasm

C function of example in above image:

void tripleArr(int *arr, int n) {
    for (int i=0; i<n; i++) {
        arr[i] *= 3;
    }
}

Compiled without SIMD

(func $1 (; 1 ;) (param $0 i32) (param $1 i32)
  (local $2 i32)
  (local $3 i32)
  (if
   (i32.gt_s
    (local.get $1)
    (i32.const 0)
   )
   (loop $label$2
    (i32.store
     (local.tee $3
      (i32.add
       (local.get $0)
       (i32.shl
        (local.get $2)
        (i32.const 2)
       )
      )
     )
     (i32.mul
      (i32.load
       (local.get $3)
      )
      (i32.const 3)
     )
    )
    (br_if $label$2
     (i32.ne
      (local.tee $2
       (i32.add
        (local.get $2)
        (i32.const 1)
       )
      )
      (local.get $1)
     )
    )
   )
  )
 )
)

Compiled with SIMD

   (loop $label$3
      (v128.store align=4
       (local.tee $4
        (i32.add
         (local.get $0)
         (i32.shl
          (local.get $3)
          (i32.const 2)
         )
        )
       )
       (i32x4.mul
        (v128.load align=4
         (local.get $4)
        )
        (i32x4.splat
         (i32.const 3)
        )
       )
    )

Wat in SIMD is omitted to show only main part i.e v128 align=4 instructions

WASM binary disassembled using Binaryen.

References & Interesting links

SIMD Wikipedia article
Oh the things you’ll compile - modern webassembly (chrome dev summit 2019)
Porting simd code targeting webassembly — Emscripten docs
Fast, parallel applications with WebAssembly simd — v8 official blog
Simd proposal for webassembly — Github