Pranjal Saxena
ProjectsBlogAbout Me

SIMD in WebAssembly

11th Feb, 2020

WebAssembly SIMD (Single Instruction, Multiple Data) currently in development and soon will be added.

What is SIMD

SIMD is an interesting concept of parallel computing and can make huge performance gains in some cases. What essentially SIMD is instead of performing a single calculation on a single element of an array or vector on single instruction it can perform calculations on multiple or all elements (depending on type and size) in a single SIMD instruction.

Areas which will have major performance gains

  • Audio Processing
  • Video, Image Processing
  • Anything involving vector/tensor operations

It is commonly required in image processing to apply the same operation/calculation to all the pixels. Example might include changing hue of an image where it is needed to go through all the pixels and manipulate their rgb value.

Without SIMD


Without SIMD
(Source: SIMD Wikipedia article)


With SIMD


With SIMD
(Source: SIMD Wikipedia article)

Similarly in machine learning and artificial intelligence vector & tensor product are needed to be calculated and SIMD can really fasten it up and improve performance by large factor.

Using SIMD with emscripten

SIMD can be enabled using -msimd128 flag when compiling using emscripten.

emcc -O3 -msimd128 in.c -o out.wasm

C function of example in above image:

void tripleArr(int *arr, int n) {
    for (int i=0; i<n; i++) {
        arr[i] *= 3;
    }
}

Compiled without SIMD

(func $1 (; 1 ;) (param $0 i32) (param $1 i32)
  (local $2 i32)
  (local $3 i32)
  (if
   (i32.gt_s
    (local.get $1)
    (i32.const 0)
   )
   (loop $label$2
    (i32.store
     (local.tee $3
      (i32.add
       (local.get $0)
       (i32.shl
        (local.get $2)
        (i32.const 2)
       )
      )
     )
     (i32.mul
      (i32.load
       (local.get $3)
      )
      (i32.const 3)
     )
    )
    (br_if $label$2
     (i32.ne
      (local.tee $2
       (i32.add
        (local.get $2)
        (i32.const 1)
       )
      )
      (local.get $1)
     )
    )
   )
  )
 )
)

Compiled with SIMD

   (loop $label$3
      (v128.store align=4
       (local.tee $4
        (i32.add
         (local.get $0)
         (i32.shl
          (local.get $3)
          (i32.const 2)
         )
        )
       )
       (i32x4.mul
        (v128.load align=4
         (local.get $4)
        )
        (i32x4.splat
         (i32.const 3)
        )
       )
    )

Wat in SIMD is omitted to show only main part i.e v128 align=4 instructions

WASM binary disassembled using Binaryen.


References & Interesting links