webassembly - 为什么我的 WebAssembly 函数比等效的 JavaScript 函数慢？

为广泛的问题道歉!我正在学习 WASM 并在 C 中创建了一个 Mandelbrot 算法:

int iterateEquation(float x0, float y0, int maxiterations) {
  float a = 0, b = 0, rx = 0, ry = 0;
  int iterations = 0;
  while (iterations < maxiterations && (rx * rx + ry * ry <= 4.0)) {
    rx = a * a - b * b + x0;
    ry = 2.0 * a * b + y0;
    a = rx;
    b = ry;
    iterations++;
  }
  return iterations;
}

void mandelbrot(int *buf, float width, float height) {
  for(float x = 0.0; x < width; x++) {
    for(float y = 0.0; y < height; y++) {
      // map to mandelbrot coordinates
      float cx = (x - 150.0) / 100.0;
      float cy = (y - 75.0) / 100.0;
      int iterations = iterateEquation(cx, cy, 1000);
      int loc = ((x + y * width) * 4);
      // set the red and alpha components
      *(buf + loc) = iterations > 100 ? 255 : 0;
      *(buf + (loc+3)) = 255;
    }
  }
}

我正在编译为 WASM，如下所示(为清楚起见，省略了文件名输入/输出)

clang -emit-llvm  -O3 --target=wasm32 ...
llc -march=wasm32 -filetype=asm ...
s2wasm --initial-memory 6553600 ...
wat2wasm ...

我在 JavaScript 中加载，编译，然后调用如下:

instance.exports.mandelbrot(0, 300, 150)

输出被复制到 Canvas 上，这使我能够验证它是否正确执行。在我的电脑上，上述函数需要大约 120 毫秒才能执行。

但是，这里有一个 JavaScript 等效项:

const iterateEquation = (x0, y0, maxiterations) => {
  let a = 0, b = 0, rx = 0, ry = 0;
  let iterations = 0;
  while (iterations < maxiterations && (rx * rx + ry * ry <= 4)) {
    rx = a * a - b * b + x0;
    ry = 2 * a * b + y0;
    a = rx;
    b = ry;
    iterations++;
  }
  return iterations;
}

const mandelbrot = (data) => {
  for (var x = 0; x < 300; x++) {
    for (var y = 0; y < 150; y++) {
      const cx = (x - 150) / 100;
      const cy = (y - 75) / 100;
      const res = iterateEquation(cx, cy, 1000);
      const idx = (x + y * 300) * 4;
      data[idx] = res > 100 ? 255 : 0;
      data[idx+3] = 255;
    }
  }
}

执行只需要大约 62 毫秒。

现在我知道 WebAssembly 是很新的，并没有非常优化。但我不禁觉得它应该比这更快!

任何人都可以发现我可能错过的明显内容吗？

另外，我的 C 代码从“0”开始直接写入内存 - 我想知道这是否安全？分页线性存储器中存储的堆栈在哪里？我会冒险覆盖它吗？

这是一个 fiddle 来说明:

https://wasdk.github.io/WasmFiddle/?jvoh5

运行时，它会记录两个等效实现(WASM 然后是 JavaScript)的时间

最佳答案

一般的

通常，与优化的 JS 相比，您可以希望在繁重的数学运算上获得约 10% 的提升。这包括:

wasm 利润

输入/输出内存复制费用。

请注意，Uint8Array 副本在 chrome 中特别慢(在 FF 中还可以)。当您使用 rgba 数据时，最好将底层缓冲区重新转换为 Uint32Array ant 在其上使用 .set()。

尝试在 wasm 中按字 (rgba) 读/写像素的工作速度与读/写字节 (r, g, b, a) 的速度相同。我没有发现区别。

当使用 node.js 进行开发时(就像我一样)，对于 JS 基准测试，保持 8.2.1 是值得的。下一个版本将 v8 升级到 v6.0，并为此类数学引入了严重的速度回归。对于 8.2.1 - 不要使用现代 ES6 特性，如 const 、 => 等。请改用 ES5。可能是 v8 v6.2 的下一个版本将解决这些问题。

sample 评论

使用 wasm-opt -O3 ，这可能会在 clang -O3 之后的某个时间有所帮助。

使用 s2wasm --import-memory 而不是硬编码固定内存大小

在 wasdk 站点的代码中，不要使用全局变量。当这些存在时，编译器将在内存开始时为全局变量分配未知块，您可以错误地覆盖它们。

可能，正确的代码应该从正确的位置添加内存副本，并且应该包含在基准测试中。您的示例不完整，并且来自 wasdk 的恕我直言代码应该无法正常工作。

使用 benchmark.js ，这样更精确。

简而言之:在继续之前，值得清理一下。

您可能会发现挖掘 https://github.com/nodeca/multimath 源或在您的实验中使用它很有用。我专门为小型 CPU 密集型事物创建了它，以通过适当的模块初始化、内存管理、js 回退等来简化问题。它包含“unsharp mask”实现作为示例和基准。在那里采用您的代码应该不难。

关于webassembly - 为什么我的 WebAssembly 函数比等效的 JavaScript 函数慢？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/46331830/