# Performance notes

from Red Blob Games
12 Jan 2021

I normally don’t want to worry about performance but I did take a detour to look at this problem. The main reason is that the performance affects the set of questions I will ask. If I can simulate something in 1 minute, I might be willing to simulate hundreds of scenarios, but non-interactively. If i can simulate something in 1 second, I could either simulate tens of thousands of scenarios non-interactively, or maybe one scenario interactively. If I can simulate something in 1 millisecond, I might simulate hundreds or maybe thousands of scenarios interactively. It’s nice to be able to set the parameters interactively and see the results, but that shows one at a time. To understand the space of possible solutions, I want to see lots of results simultaneously. See Bret Victor’s page about the Ladder of Abstraction[1]. So although I’m not optimizing, I am wanting to get a sense of what is possible to simulate in a reasonable amount of time.

I started with a baseline: Odex.js[2] is a differential equation solving library. Mike Bostock’s predator-prey uses it. I assumed it is accurate (see the accuracy page) and then wanted to compare different approximations to get a feel for the accuracy vs speed tradeoff.

## 1  Algorithms#

The first code I wrote was the naive straightforward “add up the values” integration: x += dx. I think everyone who does numerical integration knows this doesn’t work well, but it’s been so long that I had forgotten how bad Euler’s method was.

But it’s fast.

On the accuracy page you can see that it’s not stable at all.

The next step up is to use Heun’s method. This is fairly stable at least on this problem. And … it’s fast. I tried on my old (m0) and new (m1) machines in Chrome:

AlgorithmFirst run, m140 runs, m1First run, m040 runs, m0
Odex112ms3014ms308ms6726ms
Euler5.7ms88ms8.2ms117ms
Heun9.3ms211ms15.8ms281ms

So the dumb approach is 20-30X the speed of Odex, and even switching to a more accurate method, it’s still 10-15X the speed of Odex on this problem.

## 2  Implementations#

I also wrote the Heun code in C++ and Rust, and compiled both natively and to wasm. The Rust wasm results were very similar to the C++ ones. I ran these both on my old Macbook Pro (“m0”) and my new Macbook Pro (“m1”).

LanguageEnvironmentTime, m1Time, m0
C++ -O3native210ms250ms
Rust -Onative210ms250ms
JSNode210ms250ms
wasmNode210ms240ms
JSChrome210ms250ms
wasmChrome210ms240ms
JSSafari210ms250ms
wasmSafari210ms250ms
JSFirefox260ms260ms
wasmFirefox330ms330ms

If you want to try yourself:

Notes:

• Comparing browser JS to native C++: same speed, mostly.
• Comparing browser JS to browser Wasm: same speed, mostly.
• Comparing Chrome to Safari: same speed, mostly.
• Comparing Firefox to Chrome/Safari: slower, sadly.

In Chrome 87 and Firefox 84, having web developer tools open slows down wasm a lot but doesn’t slow down js. In Chrome, it slowed from 211ms to 255ms. In Firefox, it slowed from 330ms to 807ms(!). This inconsistency puzzled me greatly until I found a reddit thread[3] that pointed to the developer tools as the culprit. One comment says it might be due to “wat” files.

Wasm doesn’t seem much faster than Javascript for this type of code. I found the same was true in Mapgen4. I think the Javascript optimizers seem to do reasonably well with numerical code. I’m going to stick to Javascript for this project.

I didn’t show it on this page but there’s a significant speedup as it runs the code the first ~10 times. Part of it is that browsers optimize code if it’s run many times, but on my ARM machine I see the same effect in C++ and Rust!

I thought it might be limited by the memory access, so I turned off writing of the data to an array, but it didn’t speed it up.

### 2.1 Stupid quirks

• On my old machine, `gamma` is a C++ global, and I can’t make my own global variable named gamma.
• On my new machine, `y0` is a C++ global, and I can’t make my own global variable named y0.

Email me , or tweet @redblobgames, or comment: