Trying (and Failing) to Speed Up String.startsWith

Brian Hicks, March 1, 2021

When I was optimizing elm-csv, I noticed that Elm's String.startsWith is implemented using JavaScript's indexOf method. It looks something like this:

function startsWith(haystack, needle) {
  return haystack.indexOf(needle) === 0;
}

This strikes me as inefficient! Doesn't that mean that if haystack doesn't begin with needle, it'll still look through the whole string? That seems to waste a lot of cycles on work after we already know the result.

After thinking about it for a little bit, I imagined a faster way:

function startsWith(haystack, needle) {
  return haystack.slice(0, needle.length) === needle;
}

But, I didn't know for sure if it would be faster, so I started measuring things. After a few benchmarks things were looking promising! I found that String.prototype.slice and String.prototype.length operate in more-or-less constant time, and that ===, while not constant-time, is heavily optimized.

Hypothetically, I thought, this means that a slice-based implementation of startsWith could run much quicker than scanning through the whole string with indexOf.

Chrome Uses Actual Magic in Their Strings

But just to make sure my intuition about indexOf being slow was correct, I decided to benchmark that as well. Specifically, I decided to benchmark how long browsers took to find a single character in a string that doesn't contain the character:

"abc".repeat(size).indexOf("d");

I expected that every time I multiplied size by 10, I should see roughly a 10x slowdown in operations per second. But then…

Browser	1 "abc"	10 "abc"s	100 "abc"s	1,000 "abc"s
Chrome 88.0.4324.182	866,026,834	869,256,882	858,563,862	865,370,315
Firefox 85.0	43,557,275	44,947,174	41,059,919 (-8.65%)	16,831,928 (-59.01%)
Safari 14.0.1	49,240,581	34,661,639 (-29.61%)	7,162,615 (-79.01%)	877,449 (-87.75%)

Chrome has… NO REAL SLOWDOWN? WHAT? This data suggests that Chrome can find a substring in any length string in constant time. How is that even possible?

At this point, I felt pretty out of my depth, and decided I needed to ask my friend Luke Westby, Real-Life Computer Genius™, what was going on. He knew right where to look and figured out that Chrome has a special optimization for single-character search strings in indexOf. In other words, people have hand-tuned V8 over the years to go ridiculously fast in all sorts of situations, and this is one of them.

Well, OK, let's try finding some multi-character string instead. I used "nope", which doesn't trigger this optimization, but still fails to match "abc". Does that give us a result that matches my intuition?

Browser	1 "abc"	10 "abc"s	100 "abc"s	1,000 "abc"s
Chrome 88.0.4324.182	885,211,831	863,845,523 (-2.41%)	849,821,335 (-1.62%)	837,391,727 (-1.46%)
Firefox 85.0	58,648,716	44,107,472 (-24.79%)	40,529,513 (-8.11%)	16,768,019 (-58.63%)
Safari 14.0.1	102,071,445	26,242,114 (-74.29%)	4,212,339 (-83.95%)	467,329 (-88.91%)

That's kinda like what I expected, but Chrome is still incredibly fast at this! Well, OK, that's fine. Good for them. But at this point I got curious (and a little devious): how can I defeat Chrome's optimizations? What is the worst-case scenario that they can't possibly have handled already?

I tried to imagine the algorithm that you'd have to write to implement indexOf and came up with something like this:

Look at the first character in haystack.
If it matches the first character in needle, look at the next pairs of characters in the two strings one by one.
If you get to the end of needle and they all match, you've found the index.
But if one pair doesn't match, you've got to start over at the next character in haystack.

With that in mind, what's combination of needle and haystack would perform the worst? I think it's any pair of strings where haystack is all but the last character of needle repeated over and over. So, maybe a benchmark like this?

"abc".repeat(size).indexOf("abcd");

That can never succeed, but it'll make the browser do a lot of extra work reading the initially-matching "abc" over and over. Let's see the results:

Browser	1 "abc"	10 "abc"s	100 "abc"s	1,000 "abc"s
Chrome 88.0.4324.182	877,221,230	863,929,216 (-1.52%)	688,398,676 (-20.32%)	125,023 (-99.98%)
Firefox 85.0	61,384,762	12,874,505 (-79.03%)	1,366,202 (-89.39%)	137,245 (-89.95%)
Safari 14.0.1	100,546,438	25,948,095 (-74.19%)	4,170,375 (-83.93%)	467,753 (-88.78%)

Finally, Chrome has similar performance to other browsers. I don't think this is representative of real workloads, but even so Chrome performs admirably. Hats off to the V8 team for this implementation, but honestly, it's good to know they're still mortal!

Comparing the Implementations

So now that we know more about the component pieces involved in this optimization (for example, knowing to avoid single-character test strings), we can check if slice actually gives us a real-world speedup. In setting this up, I decided to go for the middle of the road to try and benchmark the browsers under something that you can kind of squint at and think "yeah, that will probably happen reasonably often in real life." In this case, that meant setup looked like this:

haystack = "abc".repeat(100);
needle = "abc";

Then the indexOf and slice benchmarks looked like the code at the top of this post:

haystack.indexOf(needle) === 0;

haystack.slice(0, needle.length) === needle;

That gave me these results:

Browser	indexOf	slice	% Diff
Chrome 88.0.4324.192	871,312,454	870,810,291	-0.06%
Firefox 86.0	32,428,394	46,845,953	+44.46%
Safari 14.0.1	430,854,455	1,352,674,261	+213.95%

Note that Chrome and Firefox had released new versions between me testing indexOf and this test. I kept the old data here because it doesn't seem to have changed the performance, but I just want to be up front that the versions are different!

Anyway, that said: looks like a result within the margin of error for Chrome (around 0.4%), but a reasonable speedup for Firefox and a big win for Safari.

What does the failure case look like? (Same haystack, but needle = "nope"):

Browser	indexOf	slice	% Diff
Chrome 88.0.4324.192	874,951,212	854,116,367	-2.44%
Firefox 85.0	31,026,729	22,180,114	-28.51%
Safari 14.0.1	4,189,768	1,508,770,435	+35,910.83%

This time, Chrome has a slowdown greater than the measurement error, which surprised me! Same with Firefox, but to a larger degree. But the real story here is Safari, which can now do this operation over 1.5 billion times per second. Billion! What even!

Is It Worth It?

So, with all that measurement done, is this optimization worth it? I'm actually not really sure. This is basically nothing in Chrome but means making some real performance tradeoffs in Firefox. And, even though it looks like a huge win in Safari, my claims can only be as strong as the underlying assumptions I'm making in choosing test data. So, just to be explicit about that:

These benchmarks assume that haystack is a long-ish string (300 characters.) Many of the gains we see here are erased when the strings are short. Safari in particular, drops to only about a 100% gain in the failure case with a shorter string (I tested at 3, 30, 150 characters, all with similar results.)
The needle strings are neither a single character nor a worst-case scenario for indexOf performance.
Success and failure cases are equally likely. We can't make any assumption other than that at the runtime level, although we could make a good argument to switch this for individual applications. For example, we could choose to use slice in an application where we expected most checks on startsWith would succeed, since that's basically a wash in Chrome and much faster in Firefox and Safari.
We care about browsers based on global total share instead of splitting out into desktop or mobile. This is another area where you could make different decisions at the application level: if the users of your app skew more towards using Chrome or more towards Safari, this payoff for this optimization gets better or worse.

Of course, if you can measure that your application would benefit from this optimization, go ahead and do it yourself! Calling String.slice 0 (String.length haystack) === needle has some overhead in Elm (due to equality having different semantics) but you may still able to get the majority of the benefits if your users are mostly using Safari or you're running iOS webkit views or something like that. Just make sure to actually measure it! Optimizations at this level can be helpful, but they're more often dwarfed by DOM stuff in the runtime.

As for me: at this point, I'm not going to make a PR to elm/core with this improvement, even though I intended to when I set out. However, I found it interesting to look into these things, and I learned a lot about how strings are optimized in various browsers, so I'm gonna call it a win overall.