Getting rid of cons strings
By Erik Corry (ecorry@cloudflare.com, erikcorry@chromium.org), November 2025.
Cons strings are used to implement string addition in V8, and they are remarkably efficient.The creation of a single object is very rapid, and the scavenge-based new space is very efficient at both allocation and collection of dead objects. At first use, the string is lazily flattened (see below).
However the complexity of handling cons strings pervades V8.
Before a cons string can be used for almost any application it must be flattened. This means the left hand side is replaced by a new string, containing all the characters, while the right side is replaced by the empty string. Later, the GC may short-circuit references to the flattened cons, replacing them with references to the underlying string. This trick has itself caused issues.
Over the years there have been many attempts to fix string concatenation. Some of the more recent are here:
This last one looks the most promising. The TLDR is that overallocated buffers are used to implement string builder dynamically with little static analysis needed (which JS is mostly resistant to). Each buffer can only be owned by one string object which is a slice-like string that is acting as a builder. Adding a string to the end of an existing builder (slice) that owns its buffer consists of using another byte in the buffer and creating a new (immutable) builder object that is the new owner of the buffer.
Toon’s proposal was implemented by Jakob as part of his investigation described above, but didn’t pay off. This was for two main reasons in my opinion:[a]
The solution to the prepend issue was to make the buffers circular, but this added a lot of complexity. Because of this (and because It didn’t go far enough by eliminating cons strings)[b][c], V8 still had indirect strings and sometimes needed flattening. Eliminating flattening completely would unlock other optimizations and simplifications eg. by being able to generate code that handles all string types without bailouts (but uncached external strings may ruin this).
My proposal is similar to Toon’s token holding slices, but leaving some space at the start of the buffer to avoid having to make the buffers circular[d][e][f][g][h][i][j]. This means that all strings are flat. The space at the start must be a fixed proportion of the buffer so that patterns that only prepend still run in amortized O(nlogn) time.
Like Toon’s original proposal, this is fundamentally a dynamic proposal which can work in a language like JS that is resistant to static analysis (but some simple analysis can improve the performance).
The buffer must grow geometrically to preserve O(nlogn) performance. The obvious solution is to double it. The space at both ends must also grow geometrically to be sure that prepending does not destroy performance completely. Prepending is rarer than appending but far from unknown. One proposal when allocating a new buffer for a string of size n is to make the spaces at either end n/4 and 3n/4.[l][m]
Of course there is some waste involved here because buffers are over-allocated. There are lots of opportunities to adjust heuristics here, even with feedback like V8’s allocation sites for objects. But we can also get a little help from the GC, if we modify the string buffer objects a little by adding a high water mark field::
class StringBuffer {
Map* map;
int32 length;
int32 hwm[n][o][p][q][r][s][t][u][v];
char data[1]; // Actually variable length appended to object.
}
The hwm (high water mark) is used by most operations, while the length field determines the actual size in memory, and is used by the GC.
The hwm is normally a few bytes ahead of the highest character position used in the buffer. When adding via an owning string object the approximate character positions used are recorded in the owning object. If there is space between the last used character and the hwm then the characters can be written into the buffer with no other buffer changes. This is the fast path.
If there is not enough space for the new characters we go to the slow path and this can bump up the hwm toward the actual length, leaving some slack again for fast paths to eat. (If the buffer is not big enough a new one is allocated.)
The benefit of the hwm is that the GC can see how much of the buffer is in use and can trim the object while moving it. Scavenges are still stop-the-world (parallel, not concurrent) so this can be safely done. Most generated code will not inspect the length field, only the hwm field, so it will not be noticed if the length changes.
As an extra improvement, a bit can be hidden in the length field that tracks whether the hwm was modified since last GC and this can be used to trim more aggressively for objects that are no longer being built on.
This strategy can’t reclaim the smaller spare buffer space at the beginning.
Instead of the high-water-mark field we could add two fields, the biggest length of a slice referring to us, and the lowest offset in use. The biggest-length can replace the ownership bit in the slices: The ownership test is replaced with a check that the length in the slice matches the biggest length field in the buffer. Now stealing a buffer from another slice merely involves writing to the buffer, not to the old owner, which may be slightly easier.
Benefits relative to the original trimming proposal:
Disadvantage:
Most attempts to fix string building in V8 fail, so it is likely this one would also fail:
new_string2 = big_string + “y”; // Oh no, memcpy big_string!
return foo() ? new_string1 : new_string2;
I estimate the chances that this change would actually land in V8 at less than 50%.
[a]Like I said in my doc, I think the reasons are a bit more complex. The simplicity of ConsString ops is hard to beat. They behave pretty well on all use patterns, unlike our more specialized design ideas that do a few things great, but have slow slow cases. Laziness also helps.
[b]The complexity for circularity wasn't too bad, just some index masking and rarely splitting one concat into two (when writing across the buffer end).
ConsStrings were completely replaced in my prototype.
Flattening was still needed for getting rid of circularity, but it was cheap (just 2 memcpys) and rare (most strings just remained flat).
[c]I think it's hard to know the downstream effects of simplifying because it affects inlining, both manual and automatic.
I recognize that often the only way to really know how complex or simple something becomes is to implement it, and I haven't done that. :-/
Sorry about the cons string error.
[d]Unfortunately it's easy and common to write JS code that intersperses prepends into otherwise mostly-append chains due to operator precedence.
`s += a + b + c`
Also other common patterns like
```
s0 = a + b;
s1 = c + s0 + d;
```
I'm not sure leaving some space on the left will cover all common cases.
[e]The first example, the operator precedence makes:
s += a + b + c
the same as
s = s + ((a+b)+c)
which are likely all append operations, assuming that s is large and a, b, c are approximately the same size.
In the second example the prepend is the (c + s0) operation.
It's possible that the prepend doesn't fit in the gap. In this case the solution is the same as when the append doesn't fit in the gap. It doesn't change the big-O time complexity. Of course it is preferable to avoid it if possible with heuristics.
[f]In `s = s + ((a+b)+c)`, the `((a+b)+c)` part will append to the builder `a`, and then append `a` to `s`. For my StringBufferBuilder prototype, that's turned into a prepend to `s` to avoid creating new builders.
[g]I don't understand this. There is no prepend to s, the characters have to go to the end of s?
The way I see this example, if a,b,c are <13 they just become sequential as they do now. If a+b+c is big enough then a builder is created and that builder dies when its contents are copied into s's buffer.
In the future, the compiler can trivially determine that the result of a+b is either a sequential string or a dead builder, so it gives that information to the "+ c", which can reuse the slice object avoid allocation on that operation. I'm assuming here that the compiler already deopts if they are not strings.
[h]I mean they are prepended to the underlying char buffer. For s = s + ((a+b)+c) to be efficient, we want all concats to go into the same buffer, right? Assuming all vars are initially flat strings: `a+b` will create the overallocated buffer, `..+c` will append to it, and `s+=..` will prepend to it as an optimization to avoid having to create a new buffer. The wrapper objects (which provide immutability) just point at different ranges in this buffer.
[i]I see. I misunderstood. Because s is used with the += pattern I was assuming it was itself already a large string that would not fit in the rhs builder or s was itself a builder when we entered this snippet.
[j]I think it's unrealistic for all concats to go into the same buffer when we have operations like s += foo();
In this case we must assume that the return value from foo() is something that knows nothing of the builder and is probably a string of some sort that will be garbage after the + operation.
Expecting to know more than that is to overestimate the static analyzability of JS.
[k]An alternative to the owner bit in the slice is to have an owner pointer in the buffer. This way, stealing a buffer from an earlier slice doesn't involve writing to the old slice. Unfortunately, updating this pointer would require a write barrier, which makes it a lot heavier.
Another possibility is to write the slice-length into the buffer and compare against that. Since each new slice created has a strictly higher length than the previous one this serves as a monotonically increasing ownership token and also means the left hand side of the + (the old owner) does not need to be written to.
This possibility is now described below in the "Alternative trimming proposal section".
[l]You could even make the size of the string be n+m with n how many chars were prepended so far, and m how many were appended so far, and you only increase n or m when you run out of space on that side
[m]Nice
[n]I guess you need lwm too?
[o]Can't trim on the left, so not useful.
[p]_Marked as resolved_
[q]_Re-opened_
We'll see about that!
[r]mlippautz' favorite code in v8 https://source.chromium.org/chromium/chromium/src/+/main:v8/src/heap/heap.cc;l=3532?q=LeftTrimFixed&sq=&ss=chromium
[s]Without joking though: You can move the characters assuming that the slice only contains the length. The start pointer would be read from the buffer when we prepend or read in general. So we can move the start pointer in the buffer.
[t]We were young and needed the money.
Also, the fixedarray only has one object referring to it, while the buffer has potentially many. If the GC can determine there's only one object pointing at it then everything is possible, but perhaps not desirable.
[u]The start pointer (offset) must be in the slice, not the buffer, because a prepend operation creates different strings that share the buffer, but not the start.
[v]True.
[w]Not an expert on the JavaScript workloads in JetStream 3 (CC @cbruni@chromium.org), but do you know of line items in there that build strings without using them? It's still time to fix those before the release of JetStream 3, but it should happen in the next 1-2 months.
[x]This was mentioned by Jakob in https://docs.google.com/document/d/1AX5-nR_Pk4NqZewGZYNFZE3n6hnFjeWPx7fpbKsihL8