Getting rid of cons strings

By Erik Corry (ecorry@cloudflare.com, erikcorry@chromium.org), November 2025.

Overview

Cons strings are used to implement string addition in V8, and they are remarkably efficient.The creation of a single object is very rapid, and the scavenge-based new space is very efficient at both allocation and collection of dead objects. At first use, the string is lazily flattened (see below).

However the complexity of handling cons strings pervades V8.

Because they are not auto-balancing they are often extremely asymmetric chains, rather than tree-like structures. This means you can’t use recursive algorithms on them without stack overflow.
The asymmetry also means they are slow for simple things like charCodeAt()
String search and regexp can’t work with cons strings
Embedders almost always need to flatten the strings to work with them
When adding to strings a few characters at a time, the size of the cons strings (20 bytes) dwarfs the amount of text being added, and the cons tree can have a size in memory that is easily an order of magnitude larger than the flat string would be. This is partly mitigated by the extreme efficiency of the V8 new generation, and the fact that with a sufficiently large new gen, most strings are flattened before the next GC, rendering the entire tree garbage. A moving scavenge GC runs in time proportional to the live data, not the dead data. Nevertheless, V8 is thundering through memory when it builds a string, which is not as efficient as it would be if JS had stringbuilders (and people used them).

Flattening

Before a cons string can be used for almost any application it must be flattened. This means the left hand side is replaced by a new string, containing all the characters, while the right side is replaced by the empty string. Later, the GC may short-circuit references to the flattened cons, replacing them with references to the underlying string. This trick has itself caused issues.

Proposals to fix string concatenation in V8.

Over the years there have been many attempts to fix string concatenation. Some of the more recent are here:

Proposal to use cons strings more aggressively for small strings: https://docs.google.com/document/d/1QRKUCFFVe-ya4Q6I1zWqNfAK2gPSLSiku1EKTU_z_pI/edit?tab=t.0
String building strategies by Jakob https://docs.google.com/document/d/1AX5-nR_Pk4NqZewGZYNFZE3n6hnFjeWPx7fpbKsihL8/edit?tab=t.0#heading=h.4heocyvhjeye
Jakob’s CL: https://chromium-review.googlesource.com/c/v8/v8/+/6702471
Statically detecting stringbuilder patterns in the optimizing compiler by Darius https://docs.google.com/document/d/1HSzXq3Hlf64FNNT2wOMlbSJH_I1EewybCn510KeKQ9A/edit?tab=t.0
CL by Erik to progressively flatten cons string trees during append to limit their depth to approximately log(length): https://chromium-review.googlesource.com/c/v8/v8/+/7137865
Automatic String Builders through Token-holding Sliced Strings by Toon https://docs.google.com/document/d/1rk0Qjo2_7qve1sWTmd32SO3cZLp8yRvVsLgdYbooSDo/edit?tab=t.0#heading=h.5e0v6bi6cgku

This last one looks the most promising. The TLDR is that overallocated buffers are used to implement string builder dynamically with little static analysis needed (which JS is mostly resistant to). Each buffer can only be owned by one string object which is a slice-like string that is acting as a builder. Adding a string to the end of an existing builder (slice) that owns its buffer consists of using another byte in the buffer and creating a new (immutable) builder object that is the new owner of the buffer.

Toon’s proposal was implemented by Jakob as part of his investigation described above, but didn’t pay off. This was for two main reasons in my opinion:^[a]^[b]^[c]

Some benchmarks build strings, then don’t use them. In this case it’s hard to beat cons strings, since they are created with JITted code and a bump allocator, then collected by the scavenger. Very little work is done.
Some workloads prepend to the start of a string. For example a function that builds a big string, then surrounds it by <tag> and </tag>. The addition to the start runs into an issue where we need to extend the overallocated buffer on the left.

The solution to the prepend issue was to make the buffers circular, but this added a lot of complexity. Because of this (and because It didn’t go far enough by eliminating cons strings)^[d]^[e], V8 still had indirect strings and sometimes needed flattening. Eliminating flattening completely would unlock other optimizations and simplifications eg. by being able to generate code that handles all string types without bailouts (but uncached external strings may ruin this).

My proposal.

My proposal is similar to Toon’s token holding slices, but leaving some space at the start of the buffer to avoid having to make the buffers circular^[f]^[g]^[h]^[i]^[j]^[k]^[l]. This means that all strings are flat. The space at the start must be a fixed proportion of the buffer so that patterns that only prepend still run in amortized O(nlogn) time.

Like Toon’s original proposal, this is fundamentally a dynamic proposal which can work in a language like JS that is resistant to static analysis (but some simple analysis can improve the performance).

All JS-visible objects are conceptually immutable. The owner field is flipped^[m]^[n]^[o] when the new string is created, but this doesn’t change the contents of the original string object.
This means if escape/dead object analysis fails or is not available we can fall back to just creating a new builder object as seen above. This is similar to the creation of the cons object currently and is very fast, but now results in a flat string.
If we have escape analysis available that tells us the previous object is dead, we can reuse it, improving the efficiency further. At this point no allocation at all would need to happen for many append operations.
The escape analysis can tell the runtime routine “this string is either a primitive flat string or it’s a dead builder”. This means that patterns like:
var s = “[”;
while (foo()) { s += bar(); }
Can eliminate almost all creations of new objects in this ‘+’ operation without loop peeling, potentially opening up this optimization for medium compiler tiers.

Thoughts on the growing strategy.

The buffer must grow geometrically to preserve O(nlogn) performance. The obvious solution is to double it. The space at both ends must also grow geometrically to be sure that prepending does not destroy performance completely. Prepending is rarer than appending but far from unknown. One proposal when allocating a new buffer for a string of size n is to make the spaces at either end n/4 and 3n/4.^[p]^[q]

Trimming

Of course there is some waste involved here because buffers are over-allocated. There are lots of opportunities to adjust heuristics here, even with feedback like V8’s allocation sites for objects. But we can also get a little help from the GC, if we modify the string buffer objects a little by adding a high water mark field::

class StringBuffer {

Map* map;
int32 length;

int32 hwm^[r]^[s]^[t]^[u]^[v]^[w]^[x]^[y]^[z];

char data[1]; // Actually variable length appended to object.

}

The hwm (high water mark) is used by most operations, while the length field determines the actual size in memory, and is used by the GC.

The hwm is normally a few bytes ahead of the highest character position used in the buffer. When adding via an owning string object the approximate character positions used are recorded in the owning object. If there is space between the last used character and the hwm then the characters can be written into the buffer with no other buffer changes. This is the fast path.

If there is not enough space for the new characters we go to the slow path and this can bump up the hwm toward the actual length, leaving some slack again for fast paths to eat. (If the buffer is not big enough a new one is allocated.)

The benefit of the hwm is that the GC can see how much of the buffer is in use and can trim the object while moving it. Scavenges are still stop-the-world (parallel, not concurrent) so this can be safely done. Most generated code will not inspect the length field, only the hwm field, so it will not be noticed if the length changes.

As an extra improvement, a bit can be hidden in the length field that tracks whether the hwm was modified since last GC and this can be used to trim more aggressively for objects that are no longer being built on.

This strategy can’t reclaim the smaller spare buffer space at the beginning.

Alternative trimming proposal

Instead of the high-water-mark field we could add two fields, the biggest length of a slice referring to us, and the lowest offset in use. The biggest-length can replace the ownership bit in the slices: The ownership test is replaced with a check that the length in the slice matches the biggest length field in the buffer. Now stealing a buffer from another slice merely involves writing to the buffer, not to the old owner, which may be slightly easier.

Benefits relative to the original trimming proposal:

Simpler: No slow case to bump the hwm.
Write only to two objects (new slice, if not reused, and buffer) rather than three (new slice, old slice, buffer).
It’s not obvious where to put the ownership bit in the slice and more of these are allocated than buffers, so we may save a word there.
If we have a pattern like
a = builder + “}”;
b = builder + “}”;
we could potentially share the buffer between a and b since the characters are identical.

Disadvantage:

An extra field in the buffer.
An extra indirection to determine ownership.
An extra update to the buffer when prepending.
Doesn’t provide a modified-since-last-GC signal.

Risks

Most attempts to fix string building in V8 fail, so it is likely this one would also fail:

Erik, Toon and Jakob have no time to implement, so unless someone has time it will go nowhere.
Benchmarks that build strings without using them will pay the nlogn buffer growing price, but get no benefit from the flatness of the result. This can be fixed by better benchmarks (Jetstream3^[aa]^[ab]^[ac]) and more escape analysis to give a benefit already during the building stage.
There’s always something that will regress, probably related to heavy use of prepending
Any benchmarks or applications that index into (or search in) the string while it is being built would benefit, but they probably don’t exist because V8 performance was so bad on this use case (constant flattening).
Benchmarks that export to UTF-8 will benefit from not having to flatten, but they are likely underrepresented since they are harder to build.
There may be important applications/benchmarks that have some other pathological behaviour that is punished by the new string building strategy. Eg. something like this would be disastrous since it repeatedly builds on non-owning strings:
new_string1 = big_string + “x”; // Big_string loses ownership.

new_string2 = big_string + “y”; // Oh no, memcpy big_string!

return foo() ? new_string1 : new_string2;

Some complication we haven’t thought about.

I estimate the chances that this change would actually land in V8 at less than 50%.

[a]Like I said in my doc, I think the reasons are a bit more complex. The simplicity of ConsString ops is hard to beat. They behave pretty well on all use patterns, unlike our more specialized design ideas that do a few things great, but have slow slow cases. Laziness also helps.

[b]_Marked as resolved_

[c]_Re-opened_

[d]The complexity for circularity wasn't too bad, just some index masking and rarely splitting one concat into two (when writing across the buffer end).

ConsStrings were completely replaced in my prototype.

Flattening was still needed for getting rid of circularity, but it was cheap (just 2 memcpys) and rare (most strings just remained flat).

[e]I think it's hard to know the downstream effects of simplifying because it affects inlining, both manual and automatic.

I recognize that often the only way to really know how complex or simple something becomes is to implement it, and I haven't done that. :-/

Sorry about the cons string error.

[f]Unfortunately it's easy and common to write JS code that intersperses prepends into otherwise mostly-append chains due to operator precedence.

`s += a + b + c`

Also other common patterns like

```

s0 = a + b;

s1 = c + s0 + d;

```

I'm not sure leaving some space on the left will cover all common cases.

[g]The first example, the operator precedence makes:

s += a + b + c

the same as

s = s + ((a+b)+c)

which are likely all append operations, assuming that s is large and a, b, c are approximately the same size.

In the second example the prepend is the (c + s0) operation.

It's possible that the prepend doesn't fit in the gap. In this case the solution is the same as when the append doesn't fit in the gap. It doesn't change the big-O time complexity. Of course it is preferable to avoid it if possible with heuristics.

[h]In `s = s + ((a+b)+c)`, the `((a+b)+c)` part will append to the builder `a`, and then append `a` to `s`. For my StringBufferBuilder prototype, that's turned into a prepend to `s` to avoid creating new builders.

[i]I don't understand this. There is no prepend to s, the characters have to go to the end of s?

The way I see this example, if a,b,c are <13 they just become sequential as they do now. If a+b+c is big enough then a builder is created and that builder dies when its contents are copied into s's buffer.

In the future, the compiler can trivially determine that the result of a+b is either a sequential string or a dead builder, so it gives that information to the "+ c", which can reuse the slice object avoid allocation on that operation. I'm assuming here that the compiler already deopts if they are not strings.

[j]I mean they are prepended to the underlying char buffer. For s = s + ((a+b)+c) to be efficient, we want all concats to go into the same buffer, right? Assuming all vars are initially flat strings: `a+b` will create the overallocated buffer, `..+c` will append to it, and `s+=..` will prepend to it as an optimization to avoid having to create a new buffer. The wrapper objects (which provide immutability) just point at different ranges in this buffer.

[k]I see. I misunderstood. Because s is used with the += pattern I was assuming it was itself already a large string that would not fit in the rhs builder or s was itself a builder when we entered this snippet.

[l]I think it's unrealistic for all concats to go into the same buffer when we have operations like s += foo();

In this case we must assume that the return value from foo() is something that knows nothing of the builder and is probably a string of some sort that will be garbage after the + operation.

Expecting to know more than that is to overestimate the static analyzability of JS.

[m]An alternative to the owner bit in the slice is to have an owner pointer in the buffer. This way, stealing a buffer from an earlier slice doesn't involve writing to the old slice. Unfortunately, updating this pointer would require a write barrier, which makes it a lot heavier.

Another possibility is to write the slice-length into the buffer and compare against that. Since each new slice created has a strictly higher length than the previous one this serves as a monotonically increasing ownership token and also means the left hand side of the + (the old owner) does not need to be written to.

This possibility is now described below in the "Alternative trimming proposal section".

[n]_Marked as resolved_

[o]_Re-opened_

[p]You could even make the size of the string be n+m with n how many chars were prepended so far, and m how many were appended so far, and you only increase n or m when you run out of space on that side

[q]Nice

[r]I guess you need lwm too?

[s]Can't trim on the left, so not useful.

[t]_Marked as resolved_

[u]_Re-opened_

We'll see about that!

[v]mlippautz' favorite code in v8 https://source.chromium.org/chromium/chromium/src/+/main:v8/src/heap/heap.cc;l=3532?q=LeftTrimFixed&sq=&ss=chromium

[w]Without joking though: You can move the characters assuming that the slice only contains the length. The start pointer would be read from the buffer when we prepend or read in general. So we can move the start pointer in the buffer.

[x]We were young and needed the money.

Also, the fixedarray only has one object referring to it, while the buffer has potentially many. If the GC can determine there's only one object pointing at it then everything is possible, but perhaps not desirable.

[y]The start pointer (offset) must be in the slice, not the buffer, because a prepend operation creates different strings that share the buffer, but not the start.

[z]True.

[aa]Not an expert on the JavaScript workloads in JetStream 3 (CC @cbruni@chromium.org), but do you know of line items in there that build strings without using them? It's still time to fix those before the release of JetStream 3, but it should happen in the next 1-2 months.

[ab]This was mentioned by Jakob in https://docs.google.com/document/d/1AX5-nR_Pk4NqZewGZYNFZE3n6hnFjeWPx7fpbKsihL8

[ac]Thanks, quoting from there "date-format-xparb-SP, regex-dna-SP, and splay regress by double-digits due to unused result strings and (missing) lazy modifications" -> Certainly would be interesting to repeat the experiments on JetStream 3 (https://github.com/WebKit/JetStream) and make sure those line items are fixed there (@cbruni@google.com)