Array destructuring for multi-value returns (in light of React hooks)
Attention: Shared Google-externally
Authors: bmeurer@chromium.org, jasonmiller@chromium.org, rmcilroy@chromium.org
Last Updated: 2018/11/09
TL;DR This document presents an analysis of the performance impact when using array destructuring patterns to handle multi-value returns, especially in light of the recently announced React hooks.
Short link bit.ly/array-destructuring-for-multi-value-returns
Object destructuring alternative
Object destructuring the array
Special casing array destructuring
Generally cheaper iteration protocol (for arrays)
Stretch Goal: Swapping variables
At React Conf 2018, Dan Abramov introduced a new concept called React hooks (video), which provides an elegant way to handle stateful components (aka hooking into Reacts state system and specifically the lifecycle hooks) without having to use component classes. A simple example of how this looks is:
import { useState } from 'react'; |
This looks like a stateless component in traditional React, but it is stateful (the count state). Like stateless components this is executed every time the component Example is rendered. React internally memorizes the state for the count, so subsequent runs for the same instance of the component will have the same state.
Developers can use any number of these useState hooks (and other hooks) in a single component, for example:
function ExampleWithManyStates(props) { |
Again, this will be executed every time the ExampleWithManyStates component is rendered (and useState will do its magic behind the scenes). Specifically the array destructuring runs on every render!
I discussed the problem of potentially expensive array destructuring earlier this year at JSConfEU with Dan Abramov. We talked about using object destructuring instead, but the ergonomics for the developers would be worse, i.e. you'd need to agree on names for the pairs produced from the "use" functions, and you'd need more verbose syntax to assign it to local names.
function ExampleWithManyStates(props) { |
This is harder to parse and probably leads to more subtle bugs than the array destructuring syntax above. Also array destructuring is not only interesting in the case of React hooks, but it's a general pattern that developers use often.
Dan's main argument back then was that React developers generally don't ship ES2015 code, but transpile their code via Babel first anyways, usually in loose mode. And Babel loose mode completely eliminates the array iteration:
function ExampleWithManyStates(props) { |
But for the success of the web platform we don't want to incentivize transpilation as a long-term solution. Just looking at this simple snippet of code, it's already immediately obvious that transpilation is not a silver bullet, since it increases the size of the bundled JavaScript.
In order to make it attractive to ship untranspiled code (to modern browsers), we'll probably need to improve the way we deal with these special array destructuring (for multi-return values).
Let's consider this example (definitely very oversimplified, but necessary for the analysis):
function useState(x) { |
It's specifically interesting to look at the generated bytecode (and any implicit overhead) here, because such render functions will likely execute quite often in bytecode, especially during page load, before they eventually tier up to optimized code.
In V8 7.2 (as of 2018/10/31) this render function turns into the following Ignition bytecode:
0 : StackCheck 1 : LdaGlobal [0], [0] 4 : Star r11 6 : LdaZero 7 : Star r12 9 : CallUndefinedReceiver1 r11, r12, [2] 13 : Star r0 15 : LdaNamedProperty r0, [1], [4] 19 : Star r12 21 : CallProperty0 r12, r0, [6] 25 : Mov r0, r11 28 : JumpIfJSReceiver [7] (0x2206f882056d @ 35) 30 : CallRuntime [ThrowSymbolIteratorInvalid], r0-r0 35 : Star r1 37 : LdaNamedProperty r1, [2], [8] 41 : Star r2 43 : LdaFalse 44 : Star r3 46 : LdaZero 47 : Star r6 49 : Mov <context>, r13 52 : Mov <context>, r14 55 : Ldar r3 57 : JumpIfToBooleanTrue [43] (0x2206f88205ae @ 100) 59 : LdaTrue 60 : Star r3 62 : CallProperty0 r2, r1, [10] 66 : Star r4 68 : InvokeIntrinsic [_IsJSReceiver], r4-r4 72 : ToBooleanLogicalNot 73 : JumpIfFalse [7] (0x2206f882059a @ 80) 75 : CallRuntime [ThrowIteratorResultNotAnObject], r4-r4 80 : LdaNamedProperty r4, [3], [12] 84 : JumpIfToBooleanFalse [7] (0x2206f88205a5 @ 91) 86 : LdaUndefined 87 : Star r5 89 : Jump [11] (0x2206f88205ae @ 100) 91 : LdaNamedProperty r4, [4], [14] 95 : Star r5 97 : LdaFalse 98 : Star r3 100 : LdaSmi [2] 102 : Star r6 104 : Mov r5, r7 107 : LdaZero 108 : Star r6 110 : Ldar r3 112 : JumpIfToBooleanTrue [43] (0x2206f88205e5 @ 155) 114 : LdaTrue 115 : Star r3 117 : CallProperty0 r2, r1, [16] 121 : Star r4 123 : InvokeIntrinsic [_IsJSReceiver], r4-r4 127 : ToBooleanLogicalNot 128 : JumpIfFalse [7] (0x2206f88205d1 @ 135) 130 : CallRuntime [ThrowIteratorResultNotAnObject], r4-r4 135 : LdaNamedProperty r4, [3], [12] 139 : JumpIfToBooleanFalse [7] (0x2206f88205dc @ 146) 141 : LdaUndefined 142 : Star r5 144 : Jump [11] (0x2206f88205e5 @ 155) 146 : LdaNamedProperty r4, [4], [14] 150 : Star r5 152 : LdaFalse 153 : Star r3 155 : LdaSmi [2] 157 : Star r6 159 : Mov r5, r8 162 : LdaZero 163 : Star r6 165 : Jump [33] (0x2206f8820610 @ 198) 167 : Star r15 169 : CreateCatchContext r15, [5] 172 : PushContext r15 174 : Star r14 176 : LdaSmi [2] 178 : TestEqualStrict r6, [18] 181 : JumpIfFalse [6] (0x2206f8820605 @ 187) 183 : LdaSmi [1] 185 : Star r6 187 : LdaImmutableCurrentContextSlot [4] 189 : Star r16 191 : CallRuntime [ReThrow], r16-r16 196 : PopContext r15 198 : LdaSmi [-1] 200 : Star r12 202 : Star r11 204 : Jump [7] (0x2206f882061d @ 211) 206 : Star r12 208 : LdaZero 209 : Star r11 211 : LdaTheHole 212 : SetPendingMessage 213 : Star r13 215 : Ldar r3 217 : JumpIfToBooleanTrue [90] (0x2206f882067d @ 307) 219 : LdaNamedProperty r1, [6], [19] 223 : Star r9 225 : TestUndetectable 226 : JumpIfFalse [4] (0x2206f8820630 @ 230) 228 : Jump [79] (0x2206f882067d @ 307) 230 : LdaSmi [1] 232 : TestEqualStrict r6, [21] 235 : JumpIfFalse [47] (0x2206f8820664 @ 282) 237 : Ldar r9 239 : TestTypeOf #6 241 : JumpIfFalse [4] (0x2206f882063f @ 245) 243 : Jump [18] (0x2206f882064f @ 261) 245 : LdaSmi.Wide [154] 249 : Star r14 251 : LdaConstant [7] 253 : Star r15 255 : CallRuntime [NewTypeError], r14-r15 260 : Throw 261 : Mov <context>, r14 264 : Mov r9, r15 267 : Mov r1, r16 270 : InvokeIntrinsic [_Call], r15-r16 274 : Jump [6] (0x2206f8820662 @ 280) 276 : LdaTheHole 277 : SetPendingMessage 278 : Ldar r14 280 : Jump [27] (0x2206f882067d @ 307) 282 : Mov r9, r14 285 : Mov r1, r15 288 : InvokeIntrinsic [_Call], r14-r15 292 : Star r10 294 : InvokeIntrinsic [_IsJSReceiver], r10-r10 298 : JumpIfToBooleanFalse [4] (0x2206f8820678 @ 302) 300 : Jump [7] (0x2206f882067d @ 307) 302 : CallRuntime [ThrowIteratorResultNotAnObject], r10-r10 307 : Ldar r13 309 : SetPendingMessage 310 : LdaZero 311 : TestReferenceEqual r11 313 : JumpIfFalse [5] (0x2206f8820688 @ 318) 315 : Ldar r12 317 : ReThrow 318 : LdaConstant [8] 320 : Star r11 322 : Ldar r7 324 : Add r11, [22] 327 : Star r11 329 : LdaConstant [9] 331 : Add r11, [23] 334 : Return |
Yes, that's really 335 bytes of bytecode! I've highlighted the different parts to make it easier to spot the different parts of the function inside the bytecode:
function render() { |
So the array destructuring const[x,setX]=state alone is responsible for 305 bytes of bytecode! The reason for this is that array destructuring uses the iteration protocol and the BytecodeGenerator inlines the iteration protocol completely into the bytecode for this case. Summarizing the cost of this, we need:
And to make matters worse, all of this has to be wrapped into a try-finally to implement the iterator closing. So that's a whopping 9 ICs and 3 temporary objects just to implement the array destructuring.
In order to understand why the above bytecode is this big, it might help to look at some pseudo JavaScript, which describes the destructuring assignment:
function render() { |
It's not 100% correct, but close enough to help understand the complexity of the iteration protocol for such (simple) cases.
AI(mythria) It might be worth sharing the feedback slots for the two "done" LdaNamedProperty and the two "value" LdaNamedProperty bytecodes, as it's very likely that a given iterator will always produce JSIterResultObjects with the same shape.
Contrast this with the code that Babel produces in loose mode:
function render() { |
It's important to recognize that this is NOT semantically equivalent to the original ES2015 version! Ignition's BytecodeGenerator turns this into the following bytecode:
0 : StackCheck 1 : LdaGlobal [0], [0] 4 : Star r3 6 : LdaZero 7 : Star r4 9 : CallUndefinedReceiver1 r3, r4, [2] 13 : Star r0 15 : LdaZero 16 : LdaKeyedProperty r0, [4] 19 : Star r1 21 : LdaSmi [1] 23 : LdaKeyedProperty r0, [6] 26 : Star r2 28 : LdaConstant [1] 30 : Star r3 32 : Ldar r1 34 : Add r3, [8] 37 : Star r3 39 : LdaConstant [2] 41 : Add r3, [9] 44 : Return |
So this is only 45 bytes (aka 13% of the size of the bytecode for the original ES2015 function). Again highlighting the different parts to make it easier to read the bytecode:
function render() { |
Summarizing the cost here:
So it's easy to see that this version will be a lot more efficient than the original ES2015 version, speaking in terms of the bytecode and the object allocation overhead in the interpreter version.
AI(mythria) One thing that seems interesting here is that the two LdaKeyedProperty bytecodes get distinct feedback vector slots (4 vs 6). That might be something worth optimizing, having all KEYED_LOAD_ICs accessing the same object share a single feedback vector slot, since the feedback that is collected is the pair of map and handler, where the handler is usually one of the LoadElementStubs (i.e. it's the same for loading 0 and 1).
An alternative approach I suggested to Dan Abramov was to consider object destructuring instead of array destructuring (as mentioned above). So this would look something like this:
function useState(x) {
|
Looking at the bytecode in Ignition we see:
0 : StackCheck 1 : LdaGlobal [0], [0] 4 : Star r3 6 : LdaZero 7 : Star r4 9 : CallUndefinedReceiver1 r3, r4, [2] 13 : Star r0 15 : JumpIfUndefined [6] (0x37efef72062f @ 21) 17 : Ldar r0 19 : JumpIfNotNull [16] (0x37efef72063d @ 35) 21 : LdaSmi [81] 23 : Star r3 25 : LdaConstant [1] 27 : Star r4 29 : CallRuntime [NewTypeError], r3-r4 34 : Throw 35 : LdaNamedProperty r0, [1], [4] 39 : Star r1 41 : LdaNamedProperty r0, [2], [6] 45 : Star r2 47 : LdaConstant [3] 49 : Star r3 51 : Ldar r1 53 : Add r3, [8] 56 : Star r3 58 : LdaConstant [4] 60 : Add r3, [9] 63 : Return |
This is 64 bytes, which is not as ideal as the Babel loose mode version, but significantly better than the original ES2015 array destructuring version. Again using color boxing to explain the bytecode in terms of the original JavaScript:
function render() { |
Summarizing the cost of the destructuring:
Again no additional temporary objects need to be constructed.
Jason Miller points out that it is also possible to destructure the return value of array-returning functions like useState() using object destructuring with numeric keys, avoiding the cost of the iteration protocol:
function render() { |
A somewhat representative benchmark seems to indicate this outperforms both the Array destructured approach and Babel’s transpiled output:
Looking at the bytecode reveals that it's indeed close to the object destructuring with real named fields (i.e. not using numeric indices):
0 : StackCheck 1 : LdaGlobal [0], [0] 4 : Star r3 6 : LdaZero 7 : Star r4 9 : CallUndefinedReceiver1 r3, r4, [2] 13 : Star r0 15 : JumpIfUndefined [6] (0xe6cf78a0627 @ 21) 17 : Ldar r0 19 : JumpIfNotNull [16] (0xe6cf78a0635 @ 35) 21 : LdaSmi [80] 23 : Star r3 25 : LdaConstant [1] 27 : Star r4 29 : CallRuntime [NewTypeError], r3-r4 34 : Throw 35 : LdaZero 36 : LdaKeyedProperty r0, [4] 39 : Star r1 41 : LdaSmi [1] 43 : LdaKeyedProperty r0, [6] 46 : Star r2 48 : LdaConstant [2] 50 : Star r3 52 : Ldar r1 54 : Add r3, [8] 57 : Star r3 59 : LdaConstant [3] 61 : Add r3, [9] 64 : Return |
This is 65 bytes of bytecode, so pretty good. Color boxing to match it back to the bytecode:
function render() { |
But this solution suffers from the same problem as the object destructuring, it's not really easy on the eyes. You don't need to agree on names, but object destructuring with numeric indices is not something JavaScript developers come across often, so it might lead to surprises, especially with newcomers.
Now it's worth looking into the optimized code as well. The render functions will likely be called often while interacting with a React web application, so there's a good chance they'll become hot.
Let's first consider the optimized code that TurboFan generates for the original ES2015 array destructuring version:
0 movq rbx,[rcx-0x20] 4 testb [rbx+0xf],0x1 8 jnz CompileLazyDeoptimizedCode e push rbp f movq rbp,rsp 12 push rsi 13 push rdi 14 subq rsp,0x10 18 movq [rbp-0x18],rsi 1c cmpq rsp,[r13+0x1218] 23 jna 0x135 29 movq rdi,<JSFunction useState> 33 movq rsi,[rdi+0x1f] 37 movq rax,<JSGlobal Object> 41 push rax 42 push 0x0 44 movl rax,0x1 49 movq rcx,<undefined> 4d movq rbx,rax 50 movq rdx,rcx 53 movq rcx,[rdi+0x2f] 57 addq rcx,0x3f 5b call rcx 5d test al,0x1 5f jz Deoptimize 65 movq rbx,<JSArrayMap(PACKED_ELEMENTS)> 6f cmpq [rax-0x1],rbx 73 jnz Deoptimize 79 movq rbx,[rax+0xf] ; load elements 7d movl rax,[rax+0x1b] ; load length 80 cmpl rax,0x0 ; check 0th element in bounds 83 jna 0x153 89 movq rbx,[rbx+0xf] ; is in-bounds -> load the element 8d movq rcx,rbx 90 movl rbx,0x1 ; set iter index to the next one 95 xorl rsi,rsi ; set iterRes.done to 0 97 xorl rdx,rdx 99 cmpl rsi,0x0 ; check iterRes.done 9c jnz 0xaa a2 movq rsi,rdx ; ??? Some bizarrely reused 0 ??? a5 jmp 0xb3 aa movl rsi,0x1 ; set iterRes.done to 1 (uselessly!?) af movq rcx,<undefined> b3 cmpl rsi,0x0 ; check iterRes.done (again!?) b6 jnz 0xc7 bc cmpl rbx,rax ; check next element in bounds be jnc 0x16a c4 cmpl rdx,0x0 ; ??? Missing jump, perhaps jump-threaded away ??? c7 movq rax,[r13+0x8618] ce movq [r13+0x8618],rax d5 movq rax,<String[5]: <div>> df movq rsi,<NativeContext[248]> e9 movq rbx,rcx ec movq r10,<Code: StringAdd_ConvertRight> f6 call r10 f9 movl rbx,[rax+0xb] fc addl rbx,0x6 ff cmpl rbx,0x3fffffe8 105 jnc Deoptimize 10b movq rbx,<String[6]: </div>> 115 xorl rsi,rsi 117 movq r10,<Code: StringAdd_CheckNone> 121 call r10 124 movq rsp,rbp 127 pop rbp 128 ret 0x8 12b movq rbx,0x7f5f1930ce00 135 xorl rax,rax 137 movq rsi,<NativeContext[248]> 141 movq r10,<Code: StackCheck> 14b call r10 14e jmp 0x29 153 movq [rbp-0x20],rax 157 movl rbx,0xffffffff ; mark the iterator as exhausted 15c movl rsi,0x1 ; set iterRes.done to 1 161 movq rcx,<undefined> ; set iterRes.value to undefined 165 jmp 0x97 16a movl rdx,0x1 ; set iterRes.done to 1 16f jmp 0xc4 |
This is admittedly a lot harder to read than the bytecode. Using color boxing we can again match it back to the original code:
function render() { |
The interesting aspect here is that TurboFan completely eliminates not only the allocation of the JSArrayIterator instance, but also the two JSIterResultObject instances (there's no allocation code in the red part, although admittedly the optimized code could still be a little bit better).
The optimized code for the JavaScript generated by Babel in loose mode looks like this:
0 movq rbx,[rcx-0x20] 4 testb [rbx+0xf],0x1 8 jnz CompileLazyDeoptimizedCode e push rbp f movq rbp,rsp 12 push rsi 13 push rdi 14 subq rsp,0x8 18 movq [rbp-0x18],rsi 1c cmpq rsp,[r13+0x1218] 23 jna 0xe3 29 movq rdi,<JSFunction useState> 33 movq rsi,[rdi+0x1f] 37 movq rax,<JSGlobal Object> 41 push rax 42 push 0x0 44 movq rdx,<undefined> 48 movl rax,0x1 4d movq rcx,[rdi+0x2f] 51 addq rcx,0x3f 55 call rcx 57 test al,0x1 59 jz Deoptimize 5f movq rbx,<JSArrayMap(PACKED_ELEMENTS)> 69 cmpq [rax-0x1],rbx 6d jnz Deoptimize 73 movq rbx,[rax+0xf] 77 movl rdx,[rax+0x1b] 7a cmpl rdx,0x0 7d jna Deoptimize 83 movq rbx,[rbx+0xf] 87 cmpl rdx,0x1 8a jna Deoptimize 90 movq rax,<String[5]: <div>> 9a movq rsi,<NativeContext[248]> a4 movq r10,<Code: StringAdd_ConvertRight> ae call r10 b1 movl rbx,[rax+0xb] b4 addl rbx,0x6 b7 cmpl rbx,0x3fffffe8 bd jnc Deoptimize c3 movq rbx,<String[6]: </div>> cd xorl rsi,rsi cf movq r10,<Code: StringAdd_CheckNone> d9 call r10 dc movq rsp,rbp df pop rbp e0 ret 0x8 e3 movq rbx,0x7f11a142ce00 ed xorl rax,rax ef movq rsi,<NativeContext[248]> f9 movq r10,<Code: StackCheck> 103 call r10 106 jmp 0x29 |
This is very close to the perfect code you can generate here. Color boxing to read the optimized code:
function render() { |
This code not only avoids any unnecessary allocations, but also executes only the minimal number of instructions, i.e. it even eliminates the memory access for setX = _useState[1], since TurboFan is able to tell that setX is not used afterwards (notice that the bounds check remains in the code though).
Given the above analysis and the fact that during page load (aka for the initial render), most code will run in the interpreter, it probably makes sense to invest into the generated bytecode for array destructuring. The observation from TurboFan here is that if the engine can prove that no one messed with the Array.prototype[Symbol.iterator] and that the intrinsic %ArrayIteratorPrototype%.next() is unchanged, we don't need to actually do the iteration protocol. So we could recognize simple array destructurings like
const [a, b] = iterable; |
and instead of blowing that up into the full iteration protocol inlined into the bytecode, have a single bytecode that does it instead, i.e.
DestructureArray iterable, ra-rb
which takes an iterable and produces a register list. This bytecode would also collect feedback about the iterable for the TurboFan optimization, so TurboFan is probably able to generate the perfect code sequence in case the iterable is an JSArray instance (with fast elements and no elements on the Array.prototype chain); we can even take it further one step and go MEGAMORPHIC if the iterable has less than two elements, which means TurboFan could get along with just a single bounds check followed by two element loads. So that might look something like this
CheckHeapObject(iterable)
CheckMaps[array map](iterable)
length = LoadField[length](iterable)
CheckBounds(1, length)
elements = LoadField[elements](iterable)
ra = LoadElement(iterable, 0)
rb = LoadElement(iterable, 1)
which is the best optimized code possible and will even beat the Babel loose mode code (although not by a lot).
More important than the TurboFan considerations is that for Ignition we can completely encapsulate the iteration protocol in one bytecode (in the same spirit as for the ArrayCloneIC), which means even before going to optimized code we can skip the iteration protocol if we see that the iterable is a JSArray and no one messed with the iterator methods (aka the FastArrayIteration protector is intact).
We will probably have to limit this to one-/two-element patterns initially, since TurboFan cannot really handle stub calls that return more than two values (we once supported three to implement some part of for..in, but fortunately we improved that since then, so there was no need for 3-return any more), and for the case where DestructureArray doesn't have usable feedback TurboFan needs to insert a builtin call to handle the generic case.
Ross McIlroy suggested that we could do something similar to how we support for..in and have a bytecode that looks like this:
[iterator, next_method] = GetIterator(iterable)
loop {
iter_result = IteratorNext(iterator, next_method)
if (IteratorComplete(iterator, iter_result)) break;
value = IteratorValue(iterator, iter_result);
…
}
where for the case of a JSArray iterable with fast elements when both the Array.prototype[Symbol.iterator] and %ArrayIteratorPrototype%.next are unchanged, the GetIterator bytecode does nothing but returns the iterable for iterator and Smi #0 as the next_method, and IteratorNext returns the index (the current value of next_method), and increments the next_method Smi. In IteratorComplete we perform the bounds check against the iterator.length (iterator will be the JSArray iterable), and finally IteratorValue will load the value at the iter_result index (might hit the slow path and walk up the prototype chain). For the generic case, all the iterator and iter result objects could be allocated and held in the respective registers.
This wouldn't be quite as fast as the special case approach above, but should be comparable to the Object destructuring array approach for fast JSArrays. And it would allow us to sort out for..of as well. Also this would be easier to match to the specification, which has similarly named abstract operations. Some of these operations will have to collect feedback to allow TurboFan to generate perfect optimized code.
A similar pattern that developers got quite excited about in ES2015 is the ability to swap the values of variables with just a single line of code, i.e.:
[a, b] = [b, a]; |
This currently suffers from the same problems as the general case, and would also benefit from a special DestructureArray bytecode. But in reality it would be even better to recognize this and have a Swap bytecode that completely avoids any temporary allocations if the array iteration is not messed up, i.e. something like this:
Swap ra-rb
It's currently unclear how important it is, so maybe just having the DestructureArray bytecode which solves a more general problem might be enough. Most uses in the wild will be transpiled code anyways, so it's hard to estimate the importance.