Fast JavaScript with Data-oriented Design
Lessons from the Firefox Profiler
Markus Stange
FOSDEM 2024
About me
Firefox
Firefox Profiler: https://profiler.firefox.com/
Markus Stange @ Mozilla
Lots of samples = slow UI?
Lots of samples = fast UI! (after a lot of optimization work)
Let’s optimize a mini profiler together!
Mini profiler features
Profile JSON format:
Toy profile format, V1
{
"samples": [
{
"time": 17880.8104,
"stack": [
{ "name": "ZwWaitForAlertByThreadId", "category": "User" },
{ "name": "RtlSleepConditionVariableSRW", "category": "User" },
{ "name": "SleepConditionVariableSRW", "category": "User" },
{ "name": "mozilla::ConditionVariable::wait(...)", "category": "User" },
{ "name": "mozilla::OffTheBooksCondVar::Wait()", "category": "User" },
...
],
"weight": 643
},
{
"time": 17721.0814,
"stack": [
{ "name": "KiSearchForNewThreadOnProcessor", "category": "Kernel" },
{ "name": "KiSwapThread", "category": "Kernel" },
...
],
"weight": 1
},
...
]
}
export type Profile = {
samples: Sample[];
};
export type Sample = {
time: number;
stack: Stack;
weight: number;
};
export type Stack = Frame[];
export type Frame = {
name: string;
category: string;
};
V1: Computing the category breakdown
export function computeCategoryBreakdown(
profile: Profile,
range: SampleIndexRange
): CategoryBreakdown {
const map = new Map();
for (let i = range.start; i < range.end; i++) {
const { stack, weight } = profile.samples[i];
const topFrame = stack[0];
const category = topFrame.category;
map.set(category, (map.get(category) ?? 0) + weight);
}
return map;
}
export type Profile = {
samples: Sample[];
};
export type Sample = {
time: number;
stack: Stack;
weight: number;
};
export type Stack = Frame[];
export type Frame = {
name: string;
category: string;
};
export type CategoryBreakdown =
Map<string, number>;
export type SampleIndexRange =
{ start: number; end: number };
V1: Computing the heaviest stack
export function computeHeaviestStack(
profile: Profile, range: SampleIndexRange
): Stack {
const map = new Map();
let heaviestStackWeight = 0;
let heaviestStack: Stack = [];
for (let i = range.start; i < range.end; i++) {
const { stack, weight } = profile.samples[i];
const stackJsonString = JSON.stringify(stack);
const stackWeight = (map.get(stackJsonString) ?? 0) + weight;
map.set(stackJsonString, stackWeight);
if (stackWeight > heaviestStackWeight) {
heaviestStackWeight = stackWeight;
heaviestStack = stack;
}
}
return heaviestStack;
}
export type Profile = {
samples: Sample[];
};
export type Sample = {
time: number;
stack: Stack;
weight: number;
};
export type Stack = Frame[];
export type Frame = {
name: string;
category: string;
};
export type CategoryBreakdown =
Map<string, number>;
export type SampleIndexRange =
{ start: number; end: number };
Is it fast?
Throughput of computing the category breakdown: 104.9 nanoseconds per sample
Throughput of computing the heaviest stack: 36780.4 nanoseconds per sample
For 100,000 samples: 10.49ms and 3.6 seconds
Reasonable for small profiles, quickly falls down as profiles get bigger.
The JSON file is gigantic and repetitive.
Toy profile format, V2: Lots of indexes
export type Profile = {
samples: Sample[];
stacks: StackNode[];
frames: Frame[];
categories: string[];
};
export type Sample = {
time: number;
stackIndex: number;
weight: number;
};
export type StackNode = {
parentStackIndex: number | null;
frameIndex: number;
};
export type Frame = {
name: string;
categoryIndex: number;
};
{
"samples": [
{ "time": 17880.8104, "stackIndex": 27, "weight": 643 },
{ "time": 17721.0814, "stackIndex": 87, "weight": 1 },
...
],
"stacks": [
{ "frameIndex": 24, "parentStackIndex": null },
{ "frameIndex": 23, "parentStackIndex": 0 },
{ "frameIndex": 22, "parentStackIndex": 1 },
...
],
"frames": [
{ "name": "ZwWaitForAlertByThreadId", "categoryIndex": 0 },
{ "name": "RtlSleepConditionVariableSRW", "categoryIndex": 0 },
{ "name": "SleepConditionVariableSRW", "categoryIndex": 0 },
...
],
"categories": [
"User",
"Kernel",
"Trampoline",
"JIT",
...
]
}
V2: Computing the heaviest stack
export function computeHeaviestStack(
profile: Profile, range: SampleIndexRange
): Stack {
const map = new Map();
let heaviestStackWeight = 0;
let heaviestStack: Stack = [];
for (let i = range.start; i < range.end; i++) {
const { stack, weight } = profile.samples[i];
const stackJsonString = JSON.stringify(stack);
const stackWeight =
(map.get(stackJsonString) || 0) + weight;
map.set(stackJsonString, stackWeight);
if (stackWeight > heaviestStackWeight) {
heaviestStackWeight = stackWeight;
heaviestStack = stack;
}
}
return heaviestStack;
}
export function computeHeaviestStackIndex(
profile: Profile, range: SampleIndexRange
): number | null {
const map = new Map();
let heaviestStackWeight = 0;
let heaviestStackIndex: number | null = null;
for (let i = range.start; i < range.end; i++) {
const { stackIndex, weight } = profile.samples[i];
const stackWeight =
(map.get(stackIndex) ?? 0) + weight;
map.set(stackIndex, stackWeight);
if (stackWeight > heaviestStackWeight) {
heaviestStackWeight = stackWeight;
heaviestStackIndex = stackIndex;
}
}
return heaviestStackIndex;
}
V1:
V2:
~30000 nanoseconds per sample
103.7 nanoseconds per sample (300x faster)
V2: Computing the category breakdown
export function computeCategoryBreakdown(
profile: Profile,
range: SampleIndexRange
): CategoryBreakdown {
const map = new Map();
for (let i = range.start; i < range.end; i++) {
const { stackIndex, weight } = profile.samples[i];
const frameIndex = profile.stacks[stackIndex].frameIndex;
const categoryIndex = profile.frames[frameIndex].categoryIndex;
const category = profile.categories[categoryIndex];
map.set(category, (map.get(category) || 0) + weight);
}
return map;
}
export type Profile = {
samples: Sample[];
stacks: StackNode[];
frames: Frame[];
categories: string[];
};
export type Sample = {
time: number;
stackIndex: number;
weight: number;
};
export type StackNode = {
parentStackIndex: number | null;
frameIndex: number;
};
export type Frame = {
name: string;
categoryIndex: number;
};
export type CategoryBreakdown =
Map<string, number>;
export type SampleIndexRange =
{ start: number; end: number };
V2: Computing the category breakdown
export function computeCategoryBreakdownWithIndexKeyMap(
profile: Profile,
range: SampleIndexRange
): Map<number, number> {
const map = new Map();
for (let i = range.start; i < range.end; i++) {
const { stackIndex, weight } = profile.samples[i];
const frameIndex = profile.stacks[stackIndex].frameIndex;
const categoryIndex = profile.frames[frameIndex].categoryIndex;
map.set(categoryIndex, (map.get(categoryIndex) || 0) + weight);
}
return map;
}
export type Profile = {
samples: Sample[];
stacks: StackNode[];
frames: Frame[];
categories: string[];
};
export type Sample = {
time: number;
stackIndex: number;
weight: number;
};
export type StackNode = {
parentStackIndex: number | null;
frameIndex: number;
};
export type Frame = {
name: string;
categoryIndex: number;
};
export type CategoryBreakdown =
Map<string, number>;
export type SampleIndexRange =
{ start: number; end: number };
How fast is it now?
Throughput of computing the category breakdown: 47.1 nanoseconds per sample
Throughput of computing the heaviest stack: 51.1 nanoseconds per sample
(on medium-size V2 profile)
Time to use the profiler!
Category breakdown: Replace Map
export function computeCategoryBreakdownWithIndexKeyMap(
profile: Profile,
range: SampleIndexRange
): Map<number, number> {
const map = new Map();
for (let i = range.start; i < range.end; i++) {
const { stackIndex, weight } = profile.samples[i];
const frameIndex = profile.stacks[stackIndex].frameIndex;
const categoryIndex = profile.frames[frameIndex].categoryIndex;
map.set(categoryIndex, (map.get(categoryIndex) || 0) + weight);
}
return map;
}
47 nanoseconds
export type Profile = {
samples: Sample[];
stacks: StackNode[];
frames: Frame[];
categories: string[];
};
export type Sample = {
time: number;
stackIndex: number;
weight: number;
};
export type StackNode = {
parentStackIndex: number | null;
frameIndex: number;
};
export type Frame = {
name: string;
categoryIndex: number;
};
export type CategoryBreakdown =
Map<string, number>;
export type SampleIndexRange =
{ start: number; end: number };
Category breakdown: Replace Map with typed array
export function computeCategoryBreakdownWithTypedArray(
profile: Profile,
range: SampleIndexRange
): Float64Array {
const map = new Float64Array(profile.categories.length);
for (let i = range.start; i < range.end; i++) {
const { stackIndex, weight } = profile.samples[i];
const frameIndex = profile.stacks[stackIndex].frameIndex;
const categoryIndex = profile.frames[frameIndex].categoryIndex;
map[categoryIndex] += weight;
}
return map;
}
47 nanoseconds -> 16.3 nanoseconds (~3x faster)
export type Profile = {
samples: Sample[];
stacks: StackNode[];
frames: Frame[];
categories: string[];
};
export type Sample = {
time: number;
stackIndex: number;
weight: number;
};
export type StackNode = {
parentStackIndex: number | null;
frameIndex: number;
};
export type Frame = {
name: string;
categoryIndex: number;
};
export type CategoryBreakdown =
Map<string, number>;
export type SampleIndexRange =
{ start: number; end: number };
Heaviest stack: Replace Map with typed array
export function computeHeaviestStackIndexWithTypedArray(
profile: Profile, range: SampleIndexRange
): number | null {
const map = new Float64Array(profile.stacks.length);
let heaviestStackWeight = 0;
let heaviestStackIndex: number | null = null;
for (let i = range.start; i < range.end; i++) {
const { stackIndex, weight } = profile.samples[i];
const stackWeight = map[stackIndex] + weight;
map[stackIndex] = stackWeight;
if (stackWeight > heaviestStackWeight) {
heaviestStackWeight = stackWeight;
heaviestStackIndex = stackIndex;
}
}
return heaviestStackIndex;
}
51.1 nanoseconds -> 16.3 nanoseconds (~3x faster)
export type Profile = {
samples: Sample[];
stacks: StackNode[];
frames: Frame[];
categories: string[];
};
export type Sample = {
time: number;
stackIndex: number;
weight: number;
};
export type StackNode = {
parentStackIndex: number | null;
frameIndex: number;
};
export type Frame = {
name: string;
categoryIndex: number;
};
export type CategoryBreakdown =
Map<string, number>;
export type SampleIndexRange =
{ start: number; end: number };
Where are we now?
Category breakdown: Thinking about bytes in memory
export function computeCategoryBreakdownWithTypedArray(
profile: Profile,
range: SampleIndexRange
): Float64Array {
const map = new Float64Array(profile.categories.length);
for (let i = range.start; i < range.end; i++) {
const { stackIndex, weight } = profile.samples[i];
const frameIndex = profile.stacks[stackIndex].frameIndex;
const categoryIndex = profile.frames[frameIndex].categoryIndex;
map[categoryIndex] += weight;
}
return map;
}
Arrays of objects: memory layout
for (let i = range.start; i < range.end; i++) {
const { stackIndex, weight } = profile.samples[i];
...
}
0x1300ab00
8 bytes
0x1300ab20
8 bytes
samples:
0x1300aa60
8 bytes
…
24 bytes
time
8 bytes
stackIndex
8 bytes
weight
8 bytes
Varies by engine and by phase of the moon!
JS object header
Arrays of objects: memory layout
Some sources of variation:
for (let i = range.start; i < range.end; i++) {
const { stackIndex, weight } = profile.samples[i];
...
}
0x1300ab00
8 bytes
0x1300ab20
8 bytes
samples:
0x1300aa60
8 bytes
…
24 bytes
time
8 bytes
stackIndex
8 bytes
weight
8 bytes
Varies by engine and by phase of the moon!
JS object header
Arrays of objects: memory layout
for (let i = range.start; i < range.end; i++) {
const { stackIndex, weight } = profile.samples[i];
...
}
0x1300ab00
8 bytes
0x1300ab20
8 bytes
samples:
0x1300aa60
8 bytes
…
24 bytes
time
8 bytes
stackIndex
8 bytes
weight
8 bytes
JS object header
Arrays of objects: memory layout
for (let i = range.start; i < range.end; i++) {
const { stackIndex, weight } = profile.samples[i];
...
}
0x1300ab00
8 bytes
0x1300ab20
8 bytes
samples:
0x1300aa60
8 bytes
…
24 bytes
time
8 bytes
stackIndex
8 bytes
weight
8 bytes
useful
JS object header
Arrays of objects: memory layout
for (let i = range.start; i < range.end; i++) {
const { stackIndex, weight } = profile.samples[i];
...
}
0x1300ab00
8 bytes
0x1300ab20
8 bytes
samples:
0x1300aa60
8 bytes
…
24 bytes
time
8 bytes
stackIndex
8 bytes
weight
8 bytes
useful
JS object header
not useful
not useful
not useful
not useful
not useful
Arrays of objects: memory layout
It’s time to do something radical.
Turning it on the side
Before:
"samples": [
{ "time": 17880.8104, "stackIndex": 27, "weight": 643 },
{ "time": 17721.0814, "stackIndex": 87, "weight": 1 },
{ "time": 17880.9783, "stackIndex": 93, "weight": 1 },
{ "time": 17881.569, "stackIndex": 102, "weight": 1 },
{ "time": 17881.8177, "stackIndex": 143, "weight": 1 },
{ "time": 17882.0657, "stackIndex": 190, "weight": 1 }
...
]
After:
"sampleTable": {
"length": 1691,
"timeColumn": [17880.8104, 17721.0814, 17880.9783, 17881.569, 17881.8177, 17882.0657, ...],
"stackIndexColumn": [27, 87, 93, 102, 143, 190, ...],
"weightColumn": [643, 1, 1, 1, 1, 1, 1, ...]
}
Turning it on the side: Everything is backwards
Before: profile.samples[i].weight
After: profile.sampleTable.weightColumn[i]
Toy profile format, V3: Tables everywhere
export type Profile = {
sampleTable: SampleTable;
stackTable: StackTable;
frameTable: FrameTable;
categories: string[];
};
export type SampleTable = {
length: number;
timeColumn: number[];
stackIndexColumn: number[];
weightColumn: number[];
};
export type StackTable = {
length: number;
parentStackIndexColumn: Array<number | null>;
frameIndexColumn: number[];
};
export type FrameTable = {
length: number;
nameColumn: string[];
categoryIndexColumn: number[];
};
{
"sampleTable": {
"length": 1691,
"timeColumn": [ 17880.8104, 17881.1623, 17881.4107, ... ],
"stackIndexColumn": [ 27, 27, 27, ... ],
"weightColumn": [ 643, 1, 1, ... ]
},
"stackTable": {
"length": 25776,
"parentStackIndexColumn": [ null, 0, 1, ... ],
"frameIndexColumn": [ 24, 23, 22, ... ]
},
"frameTable": {
"length": 6242,
"nameColumn": [ "ZwWaitForAlertByThreadId", ... ],
"categoryIndexColumn": [ 0, 0, 0, ... ]
},
"categories": [ "User", "Kernel", "Trampoline", ... ]
}
V3: Computing the heaviest stack
function computeHeaviestStackIndexBasic(
profile: Profile, range: SampleIndexRange
): number | null {
const map = new Float64Array(profile.stackTable.length);
let heaviestStackWeight = 0;
let heaviestStackIndex: number | null = null;
for (let i = range.start; i < range.end; i++) {
const stackIndex = profile.sampleTable.stackIndexColumn[i];
const weight = profile.sampleTable.weightColumn[i];
const stackWeight = map[stackIndex] + weight;
map[stackIndex] = stackWeight;
if (stackWeight > heaviestStackWeight) {
heaviestStackWeight = stackWeight;
heaviestStackIndex = stackIndex;
}
}
return heaviestStackIndex;
}
16.3 nanoseconds -> 8.7 nanoseconds (~2x faster)
V3: Computing the category breakdown
function computeCategoryBreakdownBasic(
profile: Profile,
range: SampleIndexRange
): Float64Array {
const map = new Float64Array(profile.categories.length);
for (let i = range.start; i < range.end; i++) {
const stackIndex = profile.sampleTable.stackIndexColumn[i];
const weight = profile.sampleTable.weightColumn[i];
const frameIndex = profile.stackTable.frameIndexColumn[stackIndex];
const categoryIndex = profile.frameTable.categoryIndexColumn[frameIndex];
map[categoryIndex] += weight;
}
return map;
}
16.3 nanoseconds -> 4.6 nanoseconds (~3.5x faster!)
V3: Computing the category breakdown
function computeCategoryBreakdownBasic(
profile: Profile,
range: SampleIndexRange
): Float64Array {
const map = new Float64Array(profile.categories.length);
for (let i = range.start; i < range.end; i++) {
const stackIndex = profile.sampleTable.stackIndexColumn[i];
const weight = profile.sampleTable.weightColumn[i];
const frameIndex = profile.stackTable.frameIndexColumn[stackIndex];
const categoryIndex = profile.frameTable.categoryIndexColumn[frameIndex];
map[categoryIndex] += weight;
}
return map;
}
16.3 nanoseconds -> 4.6 nanoseconds (~3.5x faster!)
Struct of arrays: memory layout
for (let i = range.start; i < range.end; i++) {
const stackIndex = profile.sampleTable.stackIndexColumn[i];
const weight = profile.sampleTable.weightColumn[i];
...
}
27
8 bytes
87
8 bytes
stackIndexColumn:
93
8 bytes
…
useful
useful
useful
643
8 bytes
1
8 bytes
weightColumn:
1
8 bytes
…
useful
useful
useful
Cache line utilization went way up!
17880.8104
8 bytes
17721.0814
8 bytes
timeColumn:
17880.9783
8 bytes
← no longer clogs the cache!
unused
unused
unused
Recap: struct of arrays
Recap: struct of arrays
Drawbacks:
Benefits:
Great work.
… Can we make it even faster?
Category breakdown: Less indirection?
function computeCategoryBreakdownBasic(
profile: Profile,
range: SampleIndexRange
): Float64Array {
const map = new Float64Array(profile.categories.length);
for (let i = range.start; i < range.end; i++) {
const stackIndex = profile.sampleTable.stackIndexColumn[i];
const weight = profile.sampleTable.weightColumn[i];
const frameIndex = profile.stackTable.frameIndexColumn[stackIndex];
const categoryIndex = profile.frameTable.categoryIndexColumn[frameIndex];
map[categoryIndex] += weight;
}
return map;
}
We just want to know the category for a sample. We’re not interested in its stack or frame.
Category breakdown: Less indirection!
function computeCategoryBreakdownWithPrecomputedSampleCategoriesRegularArray(
profile: Profile,
sampleCategories: number[],
range: SampleIndexRange
): Float64Array {
const map = new Float64Array(profile.categories.length);
for (let i = range.start; i < range.end; i++) {
const weight = profile.sampleTable.weightColumn[i];
const categoryIndex = sampleCategories[i];
map[categoryIndex] += weight;
}
return map;
}
The mapping from samples to categories is independent of the selected range.
Computing the sample categories
const getSampleCategories = (profile: Profile): number[] => {
const sampleCategories = new Array(profile.sampleTable.length);
for (let i = 0; i < sampleCategories.length; i++) {
const stackIndex = profile.sampleTable.stackIndexColumn[i];
const frameIndex = profile.stackTable.frameIndexColumn[stackIndex];
const categoryIndex = profile.frameTable.categoryIndexColumn[frameIndex];
sampleCategories[i] = categoryIndex;
}
return sampleCategories;
};
... = computeCategoryBreakdownWithPrecomputedSampleCategoriesRegularArray(
profile,
getSampleCategories(profile),
selectedRange
);
Computing the sample categories only once
import memoize from 'memoize-one';
const getSampleCategories = memoize((profile: Profile): number[] => {
const sampleCategories = new Array(profile.sampleTable.length);
for (let i = 0; i < sampleCategories.length; i++) {
const stackIndex = profile.sampleTable.stackIndexColumn[i];
const frameIndex = profile.stackTable.frameIndexColumn[stackIndex];
const categoryIndex = profile.frameTable.categoryIndexColumn[frameIndex];
sampleCategories[i] = categoryIndex;
}
return sampleCategories;
});
... = computeCategoryBreakdownWithPrecomputedSampleCategoriesRegularArray(
profile,
getSampleCategories(profile),
selectedRange
);
Computing the sample categories only once
import memoize from 'memoize-one';
const getSampleCategories = (profile: Profile): number[] => getSampleCategoriesFromColumns(
profile.sampleTable.stackIndexColumn,
profile.stackTable.frameIndexColumn,
profile.frameTable.categoryIndexColumn
);
const getSampleCategoriesFromColumns = memoize(
(sampleStacks: number[], stackFrames: number[], frameCategories: number[]): number[] => {
const sampleCategories = new Array(sampleStacks.length);
for (let i = 0; i < sampleCategories.length; i++) {
const stackIndex = sampleStacks[i];
const frameIndex = stackFrames[stackIndex];
const categoryIndex = frameCategories[frameIndex];
sampleCategories[i] = categoryIndex;
}
return sampleCategories;
});
… Did it work?
It worked!
4.6 nanoseconds → 3.3 nanoseconds (30% faster!)
It seems like we’re no longer memory bound!
The speedup can be higher on machines with slower memory.
Taking a step back
What just happened?
Struct of arrays with immutable columns + memoization = ❤️
Taking another step back
What is Data-oriented Design?
General Remarks
Thank You
@mstange:mozilla.org
Install the Firefox Profiler: https://profiler.firefox.com/
Talk to profiler people on https://chat.mozilla.org/#/room/#profiler:mozilla.org
Happy profiling!