1 of 63

How browsers work

Domenico De Felice�DUBLIN, AUGUST 2015�

domenicodefelice.blogspot.com

2 of 63

Why?

Web browsers are likely the most widely used�software in the world.

They present web resources and create a sandboxed environment in which web applications can run.

The way the browser achieves this is very complex and dictated by many different standards.

Some mechanisms are deceptive and/or counter-intuitive.

Understanding how the browsers work gives us useful insights to improve the efficiency of our website / web application and the general structure of the code.

3 of 63

Complexity

Of course it is not possible to explore all the details of how a browser works in a single presentation.

Each browser is implemented in its own way.

4 of 63

General and modular approach

Browsers follow the same standards (IE?)�

Most of them share the same overall structure and the same modules/phases.

We will dwell on these shared behaviours and this will give us some insights that we can apply in our everyday work as web developers.

5 of 63

Browsers have two main modules:

  • The rendering engine,�also called layout engine

  • Javascript interpreter

It is easy to understand what is the job of the javascript interpreter.

�The rendering engine is where a big deal of the process takes place

6 of 63

Mozilla Firefox uses Gecko, made by Mozilla

Both Safari and Google Chrome (until version 27) use WebKit�Chrome uses Blink after version 27

7 of 63

Components of a web page:

  • HTML: the content of the page/application

  • CSS: the style of the content

  • Javascript: the logic of the application, sometimes also for animations, etc.

  • other assets

8 of 63

What is the job of the�rendering engine?

Starting from these files (HTML, CSS, JS, etc.), render the web page on the screen of the user.

This is done roughly in four phases:

  1. Process the HTML to build the DOM,�process the CSS to build the CSSOM�
  2. Combine DOM and CSSOM together into a render tree�
  3. Layout the render tree (computing geometries)�
  4. Paint the render tree on the screen

Optimizing this critical path will enable us to display the content to the user as soon as possible.

9 of 63

  1. Process the HTML and CSS to build the DOM and CSSOM
  2. Combine DOM and CSSOM together into a render tree
  3. Layout the render tree (computing geometries)
  4. Paint the render tree on the screen

The important point to get here is that: these phases are not strictly consequential.

The engine will display contents to the user as soon as possible.

�Phase 2 won’t wait phase 1 to be complete before starting�It will start consuming output from phase 1 as soon as possible, and so will do the other phases

10 of 63

Since browsers try to show the content as soon as possible,�in some situations content may be displayed before�the style has been loaded:�this is known as the Flash of unstyled content (FOUC) problem.�

An example straight from Wikipedia:

11 of 63

PHASE 1:�Parse the HTML and the CSS

In this phase, the engine will start the parsing of the HTML.

The output will be a tree called Document Object Model (DOM) or content tree,�where each HTML tag is represented by a node of the tree (DOM node).

The mapping between HTML tags and DOM nodes is not 1:1 but almost.

The root element of the tree is the <html> element.

The purpose of the DOM is at least dual:

  • as an input tree for the next phases (to build the render tree)
  • as an interface to connect the web page to scripts:
    • every DOM node implements an interface that defines how the structure of the tree can be accessed and manipulated
    • examples of common DOM methods are: .getElementById(), .createElement(), .removeChild()

12 of 63

Parsers

Parsers are heavily used in computer science.

�They are a software components that take input data, usually text, and build a data structure to be used by the software.

Most of the times the input data doesn’t change during the parsing process: it is static.

The HTML parser is an exception.�It is a reentrant parser: it means that its input can change dynamically while parsing.

This makes this phase more complex and will lead us to our real first insight. Let’s see why.

13 of 63

Why HTML parsing is reentrant?

Because:

  • Javascript code is executed immediately when the parser reaches a <script> tag,

  • Javascript can modify the input, e.g. using document.write

The HTML parsing is synchronous

14 of 63

The HTML parsing is synchronous

When the parser reaches a <script> tag:

  • it stops parsing
  • it fetches the script from the network, if it is external
  • it yields control over to the Javascript engine to executes the code
  • the parsing is resumed

�Since <script>s are blocking the parsing, moving them at the bottom of our web page (or just after the above the fold content) makes the page rendering faster.

15 of 63

defer and async

Two attributes of the <script> tag can also help when loading external scripts.

Marking a script with the defer attribute,�will make the script run only when the DOM has been completely built.

Source: caniuse.com

16 of 63

defer and async

The async attribute,�will make the script run only when it is available, without blocking the parsing.

Source: caniuse.com

17 of 63

Curiosity

In both Gecko and Webkit,�when the engine is blocked fetching and executing a script,�a second thread starts parsing the document looking for external resources to load:��it won’t modify the DOM but just start fetching external resources.

18 of 63

What about CSS?

Javascript blocked the parsing because it could modify the document.

CSS can’t modify the structure of the document, so it would seem we have no reasons to block the parsing.

19 of 63

CSS is blocking

CSS itself is a blocking resource,

for two reasons.

20 of 63

1- Scripts

Javascript could ask for style information that have not been parsed yet.

Different browsers behave differently, but roughly:

Mozilla Firefox holds script execution until there are still stylesheets being fetched and parsed

WebKit pauses script execution when they try to access styles that may be affected by stylesheets that still need to be fetched or parsed.

21 of 63

This means that an external <script>�in the head of the page, on Firefox:

  • blocks parsing
  • fetches the external script file
  • waits for previous stylesheet to be fetched, parsed and the CSSOM to be built
  • executes the script
  • resumes parsing

As before, this can be avoided moving scripts as much to the bottom as possible.

22 of 63

2- Rendering

The browser will hold the rendering process�until all the CSSOM has been built.

This can be mitigated by:

  • pushing scripts at the bottom of the page
  • delivering the stylesheets as soon as possible�
    • splitting style by media type and media queries can help�marking some CSS resources as non-render blocking�
      • non-render blocking resources will still be downloaded�but with lower priority

23 of 63

Network

Is there something we need to know�about how the files are fetched?

24 of 63

HTTP/1.1

The RFC 2616, released in 1999, defines the HTTP protocol version 1.1.�

From section 8.1.4 Practical Considerations:�

«A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy.»

25 of 63

Imagine having this external files in the head of your web page:�

  • my_style1.css
  • my_style2.css
  • my_style3.css
  • my_style4.css

The download of file my_style3.css won’t start until we first complete download of my_style1.css.

The limits are per domain.�So if two file were on www.example.com and other two on assets.example.com the file could be downloaded at the same time

26 of 63

It is important to make few HTTP requests during page load and distribute the files on different domains.

Anyway:

  1. the specification says SHOULD NOT:�web browsers don’t have to follow that;�most browsers have a limit that is greater than 2�
  2. in 2014 RFC2616 has been replaced by multiple RFCs (7230-7239) that, among other things, remove the limit of two connections

27 of 63

This is the connections per hostname limit today according to browserscope.org:��

So it is higher

�but nevertheless it is good practice to reduce the number of HTTP requests

�to load the webpage faster

28 of 63

CSS

All the CSS of a page is parsed in a tree called�the CSS Object Model (CSSOM).

Why the CSS is represented by a tree?

The answer for this is in the name itself:�� Cascading Style Sheet.

29 of 63

Cascading

Every element can be matched by many CSS rules

How can the browser choose which property to apply to the element?

Ordering them by:

  • origin
  • weight
  • specificity

30 of 63

CSS origin

  • Author defined - styles defined by the author of the page
  • User defined - these are styles defined by the user of the browser
  • User Agent - these are the browser default styles

Weight�

Each property can have a normal weight or a !important weight.

Specificity�

Note�The weight ( !important ) has been introduced to let users override pages styles.�It was not intended for developers use!

31 of 63

Specificity

The specificity of a rule is represented by a group of four numbers (a, b, c, d)

  • The first number is 1 if inline rules, 0 otherwise�

  • The second number, is the number of the IDs in the rule� div.class_name has 0 IDs� #sidebar #widget .class_name has 2 IDs�
  • The third number, is the number of classes, pseudo classes and attributes�
  • The fourth number, is the number of elements and pseudo elements

The final group (a, b, c, d) will be considered as a single number

32 of 63

�HTML

<div id=”sidebar”>

<div id=”widget” class=”class-div”>

<span class=”span-class” style=”color: red”>Hello, world!</span>

</div>

</div>

CSS

.span-class {

color: green;

}

#sidebar #widget {

color: orange;

}

#sidebar {

color: yellow;

}

#sidebar .class-div {

color: blue;

}

/* Specificity (0, 0, 1, 0) */

/* Specificity (0, 2, 0, 0) */

/* Specificity (0, 1, 0, 0) */

/* Specificity (0, 1, 1, 0) */

The inline rule, will have a specificity of (1, 0, 0, 0).

33 of 63

When we find ourselves needing !important to get our style applied

we just need to increase�the specificity of our rule

for example by adding a class to the selector

34 of 63

Cascading summary

The rules that apply to an element are sorted by:

  1. origin
  2. weight
  3. specificity
  4. order of definition

The specificity is taken in consideration only if the rule have the same origin and weight.�

If also the specificity is the same, then the rule that has been defined last is applied.

35 of 63

The CSSOM is not the only data structure built from the CSS

The CSS rule matching can be a heavy job

�Each rule is added to one of several hash tables, according to the most specific selector of the rule.

There are hash tables for IDs, class names, tag names and a general hash table for rules that don’t fit in any other dictionary.

When the browser needs to find rules that apply to an element,

it doesn’t need to look in every rule but only in the hash tables.

This leads us to another important insight:�we will always match rules by their rightmost selector,�that for this reason is also called the key selector.

36 of 63

<div id=”container1”>

… thousands of <a> elements here …

<a> … </a>

… thousands of <a> elements here …

</div>

<div id=”container2”>

<a class=”a-class”>...</a>

</div>

Let’s say we want to select only the <a> in the container2.

#container2 a {

}

This could be quite expensive.�Reading the rule left-to-right it reads: take #container2 subtree and find all <a>s.

�But the browser reads it right-to-left, so it reads:�take all <a>s and goes up in the DOM until you find a #container2.�

37 of 63

<div id=”container1”>

… thousands of <a> elements here …

<a> … </a>

… thousands of <a> elements here …

</div>

<div id=”container2”>

<a class=”a-class”>...</a>

</div>

We can write a most efficient rule:

#container2 .a-class { … }

or, if no other elements have that class, just

�.a-class { … }

38 of 63

This can be leveraged also from javascript / jquery

Instead of selecting an element with

$(‘#container .class-name’);

we can achieve better performances by using:

$(‘.class-name’, ‘#container’);

In the first case, all the .class-name element of the page will be grabbed and then the DOM will be traversed up looking for the #container element.

In the second case, the #container element will be grabbed (constant time) and then .class-name element will be looked for only in its subtree.

39 of 63

When all the HTML has been parsed, the document will be marked as interactive and its state set to complete:�

  • deferred scripts will be executed�
  • a load event will be fired

40 of 63

2) DOM+CSSOM → Render tree

The nodes from the DOM and the CSSOM trees will be combined together in a new tree, the render tree.

Its structure is similar to the DOM tree but only visual elements will be there.

Also, some DOM elements could be represented by more nodes in the render tree (like multi-line text nodes).

It is a tree of the visual elements�in the same order in which they will be rendered.

41 of 63

Each node of the render tree stores the CSS properties that apply to it.

This nodes are called differently in different browsers.

WebKit call them renderers (hence render tree).

Firefox calls them frames and the tree is called frame tree.

�Sometime they are simply called boxes from the concept of CSS box, since each node represent a rectangular area that usually corresponds to the node’s CSS box.

42 of 63

3) Layout phase

The layout phase, also called flow (or reflow, as we’ll see later)

What else is missing to be able to render the page?

The position and dimensions of the nodes, called together geometry.

43 of 63

Sometimes the geometry is specified in the stylesheet, sometimes not.

But in any case they need to be computed for all nodes to be able to correctly render them.

The browser traverses once again the render tree, starting from the root, and for each node computes the geometry.

All the relative units used in the stylesheet are converted to pixels.

44 of 63

4) Paint

This phase is also called painting�(or repaint, as we’ll see later)

  • All nodes are traversed, starting from the root�
  • their “paint” method is called
    • each node “knows” how to paint itself.

45 of 63

Important points:

  • Layout and paint can be expensive (browser-blocking)�
  • they can be retriggered

Reflows and repaints are unavoidable,�but we can reduce their frequency and cost.

46 of 63

What triggers reflows and repaints?

Javascript manipulation of the page:

  • DOM manipulation
  • Stylesheet manipulation

User interactions:

  • Triggering a :hover effect
  • Scrolling the page
  • Entering text in a input box
  • Resizing the window

47 of 63

There’s not much we can do to avoid the reflows and repaints triggered by the user,�but we can try to make them as less expensive as possible:

  • optimizing our CSS and HTML

We have more control over the reflows and repaints triggered by our code.

48 of 63

Browsers are smart (1)

Browsers will try to do the minimal possible actions in response to a change.

Changing a property of an element may just trigger a local repaint

Hence, reflows and repaints can be global or incremental.

Applying changes to subtrees that are as small as possible will make the reflow/repaint faster.

In other words:�strive to apply your changes to elements that are deeper in the DOM and with a low height.

49 of 63

Reflows and repaints don’t have to happen together.

For example, if you change only the color of an element then only a repaint is triggered.

If you change the position of an element, both reflow and repaint are triggered.

50 of 63

What is more expensive, reflow or repaint?

The browser will use heavy caching to avoid recalculations.

A repaint requires the browser to search through all the elements to determine what is visible and what should be displayed.

A reflow recalculate the geometry for an element. It will recursively reflow also its children and sometimes also its siblings.

A reflow of course will trigger a repaint to update the webpage.

51 of 63

Optimizing CSS and HTML�for minimal reflow

  • The leaner the stylesheet, the faster the reflow
  • The higher is the DOM (the HTML structure) and the more expensive reflows can be
  • Some elements and display mode are more expensive than others:
    • inline CSS styles may trigger a further reflow
    • <table>s with automatic cell widths are expensive since the browser will need more than one pass to calculate the cell dimensions
    • using the flexbox display itself can be expensive, since the geometry of the flex items can change during the parsing

Optimizing javascript code is more important.

52 of 63

Let’s see some javascript code:

var foo = document.getElementById(‘foobar’);

foo.style.color = ‘blue’;

foo.style.marginTop = ‘30px’;

How many reflows and repaints do we trigger?

53 of 63

Browsers are smart (2)

�The browser accumulates DOM manipulations done in a timeframe into a queue and will performs them in batch.

54 of 63

Let’s see some other code

var foo = document.getElementById(‘foobar’);

foo.style.color = ‘blue’;

var margin = parseInt(foo.style.marginTop);

foo.style.marginTop = (margin + 10) + ‘px’;

How many reflows and repaints do we trigger?

55 of 63

Why?

After the first change to the color property, the manipulation has been added to the accumulating queue.

Then, we ask for the style of that element.

To give us an answer, the browser will trigger the batch execution of the queue.�

We are forcing an early repaint of the page.

56 of 63

How does this work?

When we change an element of the DOM, that element will be flagged as dirty.�Sometimes it may also have the children are dirty flag, meaning that at least one of its children needs a reflow.

Once the interval runs out, all the dirty elements are reflowed and repainted.

�Asking some properties from an element that has been marked as dirty force the browser to perform the reflow early.

57 of 63

Forced synchronous layout

or

layout trashing

Interleaving a lot of reads and writes to the DOM can lead us to what is called layout trashing

58 of 63

How can we avoid layout trashing?

  • Reordering the commands to group together DOM reads and DOM writes�
  • Cache computed styles�
  • Don’t change styles one by one, but use CSS classes�
  • Manipulate elements outside the DOM
    • document fragments can help here and they are is widely supported by browsers�
  • Set position absolute or fixed to animating elements�
  • Display element only when necessary�
  • Use window.requestAnimationFrame()�
  • Use a virtual DOM library

59 of 63

requestAnimationFrame

window.requestAnimationFrame() allows us to execute some code at the next reflow

This is very useful because it allows us to interleave reads and writes and at the same time while executing them in the right moment

60 of 63

function doubleHeight(element) {

var currentHeight = element.clientHeight;

element.style.height = (currentHeight * 2) + ‘px’;

}

all_my_elements.forEach(doubleHeight);

function doubleHeight(element) {

var currentHeight = element.clientHeight;

� window.requestAnimationFrame(function () {

element.style.height = (currentHeight * 2) + ‘px’;

});

}

all_my_elements.forEach(doubleHeight);

61 of 63

RAF support

Anyway javascript developers have been using a trick for a long time to achieve roughly the same result:

window.setTimeout(function () {...}, 0);

Setting a timeout with a zero interval will usually invoke the callback function at the next reflow.

Source: caniuse.com

62 of 63

Virtual DOM libraries

Virtual DOM libraries are gaining a lot of momentum lately, thanks to Facebook’s React library.

How does a virtual DOM works?

The developer performs the manipulations on a virtual DOM that will then:

  • aggregate the manipulations together
  • apply some heuristics
  • cache values
  • execute the manipulation in the right moment,�avoiding layout trashing

63 of 63

Thank you for your attention :-)