1 of 16

How WebKit Works

Adam Barth (abarth)

October 30, 2012

2 of 16

What is WebKit?

WebKit is a rendering engine for web content

WebKit is not a browser, a science project, or the solution to every problem

HTML

JavaScript

CSS

WebKit

Rendering of a web page

3 of 16

Major Components

WebCore

(HTML, CSS, DOM, etc, etc)

WTF

(Data structures, Threading primitives)

Platform

(Network, Storage, Graphics)

JavaScriptCore

(JavaScript Virtual Machine)

Bindings

(JavaScript API, Objective-C API)

WebKit and WebKit2

(Embedding API)

This talk

4 of 16

Life of Web Page

Network

Loader

HTML Parser

DOM

Script

Render Tree

CSS

Graphics Context

5 of 16

Pages, Frames, and Documents

Page

Main Frame

Document

Frame

Frame

Frame

Document

Document

Document

Frame

Document

6 of 16

Lifecycle of a Frame

  • Committed is the quiescent state

Uninitialized

Initial Document

Provisional

Ready to Commit

Committed

Checking Policy

7 of 16

How the Loader Works (Idealized)

MemoryCache

CachedResourceLoader

CachedResource

ResourceRequest

ResourceLoader

ResourceHandle

CachedResourceRequest

The Loader is actually very messy and complicated, but we have a long-term project to clean up its nuttiness

Platform-specific code

8 of 16

How the HTML Parser Works

Tokenizer

TreeBuilder

Bytes

Characters

Tokens

Nodes

DOM

<body>Hello, <span>world!</span></body>

StartTag: body

Hello,

StartTag: span

world!

EndTag: span

body

Hello,

span

world!

body

Hello,

span

world!

3C 62 6F 64 79 3E 48 65 6C 6C 6F 2C 20 3C 73 70 61 6E 3E 77 6F 72 6C 64 21 3C 2F 73 70 61 6E 3E 3C 2F 62 6F 64 79 3E

9 of 16

Preload Scanning for Fun and Profit

Script execution can change the input stream

Preload scanner tokenizes ahead

  • When parser is blocked on external scripts
  • Starts resource loads earlier

Mary had a little lamb

Tokenizer

TreeBuilder

document.write("<textarea>");

10 of 16

XSSAuditor

XSSAuditor examines token stream

Looks for scripts that were also in the request

  • Assumes those scripts were reflected XSS
  • Blocks them

Tokenizer

TreeBuilder

HTTP Request

HTTP Response

XSSAuditor

11 of 16

DOM + CSS → Render Tree

body

Hello,

span

world!

html

head

title

Greeting

img

#footer { position: fixed; bottom: 0; left: 0 }

body > span { font-weight: bold; }

Render

Block

Render

Inline

Render

Text

Render

Image

Render

Text

bold

Layout

Render

Block

fixed

12 of 16

Anonymous RenderObjects

  • Not every RenderObject has a DOM Node
  • Every RenderBlock either:
    • Has all inline children
    • Has no inline children

div

Hello,

div

world!

Render

Block

Render

Block

Render

Text

Render

Block

Render

Text

Anonymous

13 of 16

LayerTree

  • Sparse representation of RenderTree
  • Enables accelerated compositing, scrolling

Render

Block

Render

Inline

Render

Text

Render

Image

Render

Text

bold

Render

Block

fixed

Render

Layer

Render

Layer

14 of 16

Yet Another Tree: LineBoxTree

  • One RootInlineBox per line of text
  • List of inline flow and inline text boxes

<div>An old silent pond...

A frog jumps into the pond,

splash! <b>Silence again.</b></div>

InlineTextBox

InlineTextBox

InlineTextBox

RootInlineBox

RootInlineBox

RootInlineBox

InlineTextBox

Render

Block

Render

Text

Render

Inline

Render

Text

bold

15 of 16

Conclusion

  • WebCore's main processing pipeline:
    • Loader and Parser
    • CSS, DOM, and Script
    • RenderTree, LayerTree, and InlineBoxes
  • Other major subsystems
    • Accessibility, Editing, Events, CSS, Web Inspector
    • Plugins, SVG, MathML, XSLT...
  • Other components
    • WebKit, Bindings, Platform, JavaScriptCore, WTF
    • ... 1.5 MLOC of C++
  • Learn more:

16 of 16

Thanks!

abarth@webkit.org