1 of 111

Gecko and C++ Onboarding

5 June 2017

Nathan Froyd @froydnj

Improvements by Waldo, mstange, mccr8

2 of 111

Today’s suggested topics

  • Subsystems
  • XPCOM strings
  • XPCOM datatypes
  • infallible alloc
  • Coding style
  • C++ guidelines

3 of 111

Release the Gecko firehose

  • Subsystems
  • XPCOM strings
  • XPCOM datatypes
  • infallible alloc
  • Coding style
  • C++ guidelines
  • Refcounting
  • Smart pointers
  • XPIDL
  • WebIDL
  • IPDL
  • e10s
  • NSPR
  • Threading

4 of 111

Today’s goal

Familiarization, not expertise

5 of 111

Please ask questions!

#engineering-onboarding for backchannel

6 of 111

Overview

  • Brief overview of Gecko’s architecture
  • Gecko’s C++ dialect
  • Gecko’s C++ data structures
  • Code generation
  • Development tools

7 of 111

Gecko architecture: history

  • Gecko is 20ish years old
  • Good ideas then are not good ideas now
  • MSCOM-inspired architecture
    • We created cross-platform COM: XPCOM

8 of 111

Gecko architecture: XPCOM

  • Everything derives from nsISupports
  • Common interface definition language (XPIDL)
  • Set of rules for objects/methods to follow
  • Implement objects in different languages
    • This turns out to be really difficult

9 of 111

Gecko architecture: XPCOM basics

  • Classes are components
    • Implementing interfaces to expose functionality
  • Create instances of components using human-readable contract ID strings
  • Query for the interface you want to use
    • Returns nullptr if interface not supported

10 of 111

Gecko architecture: XPCOM contract IDs

  • Contract IDs aren’t centrally registered
    • "@mozilla.org/network/server-socket;1"
    • "@mozilla.org/timer;1"
    • "@mozilla.org/file/local;1"
  • Ask/search to find them

11 of 111

Gecko architecture: XPCOM services

  • Certain components are services
    • Only one instance process-wide
    • Observer service
    • Preferences service
    • etc.
  • Convention only!

12 of 111

Gecko architecture: portability

  • Used to support many architectures/OSes
  • Created NSPR to abstract away differences

13 of 111

Gecko architecture: evolving

  • Components didn’t work
  • Embedding didn’t work (for many reasons)
  • Multiple languages didn’t work
    • JS is the One True Web Language
  • Portability is a good thing
    • Fewer platforms, better standards nowadays
  • Embrace one big library: libxul
    • XPCOM still useful for C++/JS interop
    • Leverage C++ more

14 of 111

Overview

  • Brief overview of Gecko’s architecture
  • Gecko’s C++ dialect
  • Gecko’s C++ data structures
  • Code generation
  • Development tools

15 of 111

C++ dialect: style

  • We have a style guide
  • Not all code conforms to the style guide
    • We are working on that
    • js/ has its own style
  • Please write new code to follow the guide
    • Follow local conventions when necessary
    • Use clang-format if you like

16 of 111

C++ dialect: infallible allocation

  • new T(...) and new T[] are infallible
    • failure to allocate will crash the program
    • arguably better than consistent null-checking
  • new (fallible) T(...) is fallible
    • result must be null-checked
    • Use when we don’t control allocation amount
  • Data structures have fallible/infallible methods

17 of 111

C++ dialect: static analysis

  • Compiler warnings
  • Compiler-specific attributes: mfbt/Attributes.h
  • Custom static analyses
    • “S” builds on Treeherder
  • Coverity

18 of 111

C++ dialect: language features

  • We don’t use exceptions
  • We don’t use RTTI
    • Bloat, and won’t work with JS-implemented things
    • Also not supported on Android
  • Mostly complete C++ feature matrix

19 of 111

Brief note on portability

  • Rate of standards adoption differs
  • Quality of implementation differs
  • Interface guarantee mismatch
    • Memory reporting, for instance
  • Sometimes better to do our own thing

20 of 111

C++ dialect: std:: classes

  • Historically we have avoided std:: things
    • Concerns around exceptions
    • Not necessarily available everywhere
  • Use the Gecko versions absent a compelling reason to do otherwise
  • Please file bugs on terrible APIs

21 of 111

Progress bullets

  • Brief overview of Gecko’s architecture
  • Gecko’s C++ dialect
  • Gecko’s C++ data structures
  • Code generation
  • Development tools

22 of 111

XPCOM / MFBT data structures

  • We have home-grown versions of most of the CS201 data structures:
    • Growable vectors
    • Hashtables
    • Linked lists
    • Strings
  • Yes, we duplicate a lot of std:: stuff

23 of 111

Growable vectors

  • mozilla::Vector
    • mfbt/Vector.h
    • Follows C++ construction/destruction rules
  • nsTArray.h

24 of 111

MFBT linked lists

  • mozilla::LinkedList<T>
    • “intrusive” doubly-linked lists
    • mfbt/LinkedList.h
  • No singly-linked list type
  • May also see NSPR’s PRCList
    • C-style, with macros
    • Please don’t use in new code

25 of 111

XPCOM hashtables

  • nsTHashtable<T>
    • xpcom/ds/nsTHashtable.h
    • Really more of a hash-set…
    • Entry types must conform to a particular interface described therein
    • Common (inheritable) key types in xpcom/ds/nsHashKeys.h

26 of 111

XPCOM hashtables, cont’d

  • Several specializations for common cases
  • Key types conform to the nsTHashtable entry type spec

27 of 111

XPCOM hashtables, cont’d

  • MDN has hashtable documentation

28 of 111

XPCOM strings

  • Everybody wants better strings than C strings
  • Nobody likes their string API
  • Especially us!

29 of 111

XPCOM strings, cont’d

  • Compile-time class hierarchy
    • ...using run-time flags
  • String buffer sharing
    • Parts of Gecko do depend on this

30 of 111

XPCOM string classes

  • nsString/nsCString
    • wide-char/char versions
    • all-purpose string
  • nsAutoString/nsAutoCString
    • incorporates fixed-width buffer
    • efficient short strings

31 of 111

XPCOM string classes, cont’d

  • nsDependent{,C}String
    • re-uses somebody else’s buffer
    • lifetime issues can, of course, bite you here
  • nsAString/nsACString
    • used for XPIDL argument passing
    • occasionally used for argument passing elsewhere

32 of 111

XPCOM strings, cont’d

  • No encoding requirement enforced
    • nsCString could be ASCII, Latin-1, UTF-8, etc.
    • Helpers exist for some encoding conversions

33 of 111

XPCOM string helpers

  • Conversion “strings”
    • NS_ConvertUTF16ToASCII
    • NS_ConvertUTF16ToUTF8
    • NS_ConvertUTF8ToUTF16
  • nsPrintfCString

34 of 111

XPCOM string helpers: example

nsString x; …

NS_ConvertUTF16ToUTF8 utf8(x);

// use utf8 as a normal nsCString

// pass conversion to a const char *

f(..., NS_ConvertUTF16ToUTF8(x).get());

35 of 111

XPCOM strings, documentation

36 of 111

Owned pointers

  • std::unique_ptr
  • mozilla::UniquePtr
    • mfbt/UniquePtr.h
    • Same great interface

37 of 111

Owned pointers: the old way

  • nsAutoPtr<T>
  • Acts like std::auto_ptr
    • ...and like std::auto_ptr, it is deprecated

38 of 111

Reference counting

One day a student came to Moon and said: "I understand how to make a better garbage collector. We must keep a reference count of the pointers to each [object]."

Moon patiently told the student the following story:

"One day a student came to Moon and said: ‘I understand how to make a better garbage collector…”

—AI Koan from The New Hacker’s Dictionary

39 of 111

Reference counting: basic interface

  • AddRef() / Release() methods
    • virtualness of methods doesn’t matter
  • Gecko reference-counts objects
    • Not references to objects, as std::shared_ptr

40 of 111

Reference counting: thread safety

  • Thread safety is per-class
  • DEBUG builds verify thread usage

41 of 111

Reference counting: smart pointers

  • RefPtr<T>
    • Used with concrete classes
  • nsCOMPtr<T>
    • Used with interface classes (XPIDL, nsI*)
    • Cooperates with a raft of other things
      • Lots of do_Thing helpers

42 of 111

Refcounting: cycles

  • Reference counting not perfect
  • Easy to get cyclic references
    • ...which result in memory leaks
  • Not just C++ <-> C++, either
    • ...cycles can go through JS objects

43 of 111

Refcounting: cycle collector

  • Solution: periodically collect cycles
    • close integration with the JavaScript GC
  • Generally only relevant for DOM things
  • Lots of macros for making this sort-of-nice

44 of 111

Refcounting: declaration

  • Macros handle lots of this
  • For nsISupports things:
    • single-threaded (default): NS_DECL_ISUPPORTS
    • multi-threaded: NS_DECL_THREADSAFE_ISUPPORTS

45 of 111

Refcounting: definition

// in nsThing.cpp

// concrete class name followed by

// variadic list of interfaces implemented

NS_IMPL_ISUPPORTS(nsThing, nsIThing);

// works the same for threadsafe ones, too

NS_IMPL_ISUPPORTS(nsThreadsafeThing, nsIThing);

46 of 111

Refcounting: declarations, cont’d

  • For non-nsISupports things:
    • single-threaded: NS_INLINE_DECL_REFCOUNTING(class-name)
    • multi-threaded: Same, but with NS_INLINE_DECL_THREADSAFE_REFCOUNTING

47 of 111

Refcounting: implementation

  • Most times, that will be all you need
  • See xpcom/base/nsISupportsImpl.h for all the grotty details

48 of 111

Refcounting: already_AddRefed

  • Returning refcounted things
  • Passing in refcounted things
    • Can cleverly avoid AddRef/Release on wrong thread
  • Useful pre-C++11
    • No useless refcounting, guaranteed
  • Less useful nowadays?

49 of 111

Leak checking

  • DEBUG-only leak checks
  • Counts instrumented classes
    • Reference-counted classes automagically get this
  • Doesn’t count individual malloc/new
    • ASan/Valgrind test runs

50 of 111

Leak checking: macros

  • For non-reference-counted classes
    • Constructors: MOZ_COUNT_CTOR(klass-name)
    • Destructor: MOZ_COUNT_DTOR(klass-name)

51 of 111

Leak checking: macro considerations

  • Checking hierarchies is difficult
    • MOZ_COUNT_CTOR_INHERITED
    • MOZ_COUNT_CTOR in the base class
  • Checking templated classes is difficult
    • Use a non-templated base class

52 of 111

Leak checking: more information

  • MDN article on BloatView
    • Runs by default on DEBUG tests

53 of 111

Logging

  • export MOZ_LOG=”MyModule:5”
  • export MOZ_LOG_FILE=”/path/to/file”
  • MDN describes Gecko’s logging

54 of 111

Logging modules

  • No central mechanism for naming modules
  • Search for LazyLogModule for interesting modules to log

55 of 111

Dynamic logging

  • Can toggle logging on the fly
    • Only for LazyLogModule modules
  • Set the logging.$MODULE_NAME pref
    • Numeric or string values
    • “debug” or “verbose” (4 or 5, resp.) most likely
    • See xpcom/base/Logging.h LogLevel for all values

56 of 111

Progress bullets

  • Brief overview of Gecko’s architecture
  • Gecko’s C++ dialect
  • Gecko’s C++ data structures
  • Code generation
  • Development tools

57 of 111

Code generation

  • Three main kinds of code generation
    • XPIDL: old bindings to chrome/content JS
    • IPDL: multiprocess communication
    • WebIDL: new bindings to chrome/content JS
  • Other one-off code generators throughout the tree
  • All written in Python

58 of 111

XPIDL files

  • Define interfaces for C++ or JS to implement
  • Generates C++ class boilerplate
  • Generates metadata for calling into C++ from JS
  • MDN documentation on XPIDL

59 of 111

XPIDL files: generated code

  • Headers for each .idl wind up in $OBJDIR/dist/include/
  • nsIFile.idl generates nsIFile.h
  • IDL files can define multiple interfaces, and so can the generated header

60 of 111

XPIDL files: generated code

  • Each interface nsIFoo generates
    • nsIFoo class definition
    • NS_DECL_IFOO macro for declaring all of nsIFoo’s interface in C++
    • NS_IFILE_IID{,_STR} macros for the uuid of the interface
    • other macros not typically needed

61 of 111

XPIDL files: generated code

IDL text

C++ translation / prototypes

const unsigned long NORMAL_FILE_TYPE = 0;

enum { NORMAL_FILE_TYPE = 0 };

void append(in AString node);

NS_IMETHOD Append(const nsAString&);

bool exists();

NS_IMETHOD Exists(bool*);

readonly attribute long long fileSize;

NS_IMETHOD GetFileSize(int64_t*);

attribute AString leafName;

NS_IMETHOD GetLeafName(nsAString&);

NS_IMETHOD SetLeafName(const nsAString&);

62 of 111

XPIDL files: generated code

  • NS_IMETHOD is short for “virtual nsresult” + Windows-specific goo
  • nsresult value indicates success/failure
    • success: NS_OK
    • failure: NS_ERROR_FAILURE (and many others)
    • test with NS_SUCCEEDED(val) or NS_FAILED(val)
  • Complete list in xpcom/base/ErrorList.h

63 of 111

XPIDL files: generated code

  • Return values passed through outparams
    • only set outparam when you’re returning NS_OK
    • sometimes see people doing it anyway, for safety or simplicity
  • Similarly, don’t use outparam at caller until NS_SUCCEEDED checked
  • As always, propagate errors to caller(s)!
  • Errors turn into JS exceptions

64 of 111

IPDL files

  • Describe multi-process “protocols”
  • Codegens uninteresting, repetitive parts of protocol(s)
  • Provides C++ structure for the interesting parts

65 of 111

IPDL codegen

  • ipc/ipdl/ipdl.py and associated bits
    • C++ code in $OBJDIR/ipc/ipdl
    • C++ headers in $OBJDIR/ipc/ipdl/_ipdlheaders
    • Headers organized by C++ namespace

66 of 111

IPDL links

67 of 111

WebIDL files

  • W3C standard for describing web interfaces
  • Similar to IDL
  • Semantics closer to JavaScript
  • Richer types, method overloading, etc.

68 of 111

WebIDL bindings: why?

  • XPIDL C++ <-> JavaScript is rather slow
  • JavaScript JIT ignorant of XPIDL methods
  • Use WebIDL files to autogenerate lots of code
  • Makes C++ objects look much more like regular JS objects
    • JIT can know about semantics

69 of 111

WebIDL bindings: links

  • Extremely nice page on MDN describing bindings and C++ interfaces

70 of 111

Progress bullets

  • Brief overview of Gecko’s architecture
  • Gecko’s C++ dialect
  • Gecko’s C++ data structures
  • Code generation
  • Development tools

71 of 111

Code search

  • DXR: dxr.mozilla.org
    • Also indexes addons
  • Searchfox: searchfox.org
    • DXR clone, philosophical differences
    • Better UI for some things

72 of 111

Profiling

  • Built-in profiler (Nightly only)
  • MDN documentation

73 of 111

Debugging

  • TL;DR: ./mach run --debug
    • gdb on Linux
    • Visual Studio on Windows (windbg available)
    • lldb on Mac
  • ./mach run --debug --debugger=$PROG
  • --debugger works with mach mochitest, etc. etc.

74 of 111

Debugging: rr

  • Record-and-replay tool
  • Linux-only, unfortunately
    • Works in VMWare (get a license from ServiceNow)
  • “Where is this value coming from?”
    • Set a watchpoint on memory, reverse-continue
  • http://rr-project.org/

75 of 111

Last words

  • There’s a lot of code
    • ...but it’s all just code
  • Nothing sacrosanct about existing code
  • Don’t assume the current code is “the best” just because it’s already there

76 of 111

Feedback!

  • froydnj on IRC/Slack/bugzilla
  • froydnj@mozilla.com

77 of 111

Refcounting: ownership transfers

Klass* Klass::Create(...)

{

RefPtr<Klass> k = new Klass(...);

// Other stuff

return /* ??? */

}

78 of 111

Ownership transfer strategies

  • Outparam (T**)
    • Common with XPIDL-style interfaces
  • already_AddRefed<T> return value
    • Generally indicates new object entirely yours
  • RefPtr<T> or nsCOMPtr<T> return value
    • “Modern” C++ way

79 of 111

Ownership transfers

Pointer outparam

already_AddRefed<T>

Smart pointer

Identifying feature

T** argument, usually last

return type

return type

Passing ownership

p.forget(outparam);

return p.forget();

return p;

Caller

RefPtr<T> p;

o->Method(getter_AddRefs(p));

RefPtr<T> p;

p = o->Method();

RefPtr<T> p;

p = o->Method();

Use cases

XPIDL-style interfaces

static Create() methods

many

80 of 111

QueryInterface: XPCOM’s dynamic_cast

  • Every interface in Gecko gets a UUID
  • QueryInterface asks classes whether they support a particular UUID (== interface)
  • Calling QueryInterface manually would be annoying
  • Bunch of infrastructure to make it easier

81 of 111

QueryInterface example

nsCOMPtr<nsIFoo> foo = …;

nsCOMPtr<nsIBar> bar =

do_QueryInterface(foo);

// bar is now non-null if foo was non-null

// and supports the nsIBar interface

82 of 111

nsCOMPtr helpers: do_CreateInstance

  • XPIDL needed object creation across C++/JS
  • Name interfaces with “contract IDs”, not UUIDs
    • Semi-human-readable strings
  • Enforced creation/initialization separation
    • Easy to propagate nsresult exceptions from C++

83 of 111

do_CreateInstance example

nsCOMPtr<nsIFoo> f =

do_CreateInstance(“@mozilla.org/foo;1”);

nsresult rv = f->Init();

// Roughly equivalent JS

let f = Cc[“@mozilla.org/foo;1”]

.createInstance(Ci.nsIFoo);

f.init();

84 of 111

nsCOMPtr helpers: do_GetService

  • Services are XPCOM’s singletons
  • Lazily created via do_GetService
    • We found do_GetService overhead prohibitive for common services
    • So we have an alternate, faster way to get those…
    • xpcom/build/ServiceList.h

85 of 111

do_GetService example

nsCOMPtr<nsICatchyService> service =

do_GetService(“@mozilla.org/catchy;1”);

// Alternative, preferred if available

service = services::GetCatchyService();

86 of 111

nsCOMPtr helpers: do_GetInterface

  • nsIInterfaceRequestor is a reasonably common interface
  • Asks “do you have an object corresponding to this UUID?”
    • Might be a direct member of the concrete class
    • Might be the class itself
    • Might be something else entirely
  • Yes, this is evil, so we don’t add new implementations of it.

87 of 111

do_GetInterface example

nsCOMPtr<nsIFoo> foo = …

nsCOMPtr<nsIInterfaceRequestor> iir =

do_QueryInterface(foo);

nsCOMPtr<nsIBar> bar =

do_GetInterface(iir);

// bar now non-null if

// - foo supported nsIInterfaceRequestor

// - foo knew how to provide nsIBar things

88 of 111

Progress bullets

  • What happens when I click a link?
  • Nuts and bolts of Gecko C++
  • Code generation
  • Memory management
  • Gecko C++ data structures

89 of 111

Debugging

  • TL; DR: ./mach run --debug
    • gdb on Linux
    • Visual Studio on Windows
    • lldb on Mac
  • ./mach run --debugger=$PROG
  • --debugger works with mach test, mach mochitest, etc. etc.

90 of 111

Debugging: rr

  • rr: record-and-replay debugger
    • rr record ./mach run
    • rr replay
  • Works in VMWare
  • rr-project.org

91 of 111

Progress bullets

  • What happens when I click a link?
  • Nuts and bolts of Gecko C++
  • Code generation
  • Memory management
  • Gecko C++ data structures

92 of 111

What happens when I click a link?

93 of 111

Gecko overview: docshell

  • docshell/ directory
  • Handles grotty details around navigation, document loading, session history
  • See also the uriloader/ directory

94 of 111

Gecko overview: networking

  • netwerk/ directory
  • DNS lookups (dns/)
  • Protocols (HTTP, FTP, WebSocket, etc…) (protocol/)
  • Document caching (cache2/)
  • Cookies (cookie/)

95 of 111

Gecko overview: parsing

  • parser/ directory
  • HTML5 parser (html/)
  • Old-style HTML (htmlparser/)
    • Only used for about:blank
  • XML (xml/, expat/)
  • Other formats (CSS, JavaScript, images) handled elsewhere

96 of 111

Gecko overview: DOM

  • dom/ directory
  • Represents the structure of the page
  • Absolutely massive
    • Documents, elements (base/, html/)
    • Various web APIs (indexedDB/, others)
      • Distinct APIs usually in their own subdir
    • WebIDL definitions (webidl/)
    • New-style WebIDL-driven bindings (bindings/)

97 of 111

Gecko overview: layout

  • layout/ directory
  • Turns the DOM into boxes
  • Also absolutely massive
    • CSS implementation (style/)
    • Frames, display lists (base/, generic/)
    • SVG implementation (svg/)
    • XUL (xul/)

98 of 111

Gecko overview: graphics

  • gfx/ directory
  • Also absolutely massive
    • Basic 2d drawing primitives (2d/, thebes/)
    • Layers for compositing (layers/)
    • Various amounts of third-party code (angle/, cairo/, graphite2/, harfbuzz/, ots/, skia/, ycbcr/)

99 of 111

What does all that other code do?

We didn’t cover all the toplevel directories.

What are the other big pieces?

100 of 111

Gecko overview: storage

  • storage/ directory
  • We use SQLite for a lot of things
    • IndexedDB/localStorage
    • browser history/bookmarks (“Places”)
    • cookie database
    • ServiceWorker cache API
  • MozStorage provides a nice multithreaded API over raw SQLite

101 of 111

Gecko overview: images

  • image/ directory
  • Decoding and rendering images
    • PNG, BMP, JPEG, etc.
    • Moving as much work off the main thread as possible

102 of 111

Gecko overview: media

  • media/ directory
  • Imported code for sound/image/video handling
  • Also where our (imported) WebRTC implementation lives

103 of 111

Gecko overview: widget

  • widget/ directory
  • Interface between Gecko and native window/event handling
  • Separate directories for various platforms (android/, cocoa/, gtk/, windows/)

104 of 111

Gecko overview: IPC

  • ipc/ directory
  • Code for managing multiple processes
    • sending/receiving messages
    • object serialization
    • separate event loop
    • IPDL code generator
  • Based on ye olde Chromium code, modified beyond all mergeability

105 of 111

Gecko overview: JavaScript

  • js/ directory
  • JavaScript implementation, etc.
  • “public” interfaces in public/
  • Source code in src/
    • Garbage collector in src/gc/
    • JIT compiler in src/jit/

106 of 111

Gecko overview: XPConnect

  • js/xpconnect/ directory
  • “Deep Magic”
  • Manages reflection of objects between JS and C++
  • Also defines a fair amount of our security architecture for JavaScript

107 of 111

Gecko overview: MFBT

  • mfbt/ directory
  • “Mozilla Framework Based on Templates”
  • Shared between Gecko and the JS engine
    • Classes we find useful
    • Polyfills for C++ stdlib things

108 of 111

Gecko overview: XPCOM

  • xpcom/ directory
  • “Cross-platform COM”
  • Hodgepodge of building blocks
    • data structures (array, hashtables, strings)
    • core event loop (threads, timers)
    • parts of JS interface to chrome code
    • not-quite complete abstract I/O interface
    • cycle collector

109 of 111

Gecko overview: NSPR

  • nsprpub/ directory
  • “Netscape Portable Runtime”
  • Abstracts threads, locks, I/O, logging, etc.
    • Sometimes we have nice wrappers
    • Sometimes we use things directly
    • Sometimes the C++ stdlib has better alternatives
  • Different commit structure, release schedule, etc. than Gecko itself

110 of 111

Progress bullets

  • What happens when I click a link?
  • Nuts and bolts of Gecko C++
  • Code generation
  • Memory management
  • Gecko C++ data structures

111 of 111

Nuts and bolts: C++ visibility

  • We ship one big shared library, xul.{dll,so}
    • Improves startup time
  • Exporting symbols from xul is expensive
    • Runtime penalties
    • Size penalties
    • Exposed interface penalties (malware hooks, etc.)
  • Strive to export as little as possible
    • The Right Thing happens by default