1 of 49

(Why We Need) Large Code Base Change Ripple Management in C++

Niall Douglas

2 of 49

Contents:

What I am pitching: a new Boost library, and a possible motivating vision of a long-term future for C++ and Boost
Many (contentious) claims as to why ...

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

3 of 49

What I am pitching: a low-level embedded graph database for Boost

4 of 49

The proposed embedded graph database

All first tier content is a standard file which can be opened, mmapped etc
Takes advantage of filing system specific features such as extents, metadata/data journaling, hole punching, copy on write, bitrot self healing etc
Strong versioning and MVCC concurrency
Per-graph content protection (e.g. parity healing)
Content addressable with a per-graph hash of your choice

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

5 of 49

The proposed embedded graph database

Per-graph optional ACID transactions
Per-graph arbitrary indexers (e.g. Boost.Graph, SQLite3, ZIP etc)
Network shardable to other copies with interrupted partial copy resumption
Objects can be executable (in fact is self hosting)
Uses an algorithm very close to git
Designed to act like increasing mount points of ‘database-ness’ overlaid onto the filesystem

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

6 of 49

The proposed embedded graph database

Performance is expected to be within two and five orders of magnitude slower than the big iron graphstores
Write transaction performance shouldn’t be lower than ten per second hopefully
All designed to work during very early process bootstrap i.e. before shared libraries are loaded
There is nothing close in existing software - this design and its abilities are very unique

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

7 of 49

It is really more of a “generic data persistence library”

8 of 49

Want more detail?

There is a 25,000 word accompanying position white paper on ArXiv at http://arxiv.org/abs/1405.3323:

What makes code changes ripple differently in C++ to other languages?
Why hasn’t C++ replaced C in most newly written code?
What does this mean for C++ a decade from now?
What C++ 17 is doing about complexity management
What C++ 17 is leaving well alone until later: Type Export
Detail about the embedded graph database design
Two example killer applications for such a graph database namely:

An example C++ object components design (similar to Bandela’s)
An example extensible Filesystem design

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

9 of 49

What does any of this have to do with the price of fish?

What does this have to do with change ripple management?
Or Boost?
Or C++?
Or anything?

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

10 of 49

Let the contention begin!

I am now going to articulate a (motivating) vision for a long term future goal of C++ and Boost which explains why we might absolutely need one of these databases soon

I will then make a series of supporting claims most of which will be contentious (and hence I place them last!)

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

11 of 49

What is this (motivating) vision of the future of C++ and Boost?

12 of 49

A post-C++ 17 goal:

I’d like to see a world where we can write C++ as if everything in the solution (including Python, Lua, PHP etc, including C++ in closely related processes) is header only, no matter the size of the program

(It is a natural result of a complete ABI management solution)

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

13 of 49

Example:

class Foo { virtual void boo(int a); };

A rule in the graph database says that when this type is changed, it should be reflected via std::reflect into a Python binding:

class Foo:

@accepts(int)

@returns(None)

def boo(self, a)

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

14 of 49

Example:

Let’s break Foo’s ABI:

class Foo { virtual void boo(double); };

When you hit compile in C++ for each use of Foo::boo() in Python code you get:

TypeWarning: 'boo' method accepts (float), but was given (int)

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

15 of 49

To clarify:

C++ moves from a source file compilation model to a type graph compilation model (similar to exported templates)
The type graphs are compiled a bit like GLSL shaders into many tiny C++ Modules i.e. bits of precompile all put into the graphstore
Reflection (runtime) equals a graph query
To bootstrap a C++ process equals visiting a graph query for all the matching tiny C++ Modules

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

16 of 49

Consequences:

Instant notification of breakages from a code change no matter how far away
Optimally minimal rebuild (and can dispense with external build tools)
Optimal optimisation which can be pushed onto a batch pass/cloud compute

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

17 of 49

Consequences:

Easy components via ripple propagation rules in the graphstore
No longer pollutes all over the C symbol table
Finally a real ABI management solution
Other programming languages would be very interested in this

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

19 of 49

Claims (to which I shall return):

C++ is in relative decline
Boost is in both absolute and relative decline

Therefore:

We need to return to becoming a better systems programming language
We need “signature projects” for C++ 11/14

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

20 of 49

Why this instead of including more code per-compiland?

21 of 49

Where hardware is going soon

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

22 of 49

Where hardware is going soon

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

23 of 49

My best speculations on effects:

I therefore claim these likely outcomes in the future:

The cost of including ever more source code per-compiland stops being sunk by transistor density growth
Therefore build times start to rise and stop falling with time
Therefore C++ starts to look more like in the 1990s with small compiles and large links

Except with all our fancy modern C++ techniques

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

24 of 49

My best speculations on effects:

Also:

There will be a return to growth for systems programming languages as software is refactored to cope with linear growth CPU and RAM but still exponential growth storage
I claim this will happen around 2017-2020 if present trends continue and no surprises turn up

(Mass production of Graphene or Phosphorine transistors won’t be ready by 2020 at present rates of R&D)

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

25 of 49

Will C++ be that majority choice of systems language?

26 of 49

The present structural revolution

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

27 of 49

Claim:

C++ has been in relative decline for a decade

(with a pause 2009-2011)

28 of 49

The present structural revolution

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

29 of 49

The present structural revolution

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

30 of 49

Possible reasons why C++ is in relative decline

And why should we care?

31 of 49

Why C++ is in relative decline?

C++ is no longer a general purpose programming language - it’s a niche specialist language suited for:

Low latency (async etc)
Maximum performance (maths etc)
Gluing application and service code written in other languages like Python or C# together

Note that C is good at all of the above too, and still remains more popular in open source than C++ for new code

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

32 of 49

Why C++ is in relative decline?

C++ has stopped trying to be the best systems programming language possible

C++ 11/14 adds a ton of great stuff BUT …

Did any of it persuade someone like Linus that C++ might be tolerable in the Linux kernel?
Do the Python/Ruby/Lua/PHP interpreter guys look at C++ 11/14 and go “wow that transforms our use case for C++ over C”?

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

33 of 49

Why should we care?

If:

C++ remains best in class for high performance math
C++ remains very strong in low latency async
BUT C remains preferred to C++ as a systems programming language

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

34 of 49

Why should we care?

Then, assuming the previous claims are true, a reasonable prediction of the future is:

C (or some extension thereof) gets the majority of post-exponential hardware systems language growth
C++ becomes ever more like Haskell, with only a rarified programmer elite able to touch it

Is that what we want?

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

35 of 49

Claim:

Since 2011 Boost is in both an absolute & relative decline

36 of 49

The Decline of Boost

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

37 of 49

The Decline of Boost

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

38 of 49

The Decline of Boost

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

39 of 49

Possible reasons why Boost is in decline

And why should we care?

40 of 49

Why Boost is in decline?

The C++ 11/14 standard library now can do what people used to need Boost for

Unfixed bugs in Boost means that people simply switch to C++ 11/14 instead if they can
People think a particular Boost library needs all of Boost as a dependency - a real showstopper
Boost is seen as simply no longer relevant

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

41 of 49

Why Boost is in decline?

To quote a highly respected engineer from this very conference who said a few days ago:

“Boost used to be about all the stuff you really wanted in the standard. Now Boost looks like all the stuff that wasn’t good enough to get into the standard”

- somebody well known (not me)

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

42 of 49

Why Boost is in decline?

Boost has become two mutually incompatible sets of libraries:

The C++ 11 STL emulation library for C++ 98

A set of libraries which push the boundaries of C++, as Boost once used to in the 1990s

With the latter being suffocated of late

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

43 of 49

Why Boost is in decline?

Most of the interesting C++ 11 libraries on the internet appear to have no interest in joining Boost (with a few honourable exceptions)

I personally find that very scary

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

44 of 49

Why Boost is in decline?

Boost makes no attempt to preserve ABI stability, and therefore is not welcome in large stable code bases

None of the improvements in C++ 11/14 do anything for change ripple management, so even mild ABI breakage is intolerable and therefore Boost is banned/pinned to some ancient version

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

45 of 49

Why should we care?

Simple answer:

How many of the changes to C++ 11/14 standard over C++ 98/03 originated in Boost?

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

46 of 49

Assertion:

C++ ought to return to trying to become a better systems programming language

47 of 49

Assertion:�We want Boost to continue to lead out the future of C++

(and therefore all systems programming)

48 of 49

So what?

What if one or all of the earlier assertions is false?

What if none of the issues described is a real problem?

What if all this is merely hand wavy nonsense?

An embedded graph database is still extremely useful:

In 2014 it is still too hard for more than one process to write to many files concurrently
In 2014 it is still too easy to lose data

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

49 of 49

Want more detail?

There is a 25,000 word accompanying position white paper on ArXiv at http://arxiv.org/abs/1405.3323:

What makes code changes ripple differently in C++ to other languages?
Why hasn’t C++ replaced C in most newly written code?
What does this mean for C++ a decade from now?
What C++ 17 is doing about complexity management
What C++ 17 is leaving well alone until later: Type Export
Detail about the embedded graph database design
Two example killer applications for such a graph database namely:

An example C++ object components design (similar to Bandela’s)
An example extensible Filesystem design

Large Code Base Change Ripple Management Niall Douglas Paper: http://arxiv.org/abs/1405.3323

1 of 49

2 of 49

3 of 49

4 of 49

5 of 49

6 of 49

7 of 49

8 of 49

9 of 49

10 of 49

11 of 49

12 of 49

13 of 49

14 of 49

15 of 49

16 of 49

17 of 49

18 of 49

19 of 49

20 of 49

21 of 49

22 of 49

23 of 49

24 of 49

25 of 49

26 of 49

27 of 49

28 of 49

29 of 49

30 of 49

31 of 49

32 of 49

33 of 49

34 of 49

35 of 49

36 of 49

37 of 49

38 of 49

39 of 49

40 of 49

41 of 49

42 of 49

43 of 49

44 of 49

45 of 49

46 of 49

47 of 49

48 of 49

49 of 49