1 of 21

Building an ISP in a box

By Justin Kilpatrick

2 of 21

Why is the last mile so expensive?

WISPs or Wireless ISPs build out networks for less than $300 in capital costs per user

May only have a single employee for hundreds of users

The last mile is profitable to build, but expensive to coordinate

As an example consider a wireless ISP

Wireless ISP’s use point to point long range antennas, all in installing a user costs about $300 in hardware. These costs are well within the reach of a small business. In fact all the equipment for a network of several hundred customers costs significantly less than starting a restaurant.

The real challenge of a last mile service provider is not the equipment cost, but the coordination required for installation and acquiring new customers.

<pause>

A significant amount of the cost savings for Wireless ISPs comes from the fact that the owner does all of the groundwork to build a viable network.

fiber and cable show a similar cost breakdown. Actually building infrastructure is wildly profitable, While coordinating it’s construction is nearly impossible.

3 of 21

Althea is a decentralized ISP

Commoditize bandwidth the same way the cloud commoditized server time

Zero software setup, minimal hardware setup

Separate customer service through local ‘network organizers’

Reduce the cost and increase access by automating coordination

Fortunately we have a model to dramatically reduce the cost of organizing networks. The internet is made up of many independent, separately owned autonomous systems that peer together. If any one organization attempted to construct single global network, the coordination cost would be crushing to the point of impossibility. Human communication simply does not scale that well.

It’s a mix of cooperation and competition, made possible by distributed ownership that makes global telecom feasible at all.

<pause>

So why can’t we apply this strategy to last mile networks?

<pause>

The subscription model of internet service is based on monopolizing the infrastructure and the user. It simply precludes the sort of cooperation that made the internet possible.

The goal of Althea is to change the economic model of last mile internet access by scaling down and automating the tasks of a network operator. Not only the routing but also the business functions such as payment and peering agreements.

We want to reduce the cost and increase the availability of last mile infrastructure by reducing an ISP to a single box, a self contained autonomous system, making cooperation as easy as plug and play.

4 of 21

Building an entire ISP, in one very small box

An OpenWRT distribution installed on home routers
Routing with link state detection and price tracking
Automated QOS and congestion management
Peer to peer autonomous billing
Automated incident resolution
All in 128MB ram and 16MB storage

// not actually a script

At this point we can be done with the elevator pitch and get into the details. The real challenge here pretty obviously isn’t making such a network forward packets, or automating it’s setup, but in making sure that an enormous fleet of these devices can actually operate continuously and securely with very minimal supervision.

We chose WiFi routers as a platform because cost in manufacturing is all about scale. By programming existing devices we take on many restrictions, we don’t get the exact hardware we want, but we do get the performance we need, in a package the user can easily have in their home, at a price point they can easily pay.

I’m going to go over our tooling, security measures, and then the challenges of building not just an automated network but automation that’s intended to run unsupervised for months or years.

5 of 21

Legend

Encryption operation

(Wireguard)

Queue operation

Routing Operation

Video stream

Game

Chat

fq_codel

Exit tunnel

NAT

Alice’s Exit tunnel

Bob’s Exit tunnel

Exit tunnel

Babel

Arbitrary

hops / paths

Per-hop

tunnel

Per-hop

tunnel

fq_codel

Internet

NAT

HTB

Router

Exit

TBF

fq_codel

So first things first, here’s a packet flow diagram. All of our networking operations are kept in the Linux kernel.

Our billing, routing, and really everything that makes Althea unique is in userspace. You could make a serious argument that our entire product is simply a very complicated script for setting up routing tables.

Discuss the life of a packet as it flows from a user through some relays and to the exit
The exit is a sort of integrated vpn, used to actually peer user traffic out to the internet
All ipv6 within the network, ipv4 exists only inside the client-exit WireGuard tunnel
Yes there is a second layer of encryption between every set of two devices. This deals with L2 security by ensuring that even devices sharing a broadcast domain can’t impersonate other devices, or perform spam attacks that might be attributed to others. There’s no actual way to prevent a hostile node from spamming a broadcast domain but we can at least isolate it.
The network is a whole is L3 routed by Babel, we’ll get into more of how exactly that works later.

6 of 21

Wireguard Perf

WireGuard is an in kernel high performance VPN

It’s sessionless nature and low overhead make Althea possible

For the ARM router (bottom) kernel routing is far slower than the cryptography

We’re going to start by talking over the loose components of the system. In this case WireGuard, we’re nesting encrypted vpn tunnels on puny embedded device processors. At first glance this seems to be inviting disaster, or performance bad enough to be considered a disaster.

But WireGuard really does live up to the hype of being dramatically more performant than its competitors. Even the least impressive MIPS processors get 30ish mbps of throughput.

We have some kernel thread perf samples of a running Althea device here, graphed with a wonderful perl script by Brenden Gregg.

Mips device dominated by crypto operations
ARM device dominated by routing operations
ARM’s vector instructions allow encryption/decryption to be performed with fewer memory lookups
We get about 150mbps out of the IPQ40xx series of ARMv7 processors (the sample on the bottom, can be had in ~$85 devices) and this is primarily constrained by kernel routing performance / memory lookups not cryptography.
Mention the connectionless and (almost) sessionless nature of WireGuard and how that’s very helpful in a dynamic network

7 of 21

FQ_Codel

Solves the traffic management problem without requiring any human involvement

Applied at two levels to fairly distributed bandwidth between user connections on each router and between routers

Althea can’t use hardware acceleration anyway

First off these are not my diagrams, they are from bufferbloat.net and also abridged to fit well on this slide. The biggest thing we’re missing is a speed boost to 80mbps on the left graph which represents performance without cake.

I decided to focus in on the steady state performance in both graphs which is what we’ll be going over.

Smallest traffic stream first prioritization
Provides the best average throughput and perceived internet responsiveness by the user
Key for Althea because we need to solve the sharing problem, if you’re running a relay and subdividing your bandwidth with an arbitrary number of people a single bandwidth hog can’t be allowed to ruin things. Likewise we can’t have a fixed distribution of the total device capacity since there’s no defined number of devices using the connection. Babel may choose to migrate any given users traffic stream off of or onto your node at any time.
Actually provides a better user experience than a higher throughput connection, all modern home gateways should be running this.

8 of 21

Babel

Open source

Distance vector with link state detection

An exhaustively defined spec with multiple implementations

Extendable

Inserts routes directly as kernel routes

Althea

Babel is an open source distance vector, loop avoiding, mesh routing protocol that uses link detection to determine routing metrics rather than a simple hop count.

https://www.irif.fr/~jch/software/babel/

So I’ll take a moment to present the basics of how Babel works

Babel binds to a provided list of interfaces and sends a selection of ipv6 multicast packet to all other babel instances on the broadcast domain (L2)
Hello packets and IHU (I Heard You) packets are for neighbor discovery and also used to estimate packet loss by recording how often the exchange is completed successfully. By default these are sent every 4 seconds.
Route updates contain the routing table of the peer these updates are combined with your local routing table and the lowest cost route to any given destination is selected and installed as a kernel route. Likewise you can configure babel to rebroadcast a selection of your own routes.
Babel has a simple route ‘cost’ field, it adds a fixed amount to every route and increases this amount based on the packet loss estimate from the Hello/IHU exchange with the neighbor who broadcasted the route. Functionally this is a combination of distance vector and link quality detection.

9 of 21

The key Insight, multi-vector distance vector

We want to create a network that is also a market for bandwidth

If we phrase all purchase criteria as distance vector metrics the output of Bellman-Ford routing becomes the ‘best buy’

But we have to keep each vector separate so that it can be evaluated individually. You can’t know what price to pay if you have a single ‘metric’ field

Althea

So I mentioned on the last slide that canonically the babel protocol advertises routes with a single metric field, nodes are allowed and encouraged to arbitrarily modify this field based on their own willingness to route, packet loss detection, and whatever extensions they have installed. So long as they don’t subtract from it it’s perfectly allowed by the Babel spec.

This isn’t really workable for us because we need to know more than one piece of info about routing, we need to know both the quality and the cost. If we were to combine these into a single metric we would obviously lose too much information to operate.

So what we did was extend babel with a ‘price’ distance vector metric. You simply add your own price to any route you rebroadcast. We’ll get into the details of exactly what a ‘price’ means a little later. Right now we have to decide what to do with the original metric field. Because it’s still arbitrary and that’s a problem.

10 of 21

Verifiable metrics

We can make an efficient automated market out of distance vector, but distance vector is trivially spoofable.

If we assume we have an encrypted connection to our destination we can measure some network properties objectively

Round trip time and cumulative packet loss can be determined by cooperating nodes on each end, then compared to the advertisement

Althea

So most routing protocols and all of the routing protocols we deal with on a day to day basis as network operators are trustful. Meaning they assume that nodes are generally doing the right thing. Specific liveness failures are tolerated, but if a hostile node where to insert maliciously crafted data BGP, OSPF, and Babel will all immediately fall to the attacker.

There are some examples of routing protocols designed to be secure, I don’t think I have enough time to cover them today though.

Cover the theoretical ways you can validate routes (if time permits)

Secured scalable source routing
SEMPTOR (BMX8)
Dig up that paper about ring detection (Axel Neumann’s advisor?)

Our validation scheme is a simple variant of what’s called ‘tunneled probing’ tuned to operate well in a babel style distance vector network.

Instead of having a general ‘metric’ field we create a well defined and explicit ‘packet loss’ metric that is propagated with all routes, then using an encrypted connection with the destination we can estimate the packet loss free of interference by intermediate nodes. This can then be compared to the advertised packet loss to sniff out any funny business.

The metric that we determine ourselves is then used to overwrite the falsely advertised data and our route selection is repaired.

I will note that while we have implemented full path rtt as a metic and do some basic tunneled probing tasks, it needs significant refinement to be accurate enough to use to detect attacks and be trusted to respond reliability. This is actually the only subject in the talk that’s not implemented and actively operating in prod.

11 of 21

Example network

For routing decisions we re-aggregate the metrics as the sum of logs

Exact outcome dependent on user preference

Kernel routing only allows one route per destination at a time

While route computation is O(neighbors) memory usage is O(network * neighbors)

A

D

B

C

F

E

Price 5

Price 4

Price 8

Price 1

Price 10

Price 5

Price 3

Loss Rate: 1

RTT: 2ms

Loss Rate: 0.8

RTT: 10ms

Loss Rate: 0.95

RTT: 1ms

Loss Rate: 0.8

RTT: 1ms

Loss Rate: 0.95

RTT: 1ms

Loss Rate: 0.7

RTT: 1ms

Loss Rate: 1

RTT: 1ms

Loss Rate: 0.99

RTT: 2ms

Loss Rate: 0.95

RTT: 5ms

Loss Rate: 1

RTT: 3ms

G

So here is an example network with the best price in red, the best packet loss in blue, and the best latency in green. In order to make routing decisions we take all of these metrics, which are present in our route data due to our babel modifications and reaggregate them as a sum of logs.

This is pretty much re-creating babels single metric from its constituent parts. We make this metric reflect our users preferences rather than be some proxy for the preferences of every route along the path. This is nice, but really much less effective than it sounds, babel is just inserting kernel routes, so there is only one installed at any given time to any given destination. You can choose to install a specific route based on your own preferences, but you can’t decide a route through the entire network based on those preferences.

We’ve considered several strategies to allow user preferences to inform the whole route, but that’s moving from a very simple single network map into something closer to either source routing or many parallel networks with different sets of preferences. Both of these are complex and come with significant overhead.

The sum of logs method has two major focuses, one to keep a single metric from dominating the selection process, two to keep the user from selecting a ridiculous set of preferences by damping unreasonably extreme settings. Neither of these are desirable behavior.

A quick note on overhead, Babel requires a fixed amount of memory for every node in the network, since it maintains at least one route entry for each. Nodes with many neighbors will end up using more memory, since they receive a route to each destination from each neighbor which they will maintain in memory in order to quickly switch routes.

12 of 21

Routing - Challenges

Babel does not (and should not) operate in real time, meaning packet loss and rtt as the network advertises will be a sample over a longer period than the actual validation sample.

Overhead is actually quite practical for a last mile network. 4k nodes generates 1.8mbps in overhead traffic per node. Can be tuned lower in exchange for worse worst case convergence and/or subnetted

Althea

So the big takeaway here is that verification is a hard problem still ahead of us and overhead really isn’t an issue.

It’s not a good user experience to exit to the internet after more than a dozen or so hops. The overhead problem can be resolved with a concept i call ‘fuzzy subnetting’ where a memory maximum is placed on Babel and routes beyond that limit are discarded worst route first. But having a network large enough to even be concerned about that is still a ways away.

Babel has an interesting concept called ‘triggered updates’ the majority of Babel’s routing overhead are routing table updates. But if you were to reduce the routing table update frequency from the default 4 seconds to let's say 5 minutes the effect on convergence speed would not be linear. When a route becomes infeasible Babel triggers an update, which will trigger other updates for infeasible routes. In theory you could rely purely on triggered updates to keep the network converged within a reasonable time frame. Of course this would also increase the sample period of the verified metrics enough to make it difficult to use them.

But it’s interesting to consider a system where the overhead is proportional to the frequency of state change within the network.

13 of 21

Billing

Cryptocurrency based payments provide a nice way to avoid being a networking startup AND a financial startup

The money is actually inside the router, this is both very convenient and troublesome.

Depending on the internet to pay for internet results in a sometimes delicate dance of being both an application and a network protocol

Althea

At this point we’ve covered everything except for how payments are done.

The routers pay each other peer to peer by submitting transactions to cryptocurrency full nodes. The cryptocurrency selection doesn’t really matter much except insomuch that it’s fast enough to quality as ‘micropayments’.

Our average payment is on the order of 10c, for which we pay a 5% fee. We could increase that amount in exchange for a smaller proportion of fees. The traditional financial system simply doesn’t offer transaction pricing like this. Likewise the setup for a pay and be paid system would involve adding direct deposit to your router. Neither is it really acceptable for any one organization to have total control of the network by virtue of controlling payments.

This isn’t very exciting networking wise except of course the catch 22, we’re using the internet to pay for our internet! This bootstrapping problem is resolved by providing all nodes with a small amount of bandwidth in order to let them talk to the blockchain.

14 of 21

The Key insight, pay per forward

We’re trying to automatically sell and buy bandwidth, how do we term that sale such that it’s possible for everyone to independently verify and agree that the exchange was completed?

What we want is a subset of fault tolerant consensus. Each party needs to observe their own data and determine who to pay, how much, and how much they should expect to be paid. Nodes must agree on these values at all times despite possibly having different inputs due to packet loss

Althea

15 of 21

The Key insight, pay per forward

If we define that each node is paid per packet it forwards packet loss and other on-the-wire disruptions resolve to overpayment

For example node A looks at it’s tx counter to determine how much to pay, node B looks at its rx counter to determine how much it expects to be paid. Packet loss means that tx >= rx therefore B may see more payment than it expects but that still produces the consensus we need

Althea

16 of 21

Payment map

Taken over 1 week in our production network

Interactive at https://bit.ly/2JCxenN

Pay per forward combined with the high ratio of download traffic means most payments are to or from only a few nodes

‘Exit’ nodes get paid for traffic they ‘upload’ to users, in an exception to the pay per forward model

17 of 21

Billing - challenges

Lightweight architecture portable cryptocurrency implementations

Payments and payment verification must be incredibly resilient

Fees amounts are variable

Payment consensus needs to be resilient to execution problems. No ECC on random routers

Unlike routing verification the key challenges for payments are all pretty much resolved at this point. Even in a production network experiencing significant packet loss we can maintain tight convergence.

The real problem we faced for several months was trying to call out to the internet inside an unstable network is not a friendly situation for network libraries. We write our application code in Rust and even using a fairly mature non-blocking http library we discovered about half a dozen ways for it to block and take down nodes due to strange network conditions. We would love to submit patches upstream, but the conditions required to cause the blocking tend to exist for a few minutes at a time and it’s difficult to instrument production nodes well enough to gather sufficient data. We’ve rooted most of them out at this point, but it’s quite difficult to root cause some of these.

This does bring into focus one of our biggest problems generally, these nodes are failure prone. People may unplug them at any time, or shove them in a cabinet where they will forever overhead and be thermal throttling. They don’t have ECC so they may start to believe in just about anything if you leave them alone long enough. That being said we have payments convergence within half a percent for long periods these days. (meaning nodes agree on the bill)

It was a real trip to get an ethereum transaction generation library that worked on mips. But that’s a story for another talk.

18 of 21

Tales from the field

Average user downtime� 15min / week

Only one downtime event >1hr in the last year

Speed / Latency�100mbps / 10ms

Patching, 0 failures for 4 months

User satisfaction

Pretty good

Pay per byte billing has upsides and downsides

The average user experiences about 15 minutes of unplanned downtime a week and it’s mostly my fault. I mean me personally. Babel and openwrt are very stable.

Break down failure causes

Mostly calling things on the web using normal libraries in a very unstable network environment, traditional divide between applications and network protocols is being ignored.

Break down reductions
Discuss linking some leaves of the tree in the diagram to get downtime to zero despite using consumer routers with no battery backups and no ecc
Discuss how auto recovery features keep the events contained in length.

Speed and latency are actually pretty much perfect, the antennas max out at 100mbps and most users see that no issue with the $90 OpenWrt routers we buy. Codel keeps latency low enough for gaming and voip even during high traffic periods.

Patching is a very small shell script, using opkg for patching has actually been a better experienced than I imaged

Anecdotes about user experience
Anecdotes about metered billing and it’s issues

19 of 21

Particularly memorable incidents

Long distance latency inversion combined with lack of TCP multipath

Gardening = Downtime

Web libraries are not designed to be used in unstable networks

Time between billing consensus in the lab and in prod 5 months

No time based properties are allowed in billing

Althea

20 of 21

21 of 21

Thank you!

@AltheaNetwork�justin@althea.net