Published using Google Docs
Multiple sign off in Balrog
Updated automatically every 5 minutes

Rings of Power - Multiple Sign Off System for Balrog

rings-of-power.png


Overview

Balrog is Mozilla’s update server. It is responsible for deciding which updates to deliver for a given update request. Because updates deliver arbitrary code to users this means that a badly configured update server could result in orphaning users, or be used as an attack vector to infect users with malware. It is crucial that we make it more difficult for a single user’s account to make changes that affect large populations of users. Not only does this provide some footgun protection, but it safeguards our users from attacks if an account is compromise or an employee goes rogue.

While the current version of Balrog has a notion of permissions, most people effectively have carte-blanche access to one or more products. This means that an under-caffeinated Release Engineer could ship the wrong thing, or a single compromised account can begin an attack. Requiring multiple different accounts to sign off on any sensitive changes will protect us against both of these scenarios.

Multiple sign offs may also be used to enhance Balrog’s ability to support workflows that are more reflective of reality. For example, the Release Management team are the final gatekeepers for most products (ie: we can’t ship without their sign off), but they are usually not the people in the best place to propose changes to Rules. A multiple sign off system that supports different types of roles would allow some people to propose changes and others to sign off on them.

Scope

Because so much automation relies on Balrog we can’t simply require multiple sign off any row of any table in Balrog’s database, but there are some things that clearly must be protected by it. At minimum, we must require multiple sign off for:

On the flip side, in order to ensure that automation continues to function correctly, we must NOT require multiple sign off for:

These explicit includes and excludes cover the vast majority of objects in Balrog’s database. We can decide later whether or not to require multiple sign off for things not covered by the above.

Also explicitly out of scope, but things to keep in mind when we implement:

Other Requirements

Without dictating the details, the following requirements must be met for a successful implementation of multiple sign off:


Implementation

Overview

The recently completed Scheduled Changes system has provided us with a way to “stage” changes to Rules, which is one of the key pieces needed to implement Multiple Sign Off. We can use Scheduled Changes as a base, and enhance it to support the idea of optional Signoffs. Using it is a base has a number of advantages:

While the Scheduled Changes tables will take care of storing proposed changes, we will still need new tables to associate Users with Roles, define Required Signoffs for different types of changes, and track Signoffs to proposed changes. These new tables will also need support in the web API and UI to expose them to humans and the Balrog Agent.

Scheduled Changes Enhancements

Currently, the Scheduled Changes system is only enabled for the Rules table. We will need to enable it for Releases, Permissions, and Required Signoffs (see the “Required Signoffs” section for more on this) in order to enable Multiple Signoff for them. No changes will be needed to the schema of the Scheduled Changes tables, as Signoff information will be tracked elsewhere (see the “Signoffs” section for more on this), but the Scheduled Changes table will need to be taught how to find and validate Signoffs.

Scheduled Changes currently have two conditions that can cause them to be enacted: a timestamp or when uptake hits a certain point. Signoff will be implemented as an additional type of condition. Unlike existing conditions, Signoff will not be user controllable when creating a Scheduled Change - it will be implied for any changes to objects that have Required Signoffs defined.

At this time, Scheduled Changes does not support scheduling the deletion of objects. Because changes to the Required Signoffs table will be implemented as a Scheduled Change, we must add support for scheduling the deletion of objects to prevent a potential attacker from removing Required Signoffs and directly manipulating important Rules or Releases.

We will also modify the Scheduled Changes implementation to make it possible to disable the uptake condition on certain tables, because it will not make sense to have anything except Rules accept them.

New UI will be added for Scheduled Changes on Releases, Permissions, and Required Signoffs.

The Balrog Agent will need to be taught how to look for Scheduled Changes for Releases, Permissions, and Required Signoffs. It was also need to be taught how to determine when Required Signoffs have been satisfied.

User Roles

We will add a new table that associates Users with Roles. The table will have composite key consisting of username and role. No other columns are required.

New web APIs will be added to let us manage these User Roles. The existing Permissions UI will be modified to allow for management of User Roles.

Modifying User Roles will not require Multiple Signoff. Even if an attacker were to gain access to highly privileged account, they would not be able to ship anything by modifying User Roles, so this does not provide an indirect way of bypassing Multiple Sign Off.

Required Signoffs

Different types of objects have different dependencies that affect which types of Signoffs are needed. Because of this, we will add a new table to track Required Signoffs for each of: Rules & Releases (one table both) and Permissions. These tables will be structured as follows:

Object(s)

Table

Columns

Rules, Releases,

Required Signoffs

product_channel_required_signoffs

product (PK), channel (PK), role (PK), signoffs_required

Permissions

permissions_required_signoffs

product (PK), role (PK), signoffs_required

Including “role” in the key allows us to require Signoffs from multiple different Roles for the same type of change. The signoffs_required column lets us require more than one Signoff from within one Role (eg: two Signoffs from “relman”).

Some columns have special constraints on them:

Changes to these tables will also require Signoff, but the Required Signoffs will be inherited from associated table. That is to say, if changes to Firefox release channel Rules requires 2 relman Signoffs, changing the number of Required Signoffs for Firefox release channel rules will require 2 relman Signoff. This prevents a bad actor from indirectly modifying something by reducing the Signoff requirements without the need to duplicate the Required Signoffs across two tables.

New web APIs will be added to let us manage Required Signoffs. New sections will be added to the existing Rules, Releases, and Permissions UI to let us manage them from the Admin Interface.

Signoffs

The Signoffs tables are responsible for tracking who has signed off on proposed changes. Because they are related to the Scheduled Changes tables, we will need one Signoffs table for each of: Rules, Releases, Permissions, and Required Signoffs. Each table will have a composite key that consists of sc_id (a reference to the associated scheduled_changes table) and username, and also a role[5] column. The role column is not part of the primary key, because a user may only Signoff on any Change under one Role.

A new web API will be needed to allow users to Signoff on changes. UI for this will be part of the Scheduled Changes sections. Existing Signoffs will be shown for each proposed change as well as which Roles still need to Signoff. An “Approve” button will be shown for any Change that still requires signoff. If the User holds multiple Roles, they will be prompted to specify which Role they are signing off under.

If a proposer of a change is a member of a one of the groups that change requires Signoff from, a Signoff from them will be recorded as part of the proposal. This means that anything that we want to require more than one person to change must have the sum of its number_required be 2 or more. If the User is not a member of one of the groups required they may still propose a change, but no signoff will be recorded for it. For example, if a change requires 2 RelMan signoffs and a RelEng person proposes it, 2 RelMan people will need to approve it. If a change requires 2 RelMan signoffs and a RelMan person proposes it, only 1 of them will need to explicitly approve it.

Other Enhancements

As a safeguard measure against future changes to the Signoffs system, whenever a change is made to a Rule, Release, Permission, or Required Role, Balrog will look up if that change should’ve required Signoff, and verify that Signoff actually happened.. This safeguard will be implemented outside of the Scheduled Changes logic to ensure that it is a fully separate safeguard (and therefore less likely to break when the Signoff system is modified in the future).


Use Cases

Scenario 1: Shipping Firefox

Initial State

Workflow


Scenario 2: Granting a new Permission

Initial State

Workflow


Scenario 3: Adding a What’s New Page to an already shipped Firefox

Initial State

Workflow


Scenario 4: Shipping a new version of the CDM

Initial State

Workflow


Scenario 5: Shipping a System Addon

Initial State

Workflow


[1] The Firefox aurora channel will also require Signoff, but only during periods where updates are frozen during uplifts.

[2] This does not affect how existing automation submits Releases because beta, release, and esr channel Releases are not mapped to until after automation finishes populating them.

[3] Depending on implementation, proposing a change may require one role, and approving it may require another. Even if multiple Roles per user are not strictly necessary for the implementation to work, it’s still handy to keep us flexible for the future.

[4] After some security events in the past, we should ensure LDAP groups are not automatically granted if we go this route.

[5] It may seem unintuitive that Role is in this table, but it is necessary because a User may hold multiple Roles, and we need to know which they have performed the Signoff under.

[6] This does not count as one of the required Signoffs because Jeff does not hold the “relman” role. Recording this Signoff is not strictly necessary, and we may choose not to do so.