Ryan Sleevi <sleevi@google.com>
Last Updated: 2021-08-25
Discussions about this document, and possible mitigations, can be found in the OpenID Foundation’s Fast Federation (FastFed) WG. Comments and contributions are subject to the IPR Agreement, and can be directed to the FastFed Mailing List.
Current federation protocols, such as OpenID Connect (OIDC) and SAML, typically require some out-of-band configuration ceremony. This ceremony involves the Identity Provider (IdP) and Application Provider (AP) negotiating a set of authentication parameters, ranging from display parameters for end users, endpoints for services, or security parameters, such as SAML X.509 certificates or JSON Web Keysets (JWKs) used to authenticate OIDC JSON Web Tokens (JWTs). In some cases, such as SAML, these parameters must be periodically renegotiated, or the entire federation relationship may fail.
While some federation protocols have developed solutions to help manage this, such as OpenID Connect Discovery, FastFed seeks to greatly simplify the problem, in a federation-protocol agnostic way, by developing a means of exchanging generic parameters, as well as updating them over time, in the hopes of minimizing downtime and ensuring easier on-boarding of federation relationships.
Existing federation protocols such as SAML or OIDC attempt to ensure security through the use of signed assertions, whether using use-case specific X.509 certificates or using JWKS. When these parameters are negotiated offline, they provide robust security that a given AP is talking to a particular IdP. Unfortunately, the current drafts of FastFed reduce the security of these protocols to that of TLS and domain names, making it such that a single misissued or compromised TLS certificate may be used to impersonate the IdP, and therefore any arbitrary user, to the AP.
This creates the potential for FastFed to reduce the security of some federation protocols to the security promises of TLS. For some protocols, this may be no change from the status quo, while for other protocols, this may be a degradation of security. In all cases, however, the reduction of security to “just” TLS makes many security elements of the authentication protocol superfluous or unnecessary. This document looks to explore ways to provide a more durable binding than “just” TLS, to either maintain the status quo with current authentication protocols, or improve the security of those existing protocols depending on just TLS.
Existing federation protocols, such as SAML, are cryptographically secure due to relying on an out-of-band ceremony for both parties to establish acceptable cryptographic parameters. These ceremonies are unspecified, but may be as simple as downloading a certificate from a web page, a form of Trust on First Use (TOFU), to in-person negotiations and exchanges. Once the ceremony has been performed, the relying party (the AP) can trust messages from the IdP, which will be signed with those negotiated keys.
The downside to this approach is that if the IdP has its keys compromised, or wishes to regularly rotate keys as part of reducing the risk of factoring or the window of utility in the face of key compromise, the IdP must renegotiate with every AP these new parameters. In SAML, this may typically be every three years. If an AP fails to perform a new ceremony, the IdP’s messages will no longer be verified, and the federation relationship fails.
With SAML, the involvement of a human in the loop through an infrequent, manual process provides a useful backstop against attackers: there is a limited window of compromise available, and may further rely on additional human factors or social engineering. While not perfect, intermittent failures provide an opportunity to confirm whether the IdP has actually changed security parameters, and the manual nature makes it difficult for a network attacker to predict when or how the AP will obtain the new parameters.
Protocols such as OpenID Connect have a similar strategy, which rely on JWKS to negotiate the keys used for signing. Unlike SAMLs manual configuration, it’s more common that OIDC keys are exchanged online, through a URI hosting the JWKS, typically discovered through OpenID Connect Discovery Metadata and the jwks_uri. Implementations may either pre-emptively fetch and obtain the jwks_uri, although with some expectation of regular retrieval, or may download-on-demand whenever encountering a JWT with an unknown key ID.
With OIDC, the security of the assertions effectively assumes the security of TLS. If an attacker is able to obtain a certificate, through compromise or misissuance, they may mount a network attack on the jwks_uri contents, introducing an attacker-supplied key into the JWKS. If they are able to do so, the attacker will be able to mint arbitrary JWTs for arbitrary users, behaving as-if they were the IdP. This can be mitigated, in part, by hardcoding the JWKS contents within the AP, similar to SAML’s X.509 certificates. However, this brings the same risks as SAML, which is that if the IdP wishes to change its keys, it needs APs to update in order to validate the new assertions. If an AP relies on dynamic updates, a network attacker has a predictable window to intercept and interpose their malicious key.
FastFed borrows heavily from the OIDC and OIDC Discovery approach. It allows an IdP to provide metadata about a variety of federation protocols it supports, along with mechanisms to allow APs to automatically discover the relevant IdP for a user. These mechanisms rely on TLS, but are designed to have a manual confirmation ceremony by an Administrator of the AP. The protocols that the IdP and AP establish are fixed during this ceremony through an allowlist, which expires if the IdP does not complete the handshake and confirm the parameters within a fixed amount of time.
This handshake is protected using JWT, in order to authenticate the Identity Provider to the AP. The IdP provides a jwks_uri specific to FastFed Handshakes, which it provides to the AP, allowing the AP to authenticate the callbacks. FastFed specifies that the AP should validate these JWTs on-demand, fetching the jwks_uri whenever it encounters a message with an unrecognized key ID, in order to support its requirement that IdPs regularly rotate keys.
Once a relationship has been established, the peers must periodically renegotiate information. Early drafts of FastFed allowed arbitrary update of metadata, although the current FastFed draft attempts to significantly restrict that to just display properties and signing algorithms. However, the protocol also allows for arbitrary re-establishment of parameters by the IdP, through subsequent re-registration of allowlisted protocols, and the negotiation of new parameters for these protocols, such as updated URIs.
In addition to these concerns in the core specification, it also extends to the protocol-specific mechanisms, such as SAML key rotation. FastFed Enterprise SAML makes use of a saml_metadata_uri, which behaves similarly to the jwks_uri within the core specification. As specified, the IdP is expected to publish new certificates at that URL, which the AP is expected to periodically poll, at least every 24 hours. In addition, the AP is expected (SHOULD) to permanently cache any 301 redirects, and use the new URL in subsequent situations.
These attacks assume that one or more unauthorized parties have access to a certificate for the IdP’s domain. This can manifest through several mechanisms:
Further, we can assume that even if detected, certificate revocation is largely ineffective for these certificates:
Finally, these attacks assume the attacker can get in a positioned point between the AP and the IdP in order to mount these attacks:
The jwks_uri is part of the fixed, allowlisted values by the AP when establishing a relationship with the IdP. It is not expected to change for the duration of the federation relationship.
However, an attacker with an invalid certificate can serve an HTTP 301 Moved Permanently status code, providing a new URI for the AP to obtain the JWKS. Although FastFed makes use of HTTP 302 Found responses, which indicate a temporary move, FastFed currently encourages and permits other redirect codes. An attacker who can provide such a redirect has the functional ability to alter the jwks_uri indefinitely, as such redirects are seen and cached as permanent.
Similar risks exist for the 308 Permanent Redirect code.
Similar to Attack 1: Redirect poisoning of the jwks_uri, an attacker can serve a response with a long Expires tag. Although FastFed mandates support for ETag, along with If-None-Match directives, these are only relevant to HTTP clients when they’re checking if an expired entry is still usable. By preventing an entry from expiring, an attacker can pin an attacker-supplied key, persisting beyond the 24-hour refresh window mandated by the spec.
The FastFed Enterprise SAML profile suffers the same risks as the Core specification, with respect to Attack 1: Redirect poisoning of the jwks_uri
The FastFed Enterprise SAML profile suffers the same risks as the Core specification, with respect to Attack 2: Cache poisoning the jwks_uri
Although the metadata refresh is restricted in what parameters can be renegotiated, the updates of individual, previously established authentication profiles are both explicit and intentional.
The attacker sends a new Registration Request to the AP, providing a new saml_metadata_uri to an attacker-controlled resource (i.e. an attacker domain), as specified by the FastFed Enterprise SAML profile. This Registration Request is provided by a Signed JWT, which the attacker provides their own, attacker-controlled key. As per FastFed Core, the AP does not recognize the kid value in the JWT, so fetches a new copy of the JWKS as required. The attacker provides a malicious JWKS, authorizing the existing, legitimate keys, in addition to the attacker-provided key. The AP successfully processes this as an update to an existing registration.
At this point, the attacker has obtained a “Golden SAML” configuration that allows them to mint arbitrary assertions for arbitrary users, as-if they were the IdP.
The FastFed Core specification assumes that a domain name uniquely identifies an IdP, from past, present, and future, by virtue of the provider_domain attribute. However, domain names can and do come and go. If an AP establishes a relationship with an IdP, then if the IdP fails to de-register with the AP when releasing its domain name, it’s possible for an attacker who legitimately obtains the domain name to impersonate the IdP to the AP.
FastFed Core does not provide a way for such complete de-registration; an IdP providing an empty authentication_profiles message is treated as an incompatibility, and no updates are performed. If the protocol provided an explicit message, this would allow for IdP-initiated deregistration, but also provide opportunities for Attackers to malicious de-register.
When establishing a new relationship, the IdP can establish a key to serve as the “root of trust” for the relationship. This may be shared with multiple APs, although to reduce the risk of security issues due to APs failing to correctly check the aud of the JWT and exposing them to cross-target replay attacks, it would be more secure for the IdP to negotiate a per-AP key. Although this runs counter to the explicit goals of FastFed Core, the use of long-lived keys is critical to security assumptions. FastFed implicitly is relying on long-lived keys by way of TLS and root Certificate Authorities, and so this is not, in practice, a functional difference.
As part of this, it’s important to remove indirection through URIs, such as jwks_uri and saml_metadata_uri, as while the URIs are protected, their contents are not. Moving them to be part of the protected protocol JWT mitigates this risk.
Alternatively, signing the contents of such URIs is also an option. However, unless the signatures are delivered in-band (e.g. the contents of these URIs are JWTs themselves), this risks requiring separate URIs for contents and signature, and such schemes of two URIs regularly result in cache inconsistencies that can cause operational issues (e.g. updating the jwks_uri without updating jwks_signature_uri)
Rather than attempting to integrate signing and key management into the protocol itself, another option is the use of privately-managed PKI, which can wholly avoid the risks of unauthorized or misissued certificates to something controllable by the IdP. In this scenario, the IdP generates a long-lived, privately managed Root CA, which is kept offline by the IdP. The IdP can then issue TLS server certificates for use with its IdP endpoints, specific to the FastFed use case.
When an AP establishes a relationship with an IdP, it pins this Root CA and trusts it. Any change in the Root CA should be seen as establishing a new IdP/IdP relationship. The IdP can be uniquely identified not by the domain name, but by the Root CA used.
Note: This solution MUST NOT be used with Web PKI CAs, and relies on the IdP fully controlling all CAs within the chain.
The TLSA DNS record, as specified by RFC 6698, allows the domain holder to assert which certificate(s) are authorized for the domain. This can be used to mitigate some of the risks, although does not mitigate against compromised registrar or registry, nor insider risks. Protocols such as SMTP are able to make use of DANE, via RFC 7672, to authenticate the peer without the dependency on TLS’ PKI.
However, in order to provide security value, DANE effectively requires DNSSEC to be deployed throughout the IdP. This may represent a significant operational burden for IdPs that are enterprises, and the failure to properly manage the DNSSEC status of the IdP’s zone may cause outages for DNSSEC-validating users.
Separate from certificate pinning, RFC 8672 defines a solution for pinning opaque identities via the TLS protocol itself. This does not impair or limit certificate rotation, but relies on a TOFU-like model in which a separate set of keys, independent from certificates, are used to indicate that you’re speaking to the same (logical) TLS peer.
This is not widely supported in clients or servers, and as currently specified, is specific to TLS 1.3 and so borderlines non-viable, except that it demonstrates the same conceptual approach to the problem as some of these other solutions establish, and could potentially be implemented if all other solutions are determined to be sub-optimal. However, like other solutions, it ultimately relies on establishing a long-lived identity that the AP needs to retain to properly authenticate the IdP.
One approach, as taken by some implementations of OIDC Discovery, is to “pin” the server’s certificate and/or CA, and cause errors if that changes. Certificate pinning using commonly-trusted CAs (the so called “Web PKI”) is a dangerous antipattern that a number of CAs and site operators (IdPs) prohibit, because it poses serious operational and security risks to the broad Internet ecosystem. For example, the need to be able to change CAs is critical, whether for business reasons (e.g. a more affordable price, changing CDNs causing changes to CAs) or for security reasons (e.g. a given CA is being removed from trust). Such systems are also not typically robust against changes to the CA ecosystem. In the TLS ecosystem, certificate pinning has caused multiple serious outages.
A variant of Certificate Pinning is organizational pinning, in which rather than pinning to the certificate or CA as a whole, the relying party pins to specific details within the certificate, such as a particular value for the organization field in the Subject. This approach implies the use of OV or EV certificates, which are no more technically secure than other forms of TLS certificates, and also imposes its own risks. Both OV and EV certificates have strong variability between different CAs in the expression of certificate fields, so such solutions typically impair CA migration. It also greatly increases the time that the IdP provider has to spend managing and replacing certificates, due to needing to validate this information, making disaster recovery and incident response significantly more difficult and error prone. Individual subject fields do not uniquely identify subject entities, and failure to consider holistically all of the fields provide inadequate security: for example, two entities may share the same organization name, but be in different jurisdictions. Finally, it remains brittle against non-technical changes, such as organizational rebranding or restructuring. Other types of certificates, such as those bearing LEIs from the Global LEI Foundation, share similar risks: an organizational restructure and partial divestiture may leave multiple entities potentially needing or wanting to assert the LEI of a defunct organization. In the TLS ecosystem, organizational pinning has caused multiple serious outages.
FastFed proposes a special DNS prefix of “fastfed._well_known” as part of WebFinger endpoints. IETF BCP 222 sets out the IANA registration list and policies for “Underscored and Globally Scoped DNS Node Names”, and this name will need to be registered.
Two issues exist with this. First, the existence of an underscore name is, itself, implying the notion of “_well_known”; that is, it would be sufficient to say “_fastfed”, rather than “fastfed._well_known”, and register with IANA appropriately.
Second, underscores are NOT valid as hostnames for TLS certificates, as hostnames conform to the Preferred Name Syntax of DNS, also known as the Letter-Digit-Hyphen rule. As such, no IdP can obtain a certificate for “https://fastfed._well_known.anything.example”, as that does not comprise a valid DNS name. Other DNS records (e.g. other than A/AAAA) do not necessarily have this restriction. If a well-known underscore name is expected, rather than a .well_known HTTP URI, then a DNS-based discovery mechanism, using DNS record types, needs to be used; an HTTP API may not be used.
Given the operational challenges, reconsideration of the use of the globally-scoped domain name should be reconsidered. Despite the potential challenges with the .well_known URI, it’s still preferable for operational deployment.
As part of Endpoint Validation, FastFed Core expects certain comparisons to be made with respect to DNS, expecting endpoints to be subdomains of the IdP’s provider_domain. However, the lack of specificity here creates the risk of implementations receiving a provider_domain of “foo.example” and performing a suffix match, determining it matches “notfoo.example”. More specificity is needed regarding the implementation steps, in order to ensure these properties hold.
This may be mooted, however, by reducing dependencies on the notion of provider_domain to uniquely identify the IdP.