Client identifiers in AMP
November 11th, 2015
Authors: cramforce
Short link: https://goo.gl/Mwaacs
Published in GitHub intent to implement
Copyright 2015 The AMP HTML Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS-IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
1st party on cdn.ampproject.org
This document deals with the question how advanced analytics can be supported inside of AMP files and in particular how a notion of a user or a session can be established.
See Intent to Implement: Analytics APIs for AMP for further context.
AMP documents will frequently be loaded in 3rd party context. When loading an AMP document in a viewer, actual documents are loading in an iframe from cdn.ampproject.org. This prevents cdn.ampproject.org to be able to set 1st party cookies/storage. Because we aim to provide analytics fidelity of AMP documents displayed inside of viewers to be comparable to them being loaded individually, we need to find an equivalent mechanism that provides similar or better privacy aspects than the cookie based solution used by publishers today.
A separate issue is that when AMP documents are not loaded inside of a viewer but are loaded from a cache domain like cdn.ampproject.org then the scope of a 1st party cookie/storage for the entire domain would allow tracking users across publishers using a 1st party cookie/storage which is not desirable. Cookies have a way to limit their scope to a path like /nytimes.com/ but that mechanism is not usable at the scale of AMP due to per-domain cookie limits.
We introduce a client identifier that has the following properties:
This document proposes usage of LocalStorage for storing the BaseCID for the following reasons:
This document does not make a final decision as the whether any of the following technologies are used for LocalStorage
All of them behave exactly the same in term of privacy and security with respect to this project (and at least in terms of how they are exposed to the web), but they do have varying performance properties.
Confirmed that the following browsers delete localStorage when the user asks to delete cookies:
Once a day we store with the localStorage entry when it was last read. If we read a key that has not been read for more than N days (likely 365 days but TBD), we proceed to delete the key (expire it) and act like none was present.
Assume CHASH is a cryptographic hash function. According to crypto experts the scheme proposed here is safe when using SHA-384[1] (Closure implementation). Also a version based on XXTEA is proposed[2] that would require significantly less sophisticated JS implementation.
There are three types of client identifiers:
In this scenario the AMP file is loaded as the primary document into the browser from the proxy. A sample URL is
https://cdn.ampproject.org/c/www.theguardian.com/us-news/2015/sep/26/obama-africa-hiv-aids-treatment-women/amp
Marked in pink (www.theguardian.com) is what we call the SourceOrigin. Documents from a different source origins must not have access to the same client identifier.
To create the client identifier we run through the following steps:
In this version the AMP document is embedded into a viewer. The viewer resides on its domain different from cdn.ampproject.org and loads an iframe on cdn.ampproject.org.
We produce a CID if none is available by delegating an identifier generated by the viewer to the AMP document. An equivalent operation is e.g. a script from an ad network that sets 1st party cookies on a site and sends the values down to its third party iframe.
Note, that the below is only a recommendation. Non-Web environments may use different strategies and also web based viewers may decide on a different strategy for CID generation.
The steps to create the CID are:
This is the case where the AMP file is just another HTML document on the web. Publishers should be able to track them together with other documents on their site.
To create the client identifier we run through the following steps:
The ExternalCID is the actual value that is sent to a 3rd party tracking provider. They are based on the same BaseCID (except in the “1st party on origin domain” case above) and SourceCID, but have the property that e.g. the ExternalCID for Google Analytics is different from the ExternalCID for Adobe Analytics.
The API to get a ExternalCID is the actual API exposed by AMP’s core CID code to users such as tracking libraries.
The API is roughly this:
Promise<string> getCid(string fallback1pCookieName)
The API returns a Promise (because CID generation might be asynchronous) for the ExternalCID. The passed in variable fallback1pCookieName is the name of the cookie that should be used to read this value in the “1st party on origin domain” case. The value of this string is also used to transform the SourceCID into the ExternalCID in the other cases:
SourceCID = See above for the non-1st party on origin domain cases.
ExternalCID = CHASH(SourceCID + fallback1pCookieName)
It may sometimes be required for publishers to get user consent for the cookie/localStorage based tracking.
AMP has support for dialogs that can be used to acquire user consent through the amp-user-notification element. This element actually requires a CID itself, so that publishers have an id they can use to associate consent (and avoid asking again). The CID exposed to the element is an ExternalCID that can not be correlated with other CIDs exposed to the publisher. Publishers must not use this ExternalCID for tracking.
We add a second argument to the getCid API called consentDialogId. This id is the HTML id of the amp-user-notification element.
Promise<string> getCid(string fallback1pCookieName, string consentDialogId)
The Promise for the CID will not resolve (and it won’t actually be created) until the referenced dialog was either accepted by the user or the publisher’s data-show-if-href signaled that no consent is necessary.
If a BaseCID is created (it does not exist yet) to create the ExternalCID for the request to data-show-if-href, that BaseCID is not persisted until user consent is given.
[1] SHA1 is subject to https://en.wikipedia.org/wiki/Length_extension_attack with the constructions suggested below.
[3] Not use because in AMP they contain no useful entropy.
[4] Not use because in AMP they contain no useful entropy.