Design doc for Geolocation in Chromium

Geolocation in Chromium

Status: Implemented (in Chrome M5)

Modified: Feb 25, 2010

Objective

This document describes a design for adding Geolocation support to Chrome.

Background

The W3C Geolocation API provides access to geolocation information of the host device via a simple JavaScript API. The API is agnostic of the underlying location information sources. Common sources of location information include Global Positioning System (GPS) and location inferred from network signals such as IP address, RFID, WiFi and Bluetooth MAC addresses, and GSM/CDMA cell IDs, as well as user input.

Today, Chrome supports a precursor to the Geolocation API via Gears. The goal of providing full support of the W3C Geolocation API is to remove the dependency on Gears (thereby addressing platforms that don't have this module, crucially Linux, Mac, ChromeOS) as well as to move to a standardized API.

Location in the browser enables applications ranging from mapping and navigation, through to social networking and location-based search.

Notes highlighted in red like this are changes that were made after initial document review, to update the design doc to reflect decisions arising during implementation

Overview

Support for the Geolocation API was added to Webkit in late 2008. Since then, it has been extended and updated, and is available in several browsers on both mobile and desktop.

The basic steps of an integration to Chrome include:

Providing a GeolocationService implementation that communicates with the browser process' I/O thread via IPC.
Implementing an system of location providers in the browser process
Implementing a security UI in the browser process.

Examples of location providers include WiFi-based access point scanners that use a cloud-based service to return lat/lng location information, interfaces to GPS hardware, etc.

Existing work

Geolocation implementations

We should look at existing implementations for comparison:

Google Gears on various browsers and platforms: IE on Windows, Firefox on Windows, Mac and Linux, Safari on Mac, Chrome on Windows
Opera natively using Skyhook, in a March 2009 demo build
Firefox natively, version 3.5 onwards, using Google Location Service (GLS)
iPhone with CoreLocation
Android Web browser in Eclair using Webkit geolocation implementation
WebKit GTK is bound to the GeoClue location framework

Permission grant/review mechanisms

Within Chrome, the user permissions UI needs to fit within the context of these other UIs:

Password manager
Cookie manager
Client side storage
Info bar users include: SSL certificate warnings, save password, reload tabs after crash, plugin crash, set default browser prompt

Detailed Design

Overview

The implementation must be split across 2 processes, the Browser process and the Render process (See Chromium Multiprocess Architecture).

There is one instance of the Browser process per running chrome instance, and multiple instances of the renderer (typically 1 per webpage), providing a natural 1:N fan-out of location information from a combined location manager to multiple potential client pages.

In this phase of development the implementation strategy is reuse so far as possible the location manager code from the Google Gears project (see next sections). This will be refactored to run within the browser process, providing hardware access abstraction, location provider arbitration, recent location caching and movement detection/change notification responsibilities. (Not all of these are relevant in the initial implementation; arbitration, caching and movement detection will be enhanced in future iterations).

In addition, new code will be developed to glue this to the UI path for geolocation, specifically the icon bar and the infobar prompt.

The Renderer process code will contain the existing WebKit JavaScript (V8) bindings, Chromium Bridge glue code, and interfacing to IPC to obtain location information from the browser process; renderer sandboxing blocks direct access from this process.

Browser ↔ Renderer IPC

(See Chromium Interprocess Communication)

Within the existing WebKit geolocation bindings there are 2 main interfaces that need implementing by the embedder to provide geolocation support:

Permissions:

Request: ChromeClient::requestGeolocationPermissionForFrame()
Callbacks: Geolocation::setIsAllowed()

Geolocation Service

Request: GeolocationService::startUpdating() / stopUpdating()
Callbacks: GeolocationServiceClient::geolocationServicePositionChanged() / geolocationServiceErrorOccurred()

New IPC messages will be defined in order to proxy these interfaces to the browser process. The parameters in these messages will be modelled on this function prototypes.

Renderer to Browser messages:

IPC_MESSAGE_ROUTED2(ViewHostMsg_GeolocationStartUpdating, int /* frame_id*/, PostionOptions);

IPC_MESSAGE_ROUTED1(ViewHostMsg_GeolocationStopUpdating, int /* frame_id*/);

IPC_MESSAGE_ROUTED2(ViewHostMsg_GeolocationRequestPermissionForFrame, int /* frame_id*/, std::string /* origin */);

Browser to Renderer messages:

IPC_MESSAGE_ROUTED1(ViewMsg_GeolocationPositionChanged, Postion);

IPC_MESSAGE_ROUTED1(ViewMsg_GeolocationPositionErrorOccurred, PostionError);

IPC_MESSAGE_ROUTED2(ViewMsg_GeolocationSetAllowed, int /* frame_id*/, bool);

In general it is up to the location manager (in the browser process) to interpret and obey the content of PositionOptions. An exception is any non-provider specific options which are already implemented within the WebKit bindings. Currently 2 parameters fall into this category:

timeout
maximumAge (see pending webkit bug 30676)

Routed messages are used, as the WebKit Geolocation bindings have an implicit context of one set of updates per view, and the Chrome View and routed message architecture allows a convenient way to reflect this page context within the browser process.

In addition, frame identifiers are used on stat & stop requests, and on permission notifications, to differentiate messages from independent frames within a tab. The data callbacks (position changed and position errror) do not require frame identifiers as these are effectively broadcast messages, sent to all frames actively using geolocation.

Geolocation resources associated with a renderer can easily be freed in the case of a render process death, as the RendererViewHost class gets cleaned up by the framework.

Note there is some duplication of functionality between ViewMsg_GeolocationPositionErrorOccurred and ViewHostMsg_GeolocationSetAllowed -- both can be used to indicate permission denied error. It is done this way to keep the IPC close to the WebKit embedder API, and also as the former gives a single position failure whereas the latter "latches" for the remainder of this page load (or until a subsequent ViewHostMsg_GeolocationSetAllowed message is received).

The diagram below illustrates the relationships between the classes involved in communication:

Each frame (geolocation bridge) that requests location updates (through 1 or more geolocation watches) will have its update options registered with the geolocation dispatcher host. This is responsible for registering these options with the location arbitrator, receiving location update callbacks when the location is determined or updated, and forwarding these calls back to the associated renderer processes.

Location Providers

The GeolocationArbitrator holds a number of location providers. Typically, there will be one for network based location (see below) and one for a platform specific location / GPS API (e.g. windows 7 location API, libgps, or CoreLocation for windows linux and Mac respectively).

The location providers are informed of the current update options that are in force for the location session (e.g. use high accuracy enabled) and then manage their own state inorder to comply with these options whilst providing the best fix they can. For example, a GPS provider would disable itself if use high accuracy were disabled, whereas a native OS location provider might configure the underlying system API to only use lower accuracy sources.

Google Location provider

Initially, a single location provider will be implemented. This will use Wifi scan data from the host operating system and the JSON google network location provider protocol to resolve the client location.

The server address will be configurable via a command line flag, to allow for automated testing.

Where required, the location provider will delegate work into helper thread, e.g. if making calls to operating system API functions that will block whilst the scan occurs.

Future enhancements

In the initial version only a basic arbitrator and provider will be implemented, but could be enhanced in future iterations, for example:

The initial version the location arbitrator only needs to support replacing the current provider rather than full multi-provider arbitration and provider degradation arbitration.
handle location provider degradation: if the provider callback indicates GPS signal loss (for example), at some point the watchers must be informed and provided with the next-best location.
NOTE 1 & 2 now implemented in r48631
Cell ID support can be added to the Google Location provider, where appropriate (abstraction for access Cell ID will be required)
Significant latency and battery savings can be made by aggressively caching Wifi and Cell ID database lookups, and interpolating responses, within the client. This will also require protocol enhancement.
Numerous enhancements could be made to runtime selection and loading of the location providers. In particular, the location arbitrator could be made accessible to extensions, e.g. to allow extensions to furnish location updates to the arbitrator.

Persisting geolocation permissions

In order to meet the intended user experience, it is necessary to persist the permission state per page origin. (Refer to the UI mocks section below for more explanation of permission states)

A persisted state may be one of {Allow, Deny, Undefined}. Undefined will be represented as no entry in the permissions table. In addition a 'temporarily denied' state exists, for any page that has the info bar dismissed. This temporary states exist for the current page session, but does not modify the persisted state.

Permissions will be stored in an sqlite table accessed via the db_thread.

Permissions will be stored in HostContentSettingsMap framework (relatively new component implemented for Chrome 4.1, after the initial version of this design doc was written)

User Interface

There are 2 main entry-points in the user interface: URL bar icon, and an info bar prompt. Each of these have a pop-up bubble that can provide options or help info respectively (See the UI Mocks section below).

All UI code is implemented in the browser process (UI thread), as it is painted in the browser's chrome, not in the web-page content.

In addition the new Content Settings dialog has a Geolocation tab added (accessible via URL bar icon bubble and the Spanner->Options menu). This allows a global 2 state setting of [Ask each time a new site tries to access geolocation] and [Do not allow sites to use geolocation], and in addition an exceptions dialog can be brought up to review sites explicitly allowed or denied access.

Upstream WebKit Code

The initial WebKit Geolocation was added by Greg Bolisnga from Apple. Some fixes have been made to Android's version of WebKit by steveblock, but the process of upstreaming to webkit.org is still ongoing. You can see the status of the outstanding WebKit Geolocation bugs here.

In addition, the following modifications will be required for the Chromium implementation:

Allow a page to have geolocation access revoked (and be informed of this correctly via PositionErrorCallback) before and without the need for a reload.
Clarify the balancing of startUpdate and stopUpdate calls, when proxied via IPC to the browser process.

V8 Bindings

V8 bindings are one of the elements that have been added to Android but not yet upstreamed to webkit.org.

Platform specific API details

[New section added 2010-02-25]

Windows

There are 2 OS level APIs in use in Gears, WLAN and NDIS. Chrome will reuse this strategy.

Notes from Gears:

// Windows Vista uses the Native Wifi (WLAN) API for accessing WiFi cards. See

// http://msdn.microsoft.com/en-us/library/ms705945(VS.85).aspx. Windows XP

// Service Pack 3 (and Windows XP Service Pack 2, if upgraded with a hot fix)

// also support a limited version of the WLAN API. See

// http://msdn.microsoft.com/en-us/library/bb204766.aspx. The WLAN API uses

// wlanapi.h, which is not part of the SDK used by Gears, so is replicated

// locally using data from the MSDN.

// Windows XP from Service Pack 2 onwards supports the Wireless Zero

// Configuration (WZC) programming interface. See

// http://msdn.microsoft.com/en-us/library/ms706587(VS.85).aspx.

// The MSDN recommends that one use the WLAN API where available, and WZC

// otherwise.

// However, it seems that WZC fails for some wireless cards. Also, WLAN seems

// not to work on XP SP3. So we use WLAN on Vista, and use NDIS directly

// otherwise.

It has been reported by other projects that the OS level scan caching can impact the quality, where stale AP entries are mixed with fresh AP entries in one scan. Suggestion is to intermix active scan (OID_802_11_BSSID_LIST_SCAN) between every bssid list query (OID_802_11_BSSID_LIST). Open for more investigation.

From Windows 7 onward there is a native location API, which will be investigated in a future iteration.

Mac

Gears uses a single API to retrieve wifi scan data; the Apple80211.h header taken from the iStumber project (http://www.istumbler.net) which in turn is based on http://www.macstumbler.com/Apple80211.h which was reverse engineered from the framework libraries.

This API works in OSX versions upto & including 10.5. In version 10.6, a new API is provided and supported by Apple, CoreWLAN. Hence for Chrome geolocation to work on 10.6 a new API binding will be required, using CoreWLAN.

Some useful info on this can be found here

Note that under OSX (reported on 10.5), the wifi adapter might becomes unavailable during the active scan. Hence Gears used a longer polling interval to mitigate this effect. This restriction may no longer be present in CoreWLAN, and active scans maybe possible (to be investigated).

Linux

Gears used a support executables (iwlist / iwconfig) from the to perform scans, due to GPL linkage reasons.

Alternatives available:

1/ n80211 API directly (as this is available under a more permissive license). Need to investigate what restrictions this places on compatible kernel versions / distributions. Awkward to use, as it requires root permission to initiate a scan, and it will result in too high user latency for first-fix if we rely on another app to trigger the scan (as the driver only caches results for 10secs, in the common case we will have to wait some significant period for the next scan)

2/ wpa_supplicant. Exposes a D-Bus api, but again requires root. Example:

$ sudo dbus-send --system --print-reply --dest=fi.epitest.hostap.WPASupplicant /fi/epitest/hostap/WPASupplicant/Interfaces/0 fi.epitest.hostap.WPASupplicant.Interface.scanResults

3/ NetworkManager. This is the freedesktop.org standard component, and provides a D-Bus API which can be used by any user

$ dbus-send --system --print-reply --dest=org.freedesktop.NetworkManager /org/freedesktop/NetworkManager/Devices/2 org.freedesktop.NetworkManager.Device.Wireless.GetAccessPoints

Initially, Chrome will have a binding implemented onto the NetworkManager API for use in standard distributions.

As Chrome OS may not provide the NM component, it may be necessary to provide a setuid root executable that triggers the scan, and optionally also fetches the results and returns them on stdout (rationale being this localizes all the nl80211 kernel interface usage into a single place)

Chrome OS

Chrome OS does not support NetworkManager, so one of the other APIs must be used. Alternatively, it has a network manager equivalent called flimflam (a fork of the connman project) that sits above wpa_supplicant and which can be accessed via libcros and the network_library.h API. This currently looks the most appropriate way to obtain this data.

UI Mocks

Since geolocation information is sensitive, the W3C API mandates that "User Agents must not send location information to Web sites without the express permission of the user". The UI must solicit the user's permission before granting the Web page access to location.

Further, the API specification states: "The user interface must include the URI of the document origin." hence Permission is granted and persisted at the document origin level.

Note: this is not the effective script origin; changes to to document.domain will not affect geolocation permissions.

See http://mocks/glen/chrome/spec/77_geo/ for up to date mocks.

TODO: The examples below have a bug: The pop-up should be asking about "www.yelp.com" not "yelp.com".

TODO: The info bars should be a neutral color (chrome blue) not green as shown in the mocks. (check this applies to themed browsers too)

Implementation has dropped "using wifi and GPS" from screen 4 as this sounds misleadingly definitive (other technologies maybe used too), and the application may not have even requested high accuracy (GPS) at all.

The bubble in mock 4 will have a manage settings... link in the lower left-hand corner, to directly open the Geolocation Content Settings page.

http://mocks/glen/chrome/spec/77_geo/3/#01_bar.png

http://mocks/glen/chrome/spec/77_geo/3/#02_learnmore.png

http://mocks/glen/chrome/spec/77_geo/3/#03_tracking_icon.png

http://mocks/glen/chrome/spec/77_geo/3/#04_tracking_bubble.png

http://mocks/glen/chrome/spec/77_geo/3/#05_tracking_icon_off_01.png

NOTE: Specific details in the mocks are subject to change as they are finalized, however the basic flow is expected to remain as outlined here.

Basic flow:

On JavaScript call to geolocation API to fetch location, info bar prompt shown (see 01_bar.png)
If user clicks Allow, this site is now access geolocation, and the tracking icon is shown in the URL bar (see 03_tracking_icon.png)
If the user clicks Reject or dismisses the Infobar the API calls are rejected and the no tracking icon is shown (see 05_tracking_icon_off_01.png)

An 'Allow' or 'Reject' is persisted to future page loads from this site
An info bar dismissal rejects all further API calls for this page load, but the info bar will show again on the next page load for this site.

At any subsequent time whilst on the page the user can click the tracking icon to control the state (see 04_tracking_bubble.png)

Working assumptions:

If the page is not visible when it makes its first geolocation request, the info bar will not be visible until the page is restored, and hence the success callback will be deferred until this time, at the earliest (excepting client specified timeout)
If a page cancels / times out its request without the user responding to the infobar, the infobar may never get shown. (Note webkit currently only shows the prompt after a successfully acquirung location, which minimizes user disturbance in the failure case, but may have power consumption and privacy implications)
When a page is opened in an incognito window, persisted permissions for that origin are not created or honored.
The Clear Browsing Data dialog will have an additional tick-box to clear geolocation permissions.
Multiple origins (frames) on the pages

Info bars are queued in fifo order stacked one above the other.

User can dismiss info bars in any order, or ignore
If an undesirable page has many many frames requesting geolocaiton so the info bars flood the screen, the user can easily close the tab to dismiss them all

URL bar icon can open a dialog with multiple entries.

Possible areas for future enhancement:

Require a user interaction to initiate the location info-bar prompt (possibly show a passive icon if the page wants location without user interaction). As discussed with erg:-

"bool ScriptController::processingUserGesture()" - when this returns true we're currently inside a javascript call stack handling user input
see DOMWindow::allowPopUp

Frames (multiple origins): postponing the info bar prompt until it is clicked means that it's clearer to the user what is requesting it / which origin is granted the permission
Multiple origins (frames) on the pages

We could treat the top-level origin different from others. Top-level could display notification straight away while others require a gesture.
Dismissing or revoking permissions for the top-level site could clear permissions for the embedded domains too. [Works better with a "forget" button]

when there is redesigned control panel that enumerates permissions per site, add geolocation attribute

Testing Plan

Webkit layout tests (run under TestShell): upstreamed, testing binding javascript to mock geolocation service.

Unit tests: for individual chromium classes

Browser tests: location manager and related classes.

UI tests: for the info bar & url bar icon.

Large tests can be made by configuring the browser to use a test server address

Internationalization and Localization

Strings are required for the infobar, more info and icon bar bubbles as per the mocks. In addition, there will be strings for any tooltips, clear browsing data options, and any help pages.

These will be internationalized via standard Chromium mechanism.

Besides strings, no specific internationalization requirements are identified.

Project

For project details, see Google project doc.