Architecture and API for WebRTC

This is based in part on the Adv/Prop proposal given in draft-jennings-rtcweb-api-00.

Abstract

The purpose of WebRTC is to add real-time voice and video support to internet browsers. This draft outlines an architecture and javascript API to enable such communication using a low level and flexible approach that is not tied to existing signaling or media negotiation protocols, but enables both the implementation of standard protocols and the possibility for future innovation.

Overview

This draft describes a model and API with the intention of being as flexible as possible. With this goal in mind, the only functionality that should be embedded in the browser is that which is required to be native due to security or performance reasons. This includes:

While we argue against mandating any signaling or media negotiation protocol, we do accept as a requirement that it must be possible to implement standard signaling and negotiation on top of the proposed API. This includes both SIP/SDP and XMPP/Jingle for interop with existing services. To enable interop we also propose adopting RTP/ICE as the media protocol with standard codec mappings to avoid the requirement for media gateways whenever practically possible.

Non-goals:

Architecture

The proposed architecture is depicted below:

                    +-----------+
            |   Web     |
            |           |

            |           | ------------ Standard or proprietary protocol ->
            |  Server   |                     SIP/SDP, XMPP/Jingle etc
            |           |
            +-----------+
                /    \
               /      \   Proprietary over
              /        \  HTTP/Websockets
             /          \
            /            \
           /              \
          /                \
   +-----------+          +-----------+
   |Javascript |          |Javascript |
   +-----------+          +-----------+

       ^  |                    ^  |
   API |  v                API |  v
   +-----------+          +-----------+
   |           |          |           |
   |           |          |           |
   |  Browser  |----------|  Browser  |
   |           | ICE+RTP  |           |
   |           |          |           |
   +-----------+          +-----------+

Javascript API

The proposed API exposes the concepts of codecs, streams and connections to the javascript application. Exposing codecs and connections directly enables the application to engage in transport and media negotiation in an application specific manner, avoiding any need for defined offer/answer semantics, but enabling such semantics to be implemented as required.

Connection

The connection object represents a single ICE RTP session. A set of relays to be used in candidate generation are passed to the object constructor, along with a set of callback functions which are invoked at various stages of operation.

Following creation the system will generate a set of near candidates, including both static local end points and those allocated using relays. As candidates become available the onCandidates() callback is invoked with the current complete list of candidates. This may be called multiple times as candidates become available and can be used to exchange partial information to enable fast call setup.

An example of a candidate representation is:

{pwd:"asd88fgpdd777uzjYhagZg",

 ufrag:"8hhy",

 [{foundation:"1",

   component:"1",

   id:"abcd",

   ip:"10.0.0.1",

   port:"1234",

      priority:"2130706431",

   type:"relay"

   }]

}

When candidates are received from the far side, those far candidates are added using addCandidates(), which triggers the ICE state machine to start testing candidate pairs.

Upon a successful candidate pair being detected the onReady() callback is invoked, which is a signal to the application that the transport is setup and ready to exchange media.

The onCongestion() and onStatistics() callbacks enable the application to respond to network and system level resource constraints. For congestion control we envisage a mechanism that gives the application a ‘last chance’ to modify its behavior, which if ignored results in direct system level intervention. This provides the application the ability to set codec parameters in a way that corresponds to its use model, for example reducing video frame-rate rather than sacrificing picture quality.

[Constructor(in optional Object config)]

interface Connection {

// Properties

attribute Object relays;

attribute Stream streams[];

attribute Object farCandidates;

attribute Object nearCandidates;

readonly attribute ConnectionState state;

// Callbacks

        attribute Function onCandidates(Object nearCandidates);

        attribute Function onReady(Object event);

        attribute Function onClose(Object event);

        attribute Function onError(Object event);

        attribute Function onCongestion(Object event);

        attribute Function onStatistics(Object event);

        // Functions

        Boolean addStream(Stream stream);

        Boolean removeStream(Stream stream);

        Boolean addCandidates(Object candidates[]);

}

Stream

The Stream object represents an association of codec to source and sink media streams. There is a one to one mapping between stream and codec – it is essentially a wrapping of a source and sink with a codec and the various parameters required. The codec will have been selected by the javascript application through a process of unspecified negotiation and priority ordering based on the local list of supported codecs and those supported by the remote end point.

[ The syncGroup requires further discussion, but is intended to enable the signaling of streams that should be synchronized to the lower layers of the system. ]

Once created, streams may be attached to Connection objects. Media starts to flow once the Connection is in the ready state (which may already be the case on attachment).

[Constructor(in optional Object config)]

interface Stream {

        // Properties

        attribute Codec codec;

        attribute Media source;

        attribute Media sink;

        attribute DOMString syncGroup;

}

Codec

The Codec object represents a single codec with the required parameters. Codec specific parameters are set using a json parameters object.

Upon initialization the javascript application may request an array of supported system codecs, which can then be used in combination with the Connection nearCandidates to generate an SDP offer or engage in alternative media negotiation.

interface Codec {

        // Properties

        attribute DOMString name;

        attribute int payloadType;

        attribute int packetTime;

        attribute Object parameters;

}

This codec representation is similar to that used in any low level implementation, such as SIP handset firmware, and as such can be mapped to SDP parameters in the same manner. For cases where SDP is too restrictive it is still possible to exchange the full set of capabilities using whatever mechanism the application chooses.

Example Code

We present example pseudo-code for both an outbound and an inbound session. These are purely examples based loosely around SDP offer/answer, but use is not constrained to this semantic ordering.

Outbound session

We start by finding out the codecs supported by the local browser:

codecs = WebRTC.Codecs();

Which returns an array of json objects representing the codecs.

We then set up the local ICE RTP connection:

conn = WebRTC.Connection({

    relays: [{type:"stun",ip:"192.0.2.1"},

             {type:"turn",ip:"10.0.1.2",

username:"foo", password: "bar"},

            ],

    onCandidates: function(nearCandidates) {

Here we can generate and send an SDP offer by combining our local codecs with the candidates we are given.

    },

    onReady: function(event) {        

    },

    onClose: function(event) {

    },

    onError: functon(event) {

    },

    onCongestion: function(event) {

Here we can modify the codec parameters according to application specific constraints.

    }

});

Upon receipt of a call accept by the remote endpoint, possibly including an SDP answer, firstly set the far side candidates on our connection object.

conn.addCandidates(farCandidates);

Then set up a stream ready to attach to that connection, using the result of combining our local codecs with those given to us by the remote endpoint.

stream = WebRTC.Stream({

    codec:{name: "G711",  

           payloadType: "0",

           packetTime: "20",

           parameters: { music: true }

          },    

    source: mySource,

    sink: mySink,

    syncGroup: mySyncGroup

});

Finally we attach the stream to the connection and media will start to flow:

conn.stream(stream);

Inbound Session

The inbound session is initiated by the reception of an inbound call invitation, possibly including an SDP offer.

We start by finding out the codecs supported by the local browser:

codecs = WebRTC.Codecs();

Which returns an array of json objects representing the codecs. We then allocate our local stream based on the codecs we support and those from the remote end point:

stream = WebRTC.Stream({

    codec:{name: "G711",  

           payloadType: "0",

           packetTime: "20",

           parameters: { music: true }

          },    

    source: mySource,

    sink: mySink,

    syncGroup: mySyncGroup

});

Following which we create a local ICE RTP connection object:

conn = WebRTC.Connection({

    relays: [{type:"stun",ip:"192.0.2.1"},

             {type:"turn",ip:"10.0.1.2", username:"foo", password: "bar"},

            ],

    stream: stream,

    onCandidates: function(nearCandidates) {

Here we have our local candidates and can reply to the invitation with an SDP answer based on the codec selected above and the local candidates.

    },

}

Finally we set the far side candidates which will engage ICE negotiation.

conn.addCandidates(farCandidates);

Once ICE negotiation is complete media will automatically start to flow.

Third Party Call Setup

Under this proposal dealing with glare becomes an issue for the javascript and web application implementation. It is possible to use existing solutions, or take advantage of the fact that we have a central web application that is able to both initiate and mediate call setup. If call setup is initiated by the web application (in response to a user input) then we can avoid the race condition of two independent call setups occurring simultaneously.

Each client is initially requested to present their codec listing to the web application:

codecs = WebRTC.Codecs();

The web application can then perform any codec capability matching required and select a shared codec, instructing the clients to create streams and begin ICE candidate exchange:

stream = WebRTC.Stream({

    codec: webAppAllocatedCodec,    

    source: mySource,

    sink: mySink,

    syncGroup: mySyncGroup

});

conn = WebRTC.Connection({

    relays: [{type:"stun",ip:"192.0.2.1"},

            ],

    stream: stream,

    onCandidates: function(nearCandidates) {

Here we have our local candidates and can forward to the remote client.

    },

}

Finally when we receive the far side candidates we engage ICE negotiation.

conn.addCandidates(farCandidates);

At this point both media and ICE have been negotiated and media can start to flow.