PubSubHubbub for Private Feeds

An implementor’s guide

John Panzer & Brett Slatkin

July 21st, 2010

The Scenario

The publisher provides various feeds for use by third parties.  For example, if you do WebFinger discovery on bob@example.com, you'll find a link rel="..#updates-from" which points at a feed of updates from john@example.com. This is a semi-private feed in that it may contain a mixture of public and private (ACL'd) data. In the most complex case, it contains public entries, various entries ACL'd to different access control groups, and the content of the entries themselves may be different depending on who is doing the request; the feed may be arbitrarily personalized based on the requester.

As an example, say we want to enable NetVibes (a Consumer, meaning a service that subscribes via PubSubHubbub) to both view and subscribe to Bob's #updates-from feed. The most important criteria is that this is done safely, meaning that it's difficult to get the protocol wrong and unintentionally leak private information, and securely, meaning that an external attacker will have a very difficult time getting private data without consent of the publisher.

Other goals include inventing as little as possible and being reasonably efficient.

Walkthrough

Larger diagram available here.

Step 1: Subscription

  1. A user finds a feed URI (possibly through WebFinger discovery on bob@example.com) and wants to subscribe to it, using a Consumer (NetVibes) agent. First step is to simply GET the feed, e.g., http://example.com/feeds/bobsfeed
  2. Feed publisher returns either the feed, or a 401 response, depending on whether the feed is partially public (and so can return useful data) or is completely private (and so requires auth even to start).  In either case, it returns an OAuth WWW-Authenticate: header.
  3. Consumer starts the OAuth dance, talking to the Publisher's chosen Auth system (NB: Right now it may know the right API endpoints for this by close reading of service owner’s docs and hard-coding; in the future this should be discoverable by some mechanism).
  4. At the end of the dance the Consumer gets an access token for the semi-private feed.
  5. Consumer constructs a capability URL for the feed.  The recipe for this is a profile of OAuth and/or WRAP/OAuth2; it mandates PLAINTEXT over SSL and makes the token a URL parameter. This is compatible with server OAuth libraries. (NB: This means that every feed in this system must be accessible via SSL, and indeed the capability URLs can only work over SSL and MUST return 400 Bad Request if accessed via http:. This is to discourage bozos from sending tokens in the clear.)
  6. The Consumer does a completely vanilla PubSubHubbbub protocol step to subscribe itself to the capability URL it constructed in the previous step. (NB: At this point the Consumer also binds the original URL to the capability URL, because when it shows a URL to the end user, it needs to show the original one. No leaks!)
  7. The feed hub, which is in cahoots with the Publisher, uses a private API to map from the capability URL it was handed to a Publisher's user ID. For example, a Google User ID. (NB: A next step might be to define a public API for doing this between a publisher and a third party but trusted hub, if there is demand for such a thing). Presumably in this step the system also checks the ACL for that feed (to see if that token should be allowed access). The Hub records the relationship between the user ID, callback URL, and capability URL.
  8. Normal PubSubHubbub subscription verification callback. The Consumer confirms the subscription here.

Thus endeth the subscription flow. At this point the feed consumer user has a subscription to the feed.  They probably also do an initial feed fetch (not shown) using the capability URL, which works fine as a GET, in order to retrieve the last few entries of the feed into their system. Note that in the PubSubHubbub flow, they'll probably periodically do this GET as a background catchup task in any case (in case they missed something, or to verify their OAuth token is still good). So imagine this happening every day or so.

Step 2: Bob posts an update into the Publisher's system.

  1. Publisher determines the set of Publisher user IDs who are allowed to see the content.
  2. The Publisher does a fat ping via a private API to the Private Feed Hub, including the original feed URL, the normal content (an entry) but also the ACL for that entry (a list of user IDs).
  3. The hub maps from (original feed URL, user ID) to (capability URL, callback URL) for each user ID on the list. Any mappings that don't exist are skipped. The remaining list is now a list of standard PubSubHubbub topics (the capability URLs) and callbacks.
  4. Normal PubSubHubbub is used to push the content to the callback URLs.
  5. Finally, over at the Consumer side, it translates from the capability URL it gets as a topic back to the user that initiated the subscription. This is just normal PubSubHububb; the only tricky bit is that the Consumer has to be careful to remember not to merge data for the same original URL but different capability URLs, since this would leak private data.