Gnip API v2.0 (rev 5)

This document is divided into three sections:



Introduction


Before you can do anything with the API you'll need an account. Get one here.


The Gnip API provides two major components: change notifications for activities (events), and the full content associated with those activities. Activity examples include a user "notice" (twitter), a user "dugg" (digg), a user creating a blog post, etc. Activity examples can be user generated, or machine generated.


There are two primary roles that API users fall into; Publishers and Subscribers. You may be one, the other, or both depending on your situation. Publishers push data into the system; here's a Publisher example. Subscribers consume data from the system; here's a Subscriber example.


Activities are published into activity streams that are related to a Publisher. Gnip API Subscribers can subscribe to activity streams as a way to be notified of activities of interest. Activity streams come in two flavors: Public Timelines and Filters. An activity stream is comprised of change notifications and can optionally contain full data. Full data provides the raw, original, untouched (though encoded) activity information used to create the Activity XML. In order to protect the integrity of Gnip XML, it is gzip'd and base64 encoded in the <raw> element. Subscribers are responsible for decoding and gunzip'ing this element if they are interested in it.


A Public Timeline is the stream of all the activities from a given Publisher. Note that not all Publishers support Public Timelines, and Public Timelines do not contain full data. For example, the "twitter" Publisher is the stream of all public tweet notifications published via twitter. Each Publisher has a system-wide unique ID that is used to identify it. Activities may be retrieved by any Subscriber, but only the Publisher may add activities to the stream.


Public Timelines can be "polled" (HTTP GET) from Gnip.


A Filter is a stream containing all the activities that meet Subscriber defined criteria. For example, Filters allow Subscribers to create activity streams containing just the activities from the user names they're interested in, or all activities from Publisher "foo," that are tagged with a given string. The Subscriber who creates a Filter, is the only one who has access to it. Filter activities have the option to contain full data.


Filters can be "polled" (HTTP GET) from Gnip, or "pushed" (HTTP POST) to Subscribers who provide a POST URL endpoint during Filter creation.


General API Usage Information


Examples 

Examples of Activities

The examples below are broken divided into those for Subscribers to data from Gnip and into those for Publishing data into Gnip.  Most Gnip users are Subscribers to data from Gnip, but depending on the application, a user may be a Subscriber, a Publisher, or both.  Both Publishers and Subscribers need to be familiar with Gnip activities which are individually represented by an <activity> document and collections of <activity/> documents are wrapped in <activities/> documents.  Below are examples of each; the first example represents a "notification"-style activity that does not contain the activity's "full-data".  Second is a "full-data" <activity/> document.  Documents like these are emitted by both a Publisher's notification stream as well as a Filter's notification and activity streams.

Sample of a notification Activity XML (.../notification/...)

 <activities publisher="digg">
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="cryosteel" at="2008-09-19T16:20:22.000-04:00">
</activity>
</activities>
Sample of a full-data Activity XML (.../activity/...)
 <activities publisher="digg">
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="cryosteel" at="2008-09-19T16:20:22.000-04:00">
<payload>
<body>So in summary, we have two choices. Allow routine catastrophic boom and bust population cycles that make humanity and all life miserable because we "shouldn't judge others". Or, not behave like bacteria-with-morals inhabiting the confines of a petri dish, take advantage of what we'll learn from the human genome, and let our limited space go to the most promising half billion from our species until we manage to successfully conquer new environs.</body>
<raw>gzip'd, base64'd original activity meta-data</raw>
</payload>
</activity>
</activities>

Examples for Subscribers

Retrieve recent Activities for a given Publisher

This requires having a prior knowledge of the Publisher ID (a current list of Publisher IDs can be found here -- requires a Gnip account). In order to retrieve recent activities for a given Publisher (e.g. "digg"), construct a URL for the Publisher of interest and perform a HTTP GET to retrieve the application/xml representation of that activity stream; for details go here. You can only access activities without full data, e.g. "notifications," for a Publisher's Public Timeline. If you want full data, you need to create a Filter and access it that way; for an example, go here.

===>
GET /publishers/digg/notification/current.xml
Accept: application/xml

<---
200 OK
Content-Type: application/xml

<activities publisher="digg">
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="cryosteel" at="2008-09-19T16:20:22.000-04:00">
</activity>
</activities>

Retrieve activity for a Publisher n minutes ago (up to 60 minutes in the past)

If you are interested in a point in the past relative to now, you will need to determine what the difference is between you local clock and Gnip’s clock, to minimize clock drift as a variable; for details go here.


===>

  HEAD /

<---

  200 OK

  Date: Mon, June 9 2008, 9:08:43


The returned Date header field can be used to figure the difference between Gnip’s clock and the local one. Once that is known, the activity bucket ID for the time of interest can be calculated. Once the activity bucket ID is known its URL can be constructed using the same format as above, replacing "current" with an activity bucket ID (e.g. 200806090910), and the data can be retrieved. See the below section called "Activity Buckets".

===>
GET /publishers/digg/notification/200806090910.xml
Accept: application/xml
<---
200 OK
Content-Type: application/xml

<activities publisher="digg">
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="cryosteel" at="2008-09-19T16:20:22.000-04:00">
</activity>
</activities>

Create a Filter and retrieve its Activity

To create a Filter you need one, or more, "rules."  Each Publisher has a set of supported rule types which constrain the types of rules that can be specified for a Filter in a Publisher.  For example, if a Publisher supports the rule types "Actor" and "To", a Filter can contain rules of using those types but cannot contain rules of type "Regarding", "Tag", and "Source".  If a Filter is created with a rule type that a Publisher does not support, the server will return a response containing of an error message and error status code.  Here is a list of Publishers.


  Send an HTTP POST request to create a Filter.  For details go here.

===>
POST /publishers/digg/filters.xml
Accept: application/xml
Content-Type: application/xml

<filter name="example" fullData="true">
<rule type="actor" value="joe"/>
<rule type="actor" value="jane"/>

</filter>
<---
200 OK
Content-Type: application/xml

<result>Success</result>

Once the Filter is created, accessing full-data activity associated with it is similar to accessing Publishers' notifications.

===>
GET /publishers/digg/filters/example/activity/current.xml
Accept: application/xml
<---
200 OK
Content-Type: application/xml

<activities publisher="digg">
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="cryosteel" at="2008-09-19T16:20:22.000-04:00">
<payload>
<body>So in summary, we have two choices. Allow routine catastrophic boom and bust population cycles that make humanity and all life miserable because we "shouldn't judge others". Or, not behave like bacteria-with-morals inhabiting the confines of a petri dish, take advantage of what we'll learn from the human genome, and let our limited space go to the most promising half billion from our species until we manage to successfully conquer new environs.</body>
<raw>gzip'd, base64'd original activity meta-data</raw>
</payload>
</activity>
</activities>

Create a Filter and have its Activity POSTed to specified URL

Create a Filter as in the previous example and in addition specify a postUrl to which activities should be HTTP POSTed. Gnip validates the URL by making a HEAD request to it, so ensure that the postUrl responds successfully to an HTTP HEAD request.  For details go here.

===>
POST /publishers/digg/filters.xml
Accept: application/xml
Content-Type: application/xml

<filter name="example" fullData="true"
>
<postUrl>
http://mysite.example/inbound-activity-handler.cgi</postUrl>
<rule type="actor" value="joe"/>
<rule type="actor" value="jane"/>

</filter>
<---
200 OK
Content-Type: application/xml

<result>Success</result>

Once the Filter is created, any activity that matches rules in the filter will be POSTed to the specified URL; see below for an example HTTP exchange that occurs when sending activities to a postURl. Gnip activity POSTing is currently fire-and-forget and does not currently interpret HTTP response codes from the POST.

===>
POST http://mysite.example/inbound-activity-handler.cgi
Content-Type: application/xml

<activities publisher="digg">
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="joe" at="2008-09-19T16:20:22.000-04:00">
<payload>
<body>So in summary, we have two choices. Allow routine catastrophic boom and bust population cycles that make humanity and all life miserable because we "shouldn't judge others". Or, not behave like bacteria-with-morals inhabiting the confines of a petri dish, take advantage of what we'll learn from the human genome, and let our limited space go to the most promising half billion from our species until we manage to successfully conquer new environs.</body>
<raw>gzip'd, base64'd original activity meta-data</raw>
</payload>
</activity>
</activities>

<---
200 OK
A note on authentication: Gnip does not support a Filter owner specifying credentials to use on requests that post activities to a Filter's postUrl.  As an interim solution until Gnip supports such authentication, a Filter owner can provide a token on the postUrl as a parameter.  If this token is only known to Gnip and the postUrl's server, it can be validated on your server when Gnip sends activities to a postUrl to ensure that POST requests originate from Gnip.

Examples for Publishers

Publish Activities to an Activity Stream

If you are a Publisher interested in publishing Activities to an Activity Stream, POST application/xml to a previously created Publisher like this. For details go here.


===>

  POST /publishers/PUBLISHER-NAME/activity.xml

  Accept: application/xml

  Content-Type: application/xml

 <activities publisher="digg">
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="joe" at="2008-09-19T16:20:22.000-04:00">
<payload>
<body>So in summary, we have two choices. Allow routine catastrophic boom and bust population cycles that make humanity and all life miserable because we "shouldn't judge others". Or, not behave like bacteria-with-morals inhabiting the confines of a petri dish, take advantage of what we'll learn from the human genome, and let our limited space go to the most promising half billion from our species until we manage to successfully conquer new environs.</body>
<raw>gzip'd, base64'd original activity meta-data</raw>
</payload>
</activity>
</activities>

<---

  200 OK

  Content-Type: application/xml


  <result>Success</result>

Details

Response Codes

Activity Streams

Activity streams are collections of activities which may be retrieved and, in some situations, to which activities may be published.   Publishers POST new activities to activity streams; they are the start of data flow through the system. Subscribers can retrieve activity data via activity buckets (see the Activity Buckets section below).


https://prod.gnipcentral.com/publishers/PUBLISHER-NAME/activity.xml

(Replace PUBLISHER-NAME with the name of an actual publisher. The list of current Publisher's can be found
here)

GET:
Returns an XML representation (Activity Stream XML) of the activity Publisher activity stream.
Response Content-Type: application/xml


https://prod.gnipcentral.com/collections/COLLECTION-NAME/activity.xml

(Replace COLLECTION-NAME with the name of an actual collection owned by you.)
GET:
Returns an XML representation (Activity Stream XML) of the activity Publisher activity stream.

Activity Stream XML

An activity stream represented in XML looks like the following. Full Gnip XML schema can be found here.

<activityStream>
  <activitiesAddedAt>2008-07-28T11:32:50Z</activitiesAddedAt>
  <buckets>
    <bucket href="/publishers/example/activity/200807281130.xml"/>
    <bucket href="/publishers/example/activity/200807281125.xml"/>
    <bucket href="/publishers/example/activity/200807281130.xml"/>
    <!-- ... -->
  </buckets>
</activityStream>

Activity Buckets

Activity streams are segmented into buckets based on time. Each bucket contains all the activities for the stream that are reported during the bucket’s duration. Each bucket has a ID that is the time stamp, in UTC, of the first minute bucket. The bucket ID format is %Y%m%d%H%M%S (e.g. the strftime conversion spec). For example, bucket with ID 200806090913 would contain all activity reported during between 9:13 am and 9:14 UTC am, inclusive, on June 9, 2008. Activity buckets are only valid for the last 60 minutes. The "current.xml" bucket holds the "current" activities up to one minute.


https://prod.gnipcentral.com/publishers/PUBLISHER-NAME/notification/current.xml
https://prod.gnipcentral.com/publishers/PUBLISHER-NAME/notification/BUCKET-ID.xml
https://prod.gnipcentral.com/publishers/PUBLISHER-NAME/filters/FILTER-NAME/activity[notification]/current.xml
https://prod.gnipcentral.com/publishers/PUBLISHER-NAME/filters/FILTER-NAME/activity[notification]/BUCKET-ID.xml

GET:
Returns a representation (Activities XML) of the activity bucket specified.
Response Content-Type: application/xml

Activities XML

A set of activities are represented in XML documents that look like this. Full Gnip XML schema can be found here.

 <activities publisher="digg">
    <activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="joe" at="2008-09-19T16:20:22.000-04:00">
        <!-- <OPTIONAL> -->
        <payload>
            <body>So in summary, we have two choices. Allow routine catastrophic boom and bust population cycles that make humanity and all life miserable because we "shouldn't judge others". Or, not behave like bacteria-with-morals inhabiting the confines of a petri dish, take advantage of what we'll learn from the human genome, and let our limited space go to the most promising half billion from our species until we manage to successfully conquer new environs.</body>
            <raw>gzip'd, base64'd original activity meta-data</raw>
        </payload>

        <!-- </OPTIONAL> -->
    </activity>
</activities>

Filters

A Filter consists of rules that are used to match activities flowing through a Publisher's activity stream.  Filters may be created, edited, and deleted at will by their owner. Filter names are unique across the Gnip system; two accounts cannot have a Filter with the same name. They are owned by a single account and only that account can manipulate the Filter's definition or access the activity stream produced by it. Filter rules are OR'd together, and there is no AND support.


If a Filter specifies a postUrl, new activities that meet the collection criteria will be posted to the specified URL.  POSTed activities include full data by default.


https://prod.gnipcentral.com/publishers/PUBLISHER-NAME/filters.xml

GET:
    Returns the Filters XML document which lists all the filters, including their rules, you own for a Publisher.
    Response Content-Type: application/xml

POST:
Accepts a Filter XML document and creates a new filter based on that information. If your Filter document is large it should be gzip'd, and sent with the "Content-Encoding: gzip" header. If you wish to update a filter, use PUT, described below.  POSTing a Filter twice with the same name will result in an error response from the server.
Request Content-Type: application/xml
Response Content-Type: application/xml

https://prod.gnipcentral.com/publishers/PUBLISHER-NAME/filters/FILTER-NAME.xml

GET:
Returns a Filter XML representation of the specified Filter.
Response Content-Type: application/xml
PUT:
Accepts a Filter XML document and updates the specified Filter to match the data provided. Gnip will replace the existing Filter with this document. If you're trying to create a Filter for the first time, use POST, described above.
Request Content-Type: application/xml
Response Content-Type: application/xml
DELETE:
Removes the specified Filter.

Filter XML

Filter definitions are represented as XML documents similar to this. Full Gnip XML schema can be found here.


<filter name="example" fullData="true|false">
     <postUrl>optional</postUrl>
     <rule type="actor" value="joe"/>
     <rule type="tag" value="cars"/>
</filter>

Adding, removing, and querying rules
A Filter's rule set can be incrementally updated and queried via Filter's /rules endpoint.  The rules endpoint can be used to add a single rule or to batch add a set of rules. 

https://prod.gnipcentral.com/publishers/PUBLISHER-NAME/filters/FILTER-NAME/rules

GET:
        Accepts a query string consisting of a single rule to search for in the Filter named FILTER-NAME.
        For example: .../publishers/PUBLISHER-NAME/filters/FILTER-NAME/rules?type=actor&value=joe
        Request Content-Type: *
        Response Content-Type: application/xml
        Response Codes:
            200: if the rule is found in the Filter.  The response body will contain the XML of found <rule/>.
            404: if the rule could not be found.  
POST:
        Accepts two XML document types:
            Send a rule XML document (e.g. <rule type="" value=""/>) to add a single rule to FILTER-NAME.
            Send a rules XML document (e.g. <rules>...</rules>) that wraps one or more <rule/> documents to batch add to FILTER-NAME.
        Request Content-Type: application/xml
        
Response Content-Type: application/xml
        Response Codes:
            200: if an individual rule is successfully added to the Filter. 
                    if sending a <rules/> document, a 200 will be returned if the document is successfully received, but the server processes a batch update asynchronously, so for <rules/> a 200 does not imply that they have been added to the Filter.  Use either the rules/ endpoint to search for a newly added rule or a GET on the whole Filter to ensure all rules in the batch were added.
            400: if adding a <rule/> with a rule type that isn't supported by the publisher

DELETE:
Accepts a query string consisting of a single rule to be removed from FILTER-NAME.
For example: .../publishers/PUBLISHER-NAME/filters/FILTER-NAME/rules?type=actor&value=joe

Request Content-Type: *
Response Content-Type: *

Large Filters and the rules/ endpoint

When using Gnip, it is common for a Filter's rule set to get "large" and have tens or hundreds of thousands (or more) rules used to match activities from a Publisher.  In some cases, this endpoint must be used to add rules to a large Filter due to the fact that making incremental changes by GETting, changing, and POSTing the entire <filter/> XML document causes a needless exchange of many megabytes of data.  Instead, large Filters can be incrementally augmented by sending batches of up to 5000 <rule/> documents wrapped inside a <rules/> document to the rules/ endpoint.  For example, to add 100,000 rules to a Filter that has an existing 100,000 rules, send 20 <rules/> documents containing 5000 rules each rather than sending 100,000 individual <rule/> documents to the rules/ endpoint (which could take days) or than sending a PUT of the Filter containing 200,000 rules (which unnecessarily transmits large amounts of repetitive rules).

Publishers

A Publisher is a system-wide unique ID that identifies an activity stream. A current list of Publishers in the Gnip system can be found here. They are owned by a single account and only that account can manipulate the publisher definition and publish activities to its activity stream.


https://prod.gnipcentral.com/publishers.xml
 

GET:
    Returns a Publisher XML document listing of current Publisher's in the system.
    Response Content-Type: application/xml

POST:
Accepts a Publisher XML document and creates a new publisher using that information.
Request Content-type: application/xml
Response Content-Type: application/xml
Publisher XML

Publisher definitions are represented as XML documents similar to this.


<publisher name="digg">
    <supportedRuleTypes>
        <type>actor</type>
    </supportedRuleTypes>
</publisher>

  Publisher XML document conform to the following schema. Full Gnip XML schema can be found here.


    <xs:element name="publisher">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="supportedRuleTypes" minOccurs="1" maxOccurs="1">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element name="type" type="ruleType" minOccurs="1" maxOccurs="unbounded"/>                          
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>                               
            <xs:attribute name="name" type="uriSafeType" use="required"/>
        </xs:complexType>
    </xs:element>

Push Recipients

Filters can be subscribed to in such a way that any time activities are published that match the Filter criteria, those activities are pushed, via an HTTP POST, to a configurable URL. The URL to which activities will be pushed is specified by the postUrl of the Filter XML. Any activities added to a Filter which has a <postUrl/>, will automatically be POSTed to that URL.  The body of these POSTs will be an Activities XML document (described above in the Activity Buckets section).  Responses to these requests are ignored.


postUrl of Filter anchor


http://mysite.example/inbound-activity-handler.cgi

POST:

Pushes new activities, as an Activities XML document, to the subcriber to this collection.

Request Content-Type: application/xml

Activities XML

A set of activities are represented in XML documents that look like this.

<activities publisher="digg">
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="cryosteel" at="2008-09-19T16:20:22.000-04:00">
<payload>
<body>
So in summary, we have two choices. Allow routine catastrophic boom and bust population cycles that make humanity and all life miserable because we "shouldn't judge others". Or, not behave like bacteria-with-morals inhabiting the confines of a petri dish, take advantage of what we'll learn from the human genome, and let our limited space go to the most promising half billion from our species until we manage to successfully conquer new environs.
</body>

<raw>gzip'd, base64'd original activity meta-data</raw>

</payload>
</activity>
</activities>

Errors

In the event of Gnip being unable to fulfill a request it will return a response with the most appropriate response code and a human readable message in the body.  If the request could not be fulfilled because there was something wrong with the request a 4xx series response code will be used.  If the request was not fulfilled for any other reason a 5xx series response code will be used.

Error XML


An error represented in XML looks like this:


<error>"foo_bar" is not a valid publisher name.  Only letter, number, '.', '+' and '-' characters are allowed in publisher names.</error>


Remember, depending on your HTTP client, an <error/> message may only be available as part of an error input stream in the event that the server returns an HTTP response with a non-200 response code.  For example, in the case of Java's java.net.HttpURLConnection, the <error/> message would be available via the getErrorStream() method.


Revision History

  • rev 5 - added documentation for batch and search support in the rules/ endpoint.  Added documentation to the Activities section of the samples.  Rearranged some links and made some XML formatting changes.
  • rev 4 - updated for publisher capabilities, added paragraph on authentication for activity push, remove references to JIDs, document cleanup
  • rev 3 - updating publisher activity endpoint to reflect the correct one.
  • rev 2 - added verbiage to further clarify filter management.
  • rev 1 - moved bucket-ID API from five minute buckets to single minute buckets
  • rev 0 - initial cut of the 2.0 document