This document is divided into three sections:
Before you can do anything with the API you'll need an account. Get one here.
The Gnip API provides two major components: change notifications for activities (events), and the full content associated with those activities. Activity examples include a user "notice" (twitter), a user "dugg" (digg), a user creating a blog post, etc. Activity examples can be user generated, or machine generated.
There are two primary roles that API users fall into; Publishers and Subscribers. You may be one, the other, or both depending on your situation. Publishers push data into the system; here's a Publisher example. Subscribers consume data from the system; here's a Subscriber example.
Activities are published into activity streams that are related to a Publisher. Gnip API Subscribers can subscribe to activity streams as a way to be notified of activities of interest. Activity streams come in two flavors: Public Timelines and Filters. An activity stream is comprised of change notifications and can optionally contain full data. Full data provides the raw, original, untouched (though encoded) activity information used to create the Activity XML. In order to protect the integrity of Gnip XML, it is gzip'd and base64 encoded in the <raw> element. Subscribers are responsible for decoding and gunzip'ing this element if they are interested in it.
A Public Timeline is the stream of all the activities from a given Publisher. Note that not all Publishers support Public Timelines, and Public Timelines do not contain full data. For example, the "twitter" Publisher is the stream of all public tweet notifications published via twitter. Each Publisher has a system-wide unique ID that is used to identify it. Activities may be retrieved by any Subscriber, but only the Publisher may add activities to the stream.
Public Timelines can be "polled" (HTTP GET) from Gnip.
A Filter is a stream containing all the activities that meet Subscriber defined criteria. For example, Filters allow Subscribers to create activity streams containing just the activities from the user names they're interested in, or all activities from Publisher "foo," that are tagged with a given string. The Subscriber who creates a Filter, is the only one who has access to it. Filter activities have the option to contain full data.
Filters can be "polled" (HTTP GET) from Gnip, or "pushed" (HTTP POST) to Subscribers who provide a POST URL endpoint during Filter creation.
<activities publisher="digg">
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="cryosteel" at="2008-09-19T16:20:22.000-04:00">
</activity>
</activities> Sample of a full-data Activity XML (.../activity/...) <activities publisher="digg">
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="cryosteel" at="2008-09-19T16:20:22.000-04:00">
<payload>
<body>So in summary, we have two choices. Allow routine catastrophic boom and bust population cycles that make humanity and all life miserable because we "shouldn't judge others". Or, not behave like bacteria-with-morals inhabiting the confines of a petri dish, take advantage of what we'll learn from the human genome, and let our limited space go to the most promising half billion from our species until we manage to successfully conquer new environs.</body>
<raw>gzip'd, base64'd original activity meta-data</raw>
</payload>
</activity>
</activities> This requires having a prior knowledge of the Publisher ID (a current list of Publisher IDs can be found here -- requires a Gnip account). In order to retrieve recent activities for a given Publisher (e.g. "digg"), construct a URL for the Publisher of interest and perform a HTTP GET to retrieve the application/xml representation of that activity stream; for details go here. You can only access activities without full data, e.g. "notifications," for a Publisher's Public Timeline. If you want full data, you need to create a Filter and access it that way; for an example, go here.
===>
GET /publishers/digg/notification/current.xml
Accept: application/xml
<---
200 OK
Content-Type: application/xml
<activities publisher="digg">
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="cryosteel" at="2008-09-19T16:20:22.000-04:00">
</activity>
</activities>
If you are interested in a point in the past relative to now, you will need to determine what the difference is between you local clock and Gnip’s clock, to minimize clock drift as a variable; for details go here.
===>
HEAD /
<---
200 OK
Date: Mon, June 9 2008, 9:08:43
The returned Date header field can be used to figure the difference between Gnip’s clock and the local one. Once that is known, the activity bucket ID for the time of interest can be calculated. Once the activity bucket ID is known its URL can be constructed using the same format as above, replacing "current" with an activity bucket ID (e.g. 200806090910), and the data can be retrieved. See the below section called "Activity Buckets".
===>
GET /publishers/digg/notification/200806090910.xml
Accept: application/xml
<---
200 OK
Content-Type: application/xml<activities publisher="digg">
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="cryosteel" at="2008-09-19T16:20:22.000-04:00">
</activity>
</activities>
To create a Filter you need one, or more, "rules." Each Publisher has a set of supported rule types which constrain the types of rules that can be specified for a Filter in a Publisher. For example, if a Publisher supports the rule types "Actor" and "To", a Filter can contain rules of using those types but cannot contain rules of type "Regarding", "Tag", and "Source". If a Filter is created with a rule type that a Publisher does not support, the server will return a response containing of an error message and error status code. Here is a list of Publishers.
Send an HTTP POST request to create a Filter. For details go here.
===>
POST /publishers/digg/filters.xml
Accept: application/xml
Content-Type: application/xml
<filter name="example" fullData="true">
<rule type="actor" value="joe"/>
<rule type="actor" value="jane"/>
</filter>
<---
200 OK
Content-Type: application/xml
<result>Success</result>
Once the Filter is created, accessing full-data activity associated with it is similar to accessing Publishers' notifications.
===>
GET /publishers/digg/filters/example/activity/current.xml
Accept: application/xml
<---
200 OK
Content-Type: application/xml
<activities publisher="digg">
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="cryosteel" at="2008-09-19T16:20:22.000-04:00">
<payload>
<body>So in summary, we have two choices. Allow routine catastrophic boom and bust population cycles that make humanity and all life miserable because we "shouldn't judge others". Or, not behave like bacteria-with-morals inhabiting the confines of a petri dish, take advantage of what we'll learn from the human genome, and let our limited space go to the most promising half billion from our species until we manage to successfully conquer new environs.</body>
<raw>gzip'd, base64'd original activity meta-data</raw>
</payload>
</activity>
</activities>
Create a Filter as in the previous example and in addition specify a postUrl to which activities should be HTTP POSTed. Gnip validates the URL by making a HEAD request to it, so ensure that the postUrl responds successfully to an HTTP HEAD request. For details go here.
===>
POST /publishers/digg/filters.xml
Accept: application/xml
Content-Type: application/xml
<filter name="example" fullData="true">
<postUrl>http://mysite.example/inbound-activity-handler.cgi</postUrl>
<rule type="actor" value="joe"/>
<rule type="actor" value="jane"/>
</filter>
<---
200 OK
Content-Type: application/xml
<result>Success</result>
Once the Filter is created, any activity that matches rules in the filter will be POSTed to the specified URL; see below for an example HTTP exchange that occurs when sending activities to a postURl. Gnip activity POSTing is currently fire-and-forget and does not currently interpret HTTP response codes from the POST.
===>
POST http://mysite.example/inbound-activity-handler.cgi
Content-Type: application/xml
<activities publisher="digg">
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="joe" at="2008-09-19T16:20:22.000-04:00">
<payload>
<body>So in summary, we have two choices. Allow routine catastrophic boom and bust population cycles that make humanity and all life miserable because we "shouldn't judge others". Or, not behave like bacteria-with-morals inhabiting the confines of a petri dish, take advantage of what we'll learn from the human genome, and let our limited space go to the most promising half billion from our species until we manage to successfully conquer new environs.</body>
<raw>gzip'd, base64'd original activity meta-data</raw>
</payload>
</activity>
</activities>
<---
200 OK
If you are a Publisher interested in publishing Activities to an Activity Stream, POST application/xml to a previously created Publisher like this. For details go here.
===>
POST /publishers/PUBLISHER-NAME/activity.xml
Accept: application/xml
Content-Type: application/xml
<activities publisher="digg">
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="joe" at="2008-09-19T16:20:22.000-04:00">
<payload>
<body>So in summary, we have two choices. Allow routine catastrophic boom and bust population cycles that make humanity and all life miserable because we "shouldn't judge others". Or, not behave like bacteria-with-morals inhabiting the confines of a petri dish, take advantage of what we'll learn from the human genome, and let our limited space go to the most promising half billion from our species until we manage to successfully conquer new environs.</body>
<raw>gzip'd, base64'd original activity meta-data</raw>
</payload>
</activity>
</activities> <---
200 OK
Content-Type: application/xml
<result>Success</result>
Activity streams are collections of activities which may be retrieved and, in some situations, to which activities may be published. Publishers POST new activities to activity streams; they are the start of data flow through the system. Subscribers can retrieve activity data via activity buckets (see the Activity Buckets section below).
(Replace PUBLISHER-NAME with the name of an actual publisher. The list of current Publisher's can be found here) Activity streams are segmented into buckets based on time. Each bucket contains all the activities for the stream that are reported during the bucket’s duration. Each bucket has a ID that is the time stamp, in UTC, of the first minute bucket. The bucket ID format is %Y%m%d%H%M%S (e.g. the strftime conversion spec). For example, bucket with ID 200806090913 would contain all activity reported during between 9:13 am and 9:14 UTC am, inclusive, on June 9, 2008. Activity buckets are only valid for the last 60 minutes. The "current.xml" bucket holds the "current" activities up to one minute.
A set of activities are represented in XML documents that look like this. Full Gnip XML schema can be found here.
<activities publisher="digg"><!-- <OPTIONAL> -->
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="joe" at="2008-09-19T16:20:22.000-04:00">
<payload>
<body>So in summary, we have two choices. Allow routine catastrophic boom and bust population cycles that make humanity and all life miserable because we "shouldn't judge others". Or, not behave like bacteria-with-morals inhabiting the confines of a petri dish, take advantage of what we'll learn from the human genome, and let our limited space go to the most promising half billion from our species until we manage to successfully conquer new environs.</body>
<raw>gzip'd, base64'd original activity meta-data</raw>
</payload>
<!-- </OPTIONAL> -->
</activity>
</activities>
A Filter consists of rules that are used to match activities flowing through a Publisher's activity stream. Filters may be created, edited, and deleted at will by their owner. Filter names are unique across the Gnip system; two accounts cannot have a Filter with the same name. They are owned by a single account and only that account can manipulate the Filter's definition or access the activity stream produced by it. Filter rules are OR'd together, and there is no AND support.
If a Filter specifies a postUrl, new activities that meet the collection criteria will be posted to the specified URL. POSTed activities include full data by default.
Filter definitions are represented as XML documents similar to this. Full Gnip XML schema can be found here.
A Publisher is a system-wide unique ID that identifies an activity stream. A current list of Publishers in the Gnip system can be found here. They are owned by a single account and only that account can manipulate the publisher definition and publish activities to its activity stream.
Publisher definitions are represented as XML documents similar to this.
<publisher name="digg">
<supportedRuleTypes>
<type>actor</type>
</supportedRuleTypes>
</publisher>
Publisher XML document conform to the following schema. Full Gnip XML schema can be found here.
<xs:element name="publisher">
<xs:complexType>
<xs:sequence>
<xs:element name="supportedRuleTypes" minOccurs="1" maxOccurs="1">
<xs:complexType>
<xs:sequence>
<xs:element name="type" type="ruleType" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="name" type="uriSafeType" use="required"/>
</xs:complexType>
</xs:element>
Filters can be subscribed to in such a way that any time activities are published that match the Filter criteria, those activities are pushed, via an HTTP POST, to a configurable URL. The URL to which activities will be pushed is specified by the postUrl of the Filter XML. Any activities added to a Filter which has a <postUrl/>, will automatically be POSTed to that URL. The body of these POSTs will be an Activities XML document (described above in the Activity Buckets section). Responses to these requests are ignored.
postUrl of Filter anchor
POST:
Pushes new activities, as an Activities XML document, to the subcriber to this collection.
Request Content-Type: application/xml
A set of activities are represented in XML documents that look like this.
<activities publisher="digg">
<activity source="web" regarding="http://services.digg.com/story/8571625" to="" url="http://services.digg.com/story/8538612/comment/18959806" action="comment" actor="cryosteel" at="2008-09-19T16:20:22.000-04:00">
<payload><body>
So in summary, we have two choices. Allow routine catastrophic boom and bust population cycles that make humanity and all life miserable because we "shouldn't judge others". Or, not behave like bacteria-with-morals inhabiting the confines of a petri dish, take advantage of what we'll learn from the human genome, and let our limited space go to the most promising half billion from our species until we manage to successfully conquer new environs.
</body>
<raw>gzip'd, base64'd original activity meta-data</raw>
</payload>
</activity>
</activities>
An error represented in XML looks like this:
<error>"foo_bar" is not a valid publisher name. Only letter, number, '.', '+' and '-' characters are allowed in publisher names.</error>
Revision History