1 of 17

Wikipedia Tools for Google Spreadsheets

Thomas Steiner[Google]

WWW 2016, April 12, 2016, Montreal, Canada

2 of 17

Introduction

Proprietary + Confidential

3 of 17

Spreadsheet applications

  • Serve for the organization, analysis, and storage of data in tabular form.
  • Operate on data represented as cells of an array, organized in rows and columns.
  • Cells can contain numeric or textual data, or the results of formulas that automatically calculate and display a value based on the contents of other cells.

4 of 17

Wikipedia and Wikidata APIs

  • Wikipedia’s content and data is available through the Wikipedia API (https://{language}.wikipedia.org/w/api.php), where {language} represents one of the currently 291 supported Wikipedia languages.
  • Wikidata’s content and data is available through the Wikidata API (https://www.wikidata.org/w/api.php).
  • Wikipedia pageviews data can be obtained using the Pageviews API (https://wikimedia.org/api/rest v1/?doc).

5 of 17

Extending Google Spreadsheets

  • Google Spreadsheets is part of a free, Web-based software office suite offered by Google within its Google Docs service.
  • Google Spreadsheets can be extended with custom functions (or formulas) using Google Apps Scripts that are written in standard JavaScript.
  • Custom functions can access external resources on the Web by fetching URLs with the UrlFetchApp, one of the scripting services available in Google Apps Scripts.

6 of 17

Interacting with the Wikipedia API

7 of 17

Wikipedia Tools Custom Formulas

  • WIKITRANSLATE Returns Wikipedia translations (language links) for a Wikipedia article.
  • WIKISYNONYMS Returns Wikipedia synonyms (redirects) for a Wikipedia article.
  • WIKIEXPAND Returns Wikipedia translations (language links) and synonyms (redirects) for a Wikipedia article.
  • WIKICATEGORYMEMBERS Returns Wikipedia category members for a Wikipedia category.
  • WIKISUBCATEGORIES Returns Wikipedia subcategories for a Wikipedia category.
  • WIKIINBOUNDLINKS Returns Wikipedia inbound links for a Wikipedia article.
  • WIKIOUTBOUNDLINKS Returns Wikipedia outbound links for a Wikipedia article.
  • WIKIMUTUALLINKS Returns Wikipedia mutual links, i.e, the intersection of inbound and outbound links for a Wikipedia article.
  • WIKIGEOCOORDINATES Returns Wikipedia geocoordinates for a Wikipedia article.
  • WIKIDATAFACTS Returns Wikidata facts for a Wikipedia article.
  • WIKIDATAQID Returns the Wikidata qid of the corresponding Wikidata item for an article.
  • WIKIPAGEVIEWS Returns Wikipedia pageviews statistics for a Wikipedia article.
  • WIKIPAGEEDITS Returns Wikipedia pageedits statistics for a Wikipedia article.
  • WIKICOMMONSLINK Returns the Wikimedia Commons link for a file.
  • WIKICATEGORIES Returns Wikipedia categories for a Wikipedia article.
  • WIKIARTICLESAROUND Returns Wikipedia articles around a Wikipedia article or around a point.
  • WIKISEARCH Returns Wikipedia article results for a query.
  • WIKIQUARRY Returns the output of the Quarry (https://meta.wikimedia.org/wiki/Research:Quarry) query with the specified query ID.

Functions all start with a WIKI* prefix and in true spreadsheets manner are all uppercase.

8 of 17

Internals of the custom functions

  • Most functions simply directly wrap the corresponding API call.
  • WIKISYNONYMS and WIKITRANSLATE can be combined as WIKIEXPAND and each support language filters.
  • The convenience function WIKIMUTUALLINKS is the intersection of WIKIINBOUNDLINKS and WIKIOUTBOUNDLINKS.
  • WIKIDATAFACTS uses Maxime Lathuilière’s simplifyClaims function from his Wikidata SDK.
    • Single-value facts (e.g., Berlin’s ISO 3166-2 code) vs. multi-value facts (e.g., Berlin’s head of government).
    • WIKIDATAFACTS can return just single-value facts, only the first fact of multi-value facts, or all multi-value facts.

9 of 17

Wikipedia Tools Samples Sheet

10 of 17

Usage Scenarios

Proprietary + Confidential

11 of 17

Usage Scenario I: Ordered Category Panel

  • Wikipedia holds an enormous amount of categories, for example, visitor attractions in Montreal.
  • Category members obtained through a call of WIKICATEGORYMEMBERS are listed in alphabetical order.
  • If we additionally request pageviews data for each category member through a series of WIKIPAGEVIEWS calls and then sort by pageviews in descending order, we get a representative list of top-10 visitor attractions.
  • Can be enriched with photos retrieved through calls of WIKIDATAFACTS filtered on “image”.

12 of 17

Usage Scenario II: Search Ads

  • For hotel booking sites, it may be desirable to advertise based on points of interest (POIs) and create advertisements automatically featuring known facts of such POIs.
  • For example, skyscrapers listed in the category skyscrapers over 350 meter are first obtained via WIKICATEGORYMEMBERS and then checked for their “height” fact via WIKIDATAFACTS, which is then used in two templates to create ads.
  • Search keywords are generated by calling WIKISYNONYMS and combined with terms like “hotel”.

13 of 17

Usage Scenario III: Marketing Campaigns

  • On January 13, 2016, Google Maps added Street View imagery for the model railway Miniatur Wunderland.
  • Taking global Wikipedia pageviews as a popularity indicator, we can examine if the marketing campaign has had any impact on the attraction.
  • Therefore, we first obtain the Miniatur Wunderland article in all available languages via WIKITRANSLATE and then retrieve pageviews via WIKIPAGEVIEWS.

14 of 17

Usage Scenario IV: Obligatory US Elections Example

  • Get election candidates via WIKICATEGORYMEMBERS of the category United States presidential candidates, 2016.
  • Get the top-10 candidates based on pageviews via WIKIPAGEVIEWS over a customizable period of time.
  • “Super Saturday” peak on March 5, 2016 clearly visible.

15 of 17

Future Work and Conclusions

Proprietary + Confidential

16 of 17

Future Work and Conclusions

  • Add more functions as need be and potentially make the functions more parameterizable.
  • Advanced mode that allows experienced users to fine-tune the functions’ results, for example, to implicitly include bot traffic in WIKIPAGEVIEWS.�
  • Great for rapid prototyping of ideas around Wikipedia and Wikidata, solid uptake of the add-on.
  • Released as open-source with first pull requests merged and issues fixed.
  • Can encourage to add data to Wikipedia and Wikidata as a positive side effect.

17 of 17

Resources