Research Handbook

Beta version

By Juan Ortiz Freuler

Welcome to the Algorithms and AI Handbook and thank you for participating in this study. This guide introduces the assessment process, including information on the methodology, guidance on the sources you can use and how to cite them. This research is exploratory, so we welcome your feedback at all and any points of the process of implementation. You should read this handbook in detail before starting your research.

Overview

This document acts as a handbook for researchers, and a reference for those seeking to understand the data that has been made available through the study. This should be considered a living document as we define the best approach to deal with the subject matter at hand. Your help and feedback is valuable in assisting us to test, verify and develop the research methodology for this project. You can send your feedback on the methodology and handbook to juan.ortiz@webfoundation.org and carlos.iglesias@webfoundation.org or comment directly on this online version.

The main aim of this handbook is to provide consistent referencing and to ensure a modular approach to the reporting, which will help us merge the findings, and facilitate the publication process.

Before you move forward, it is important that we discuss a common set of definitions for the terms statistical models, algorithms and AI. These definitions are by no means complete nor should be followed rigidly. They should be interpreted as a common ground from which to start these conversations. These concepts are relatively new, and are subject to contextual variations which are also relevant to this study. These concepts are in fact all closely related, and are presented below in order from least to most complex inner workings.

Definitions:

Statistical Model: A type of mathematical model that is used to extract conclusions regarding a large population based on a smaller set of samples.

Algorithms: set of encoded procedures or a logical series of steps for organising and acting on a body of data to quickly achieve a desired outcome^[1]

Artificial Intelligence: mechanisms to define a course of action that will maximize its chance of achieving a given goal in an evolving environment.^[2] This includes techniques such as machine learning and deep learning.

RESEARCH METHODOLOGY

Given the exploratory nature and the time constraints, we will take a two pronged approach: On the one hand we will specifically target three areas where we have reasons to believe there is a high chance interesting–and potentially problematic–uses of AI, algorithms and/or advanced statistical models are being used (pensions, inflation and social benefits). We will target them by filing access to information petitions in the first week, and identifying actors who might guide us through the available information on these topics. On the other hand we will conduct an open exploration of the available information regarding the use of these methods by governments, and identify novel cases.

The purpose of this research is to

Identify key public and private stakeholders in the field
Identify good and bad practices in terms of use of advanced data driven methods for decision-making by the public sector
Foster broad public debate on government use of statistical models, algorithms, and AI
Establish recommendations in terms of procurement processes that should be upheld by government units purchasing software with these capabilities

We have broken down the deliverables into four parts. Each set of deliverables will be reviewed by the project management and coordination team, who will provide feedback over email, and during regular calls. This will help keep an active feedback loop between all the researchers involved.

As a general note, we commend researchers to consider gender in the implementation and the design of all sub- instruments and collection tools. This implies ensuring the collection of gender-disaggregated data where possible, ensuring a gender balance when arranging interviews, and that any other instrument implemented for data collection is appropriate to capture the voices of men, women, and other gender identities given the country/region and topic in question.

Week 1. Scoping the terrain:

Partial deadline: Feb 16

Deliverable: Tables I & II and the filed access to information petitions

Given this is exploratory research, it is important to first conduct a broad yet systemized search to identify cases, potential interviewees, and existing research.

Though the research will eventually narrow down on 2 countries, we will first conduct an exploratoration of 4 countries, to ensure we have an interesting set of cases before we define on which countries to narrow down.

The preliminary step is for you to choose 4 countries. Things you might want to take into consideration when making this decision:

A. Where would you be interested in carrying out research on these topics?

B. Which countries do you think are most likely to be leveraging some of these methods?

C. In which countries do you have a set of contacts that might help you execute the research most effectively?

1. Access to information petitions (first set - a second set will be sent further on to complement these, and to gather further information regarding practices that might surface through the scoping exercise)

We have identified five areas where there is likely to be some interesting use of these data-driven methods. Given the limited timeline, we suggest you file these access to information petitions during the first days. The format is usually informal, though the means through which to submit and retrieve government information might vary from country to country. We are happy to provide assistance and contacts to facilitate this process.

Draft questions for access to information petitions on five separate topics are available (Spanish / English)

For each of the 4 countries:

2. Identify key actors and sources from:

Government
Media
Academia and Civil Society (will be explored through point 3)

Create a table with the names and URLs of each of the key Ministries and oversight institutions. Suggestion: Type “Ministries” + [Name of target country] + Wikipedia and it is likely that you will find a list of Ministries. Repeat for autonomous governmental bodies, such as Supreme Audit Institutions, Regulatory Agencies, Ombuds, and other decentralized units of government.

Example Table I- URLs

Type	Name	URL
Government	Ministry of Health	https://www.argentina.gob.ar/salud
Government	Ministry of Innovation	https://www.argentina.gob.ar/ciencia
Media	Perfil	http://perfil.com/

B. Create a string of websites for each type of website (government or media), by introducing an OR between each URL. This will be used in step C. A sample spreadsheet is available here to simplify the process. It includes the formulas to automatically introduce the ORs between the URLs. Create a copy of the spreadsheet so the original is available to other researchers.

C. Run 6 separate queries on https://www.google.com/advanced_search

In the local language, we will search for:

i) Algorithms (and associated terms such as algorithmically)

ii) “Artificial Intelligence” (and associated terms such as “machine learning”, “deep learning”

iii) “Statistical model”

We will do so in two stages

Using string of :

i) Government websites

Search for:

i) Algorithms (and associated terms such as algorithmically)

ii) “Artificial Intelligence” (and associated terms such as “machine learning”, “deep learning”

iii) “Statistical model”

Using string of :

ii) Media websites

Search for:

i) (Government OR Ministry OR State) AND [Name of country] AND Algorithms (and associated terms such as algorithmically)

ii) (Government OR Ministry OR State) AND “Artificial Intelligence” (and associated terms such as “machine learning”, “deep learning”

iii) (Government OR Ministry OR State) AND “Statistical model”

Search settings: Conduct one limiting results to PDF documents, and one that searches throughout.

D. Create a new table with aggregate results and corresponding hyperlinks to the queries:

Example Table II - Metadata:

Type	Query keyword	Pdf/All	URL (after search)	Number of results
Government	Algorithms OR Algorithmically	PDF	https://www.google.com.ar/search?q=%22algoritmo%22+%2B+%22filetype%3Apdf+site%3Ainadi.gob.ar+OR	38
Government	Algorithms OR Algorithmically	ALL	https://www.google.com.ar/search?q=%22algoritmo%22+%2B+%22filetype%3Apdf+site%3Ainadi.gob.ar+OR	44
Government	"Artificial intelligence" OR "machine learning" OR

3. Existing Research

Using Google Scholar: https://scholar.google.com

Search settings:

Dates: 2013-2018
No citations or patents

Search for:

i) Government AND [Name of country] AND Algorithms (and associated terms such as algorithmically)

ii) Government AND [Name of country] AND “Artificial Intelligence” (and associated terms such as “machine learning”, “deep learning”

iii) Government AND [Name of country] AND “Statistical model”

Add results to Table 2

WEEK 2. Qualitative analysis of information retrieved and narrowing focus

Partial deadline: Feb 23

Deliverable: Table III

i) Narrowing the focus through desk research

Sift through the results retrieved by each query. Create a new table to systemize these findings and try to identify cases for further inquiry, and potential interviewees.

The following table is a placeholder. You are free to adapt it in ways that might best serve your interests and needs. The purpose of this phase is to narrow in on 5 cases per country, and eventually define the Y number of countries on which the in-depth research will be conducted. To do so, we want to detect those cases and countries for which the research seems more promising and interesting.

Table III - Systemizing relevant cases and contacts (8 per country)

Country	Ministry	Description	Links	Type (Algorithm, Stat Model, AI)	Promising (High, Medium, Low)	Potential interviewee Name	Gender (M/F)	Contact details	Contacted (Y/N)	Interview date	Conducted (Y/N)

ii) Interviews to get a more in-depth perspective

Though several components can be answered drawing upon online sources, published materials and desk research, we are hoping to answer a set of questions an interview with data experts or academics, NGOs, journalists, or preferably the government officials who leverage the technology, will be the best positioned to provide the relevant insights. When conducting an interview you must explain clearly to these sources:

That you are undertaking research for the World Wide Web Foundation, a multi-country study of statistical models, algorithms, and AI for policy design, implementation, and evaluation;
That any responses they give may be placed in a public open dataset, and that such dataset will be shared openly at the website in the following months;
That they are under no obligation to respond to your questions, and can withdrawn from the interview at any time;

You should keep a clear written record, ideally with e-mail trail where interviews or consultations were arranged by e-mail, to show that you have asked each source to consent to the use of their responses.

Content of the interview: We suggest you conduct a semi-structured interview, with a set of pre-established questions, but allowing for the interviewee to digress as she or he pleases to.

You can find a draft template that might be useful here

We expect at least three interviews to be conducted per country, with a balance in gender and stakeholders (government, academia, civil society, and media. Industry actor might also be considered, if there is reason to believe they have valuable insights).

WEEK 3. Narrowing in on the selected cases

Partial deadline: March 2

Deliverables:

Table III + interview notes
Table IV
Second set of access to information petitions submitted (note that the final report is due March 11. Please consider governments often take up to 30 days to reply)
Datasets retrieved

In this part we will narrow in on the selected cases and carry out an in-depth assessment of the statistical models, algorithms and AI being deployed. This will combine desk research with the findings from the interviews. At this stage, the researcher is expected to file access to information petitions to complement available information with the information that is lacking. When the information is in fact available, the access to information petition should be used to check that the sources are not outdated, as well as any other key piece of information that might help complete the final report.

Table IV - See model table here

WEEKS 4 & 5. Consolidating the findings and writing the report

Partial deadline: March 16

Deliverables:

Replies to access to information petitions
Country report for each country

The report should not exceed 3 pages per country.

In line with the general objectives of the project (check Handbook introduction), the report should include:

i) General introduction to the region (including general development statistics, access to the internet, open data)^[3]. - Maximum 1 page

ii) For each country:

Overview of the country’s position in regional terms (drawing on the general introduction), and overview of the findings.

For each of the five chosen statistical models/Algorithms/AI:

A brief description of the findings consolidated into Table IV
An assessment regarding the issues of public interest regarding the findings

iii) A general conclusion and set of recommendations for the region

iv) Bibliography and Annex should be consolidated for all countries, using appropriate headings.

The bibliography should include links to the relevant public documents and relevant reports.

The annex should include each of the tables (I-IV)

WEEK 6. Peer Review and adjustments

Deadline: March 23

Web Foundation team members and other researchers may ask for you to provide feedback and brief justifications for each of your comments and evidence to support them.

You may be requested to review the documents presented by researchers from other regions, and detect any anomalies that may need further review.

WEEKS 7. Final edits, consolidation and publication

Deadline: March 30

Web Foundation will consolidate the reports, and coordinate for publication with designers, and internal communications team.

[1] For more information please read http://webfoundation.org/docs/2017/07/WF_Algorithms.pdf

[2] For more information see http://webfoundation.org/docs/2017/07/AI_Report_WF.pdf

[3] Sources to consider:

https://publicadministration.un.org/egovkb/Data-Center , http://opendatabarometer.org/4thedition/data/ https://publicadministration.un.org/egovkb/en-us/Reports/UN-E-Government-Survey-2016