By Juan Ortiz Freuler
Welcome to the Algorithms and AI Handbook and thank you for participating in this study. This guide introduces the assessment process, including information on the methodology, guidance on the sources you can use and how to cite them. This research is exploratory, so we welcome your feedback at all and any points of the process of implementation. You should read this handbook in detail before starting your research.
This document acts as a handbook for researchers, and a reference for those seeking to understand the data that has been made available through the study. This should be considered a living document as we define the best approach to deal with the subject matter at hand. Your help and feedback is valuable in assisting us to test, verify and develop the research methodology for this project. You can send your feedback on the methodology and handbook to juan.ortiz@webfoundation.org and carlos.iglesias@webfoundation.org or comment directly on this online version.
The main aim of this handbook is to provide consistent referencing and to ensure a modular approach to the reporting, which will help us merge the findings, and facilitate the publication process.
Before you move forward, it is important that we discuss a common set of definitions for the terms statistical models, algorithms and AI. These definitions are by no means complete nor should be followed rigidly. They should be interpreted as a common ground from which to start these conversations. These concepts are relatively new, and are subject to contextual variations which are also relevant to this study. These concepts are in fact all closely related, and are presented below in order from least to most complex inner workings.
Definitions:
Statistical Model: A type of mathematical model that is used to extract conclusions regarding a large population based on a smaller set of samples.
Algorithms: set of encoded procedures or a logical series of steps for organising and acting on a body of data to quickly achieve a desired outcome[1]
Artificial Intelligence: mechanisms to define a course of action that will maximize its chance of achieving a given goal in an evolving environment.[2] This includes techniques such as machine learning and deep learning.
Suggested readings:
New York City moves to establish algorithm-monitoring task force; Devin Coldewey - https://techcrunch.com/2017/12/12/new-york-city-moves-to-establish-algorithm-monitoring-task-force/
Bias in Criminal Risk Scores Is Mathematically Inevitable, Researchers Say; Julia Larson - https://www.propublica.org/article/bias-in-criminal-risk-scores-is-mathematically-inevitable-researchers-say
Brauneis, Robert and Goodman, Ellen P., Algorithmic Transparency for the Smart City (August 2, 2017). Yale Journal of Law & Technology, Forthcoming; GWU Law School Public Law Research Paper; GWU Legal Studies Research Paper. Available at SSRN: https://ssrn.com/abstract=3012499or http://dx.doi.org/10.2139/ssrn.3012499
Doshi-Velez, Finale and Kortz, Mason and Budish, Ryan and Bavitz, Christopher and Gershman, Samuel J. and O'Brien, David and Shieber, Stuart and Waldo, Jim and Weinberger, David and Wood, Alexandra, Accountability of AI Under the Law: The Role of Explanation (November 3, 2017). Berkman Center Research Publication Forthcoming. Available at SSRN: https://ssrn.com/abstract=3064761 or http://dx.doi.org/10.2139/ssrn.3064761
Given the exploratory nature and the time constraints, we will take a two pronged approach: On the one hand we will specifically target three areas where we have reasons to believe there is a high chance interesting–and potentially problematic–uses of AI, algorithms and/or advanced statistical models are being used (pensions, inflation and social benefits). We will target them by filing access to information petitions in the first week, and identifying actors who might guide us through the available information on these topics. On the other hand we will conduct an open exploration of the available information regarding the use of these methods by governments, and identify novel cases.
The purpose of this research is to
We have broken down the deliverables into four parts. Each set of deliverables will be reviewed by the project management and coordination team, who will provide feedback over email, and during regular calls. This will help keep an active feedback loop between all the researchers involved.
As a general note, we commend researchers to consider gender in the implementation and the design of all sub- instruments and collection tools. This implies ensuring the collection of gender-disaggregated data where possible, ensuring a gender balance when arranging interviews, and that any other instrument implemented for data collection is appropriate to capture the voices of men, women, and other gender identities given the country/region and topic in question.
Partial deadline: Feb 16
Deliverable: Tables I & II and the filed access to information petitions
Given this is exploratory research, it is important to first conduct a broad yet systemized search to identify cases, potential interviewees, and existing research.
Though the research will eventually narrow down on 2 countries, we will first conduct an exploratoration of 4 countries, to ensure we have an interesting set of cases before we define on which countries to narrow down.
The preliminary step is for you to choose 4 countries. Things you might want to take into consideration when making this decision:
A. Where would you be interested in carrying out research on these topics?
B. Which countries do you think are most likely to be leveraging some of these methods?
C. In which countries do you have a set of contacts that might help you execute the research most effectively?
1. Access to information petitions (first set - a second set will be sent further on to complement these, and to gather further information regarding practices that might surface through the scoping exercise)
We have identified five areas where there is likely to be some interesting use of these data-driven methods. Given the limited timeline, we suggest you file these access to information petitions during the first days. The format is usually informal, though the means through which to submit and retrieve government information might vary from country to country. We are happy to provide assistance and contacts to facilitate this process.
Draft questions for access to information petitions on five separate topics are available (Spanish / English)
For each of the 4 countries:
2. Identify key actors and sources from:
Create a table with the names and URLs of each of the key Ministries and oversight institutions. Suggestion: Type “Ministries” + [Name of target country] + Wikipedia and it is likely that you will find a list of Ministries. Repeat for autonomous governmental bodies, such as Supreme Audit Institutions, Regulatory Agencies, Ombuds, and other decentralized units of government.
Example Table I- URLs
Type | Name | URL |
Government | Ministry of Health | |
Government | Ministry of Innovation | |
Media | Perfil | |
B. Create a string of websites for each type of website (government or media), by introducing an OR between each URL. This will be used in step C. A sample spreadsheet is available here to simplify the process. It includes the formulas to automatically introduce the ORs between the URLs. Create a copy of the spreadsheet so the original is available to other researchers.
C. Run 6 separate queries on https://www.google.com/advanced_search
In the local language, we will search for:
i) Algorithms (and associated terms such as algorithmically)
ii) “Artificial Intelligence” (and associated terms such as “machine learning”, “deep learning”
iii) “Statistical model”
We will do so in two stages
Using string of :
i) Government websites
Search for:
i) Algorithms (and associated terms such as algorithmically)
ii) “Artificial Intelligence” (and associated terms such as “machine learning”, “deep learning”
iii) “Statistical model”
Using string of :
ii) Media websites
Search for:
i) (Government OR Ministry OR State) AND [Name of country] AND Algorithms (and associated terms such as algorithmically)
ii) (Government OR Ministry OR State) AND “Artificial Intelligence” (and associated terms such as “machine learning”, “deep learning”
iii) (Government OR Ministry OR State) AND “Statistical model”
Search settings: Conduct one limiting results to PDF documents, and one that searches throughout.
D. Create a new table with aggregate results and corresponding hyperlinks to the queries:
Example Table II - Metadata:
Type | Query keyword | Pdf/All | URL (after search) | Number of results |
Government | Algorithms OR Algorithmically | https://www.google.com.ar/search?q=%22algoritmo%22+%2B+%22filetype%3Apdf+site%3Ainadi.gob.ar+OR | 38 | |
Government | Algorithms OR Algorithmically | ALL | https://www.google.com.ar/search?q=%22algoritmo%22+%2B+%22filetype%3Apdf+site%3Ainadi.gob.ar+OR | 44 |
Government | "Artificial intelligence" OR "machine learning" OR |
3. Existing Research
Using Google Scholar: https://scholar.google.com
Search settings:
Search for:
i) Government AND [Name of country] AND Algorithms (and associated terms such as algorithmically)
ii) Government AND [Name of country] AND “Artificial Intelligence” (and associated terms such as “machine learning”, “deep learning”
iii) Government AND [Name of country] AND “Statistical model”
Add results to Table 2
Partial deadline: Feb 23
Deliverable: Table III
i) Narrowing the focus through desk research
Sift through the results retrieved by each query. Create a new table to systemize these findings and try to identify cases for further inquiry, and potential interviewees.
The following table is a placeholder. You are free to adapt it in ways that might best serve your interests and needs. The purpose of this phase is to narrow in on 5 cases per country, and eventually define the Y number of countries on which the in-depth research will be conducted. To do so, we want to detect those cases and countries for which the research seems more promising and interesting.
Table III - Systemizing relevant cases and contacts (8 per country)
Country | Ministry | Description | Links | Type (Algorithm, Stat Model, AI) | Promising (High, Medium, Low) | Potential interviewee Name | Gender (M/F) | Contact details | Contacted (Y/N) | Interview date | Conducted (Y/N) |
ii) Interviews to get a more in-depth perspective
Though several components can be answered drawing upon online sources, published materials and desk research, we are hoping to answer a set of questions an interview with data experts or academics, NGOs, journalists, or preferably the government officials who leverage the technology, will be the best positioned to provide the relevant insights. When conducting an interview you must explain clearly to these sources:
You should keep a clear written record, ideally with e-mail trail where interviews or consultations were arranged by e-mail, to show that you have asked each source to consent to the use of their responses.
Content of the interview: We suggest you conduct a semi-structured interview, with a set of pre-established questions, but allowing for the interviewee to digress as she or he pleases to.
You can find a draft template that might be useful here
We expect at least three interviews to be conducted per country, with a balance in gender and stakeholders (government, academia, civil society, and media. Industry actor might also be considered, if there is reason to believe they have valuable insights).
Partial deadline: March 2
Deliverables:
In this part we will narrow in on the selected cases and carry out an in-depth assessment of the statistical models, algorithms and AI being deployed. This will combine desk research with the findings from the interviews. At this stage, the researcher is expected to file access to information petitions to complement available information with the information that is lacking. When the information is in fact available, the access to information petition should be used to check that the sources are not outdated, as well as any other key piece of information that might help complete the final report.
Table IV - See model table here
Partial deadline: March 16
Deliverables:
The report should not exceed 3 pages per country.
In line with the general objectives of the project (check Handbook introduction), the report should include:
i) General introduction to the region (including general development statistics, access to the internet, open data)[3]. - Maximum 1 page
ii) For each country:
Overview of the country’s position in regional terms (drawing on the general introduction), and overview of the findings.
For each of the five chosen statistical models/Algorithms/AI:
iii) A general conclusion and set of recommendations for the region
iv) Bibliography and Annex should be consolidated for all countries, using appropriate headings.
The bibliography should include links to the relevant public documents and relevant reports.
The annex should include each of the tables (I-IV)
Deadline: March 23
Web Foundation team members and other researchers may ask for you to provide feedback and brief justifications for each of your comments and evidence to support them.
You may be requested to review the documents presented by researchers from other regions, and detect any anomalies that may need further review.
Deadline: March 30
Web Foundation will consolidate the reports, and coordinate for publication with designers, and internal communications team.
[1] For more information please read http://webfoundation.org/docs/2017/07/WF_Algorithms.pdf
[2] For more information see http://webfoundation.org/docs/2017/07/AI_Report_WF.pdf