Human Contexts and Ethics in Data 100 Lesson 2
Predictive Analytics, Hiring Decisions,
and Sociotechnical Systems
December 3rd, 2019
Margo Boenig-Liptsin �mboenigliptsin@berkeley.edu
Ari Edmundson
HCE Student Team: Alyssa Sugarman, Mateo Montoya, Ollie Downs, Lauren Hom, Mariel Aquino, Priyans Desai, Eva Newsom, Joanne Ma, Maya Hammond, �Michelle Li, Owen Hart, Alexis Oddi, Pauline Hidalgo
Human Contexts and Ethics in your work as a data scientist
Data science task + Data Science Lifecycle + HCE Tools
+
+
Data Science Lifecycle
Formulate Question or Problem
Acquire and Clean Data
Exploratory Data Analysis
Draw Conclusions from Predictions and Inference
Reports, Decisions,
and Solutions …
Data Science Lifecycle
Formulate Question or Problem
Acquire and Clean Data
Exploratory Data Analysis
Draw Conclusions from Predictions and Inference
Reports, Decisions,
and Solutions … and products
How does your algorithm interact with the wider world?
As a data scientist your “product” is not only advice or predictions, but may include predictive algorithms that do work and make decisions without your direct participation. Predictive tools that aid in making decisions have often messy and unpredictable sociological effects beyond their stated goals.
In other words, your work in the data science lifecycle becomes an element or node in highly complex sociotechnical systems.
We can define these as organizations in which people and technology interact and work together such that human and technical agency is complexly intertwined and distributed.
Example: Predictive models used to make hiring decisions.
Automating Hiring Discrimination
Automating Hiring Discrimination
Amazon began using predictive algorithms to score job candidates on a 1-5 scale to partially automate hiring decisions.
By 2015 they recognized the system’s recommendations were not gender neutral.
Why do you think that happened?
Automating Hiring Discrimination
Algorithmic discrimination not simply an effect of poor technical efforts (i.e. failure to balance bias and variance; underfitting; poor sampling and unrepresentative data sets). Sometimes the most accurate predictions are the problem!
Case Study: “Big Data’s Disparate Impact”
From Solon Barocas and Andrew D. Selbst, “Big Data’s Disparate Impact” 104 Calif. L. Rev. 671 (2016)
“Unthinking reliance on data mining can deny historically disadvantaged and vulnerable groups full participation in society. Worse still, because the resulting discrimination is almost always an unintentional emergent property of the algorithm’s use rather than a conscious choice by its programmers, it can be unusually hard to identify the source of the problem or to explain it to a court.”
Case Study: “Big Data’s Disparate Impact”
Title VII broadly protects protected classes (race, color, religion, sex, national origin) from employment discrimination.
Employers may become liable for discrimination in two different ways:
Disparate Treatment: formal classification, intentional discrimination
Disparate Impact: facially neutral policies that lead to discriminatory outcomes
Case Study: “Big Data’s Disparate Impact”
Example: A company aims to prioritize candidates most likely to continue working for the company for a long period of time (target variable). What happens if the best predictor of this difference in tenure is gender--that men are more likely to keep a job for longer? The model predicts with very little error, but systematically discriminates against women.
What might be the reasons for this result?
Case Study: “Big Data’s Disparate Impact”
There are many possible sources of discrimination when building a model. Barocas and Selbst highlight the following:
Case Study: “Big Data’s Disparate Impact”
Title VII broadly protects protected classes (race, color, religion, sex, national origin) from employment discrimination.
Employers may become liable for discrimination in two different ways:
Disparate Treatment: formal classification, intentional discrimination
Disparate Impact: facially neutral policies that lead to discriminatory outcomes
Case Study: “Big Data’s Disparate Impact”
Models can go wrong… but also can be too right!
When judging candidates for a job based on their potential for success, what counts as success in the workplace? What makes a “good” employee?
Example: Workplace “fit” - selects for candidates similar to previous/current employees. Predicts “success” accurately, but ignores the possibility that restructuring the workplace could change likelihood of different applicants becoming successful - and could affect what counts as success in an organization.
An organization in which people and technology interact and work together such that human and technical agency is complexly intertwined and distributed. Large and highly complex sociotechnical systems distribute risks and responsibilities widely and unevenly, and are difficult to regulate. When they fail it is often difficult or even impossible to identify a single human or mechanical cause.
Examples:
Questions to ask with this tool:
Sociotechnical Systems
1. Question / Problem Formulation
Formulate Question or Problem
Why are you, as a data scientist, a relevant expert on this question? What do you bring to the table? Who else might have the relevant knowledge to help with this problem?
What are the broader contexts and stakes of the task? How does it negotiate existing power structures?
What do your employers believe that data analysis can achieve? What social values do they imagine this technology can support? How have they defined your target variables?
2. Data Acquisition and Cleaning
Acquire and Clean Data
What is the context in which this data was collected?
What is represented in the data? Are individual people represented? How (i.e. with what features)?
What kinds of identities are captured? Who or what is excluded? What else do we need to know?
3. Exploratory Data Analysis and Visualization
Exploratory Data Analysis
What kind of classification system is used in the data set? How does data analysis revise the classification system?
What argument does your visualization make? How else could the data be represented? What different conclusions might be drawn by different visualizations?
4. Predictions and Inference
Draw Conclusions from Predictions and Inference
Reports, Decisions, and Solutions
What story are you telling with the data? Why does it matter? What reservations do you have?
Who is listening? What will they do with your recommendation? What kind of power and agency do they have? What are the consequences of following the recommendation?
Do you have the ability to challenge the framing of the problem you have been given? What kind of control do you exercise over your model once you have completed it? Are you continuously involved in its use?
Data Science Lifecycle...
Data Science Lifecycle Embedded in a Sociotechnical System