1 of 21

Human Contexts and Ethics in Data 100 Lesson 2

Predictive Analytics, Hiring Decisions,

and Sociotechnical Systems

December 3rd, 2019

Margo Boenig-Liptsin �mboenigliptsin@berkeley.edu

Ari Edmundson

aedmundson@berkeley.edu

HCE Student Team: Alyssa Sugarman, Mateo Montoya, Ollie Downs, Lauren Hom, Mariel Aquino, Priyans Desai, Eva Newsom, Joanne Ma, Maya Hammond, �Michelle Li, Owen Hart, Alexis Oddi, Pauline Hidalgo

2 of 21

Human Contexts and Ethics in your work as a data scientist

Data science task + Data Science Lifecycle + HCE Tools

+

+

  • Classification
  • Identity
  • Representation
  • Agency
  • Expertise
  • Power
  • Context
  • Sociotechnical imaginaries
  • Sociotechnical Systems

3 of 21

Data Science Lifecycle

Formulate Question or Problem

Acquire and Clean Data

Exploratory Data Analysis

Draw Conclusions from Predictions and Inference

Reports, Decisions,

and Solutions

4 of 21

Data Science Lifecycle

Formulate Question or Problem

Acquire and Clean Data

Exploratory Data Analysis

Draw Conclusions from Predictions and Inference

Reports, Decisions,

and Solutions … and products

5 of 21

How does your algorithm interact with the wider world?

As a data scientist your “product” is not only advice or predictions, but may include predictive algorithms that do work and make decisions without your direct participation. Predictive tools that aid in making decisions have often messy and unpredictable sociological effects beyond their stated goals.

In other words, your work in the data science lifecycle becomes an element or node in highly complex sociotechnical systems.

We can define these as organizations in which people and technology interact and work together such that human and technical agency is complexly intertwined and distributed.

Example: Predictive models used to make hiring decisions.

6 of 21

Automating Hiring Discrimination

7 of 21

Automating Hiring Discrimination

Amazon began using predictive algorithms to score job candidates on a 1-5 scale to partially automate hiring decisions.

By 2015 they recognized the system’s recommendations were not gender neutral.

Why do you think that happened?

8 of 21

Automating Hiring Discrimination

Algorithmic discrimination not simply an effect of poor technical efforts (i.e. failure to balance bias and variance; underfitting; poor sampling and unrepresentative data sets). Sometimes the most accurate predictions are the problem!

9 of 21

Case Study: “Big Data’s Disparate Impact”

From Solon Barocas and Andrew D. Selbst, “Big Data’s Disparate Impact” 104 Calif. L. Rev. 671 (2016)

Unthinking reliance on data mining can deny historically disadvantaged and vulnerable groups full participation in society. Worse still, because the resulting discrimination is almost always an unintentional emergent property of the algorithm’s use rather than a conscious choice by its programmers, it can be unusually hard to identify the source of the problem or to explain it to a court.

  • How do practices of data mining utilized in hiring algorithms interact with the law? Specifically, Title VII of the Civil Rights Act (which governs employment discrimination in the US)?
  • Under current interpretations of Title VII, much discrimination arising from data mining will not generate liability for employers.
  • Why?

10 of 21

Case Study: “Big Data’s Disparate Impact”

Title VII broadly protects protected classes (race, color, religion, sex, national origin) from employment discrimination.

Employers may become liable for discrimination in two different ways:

Disparate Treatment: formal classification, intentional discrimination

Disparate Impact: facially neutral policies that lead to discriminatory outcomes

11 of 21

Case Study: “Big Data’s Disparate Impact”

Example: A company aims to prioritize candidates most likely to continue working for the company for a long period of time (target variable). What happens if the best predictor of this difference in tenure is gender--that men are more likely to keep a job for longer? The model predicts with very little error, but systematically discriminates against women.

What might be the reasons for this result?

12 of 21

Case Study: “Big Data’s Disparate Impact”

There are many possible sources of discrimination when building a model. Barocas and Selbst highlight the following:

  1. Defining the target variable and class labels
  2. Training Data
    1. Labelling
    2. Data collection
  3. Feature Selection
  4. Proxies
  5. Masking

13 of 21

Case Study: “Big Data’s Disparate Impact”

Title VII broadly protects protected classes (race, color, religion, sex, national origin) from employment discrimination.

Employers may become liable for discrimination in two different ways:

Disparate Treatment: formal classification, intentional discrimination

Disparate Impact: facially neutral policies that lead to discriminatory outcomes

14 of 21

Case Study: “Big Data’s Disparate Impact”

Models can go wrong… but also can be too right!

When judging candidates for a job based on their potential for success, what counts as success in the workplace? What makes a “good” employee?

Example: Workplace “fit” - selects for candidates similar to previous/current employees. Predicts “success” accurately, but ignores the possibility that restructuring the workplace could change likelihood of different applicants becoming successful - and could affect what counts as success in an organization.

15 of 21

An organization in which people and technology interact and work together such that human and technical agency is complexly intertwined and distributed. Large and highly complex sociotechnical systems distribute risks and responsibilities widely and unevenly, and are difficult to regulate. When they fail it is often difficult or even impossible to identify a single human or mechanical cause.

Examples:

  • Self-driving cars; nuclear power plants; airplanes; streetlights
  • Bureaucracies
  • Automated decision-making systems (e.g. organizations using hiring algorithms)

Questions to ask with this tool:

  • How do humans interact with a particular technology?
  • How is risk and responsibility distributed in a sociotechnical system? Whose agency is affected?
  • How does a sociotechnical system come about and change over time? Through which pressures and mechanisms?

Sociotechnical Systems

16 of 21

1. Question / Problem Formulation

  • What do we want to know?
  • What problems are we trying to solve?
  • What are the hypotheses we want to test?
  • What are our metrics for success? How is success defined?

Formulate Question or Problem

Why are you, as a data scientist, a relevant expert on this question? What do you bring to the table? Who else might have the relevant knowledge to help with this problem?

What are the broader contexts and stakes of the task? How does it negotiate existing power structures?

What do your employers believe that data analysis can achieve? What social values do they imagine this technology can support? How have they defined your target variables?

17 of 21

2. Data Acquisition and Cleaning

  • What data do we have and what data do we need?
  • How will we collect more data?
  • How do we organize the data for analysis?

Acquire and Clean Data

What is the context in which this data was collected?

What is represented in the data? Are individual people represented? How (i.e. with what features)?

What kinds of identities are captured? Who or what is excluded? What else do we need to know?

18 of 21

3. Exploratory Data Analysis and Visualization

  • Do we already have relevant data?
  • What are the biases, anomalies, or other issues with our data?
  • How do we transform the data to enable effective analysis?

Exploratory Data Analysis

What kind of classification system is used in the data set? How does data analysis revise the classification system?

What argument does your visualization make? How else could the data be represented? What different conclusions might be drawn by different visualizations?

19 of 21

4. Predictions and Inference

  • Does it answer our questions or accurately solve the problem?
  • How robust are our conclusions and can we trust the predictions?

Draw Conclusions from Predictions and Inference

Reports, Decisions, and Solutions

What story are you telling with the data? Why does it matter? What reservations do you have?

Who is listening? What will they do with your recommendation? What kind of power and agency do they have? What are the consequences of following the recommendation?

Do you have the ability to challenge the framing of the problem you have been given? What kind of control do you exercise over your model once you have completed it? Are you continuously involved in its use?

20 of 21

Data Science Lifecycle...

21 of 21

Data Science Lifecycle Embedded in a Sociotechnical System