1 of 21

Human Contexts and Ethics in Data 100 Lesson 2

Predictive Analytics, Hiring Decisions,

and Sociotechnical Systems

December 3rd, 2019

Margo Boenig-Liptsin �mboenigliptsin@berkeley.edu

Ari Edmundson

HCE Student Team: Alyssa Sugarman, Mateo Montoya, Ollie Downs, Lauren Hom, Mariel Aquino, Priyans Desai, Eva Newsom, Joanne Ma, Maya Hammond, �Michelle Li, Owen Hart, Alexis Oddi, Pauline Hidalgo

2 of 21

Human Contexts and Ethics in your work as a data scientist

Data science task + Data Science Lifecycle + HCE Tools

+

Classification
Identity
Representation
Agency
Expertise
Power
Context
Sociotechnical imaginaries
Sociotechnical Systems

3 of 21

Data Science Lifecycle

Formulate Question or Problem

Acquire and Clean Data

Exploratory Data Analysis

Draw Conclusions from Predictions and Inference

Reports, Decisions,

and Solutions …

4 of 21

Data Science Lifecycle

Formulate Question or Problem

Acquire and Clean Data

Exploratory Data Analysis

Draw Conclusions from Predictions and Inference

Reports, Decisions,

and Solutions … and products

5 of 21

How does your algorithm interact with the wider world?

As a data scientist your “product” is not only advice or predictions, but may include predictive algorithms that do work and make decisions without your direct participation. Predictive tools that aid in making decisions have often messy and unpredictable sociological effects beyond their stated goals.

In other words, your work in the data science lifecycle becomes an element or node in highly complex sociotechnical systems.

We can define these as organizations in which people and technology interact and work together such that human and technical agency is complexly intertwined and distributed.

Example: Predictive models used to make hiring decisions.

6 of 21

Automating Hiring Discrimination

Promise of overcoming unconscious bias, eliminate overt prejudice, being more objective… but also cheaper than HR departments for large companies

(skip most of below)

HireVue - Utah-based firm making AI for companies to use in making hiring decisions

more than 700 customers worldwide including over one-third of the Fortune 100, including Unilever, Hilton, JP Morgan Chase, Delta Air Lines, Vodafone, Carnival Cruise Line, and Goldman Sachs

assess “tens of thousands of data points” from candidate video interviews, including the candidate’s “intonation,” “inflection” and “emotions.” These and other data points are input into “predictive algorithms” that compare candidates with a company’s top performers. Uses “game-based” assessments too.

HireVue states its video-based algorithmic assessments provide “excellent insight into attributes like social intelligence (interpersonal skills), communication skills, personality traits, and overall job aptitude.”

The Electronic Privacy Information Center, a public interest research center based in Washington, D.C., recently asked the Federal Trade Commission to investigate HireVue

EPIC states HireVue’s “business practices produce results that are biased, unprovable and not replicable.

Algorithms like these are proprietary, and thus black boxed so public doesn’t know how they work; candidates don’t know how they’re being scored.

7 of 21

Automating Hiring Discrimination

Amazon began using predictive algorithms to score job candidates on a 1-5 scale to partially automate hiring decisions.

By 2015 they recognized the system’s recommendations were not gender neutral.

Why do you think that happened?

Amazon’s computer models were trained to vet applicants by observing patterns in résumés submitted to the company over a 10-year period. Most came from men, a reflection of male dominance across the tech industry.

In effect, Amazon’s system taught itself that male candidates were preferable. It penalized résumés that included the word “women’s”, as in “women’s chess club captain”. And it downgraded graduates of two all-women’s colleges, according to people familiar with the matter.

Amazon edited the programs to make them neutral to these particular terms. But that was no guarantee that the machines would not devise other ways of sorting candidates that could prove discriminatory, the people said.

They taught each to recognize some 50,000 terms that were found on past candidates’ résumés. The algorithms learned to assign little significance to skills that were common across IT applicants, such as the ability to write various computer codes, the people said.

Instead, the technology favored candidates who described themselves using verbs more commonly found on male engineers’ resumes, such as “executed” and “captured”, one person said.

8 of 21

Automating Hiring Discrimination

Algorithmic discrimination not simply an effect of poor technical efforts (i.e. failure to balance bias and variance; underfitting; poor sampling and unrepresentative data sets). Sometimes the most accurate predictions are the problem!

9 of 21

Case Study: “Big Data’s Disparate Impact”

From Solon Barocas and Andrew D. Selbst, “Big Data’s Disparate Impact” 104 Calif. L. Rev. 671 (2016)

“Unthinking reliance on data mining can deny historically disadvantaged and vulnerable groups full participation in society. Worse still, because the resulting discrimination is almost always an unintentional emergent property of the algorithm’s use rather than a conscious choice by its programmers, it can be unusually hard to identify the source of the problem or to explain it to a court.”

How do practices of data mining utilized in hiring algorithms interact with the law? Specifically, Title VII of the Civil Rights Act (which governs employment discrimination in the US)?
Under current interpretations of Title VII, much discrimination arising from data mining will not generate liability for employers.
Why?

10 of 21

Case Study: “Big Data’s Disparate Impact”

Title VII broadly protects protected classes (race, color, religion, sex, national origin) from employment discrimination.

Employers may become liable for discrimination in two different ways:

Disparate Treatment: formal classification, intentional discrimination

Disparate Impact: facially neutral policies that lead to discriminatory outcomes

11 of 21

Case Study: “Big Data’s Disparate Impact”

Example: A company aims to prioritize candidates most likely to continue working for the company for a long period of time (target variable). What happens if the best predictor of this difference in tenure is gender--that men are more likely to keep a job for longer? The model predicts with very little error, but systematically discriminates against women.

What might be the reasons for this result?

12 of 21

Case Study: “Big Data’s Disparate Impact”

There are many possible sources of discrimination when building a model. Barocas and Selbst highlight the following:

Defining the target variable and class labels
Training Data

Labelling
Data collection

Feature Selection
Proxies
Masking

13 of 21

Case Study: “Big Data’s Disparate Impact”

Title VII broadly protects protected classes (race, color, religion, sex, national origin) from employment discrimination.

Employers may become liable for discrimination in two different ways:

Disparate Treatment: formal classification, intentional discrimination

Disparate Impact: facially neutral policies that lead to discriminatory outcomes

Fixing the problem is complicated further because of the way antidiscrimination law regulates hiring practices…

Latter often harder to prove; generally courts have adopted the former as a test, which is more procedural

in disparate treatment cases - is it the recommendation made by the model that is discriminatory, or the use of the recommendation in a decision? If fully automated, it is it the choice to use a model that discriminates that holds one liable? If not intentionally used to discriminate, not disparate treatment. Difficult to prove intention; and any model that is predictive and discriminatory will be using proxies.

But disparate impact has difficulties too.

Failure of disparate impact - lack of strict business necessity defense makes it difficult to find liability in algorithm so long as the target variable, the sought after trait of a job candidate, can be justified as “job-related” - also test for whether less discriminatory outcomes would be possible. But good models will precisely do this - find factors that accurately predict target variable. So hard to challenge this when using predictive analytics

Proxy discrimination rediscovers preexisiting social inequalities

Trade-off between minimizing discrimination and minimizing error/increasing accuracy - can remove attributes that correlate with proscribed categories (proxies), but then the model becomes less predictive, i.e. less fair.

For data scientist, even in ideal conditions, to make up for prior prejudice lodged in the training data, you would need to have a clear idea of what a good decision would look like absent prejudice. This requires some pretty sophisticated legal and sociological knowledge, and more data than a DS working in these conditions would likely have access to. For example, won’t know if there’s a sampling, selection bias in your data set that misrepresents a protected class without access to more information about that class

Purpose of AD law raised by difficulties. Difficulties raise question of the broader purpose of antidiscrimination law:

Two underlying principles: Anticlassification and Antisubordination

Anticlassification doctrines are largely ineffective - antisubordination would raise constitutional challenges.

“At some point, society will be forced to acknowledge that this is really a discussion about what constitutes a tolerable level of disparate impact in employment.”

14 of 21

Case Study: “Big Data’s Disparate Impact”

Models can go wrong… but also can be too right!

When judging candidates for a job based on their potential for success, what counts as success in the workplace? What makes a “good” employee?

Example: Workplace “fit” - selects for candidates similar to previous/current employees. Predicts “success” accurately, but ignores the possibility that restructuring the workplace could change likelihood of different applicants becoming successful - and could affect what counts as success in an organization.

15 of 21

An organization in which people and technology interact and work together such that human and technical agency is complexly intertwined and distributed. Large and highly complex sociotechnical systems distribute risks and responsibilities widely and unevenly, and are difficult to regulate. When they fail it is often difficult or even impossible to identify a single human or mechanical cause.

Examples:

Self-driving cars; nuclear power plants; airplanes; streetlights
Bureaucracies
Automated decision-making systems (e.g. organizations using hiring algorithms)

Questions to ask with this tool:

How do humans interact with a particular technology?
How is risk and responsibility distributed in a sociotechnical system? Whose agency is affected?
How does a sociotechnical system come about and change over time? Through which pressures and mechanisms?

Sociotechnical Systems

16 of 21

1. Question / Problem Formulation

What do we want to know?
What problems are we trying to solve?
What are the hypotheses we want to test?
What are our metrics for success? How is success defined?

Formulate Question or Problem

Why are you, as a data scientist, a relevant expert on this question? What do you bring to the table? Who else might have the relevant knowledge to help with this problem?

What are the broader contexts and stakes of the task? How does it negotiate existing power structures?

What do your employers believe that data analysis can achieve? What social values do they imagine this technology can support? How have they defined your target variables?

17 of 21

2. Data Acquisition and Cleaning

What data do we have and what data do we need?
How will we collect more data?
How do we organize the data for analysis?

Acquire and Clean Data

What is the context in which this data was collected?

What is represented in the data? Are individual people represented? How (i.e. with what features)?

What kinds of identities are captured? Who or what is excluded? What else do we need to know?

18 of 21

3. Exploratory Data Analysis and Visualization

Do we already have relevant data?
What are the biases, anomalies, or other issues with our data?
How do we transform the data to enable effective analysis?

Exploratory Data Analysis

What kind of classification system is used in the data set? How does data analysis revise the classification system?

What argument does your visualization make? How else could the data be represented? What different conclusions might be drawn by different visualizations?

What kind of analysis are you going to do on the President's Tweets to come up with an answer to the problem?

Scenario development:

You've decided that the analysis that you're going to use in the process of coming up with an answer to the problem is to define labels of acceptable ("healthy") and non-acceptable speech ("toxic") on the influencers' dataset to train a machine learning algorithm that can then work to parse the universe of Tweets and show you the state of Twitter's "health" overall.

We can take the proportion of tweets which are classified as “toxic” as a metric of Twitter’s “health.” We can also take the average sentiment of all tweets we collected in order to gather the mean ‘sentiment’ of Tweets on Twitter.

(like you did in the spam classification project).

Classification:

creating labels "healthy" and "toxic" is a classification task -- sarcasm is difficult to classify

Using VADER, which is itself already classifying words (VADER "crowdsourced the opinion" using Mechanical Turk -- as representative of the population)
VADER explanation on Medium: https://medium.com/analytics-vidhya/simplifying-social-media-sentiment-analysis-using-vader-in-python-f9e6ec6fc52f

How do your analytical terms map on to social reality? What are the consequences of labeling speech as "healthy" or "toxic"?

---- general HCE questions for #3 ---

How are the patterns we see in the data related to social contexts?

How does your analysis represent a structural/systemic distribution of resources (power, agency, money, knowledge, access, etc.)?

Feature Engineering: How are the patterns affected by your choice of features? What kinds of identities are represented? Which are privileged or disadvantaged by this choice? Which social contexts have you decided are relevant?
Visualization: What argument does your visualization make? How else could it be represented? What different conclusions might be drawn by different visualizations?
Model fit: What are the consequences of over-fitting a model, e.g. technically a better fit model but has consequences for the kind of predictions that it makes?

19 of 21

4. Predictions and Inference

Does it answer our questions or accurately solve the problem?
How robust are our conclusions and can we trust the predictions?

Draw Conclusions from Predictions and Inference

Reports, Decisions, and Solutions

What story are you telling with the data? Why does it matter? What reservations do you have?

Who is listening? What will they do with your recommendation? What kind of power and agency do they have? What are the consequences of following the recommendation?

Do you have the ability to challenge the framing of the problem you have been given? What kind of control do you exercise over your model once you have completed it? Are you continuously involved in its use?

20 of 21

Data Science Lifecycle...

21 of 21

Data Science Lifecycle Embedded in a Sociotechnical System