Data
Fall 2024
Outline
Outline
Big Data
Boyd & Crawford (2012) – in their influential paper about big data – examine the idea of mass-scale data collection. They define “big data” along three dimensions:
4
What was the world like before Big Data?
In the pre-Internet era, databases were...
5
...but of course today...
6
How have these new developments impacted us?
7
Atlas of AI, Chapter 3. Data
“It’s that the NIST databases foreshadow the emergence of a logic that has now thoroughly pervaded the tech sector: the unswerving belief that everything is data and is there for the taking. It doesn’t matter where a photograph was taken or whether it reflects a moment of vulnerability or pain or if it represents a form of shaming the subject. It has become so normalized across the industry to take and use whatever is available that few stop to question the underlying politics.…It is all treated as data to be run through functions, material to be ingested to improve technical performance. This is a core premise in the ideology of data extraction.”
8
Outline
Machine Learning: “Traditional Approach”
Source: Mathworks
Machine Learning: Traditional Approach
Known, domain-specific math �functions used
Source: Mathworks
Deep Learning (type of ML): Newer Approach
Source: Mathworks
Deep Learning: Newer Approach
Source: Mathworks
Outline
Big Data: Big Ideas and Societal Implications
15
1. Big data changes the definition of knowledge
16
Example 2. WordNet, ImageNet, and mTurk
Ghost Work / Gig Work: “Services delivered by companies like Amazon, Google, Microsoft, and Uber rely on a vast, invisible human labor force – who usually earn less than legal minimums for traditional work, have no health benefits, and can be fired at any time for any reason, or none.”
2. Claims to Objectivity & Accuracy are Misleading
18
2. Claims to Objectivity & Accuracy are Misleading
19
Example. Gender Shades: Joy Buolamwini
Systematically compared three facial recognitions systems (Microsoft, Face++, and IBM) across male and female faces with different skin shades.
3. Bigger data are not always better data
Importance of systematicity over volume:
21
3. Bigger data are not always better data
22
4. Just because it’s accessible, doesn’t make it ethical
23
5. Limited access creates new digital divides
24
Outline
We’re going to start watching�The Great Hack
26