On Bullshit,
and beyond
Giovanni Colavizza
Announcements
Please fill in your group info to prepare the poster session (more info soon): https://docs.google.com/spreadsheets/d/11mFYtoONgLYR4Dky3CERYi4rRBVox-TGbB9MVu-YJ1A/edit?usp=sharing
Tomorrow:
Tue 19th 23:59 CET hard deadline for Project M3
Next (last) week:
“One of the most salient features of our culture is that there is so much bullshit. Everyone knows this. Each of us contributes his share. But we tend to take the situation for granted.”
Frankfurt, On Bullshit, 2005 [1986] https://www.stoa.org.uk/topics/bullshit/pdf/on-bullshit.pdf
Fake news
Fake news
Bias and discrimination
Sensationalization of scientific results
Rise of terminator machines and the like
Everyday’s bullshit
This class
Essentially about self-awareness
Two parts:
Cf. "Calling Bullshit"
http://callingbullshit.org/index.html
What is bullshit, anyway?
What is bullshit, anyway?
Bullshit as a fundamental disregard for truth
(liar != bullshitter)
Frankfurt, On Bullshit, 2005 [1986] https://www.stoa.org.uk/topics/bullshit/pdf/on-bullshit.pdf
What is bullshit, anyway?
Bullshit as a fundamental disregard for truth
(liar != bullshitter)
Bullshit as a mental state.
Meibauer, Aspects of a Theory on Bullshit, 2016
What is bullshit, anyway?
Bullshit also about unverifiable unclarity
Bullshit as an ontological entity.
Cohen, Deeper Into Bullshit, 2002
What is bullshit, anyway?
Postmodernist Generator:
Cargo Cult Science
Feynman, Cargo Cult Science, 1974
Bullshit goes digital
Brandolini’s asymmetry principle
“The amount of effort necessary to refute bullshit is one order of magnitude bigger than to produce it.”
Is bullshit getting worst?
Is bullshit getting worst?
Is bullshit getting worst?
Tactics for fighting bullshit
3 parts
Spotting bullshit
Fox: 70M$ wasted in stamp fraud last year (2016)
Orders of magnitude: Fermi estimation
Fermi questions:
Fraud: 70M$
Prop. of Americans on food stamps: ˜10%
Dollars per american/year: ˜1000
Total program: ˜30'000M
Fraud: ˜0.2%!!
More...
How to spot fake news (FactCheck): http://www.factcheck.org/2016/11/how-to-spot-fake-news/
Tim O’Reilly, How I detect fake news: https://www.oreilly.com/ideas/how-i-detect-fake-news
Statistical pitfalls
Causation and correlation
Prosecutor's fallacy
Prosecutor's fallacy
| Match | No match |
Guilty | 1 | 0 |
Innocent | 8 | 7'999'992 |
P(match given innocent) = 8/8'000'000
P(guilty given match) = 1/9!!
Prosecutor's fallacy
| Reject Ho | Don't reject |
Ho false | True positive | False negative (Type II) |
Ho true | False positive (Type I) | True negative |
P(FP/(TN+FP)) = p-value!
P(TP/(TP+FP)) = depends on the alternative H we want to test (this is the statistical power)
P(H1 | reject Ho): even with very low p-value and high power, the alternative hypothesis (i.e. guilty) could be quite unlikely!
Right censoring
Age of death of musicians:
http://callingbullshit.org/case_studies/case_study_musician_mortality.html
Garbage-in,
garbage-out
Missing 11th of the month:
Simpson's paradox
Biased sampling
(+ cohort effects)
Sensible trends
Sensible trends
Kenneth Rice:
Data visualization done wrong
Wiki: Misleading chart
The y-axis
The only global warming chart you need from now on:
http://www.powerlineblog.com/archives/2015/10/the-only-global-warming-chart-you-need-from-now-on.php
The y-axis
The only global warming chart you need from now on:
http://www.powerlineblog.com/archives/2015/10/the-only-global-warming-chart-you-need-from-now-on.php
Wall Street Journal, April 17th 2011.
Unreadability
Missing proportionality
Wiki: Misleading chart
Improper scaling
Wiki: Misleading chart
Aberrations
Aberrations
Tufte's rules
"It is right to decorate construction, but never to construct decoration"
http://www.sealthreinhold.com/school/tuftes-rules/rule_one.php
Sagan’s toolkit
Sagan, The Fine Art of Baloney Detection,
Sagan’s fallacies
Sagan, The Fine Art of Baloney Detection,
Cognitive biases
In the end, why is it a problem?
The work of a data scientist, in context
Data science in the wild
Data science and ML work
(data crunching, analysis, models, tools, …)
Infrastructure
WORLD
data
WORLD
products, insights, policies,
...
Data science in the wild
Infrastructure
WORLD
data
WORLD
products, insights, policies,
...
Data science in the wild
Data science and ML work
(data crunching, analysis, models, tools, …)
Infrastructure
WORLD
data
WORLD
Products, insights, policies,
...
Let’s take a different perspective
analysis
WORLD
data
WORLD
action
3 stories
Cesare Lombroso’s positive criminology (data)
“We study, for the first time, automated inference on criminality based solely on still face images, which is free of any biases of subjective judgments of human observers.”
Convicted persons’ IDs
Photos crawled from the Web
A smile will save you!
Black Mirror S03E01 “Nosedive”
Hubris is in the air…
“Unlike a human examiner/judge, a computer vision algorithm or classifier has absolutely no subjective baggages [sic], having no emotions, no biases whatsoever due to past experience, race, religion, political doctrine, gender, age, etc., no mental fatigue, no preconditioning of a bad sleep or meal. The automated inference on criminality eliminates the variable of meta-accuracy (the competence of the human judge/examiner) all together.”
Criminal machine learning:
http://callingbullshit.org/case_studies/case_study_criminal_machine_learning.html
Big
picture
Machine Bias: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Discussion: https://news.ycombinator.com/item?id=11753805
Technical analysis: https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm and https://github.com/propublica/compas-analysis/blob/master/Compas%20Analysis.ipynb
Rebuttals: https://www.documentcloud.org/documents/2998391-ProPublica-Commentary-Final-070616.html and
https://www.documentcloud.org/documents/3248777-Lowenkamp-Fedprobation-sept2016-0.html
More: https://www.propublica.org/article/propublica-responds-to-companys-critique-of-machine-bias-story and https://www.propublica.org/article/technical-response-to-northpointe and https://www.propublica.org/article/bias-in-criminal-risk-scores-is-mathematically-inevitable-researchers-say?utm_source=suggestedarticle&utm_medium=referral&utm_campaign=readnext&utm_content=https%3A%2F%2Fwww.propublica.org%2Farticle%2Fbias-in-criminal-risk-scores-is-mathematically-inevitable-researchers-say ….
If interested, check Kate Crawford’s talk at NIPS 2017: https://www.facebook.com/nipsfoundation/videos/1553500344741199/
AI Now Institute: https://ainowinstitute.org/
Big
picture
5 principles for accountable algorithms: https://www.fatml.org/resources/principles-for-accountable-algorithms
Google Flu Trends (analysis)
Detecting influenza epidemics using search engine query data,
doi:10.1038/nature07634
Google Flu Trends (analysis)
The Parable of Google Flu: Traps in Big Data Analysis, http://science.sciencemag.org/content/343/6176/1203
Google Flu Trends (analysis)
The Parable of Google Flu: Traps in Big Data Analysis, http://science.sciencemag.org/content/343/6176/1203
University rankings (acting)
Launched by US News in 1983.
University rankings (acting)
How to they work?
Main transition over the years: from input to outputs
What could possibly go wrong?
Feedback loops
“U.S. News’s first data-driven ranking came out in 1988, and the results seemed sensible. However, as the ranking grew into a national standard, a vicious feedback loop materialized. The trouble was that the rankings were self-reinforcing.”
“When you create a model from proxies, it is far simpler for people to game it. This is because proxies are easier to manipulate than the complicated reality they represent.”
Cathy O’Neil, Weapons of Math Destruction, 2016.
What you don’t measure...
Measurements in the wild
“When a measure becomes a target, it ceases to be a good measure.”
Goodhart’s law
“The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”
Campbell’s law
By the way
CS rankings based on research profiles:
http://csrankings.org/#/index?all
For more:
https://www.theguardian.com/education/2010/sep/21/university-world-rankings
Interconnectedness and cascading effects
“There can be errors in these systems which propagate very quickly. Because of the scale of their action space--they can be hitting a billion or two billion users per day--that means the costs of getting it wrong are very very high.”
Mustafa Suleyman - DeepMind
Historical social networks detour
“Bring the world closer together”
Conclusions
Data science and AI are great! But…
Keep questioning
Be aware
Develop an ethical stance
A final message
“The first principle is that you must not fool yourself, and you’re the easiest person to fool.”
Richard Feynman
Credits
A lot is taken from the University of Washington class “Calling Bullshit” (Carl T. Bergstrom and Javin West): http://callingbullshit.org/index.html