ABCDEFGHIJKLMNOPQRSTUVWXYZAAABAC
1
CategoryInstanceWhat Does It Measure?Use CaseDid/Does It Work?What's Used To Measure If It Worked?Cost To DevelopMarginal costAccuracyHuman Judgement In EvaluationPredictive vs EvaluativeCommentsSources
2
IQ TestsIntelligenceSchool placement, work placement, $20
predict 0.5 of variation in school grades
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4557354/#:~:text=Predictive%20Validity%20of%20IQ,those%20criteria%20have%20been%20reported.&text=It%20is%20widely%20accepted%20that,0.5%20(Mackintosh%2C%202011), https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5346574/
3
Wechsler Adult Intelligence Scale (WAIS) 0Predictive
4
Wechsler Intelligence Scale for Children (WISC)0Predictive
5
Stanford-Binet Intelligence Scales0Predictive
6
Woodcock-Johnson Tests of Cognitive Abilities
0Predictive
7
Kaufman Assessment Battery for Children0Predictive
8
Cognitive Assessment System0Predictive
9
Differential Ability Scales0Predictive
10
Raven's Progressive Matrices0Predictive
11
Cattell Culture Fair III0Predictive
12
Reynolds Intellectual Assessment Scales0Predictive
13
Thurstone's Primary Mental Abilities[67][68]0Predictive
14
Kaufman Brief Intelligence Test[69]0Predictive
15
Multidimensional Aptitude Battery II0Predictive
16
Das–Naglieri cognitive assessment system0Predictive
17
Naglieri Nonverbal Ability Test0Predictive
18
Animal Behavior
19
Pavlov's Saliva MeasurementsHow much did a dog salivate?Classical Conditioning Research
I hope so, a lot of science is based on it
0Evaluative
https://sites.psu.edu/dps16/2016/02/18/pavlovs-dogs/ , https://www.youtube.com/watch?v=NzBDScsHL44
20
InebriometerHow drunks is your fruit fly?
Enable evaluation of intoxication, mostly for examining genes' effects on alcohol processing
0Evaluative
21
Market Value"How much can I sell X for?"Insurance, sales, taxes
22
Art AppraisalHighDepends on use
23
Subject Matter Tests
24
AP ExamsEvaluative
25
SAT IIs$22Evaluative
26
NYS Regent Exams
27
Academic Admissions Exams
28
29
SATs
30
ACTs
31
Job Performance Evaluations- Teachers
"How useful is this teacher (possibly relative to other teachers)?"
Hiring, firing, incentive-based payBoth
32
Specific types of evaluation would go here
33
Job Performance- Software Engineer
34
35
Job or Task Performance- Generic/Other
36
Stack RankingHow is a worker performing, relative to their peers.Firing, promotions, raises
Depends on situation- it can work initially in a bloated company, but degrades as the deadweight is removed
Corporate profits, morale
Lowered morale, lower employee cooperation, best employees leave
Self-validatingVery highEvaluative
https://www.perdoo.com/resources/stack-ranking/#:~:text=Stack%20ranking%20is%20a%20practice,General%20Electric%20in%20the%201980s.
37
Peer Review on a Paper
Is this paper empirically correct? Interpreted correctly? Impactful?
Journals deciding whether to publish, decline, or as for rewrites of a submission
Depends on the fieldPaper retractions$1,000Self-validatingVery highEvaluative
38
Code Review
What changes does this code need before being checked in?
Coding projects with more than one contributor
Performance, future defects, necessity of refactoring
$500HighEvaluative
https://en.wikipedia.org/wiki/Code_review
39
Animal Evaluation
40
Westminster Dog ShowConfirmation to breed standards set by WestminsterGive dog lovers a hobby
Dogs conform to breed standards more but are becoming less healthy
Longevity, health, and behavior of dogs under their jurisdiction
41
Thoroughbred RacingWhich horse can run the fastest?LeisureSelf-justifying0Both
42
Athletics
43
Rhythmic GymnasticsLeisureSelf-validating
44
100 meter dashWhich human can run the fastest over 100 meteresLeisure
45
MoneyballWhich athletes bring in the most wins per dollar?Financial gainTemporarilyGame wins$200,000/year$0MediumPredictive
Cost to Develop is a WAG based on salary of practitioner
https://grantland.com/features/the-economics-moneyball/
46
Rock Climbing Difficulty GradesHow difficult is this climb?
Allow climbers to choose the right difficulty level for themselves
People commonly call them unreliable, but in practice the ratings are remarably stable
HighEvaluative
https://forum.effectivealtruism.org/posts/oTN5t79mXRpafHDsL/prize-interesting-examples-of-evaluations?commentId=Tptkju6rKq7kbBNmm
47
Food
48
USDA Egg GradingHow good is this egg?Decide what is fit for human consumptionSelf-validating$0.01Self-validatingHighEvaluative
Marginal cost is a guess based on inspector salaries
https://www.ams.usda.gov/grades-standards/egg/grade-shields , https://www.ams.usda.gov/services/grading/fees#egg
49
Medical
50
MamogramDensity and regularity of density of breast tissueEarly diagnosis of breast cancer
Yes, although with many false negatives
False negatives. We care about false positives, but htose are significantly harder to measure
$100MediumPredictive
https://en.wikipedia.org/wiki/Mammography , https://cityhospital.co/cost-of-a-mammogram/#:~:text=How%20Much%20Is%20a%20Mammogram,depends%20on%20where%20it's%20done.
51
ApgarOverall health of a newborn baby
Should resuscitation be continued? Are interventions necesary
Yes, as part of a larger set of cultural changes
Decrease in child mortality/stillbirths (babies would previously be recorded as stillborn when they were alive and revivable)
$1,000,000$1
Unknown, no control
MediumPredictive
Cost to develop is a WAG for developer's career earnings, since the test was basically operationalizing her intuitions
https://healthmatters.nyp.org/apgar-score/ , https://en.wikipedia.org/wiki/Apgar_score
52
Checklist Manifesto
53
54
Physics
55
Digital ThermometerTemperatureManyyesOther thermometers$0
56
Ruler
57
Scale
58
Air Quality Meter0
59
Professional Admittance Exams
60
Chinese Imperial Exam (Sui Dynasty, 581-618)Fitness for wor as a Chinese bureaucratMerit based hiring by ancient Chinese bureaucracy
No(t in this time period). In practice appointments were still made by recommendation
YesPredictive
https://en.wikipedia.org/wiki/Imperial_examination#Sui_dynasty_(581%E2%80%93618)
61
Chinese Imperial Exam (Qing dynasty, 1636–1912)
Fitness for wor as a Chinese bureaucratMerit based hiring by ancient Chinese bureaucracyPredictive
https://en.wikipedia.org/wiki/Imperial_examination#Qing_dynasty_(1636%E2%80%931912)
62
Armed Forces Qualification Test 1965Usefulness as a solider to the US military
Determine admission and placement to/in the US military
Yes. Troops admitted as part of a standards-lowering initiative in 1966 performed noticably worse, suggesting the overall evaluation was predictive. They had 3x the casualty rate (overall, not controlling for role; they were 2x as likely to see combat), were reassigned 11x more often, 7-9x more likely to need remedial training
Death rate, reassignment rate
$20NoPredictive
Test is free to takers; I guessed at marginal cost from the SATs
https://en.wikipedia.org/wiki/Project_100,000 , https://bigthink.com/politics-current-affairs/story-behind-mcnamaras-morons?rebelltitem=1#rebelltitem1 , https://medium.com/@LivingHistory/project-100-000-the-mentally-disabled-men-who-fought-in-vietnam-1cbe145cc126
63
California General Electrician Certification ExamKnowledge of electrical repairAdmission to become a certified electrician in CA$100No?Evaluative
https://www.dir.ca.gov/dlse/ecu/electricaltrade.html
64
Institutions
65
Netflix Chaos Monkey
How robust is Netflix's system against random server failures
Find bugs in Netflix's system while they are easy and cheap to fix
Yes, according to Netflix (actual numbers not released)
Customer downtime$1m/year$10,000NANoEvaluative
https://www.gremlin.com/chaos-monkey/
66
Dodd-Frank Act Stress Testing
How robus is a financial institution against various potential problems
Discovering failure points ahead of time so they can be remedied before a financial crisis
No, according to Tim Harford
Actual robustness to financial crises (which haven't repeated since 2008)
$10b/year
Low (according to Tim Harford)
YesPredictive
Marginal cost is a WAG based on https://www.americanbanker.com/slideshow/the-real-world-impact-of-dodd-frank-stress-tests-and-other-regs
https://www.econtalk.org/tim-harford-on-the-virtues-of-disorder-and-messy/ , https://en.wikipedia.org/wiki/Stress_test_(financial)
67
68
69
Internet
70
PageRank
71
HN Rankings
72
Interprsonal
73
Dueling
74
75
Other
76
Polygraph
77
78
79
80
Justice
81
20th Century Trial
82
Trial by Combat
83
Trial by Ordeal
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100