Predicting Human Deliberative Judgments (2018)
Owain Evans (FHI), Andreas Stuhlmüller, Chris Cundy, Ryan Carey, Zachary Kenton, Thomas McGrath, Andrew Schreiber (Ought)
How to spend a million dollars cleverly and get no conference paper out of it
Moravec’s paradox
“Hard things are easy for computers, easy things are hard”
(Theorem proving, matmul, search) vs (Object rec, walking, few-shot)
Labelling, fast and slow
Recall AlphaGo: double imitation
Desiderata for a deliberative dataset
Two: “Fermi” (weird composite estimates) �“Politifact” (online research for verifying news stories)
Open dataset of “slow judgments”
n=25,000 probability judgments on different timescales.
What are they trying to do?
Enough talk
: user u’s probability judgment given time t
: slow judgment, time category = 2
Collaborative Filtering (KNN and SVD)
Neural Collaborative Filtering: NN maps the latent question and user embeddings to judgments
Hierarchical Bayesian Linear Regression
Limitations
Limitation: can’t distinguish really good labellers
Ultra detailed data?
record the individual steps people take during deliberating
requiring users to make their reasoning explicit
recording their use of web search.
Or the usual (doomed?) fallback of adding structure to the NN