1 of 75

Presentation for EA Anywhere, 5 Nov. 2023

Youtube video: The Unjournal: Bridging the gap between EA and academia

David Reinstein - Founder and Co-Director

Unjournal.org: Our full explanation and progress

These slides are linked at bit.ly/ujpresents; see speakers’ notes for more details

Also see

Intro - The Unjournal Presentation for March 25 Event

2 of 75

What is The Unjournal?

The Unjournal is not a journal.

We coordinate and fund the public evaluation of research projects in any format.

We’re building an open, sustainable system

for evaluation, feedback, ratings, and assessment.

Our initial focus is quantitative work that informs global priorities,

especially in economics, policy, and other social sciences.

Links: Unjournal.org, An Introduction to The Unjournal

Output: unjournal.pubpub.org

3 of 75

“Academic peer review” (background)

In Economics:

  • ~A ‘working paper’ is publicly released

  • The ‘publication’ (review) process takes 6 months to 10 years

  • At the end, we only know “which journal it ‘landed’ in”

4 of 75

Main ingredients

Research Submission/Identification and Selection

Paid Evaluators (AKA 'reviewers')

Eliciting Quantifiable and Comparable Metrics

Public Evaluation

Linking, Not Publishing

Financial Prizes

Transparency

5 of 75

Our Theory of Change

6 of 75

To believe The Unjournal has value one must believe...

  1. Research matters
    • Rigorous prioritization research can positively influence funding, decision-making, and/or policy.

  • Rigor and expertise add value.

  • (Peer) review & evaluation can add value to research (and research-use)

  • The status quo peer review system is suboptimal
    • Academic publishing has substantial room for improvement and/or
    • Global-priorities-relevant research world would benefit from more scrutiny.

  • The UJ’s approach can succeed

7 of 75

EA Research

  • High focus on impact
  • High flexibility/agency/agility
  • High coordination on innovations
  • Limited subject matter expertise
  • Limited external credibility
  • Limited formal feedback, evaluation, or quality-control processes

Academic Research

  • High general resources
  • High subject matter expertise
  • High prestige and credibility
  • Limited incentive/funding for impact
  • Limited flexibility/agency
  • Inefficient, rent-seeking publishing ecosystem
  • Limited ability to coordinate innovation

Commission Direct Public Evaluation

The Unjournal Collaboration

  • Focus on impact
  • Flexibility, agency, agility, innovation
  • Funding to change incentives
  • Connection to resources
  • Subject matter expertise
  • Prestige and credibility

8 of 75

Our Approach: leveraging problem synergy

  • EA needs research, but EA research has problems.
    1. Esp.: rigorously answering the question “what is most impactful?”.
    2. EA research can struggle with rigor and external credibility.

  • Academia could be highly valuable to EA.
    • It conducts “impactful-adjacent” research.
    • It has immense resources (~2.5% of US GDP), expertise, and cachet.
  • “Peer review” is broken in several ways.

Solution: Direct public evaluation of (global-priorities-relevant) research

  • Addresses many limitations of the outdated “journal system”
  • Leverage this to shift attention and resources towards impact

→ The Unjournal is funding, organizing, and scaling this up. See our full ToC here.

9 of 75

10 of 75

Impactful Research and Evaluation:

What is it and why does it matter?

11 of 75

How do we ‘have a positive impact’?

  1. Define a moral framework. What matters?
    • (E.g.) human and animal well-being over the long run

  • Work towards success according to that framework.
    • How do we make progress on the things that matter?
    • Which activities are most likely to make things better?

12 of 75

How can research support impact?

Research needs to produce true, useful information which enables better decision-making, driving choices and behavior that lead to better outcomes.

This can occur through:

  • Influencing resource allocation, i.e., via funding decisions
  • Affecting the nature of policies or other interventions
  • Providing models and logical arguments to improve decision-makers’ thinking

→ We focus on global-priorities-relevant research.

13 of 75

Brief… some of Unjournal’s paths to impact

==

14 of 75

Prioritizing research: existing frameworks

EA “Cause Areas”, INT framework. Ex.: 80k problem profiles, OP focus areas.

But cause prioritization ≠ research prioritization!

  • Crux/pivotal issues: uncertainty and importance
  • Value of Information: how much would a decisionmaker pay for the information gain (generated by the research) prior to making a decision?
  • Indirect evidence: researcher credibility and track-record

15 of 75

CF: GPI framework

  1. Foundations

Moral theory, decision theory, epistemology

  • “Theory”-building to support intervention design

E.g., Behavioral research on altruism,

game theory & peace building

  • Empirical measurement of interventions & outcomes

E.g., development economics, monitoring & evaluation, cost-benefit and predictive modeling

Increasingly relevant to UJ focus

16 of 75

What global-priorities-relevant research does Unjournal (currently) cover?

  1. Fields: “Human behavior and its consequences”
  2. Economics and quantitative social science (psychology, political science, etc.)
  3. Business/Policy, forecasting, cost/benefit, etc.

Not: Philosophy, computer science (AI interpretability), animal behavior, pure math

  • Approaches
  • Empirical measurement (evidence)
  • ~Theory/modeling, methodology with direct applications to policy/prioritization

Not: Pure theory, research inputs, shallow reviews, informal discussion

  • Sources and formats
  • Academic-aimed work
    • Working papers/preprints; ideal: notebooks, dynamic formats etc.
    • Journal-published ‘under-evaluated’ work
  • Our ‘rigorous policy/non-academic papers stream’ (10-20%)

4. Causes/outcomes including… →

17 of 75

Our “Field Specialist groups” �Updated 3-Nov-23, see “our team” for more

GH&D

Dev Econ

Economics/welfare

Psychology and attitudes

Innovation, meta-science

Catastrophic risks & AI gov.

Environmental Economics

Building:

Animal Welfare,

Social impact of tech., Macro/growth/finance,

LT trends and demographics

18 of 75

Why should research be evaluated?

19 of 75

The value of journals

Journals are not really publishers.

Journals are evaluators.

Journals offer quality control, credibility, prestige.

We have arXiv (and RePEc etc.) for that.

20 of 75

The value of evaluation

Rigor & Quality Control. Researchers should receive feedback and be held accountable to high standards of logic and evidence.

21 of 75

The value of evaluation

Credibility, domain, usefulness. Research-users want to know how much to trust research, update their beliefs, & adjust their decisions in different contexts.

(Without validating it all themselves)

22 of 75

The value of evaluation

Prioritizing research. We need a way to choose which researchers and organizations should receive more funding.

23 of 75

The state of the art in EA research evaluation

  • Ad-hoc internal discussion.

  • Ad-hoc external feedback.
    • EA Forum posts (academics rarely engage, technical posts get fewer comments)
    • Blogs
    • Widely shared Google docs

  • Some private red-teaming takes place

  • Occasional formal academic publications*

24 of 75

Weak underbelly of “EA/GPI/adjacent research”?

HLI on Deworming: Discrepancy between GiveWell’s data and model

25 of 75

26 of 75

Bates:

… A cost of $86m to mitigate approximately 40% of the impact of a full-scale nuclear war between the US and a peer country seems prima facie absurd, and the level of exploration of such an important parameter is simply not in line with best practice in a cost-effectiveness analysis (especially since this is the parameter on which we might expect the authors to be least expert). … these issues could potentially reverse the authors’ conclusions, and should have been substantially defended in the text.

Authors:

We agree that this estimate from the published work is likely low and have since updated our view on cost upwards. The nuclear war probability utilized does not include other sources of nuclear risk such as accidental detonation of nuclear weapons leading to escalation, intentional attack, or dyads involving China.

27 of 75

Other ‘wins, rethinks, and potential’

  • CBT therapy: little long-term effect? See UJ evaluations of “The Comparative Impact of Cash Transfers and a Psychotherapy Program…”

  • Animal welfare: attitudes, interventions, markets – almost no rigorous work in economics+

  • Major grant to water quality largely based on un-reviewed paper?

28 of 75

Apr. 2022 Evidence Action’s Dispensers for Safe Water program “… a remarkable new investment of up to $64.7 million.” “recommended by GiveWell… and funded by Open Philanthropy”

“Underpinned by rigorous research by Nobel Laureate Michael Kremer and colleagues…”

a recent meta-analysis by Michael Kremer … shows that water treatment reduces the odds of mortality of children under five, from all causes, by around 25%.

Release cites:

  1. “Social Engineering: Evidence from a Suite of Take-up Experiments in Kenya”; 2011 working paper without evident peer review

  • “Water Treatment and Child Mortality: A Meta-analysis and Cost-effectiveness Analysis”; WP updated 2023, no evident peer review

29 of 75

Why not just use academic publishing?

30 of 75

31 of 75

Problems with academic publishing:

  1. It rewards ‘flexing’ and research ties, not realism and impact

  • Rents & barriers to research access.

  • Static, limited formats: “the PDF prison”.

  • Inefficient, convoluted systems diverts researcher effort.

  • Private evaluation keeps users in the dark, slows down feedback loops.

32 of 75

Why are we still doing this?

33 of 75

How do we solve these problems?

34 of 75

Towards a New Equilibrium: The Unjournal

Commission evaluations which are:

  • Modular
  • Public
  • Paid
  • Credible
  • Quantifiable
  • Impact-Oriented

35 of 75

A New Equilibrium: The Unjournal

1. Rents & barriers to research access.

2. Static, limited formats: the PDF prison.

3. Gaming the system: wasted research & review effort.

4. Encourages academic flexing.

36 of 75

A New Equilibrium: The Unjournal

1. Rents & barriers to research access. Completely free to access.

2. Static, limited formats: the PDF prison.

3. Gaming the system: wasted research & review effort.

4. Encourages academic flexing

37 of 75

A New Equilibrium: The Unjournal

1. Rents & barriers to research access. Completely free to access.

2. Static, limited formats: the PDF prison. Open to any format.

3. Gaming the system: wasted research & review effort.

4. Encourages academic flexing.

38 of 75

A New Equilibrium: The Unjournal

1. Rents & barriers to research access. Completely free to access.

2. Static, limited formats: the PDF prison. Open to any format.

3. Gaming the system: wasted research & review effort. Paid, quantified, public evaluations.

4. Encourages academic flexing.

39 of 75

A New Equilibrium: The Unjournal

1. Rents & barriers to research access. Completely free to access.

2. Static, limited formats: the PDF prison. Open to any format.

3. Gaming the system: wasted research & review effort. Paid, quantified, public evaluations.

4. Encourages academic flexing. Directly rated on credibility, relevance & impact.

40 of 75

Progress, Challenges, and Roadmap

41 of 75

The Unjournal’s paths to impact

See here for an explanation in context

42 of 75

The academic collective action problem

Overcoming academic inertia is hard. Image source.

43 of 75

Overcoming the academic collective action problem

Our advantages:

  • Ability to take risks
  • External funding (and incentives)
  • Bridging steps for first movers
  • Rewards for early adopters

Make ourselves impossible to ignore.

UJ evaluations should be public before traditional journals do their reviews.

44 of 75

Our progress: evaluation

(Pilot) output hosted at unjournal.pubpub.org:

  • 10 research papers evaluated
  • 21 evaluations
  • 5 author responses
  • “Prize winners” chosen

45 of 75

46 of 75

Systems for prioritization, evaluation, aggregation…

47 of 75

Our workflow …

  1. Submission or selection

  • Prioritization and ‘what to evaluate’

  • Engage authors

  • Assign ‘evaluation manager’

48 of 75

… workflow

5. Source evaluators with relevant complementary expertise

6. Evaluation & rating process

7. Author response

8. Eval. Manager summary+

9. “Publish” package, link to bibliometrics+

49 of 75

We built a team

See “Our Team”

  • Management committee = 8
    • ~mainly mid-career econ. & psych. academics with EA and open science interests)

  • Advisory board = 14
    • Range of disciplines, career paths (policy, academia, EA, consulting), experience

  • Field Specialists = 29 (including many of the above)
    • Seeking to fill some further gaps here (AI safety/GCR connections, Animal Welfare, Macro/Finance…)

  • Contracted staff = 9
    • ops, comms, tech support, research support)

50 of 75

Building, improving, and grounding/benchmarking

  1. Our ‘evaluation management platform’ (PubPub)
  2. Our dissemination/communication platform (PubPub)
  3. Evaluation interface and guidelines

  • Criteria and systems for monitoring and prioritizing research
  • Criteria for evaluations (consulting research-users!))

  • Partnerships and connection tools (replication, prediction markets, etc.)

51 of 75

Our roadmap ahead

  1. Raise awareness

  • Establish credibility

  • Scale and broaden scope
  • Building tools and systems to help us grow without compromising quality

52 of 75

Challenges and pivotal choices, heavies

Not there yet: Get some ‘crowned heads of academia’ on board. Commitments from open science/open access orgs. Get submissions of “dynamic docs” (code/data).

Bigger fish: Getting UJ evaluations to ‘count’ in academia and research orgs.

Unanticipated challenge: “Hidden most recent version of paper”/author engagement

Big questions: Evaluation criteria and aggregation. Useful outputs for research-users. Quantification/quantified uncertainty. ‘Performance’ incentives for evaluators; evaluation manager oversight.

53 of 75

How you can get involved

  • Help us solve the collective action problem by engaging with our work
    • Submit your research for evaluation (here)
    • Read, use and cite our evaluations (unjournal.pubpub.org)
    • Spread the word: at your university or org
      • and on socials: Twitter: #unjournal.org/@givingtools, Linkedin, @unjournal.bsky.social
      • Help us find partners, communication ops, event co-hosts*

  • Join our team
    • Become a paid evaluator (join our pool here)
    • Apply to be a field specialist (same link)
    • Suggest work to evaluate (here)
    • We’re hiring! (Ops, management: Description, Application form Application form)

  • Give us feedback (here, on our EA Forum posts, etc)

54 of 75

Questions?

55 of 75

Bonus slides

56 of 75

How?

  1. Identify/solicit relevant research, hosted on any open platform w/ time-stamped DOI.

  • Pay reviewers to evaluate and give careful feedback on this work.
    • Publish evaluations' with authors possible replies. (Reviewers: opt-in anonymity)
    • Elicit reviewers’ quantifiable & comparable metrics/predictions as credible metrics.

  • Linking work but not 'publishing it': not 'exclusive'
    • Authors can 'submit their work to a journal' at any point,
    • which serves to benchmark our evaluations against 'traditional publications’

4. Financial prizes for strongest work

57 of 75

Replication crisis, p-hacks, fraud/ error → Transparent ‘research pipeline’ formats

58 of 75

🔎: “Rejected after third revision”

Notorious “THIRD REVIEWER”

Rap stylings

59 of 75

Traditional binary (0/1) ‘publish or reject’ process

Wastes resources

  • Effort, gamesmanship, submitting to the right sequence of journals in clever ways

#sneakypubsorgoodresearch ???

.

  • Adds unnecessary risk

  • You can’t continue to improve a work and get credit → ‘Clutter of new papers

�→ Evaluate & rate, don’t accept/reject

60 of 75

Global priorities/EA research orgs need:

  1. Feedback & quality control
  2. Dissemination
  3. External credibility

How?

  • Make our own ‘peer review circles?
  • Leverage/network with academic research(ers)
  • Submit to traditional journals?
  • Work with the Unjournal (and related)

61 of 75

Key Hurdles…

  1. Academic reluctance to engage
    1. Risk-averse
    2. Looking for ‘big shots’ and prestige
    3. Weirdness/hippie factor

  1. EAs/EA orgs
    • Interested in highly rigorous empirical work?
    • Do we trust academics?

62 of 75

… And about 60 volunteers for referee pool

63 of 75

Protocol for choosing/communicating work to evaluate

64 of 75

Guideline/form for evaluators (HERE)

65 of 75

66 of 75

67 of 75

68 of 75

69 of 75

70 of 75

71 of 75

Academic publishers

Extract rents and discourage innovation

But there is a coordination problem in ‘escaping’ this.

Funders like Open Phil and EA-affiliated researchers are not stuck, we can facilitate an exit.

72 of 75

Enable new formats and approaches

More readable, reliable, and replicable research formats, e.g., dynamic documents

and allow research projects to continue to improve without “paper bloat”

73 of 75

74 of 75

75 of 75