1 of 28

Three Challenges for Recommender Regulation

Jonathan Stray

Senior Scientist

Center for Human Compatible AI, UC Berkeley

2022-5-9

2 of 28

Recommenders are the largest deployed AI systems

Information

Products

Jobs

3 of 28

Recommenders are personalized information filters

4 of 28

This talk: ranking not moderation

(what is shown, not what is not shown)

5 of 28

Three Challenges to Recommender Regulation

No neutral baseline

Standardized Outcome Measurements

Transparency of what?

6 of 28

No Neutral Baseline

7 of 28

Which of these is neutral?

Proposed Protecting Americans from Dangerous Algorithms Act (H.R.2154) says platforms are immune from liability if they sort

(aa) chronologically or reverse chronologically;

(bb) by average user rating or number of user reviews;

(cc) alphabetically;

(dd) randomly; and

(ee) by views, downloads, or a similar usage metric

8 of 28

Popularity is not neutral

We find a regime of intermediate exploration cost where an optimal balance exists, such that choosing what is popular actually promotes high-quality items to the top. Outside of these limits, however, popularity bias is more likely to hinder quality.

How algorithmic popularity bias hinders or promotes quality

Nematzadeh et al., 2017

Popularity bias

User Exploration

9 of 28

Chronological is not neutral

A chronological newsfeed can be spammed by people or bots posting the same or similar things every few seconds. … If 50 people I follow on Twitter all retweet the same post, should I see that post 50 times, or one time? Showing a popular post just once would obscure the other 49 people’s posts.

…

Another problem has to do with human behavior. There is reason to believe that a purely chronological system will show more “borderline” content—material that almost, but not quite, violates whatever speech prohibitions a platform enforces.

Amplification and its Discontents

Daphne Keller, 2021

10 of 28

“Amplification” can be defined for Twitter…

…but a chronological baseline makes no sense for YouTube, Google News, Spotify, Amazon.

When Twitter introduced machine learning to personalize the Home timeline in 2016, it excluded a randomly chosen control group of 1% of all global Twitter users from the new personalized Home timeline.

...

We define the amplification ratio of set T of tweets in an audience U as the ratio of the reach of T in U intersected with the treatment [algorithmic] group and the reach of T in U intersected with the control [chronological] group.

Algorithmic amplification of politics on Twitter

Huszár et al., 2022

11 of 28

Standardized Outcome Measurements

12 of 28

13 of 28

The Leaked Experiment We Wish Could Do

14 of 28

15 of 28

16 of 28

Linear regression on user histories

17 of 28

Randomized controlled trial

18 of 28

Facebook Knows Instagram Is Toxic for Teen Girls, Company Documents Show

Wall Street Journal, 2021

For years, Facebook experimented with hiding the tallies of “likes” that users see on their photos. Teens told Facebook in focus groups that “like” counts caused them anxiety and contributed to their negative feelings.

When Facebook tested a tweak to hide the “likes” in a pilot program they called Project Daisy, it found it didn’t improve life for teens. “We didn’t observe movements in overall well-being measures,” Facebook employees wrote in a slide they presented to Mr. Zuckerberg about the experiment in 2020.

19 of 28

An IEEE standard with 1000 well-being metrics

IEEE 7010: A New Standard for Assessing the Well-being Implications of AI,

Schiff, Ayesh, Musikanski, Havens, 2020

20 of 28

Thinking Through Regulation of Outcomes

Direct regulation of outcomes would require standardization of:

What is measured
How it is measured
How causal influence is determined
Separation of human behavior from algorithmic effects
Thresholds or limits for “bad” outcomes

…all on a per-domain basis (social media, news, music, shopping, etc.)

21 of 28

Transparency of What?

22 of 28

Transparency Policy Possibilities - Code

Recommender Code

A lot of it! Hard to interpret, security and IP challenges.

May not tell us anything about user outcomes, though system architecture is helpful context.

Major “features” or “parameters”

This sort of language is common (e.g. DSA) but unclear what it means, and what can be done with this information.

23 of 28

Transparency Policy Possibilities - Data

User outcomes

Probably a key component of ensuring healthy platforms, but far from consensus on which metrics to monitor in which domains. and what acceptable values are.

User trajectories

What was shown to each user, what they did.

Could create an API for aggregated data protected with differential privacy.

24 of 28

Transparency Policy Possibilities - Documentation

Risk Assessments

In the DSA and various draft bills. No standard definition of “risk” or process yet.

Plausible, but be careful that disclosure doesn’t disincentivize asking hard questions.

Platform “change log”

Transparency of decision-making process, and the objectives driving it.

Explanation of each major change, and the data that drove it.

25 of 28

Transparency Policy Possibilities - Research

Audits

Right of external audit in the DSA for various “risks”

Researcher data access via FTC in proposed Platform Accountability and Transparency Act, but limited to observational analyses.

External research collaborations

Requires protection of platform, user, and public interests.

Experiments possible.

Probably the only way to get reliable causal inference of platform effects.

26 of 28

What Now?

27 of 28

Suggested Responses to Regulatory Challenges

Learn to live without “neutral” choices

Don’t assume that comparisons to “no algorithm” are desirable or even possible (and “amplification” is such a comparison).

Specify how harms are to be measured (and who decides)

There is currently no consensus on which outcomes matter most on which types of platforms, and how to measure them in a causally robust way.

Enable social science

Transparency of “code” or “parameters” is unlikely to be helpful. Focus on privacy-protected APIs that give insight into what users saw and did, and contractual structures for industry-academic research collaborations.

28 of 28

Thanks!

@jonathanstray