Three Challenges for Recommender Regulation
Jonathan Stray
Senior Scientist
Center for Human Compatible AI, UC Berkeley
2022-5-9
Recommenders are the largest deployed AI systems
Information
Products
Jobs
Recommenders are personalized information filters
This talk: ranking not moderation
(what is shown, not what is not shown)
Three Challenges to Recommender Regulation
No neutral baseline
Standardized Outcome Measurements
Transparency of what?
No Neutral Baseline
Which of these is neutral?
Proposed Protecting Americans from Dangerous Algorithms Act (H.R.2154) says platforms are immune from liability if they sort
(aa) chronologically or reverse chronologically;
(bb) by average user rating or number of user reviews;
(cc) alphabetically;
(dd) randomly; and
(ee) by views, downloads, or a similar usage metric
Popularity is not neutral
We find a regime of intermediate exploration cost where an optimal balance exists, such that choosing what is popular actually promotes high-quality items to the top. Outside of these limits, however, popularity bias is more likely to hinder quality.
How algorithmic popularity bias hinders or promotes quality
Nematzadeh et al., 2017
Popularity bias
User Exploration
Chronological is not neutral
A chronological newsfeed can be spammed by people or bots posting the same or similar things every few seconds. … If 50 people I follow on Twitter all retweet the same post, should I see that post 50 times, or one time? Showing a popular post just once would obscure the other 49 people’s posts.
…
Another problem has to do with human behavior. There is reason to believe that a purely chronological system will show more “borderline” content—material that almost, but not quite, violates whatever speech prohibitions a platform enforces.
Amplification and its Discontents
Daphne Keller, 2021
“Amplification” can be defined for Twitter…
…but a chronological baseline makes no sense for YouTube, Google News, Spotify, Amazon.
When Twitter introduced machine learning to personalize the Home timeline in 2016, it excluded a randomly chosen control group of 1% of all global Twitter users from the new personalized Home timeline.
...
We define the amplification ratio of set T of tweets in an audience U as the ratio of the reach of T in U intersected with the treatment [algorithmic] group and the reach of T in U intersected with the control [chronological] group.
Algorithmic amplification of politics on Twitter
Huszár et al., 2022
Standardized Outcome Measurements
The Leaked Experiment We Wish Could Do
Linear regression on user histories
Randomized controlled trial
Facebook Knows Instagram Is Toxic for Teen Girls, Company Documents Show
Wall Street Journal, 2021
For years, Facebook experimented with hiding the tallies of “likes” that users see on their photos. Teens told Facebook in focus groups that “like” counts caused them anxiety and contributed to their negative feelings.
When Facebook tested a tweak to hide the “likes” in a pilot program they called Project Daisy, it found it didn’t improve life for teens. “We didn’t observe movements in overall well-being measures,” Facebook employees wrote in a slide they presented to Mr. Zuckerberg about the experiment in 2020.
An IEEE standard with 1000 well-being metrics
IEEE 7010: A New Standard for Assessing the Well-being Implications of AI,
Schiff, Ayesh, Musikanski, Havens, 2020
Thinking Through Regulation of Outcomes
Direct regulation of outcomes would require standardization of:
…all on a per-domain basis (social media, news, music, shopping, etc.)
Transparency of What?
Transparency Policy Possibilities - Code
Recommender Code
A lot of it! Hard to interpret, security and IP challenges.
May not tell us anything about user outcomes, though system architecture is helpful context.
Major “features” or “parameters”
This sort of language is common (e.g. DSA) but unclear what it means, and what can be done with this information.
Transparency Policy Possibilities - Data
User outcomes
Probably a key component of ensuring healthy platforms, but far from consensus on which metrics to monitor in which domains. and what acceptable values are.
User trajectories
What was shown to each user, what they did.
Could create an API for aggregated data protected with differential privacy.
Transparency Policy Possibilities - Documentation
Risk Assessments
In the DSA and various draft bills. No standard definition of “risk” or process yet.
Plausible, but be careful that disclosure doesn’t disincentivize asking hard questions.
Platform “change log”
Transparency of decision-making process, and the objectives driving it.
Explanation of each major change, and the data that drove it.
Transparency Policy Possibilities - Research
Audits
Right of external audit in the DSA for various “risks”
Researcher data access via FTC in proposed Platform Accountability and Transparency Act, but limited to observational analyses.
External research collaborations
Requires protection of platform, user, and public interests.
Experiments possible.
Probably the only way to get reliable causal inference of platform effects.
What Now?
Suggested Responses to Regulatory Challenges
Learn to live without “neutral” choices
Don’t assume that comparisons to “no algorithm” are desirable or even possible (and “amplification” is such a comparison).
Specify how harms are to be measured (and who decides)
There is currently no consensus on which outcomes matter most on which types of platforms, and how to measure them in a causally robust way.
Enable social science
Transparency of “code” or “parameters” is unlikely to be helpful. Focus on privacy-protected APIs that give insight into what users saw and did, and contractual structures for industry-academic research collaborations.
Thanks!
@jonathanstray