1 of 21

Introduction to Trust & Safety

Camille François (Columbia University / Niantic Labs)

Mariana Olaizola Rosenblat (NYU Stern Center for Business and Human Rights)

Matthew Soeth (All Tech is Human / Tremau /Former TikTok)

2 of 21

Learning Objectives

Today we will:

Learn about the purpose and history of trust & safety
Learn about approaches to trust & safety

3 of 21

A note on difficult content, and on class etiquette

4 of 21

Purpose and History of T&S

5 of 21

What drives T&S

Corporate responsibility
Crisis sensitivity (cf. Zoom paper)
Regulation, regulatory pressure (from Europe’s DSA to the Australian Safety by Design framework)
Upstream technological standards applied through the stack (see, e.g., Apple’s app rules)

Copyright: Trust and Safety Foundation

“Trust and safety is the study of how people abuse the internet to cause real human harm, often using [online] products the way they are designed to work” (Journal of Online Trust & Safety).

“Trust and Safety is also a practice and a field within technology companies that is concerned with the reduction, prevention, and mitigation of online harms. Per the Trust & Safety Professional Association: “As internet communities, online services, and the use of digital technologies to mediate our daily lives and interactions have continued to grow, technology companies have needed to determine the kinds of content and behaviors that are appropriate and those that are not. The teams that handle this responsibility often fall under the general term “trust and safety.”

📎 Source: App store review guidelines

6 of 21

High-level taxonomy of relevant abuses

Violent & Criminal Behavior

Dangerous Organizations (e.g., extremist groups, criminal organizations)
Violence (e.g., explicit threats, bomb-making instructions)
Child Abuse & Nudity (e.g., child sexual abuse material, solicitation of minors)
Sexual Exploitation (e.g., non-consensual sex acts, sextortion)
Human Exploitation (e.g., human trafficking, forced marriage)

Regulated Goods & Services

Regulated Goods (e.g., weapons, drugs, alcohol, endangered animals)
Regulated Services (e.g., gambling, addiction treatment, financial services)
Commercial Sexual Activity (e.g., advertisements for sex work, selling access to nude images)

Offensive & Objectionable Content

Hateful Content (e.g., slurs, support for supremacy movements, mockery of victims)
Graphic & Violent Content (e.g., imagery of fatal incidents, dismembered bodies, animal cruelty)
Nudity & Sexual Activity (e.g., pornography, explicit art)

7 of 21

High-level taxonomy of relevant abuses (cont.)

User Safety

Suicide & Self Harm (e.g., intention to self harm, encouraging self harm, instructions)
Harassment & Bullying (e.g., hateful conduct, dogpiling, blackmail threats, doxxing)
Dangerous Misinformation & Endangerment (e.g., conspiracy theories, false safety info, dangerous challenges)

Scaled Abuse

Spam (e.g., mass unsolicited messaging, auto-generated comments)
Malware (e.g., viruses, spyware, ransomware)
Inauthentic Behavior (e.g., fake engagement, disinformation campaigns)

Deceptive & Fraudulent Behavior

Fraud (e.g., loan scams, pyramid schemes, fake charity solicitation, stolen goods)
Impersonation (e.g., hacked accounts, fake names, impersonating celebrities)
Cybersecurity (e.g., phishing, sharing/requesting login details)
Intellectual Property (e.g., unauthorized use of trademarks/copyrighted content)
Defamation (e.g., publication of false or outdated damaging statements)

Community-Specific Rules

Format (e.g., word limits, restrictions on links, insufficient details)
Content Limitation (e.g., off-topic content, selling/advertising restrictions, spoilers)

Key point: relevant abuse types depend on your audiences, feature, product, etc.

8 of 21

History and evolution of T&S field

Origins

Ebay: a prominent early user of term “trust and safety”

Where T&S “grew” from

Operations (ex. Ebay: from Customer Service)
Legal (ex. 📺 “Origins of Trust & Safety” Databite podcast with Alexander MacGilivray and Nicole Wong)
Information Security, Cybersecurity (ex: 📺 Alex Stamos “Battle for the Soul of the Internet” lecture)

Expanding scope of T&S

Today, Trust & Safety teams across the technology industry have different scopes, missions and organizational structures: the field continues to evolve.

As an academic topic, T&S shares borders with:

Internet governance, cybersecurity, Internet Policy, Internet freedom, platform governance, but also online terrorism and violent extremism, disinformation, hate speech, online forensics, etc.

9 of 21

Approaches and Best Practices in T&S

10 of 21

Reactive vs. proactive models

Reactive moderation

Responding to user reports
Automatically flagging content

Proactive approaches

Safety by design
Community Building & Prosocial Behaviors

Sample company response following a user report of T&S violation

11 of 21

Managing trade-offs

Privacy and safety may sometimes be at odds
Restrictive platform features that require less moderation v. open features that require more moderation
More false positives or more false negatives

12 of 21

Where T&S fits in an organization

13 of 21

Gaining senior management support

Educating senior management about issues
Making the case for why trust & safety is related to core product mission
Involving senior management in important edge case decisions

14 of 21

Building a T&S team

Components of a T&S team
Types of T&S teams
Types of T&S professionals

15 of 21

Sample functions

Policy

Operations

Safety by Design

Engineering

Threat Detection, Intelligence

Child Safety

Product (tooling)

Enforcement & Investigations

16 of 21

Discussion: contrasting perspectives on building T&S teams

What resonated from the stories collected and shared by Alex Feerst?
What are themes echoed by Feerst, by Zoom and Pinterests’ papers, and by Nicole Wong + Alexander MacGilivray?
What are significant differences emerging from these testimonies?

17 of 21

Technologies used to implement T&S

18 of 21

Overview of automated technologies

Digital hash technology
Image recognition tools
Metadata filtering
Natural language processing (NLP) classifiers

19 of 21

Shortcomings

Circumvention techniques

E.g., for audio and video hashes (by altering the length or encoding format of the file)

Biases in training data

E.g., as a result of skewed representation of certain languages and geographic regions

Lack of transparency in how databases are populated

But downsides of being too transparent…

NLP classifiers struggle with nuance and can be under- or over-inclusive in their coverage

20 of 21

The Future of T&S

Advancements in large-language models (LLMs) and generative AI technology:

New and enhanced risks
Potential for assistance of LLMs in content moderation?

T&S in the “metaverse”:

Moderating user conduct – not just content – in real time
New attack vectors

Copyright: Sanal Savunma, https://www.sanalsavunma.com/what-is-metaverse/

21 of 21

Additional Reading & Materials

Books:�The 26 Words that Created the Internet

Custodians of the Internet

Speech Police�Prosocial ��LinkedIn Learning�Becoming a Trust & Safety Leader �

Articles, Papers, & Standards:�The Santa Clara Principles

Making the Business Case for Trust and Safety�Oasis Consortium - trust & safety standards

US/EU: Joint Statement on Protecting Human Rights Defenders Online

Digital Thriving and Prosocial Design in Gaming