1 of 21

Introduction to Trust & Safety

Camille François (Columbia University / Niantic Labs)

Mariana Olaizola Rosenblat (NYU Stern Center for Business and Human Rights)

Matthew Soeth (All Tech is Human / Tremau /Former TikTok)

2 of 21

Learning Objectives

Today we will:

  • Learn about the purpose and history of trust & safety
  • Learn about approaches to trust & safety

3 of 21

A note on difficult content, and on class etiquette

4 of 21

Purpose and History of T&S

5 of 21

What drives T&S

  • Corporate responsibility
  • Crisis sensitivity (cf. Zoom paper)
  • Regulation, regulatory pressure (from Europe’s DSA to the Australian Safety by Design framework)
  • Upstream technological standards applied through the stack (see, e.g., Apple’s app rules)

Copyright: Trust and Safety Foundation

6 of 21

High-level taxonomy of relevant abuses

Violent & Criminal Behavior

    • Dangerous Organizations (e.g., extremist groups, criminal organizations)
    • Violence (e.g., explicit threats, bomb-making instructions)
    • Child Abuse & Nudity (e.g., child sexual abuse material, solicitation of minors)
    • Sexual Exploitation (e.g., non-consensual sex acts, sextortion)
    • Human Exploitation (e.g., human trafficking, forced marriage)

Regulated Goods & Services

    • Regulated Goods (e.g., weapons, drugs, alcohol, endangered animals)
    • Regulated Services (e.g., gambling, addiction treatment, financial services)
    • Commercial Sexual Activity (e.g., advertisements for sex work, selling access to nude images)

Offensive & Objectionable Content

    • Hateful Content (e.g., slurs, support for supremacy movements, mockery of victims)
    • Graphic & Violent Content (e.g., imagery of fatal incidents, dismembered bodies, animal cruelty)
    • Nudity & Sexual Activity (e.g., pornography, explicit art)

7 of 21

High-level taxonomy of relevant abuses (cont.)

User Safety

    • Suicide & Self Harm (e.g., intention to self harm, encouraging self harm, instructions)
    • Harassment & Bullying (e.g., hateful conduct, dogpiling, blackmail threats, doxxing)
    • Dangerous Misinformation & Endangerment (e.g., conspiracy theories, false safety info, dangerous challenges)

Scaled Abuse

    • Spam (e.g., mass unsolicited messaging, auto-generated comments)
    • Malware (e.g., viruses, spyware, ransomware)
    • Inauthentic Behavior (e.g., fake engagement, disinformation campaigns)

Deceptive & Fraudulent Behavior

    • Fraud (e.g., loan scams, pyramid schemes, fake charity solicitation, stolen goods)
    • Impersonation (e.g., hacked accounts, fake names, impersonating celebrities)
    • Cybersecurity (e.g., phishing, sharing/requesting login details)
    • Intellectual Property (e.g., unauthorized use of trademarks/copyrighted content)
    • Defamation (e.g., publication of false or outdated damaging statements)

Community-Specific Rules

    • Format (e.g., word limits, restrictions on links, insufficient details)
    • Content Limitation (e.g., off-topic content, selling/advertising restrictions, spoilers)

Key point: relevant abuse types depend on your audiences, feature, product, etc.

8 of 21

History and evolution of T&S field

  • Origins
    • Ebay: a prominent early user of term “trust and safety”
  • Where T&S “grew” from
    • Operations (ex. Ebay: from Customer Service)
    • Legal (ex. 📺 “Origins of Trust & Safety” Databite podcast with Alexander MacGilivray and Nicole Wong)
    • Information Security, Cybersecurity (ex: 📺 Alex Stamos “Battle for the Soul of the Internet” lecture)
  • Expanding scope of T&S
    • Today, Trust & Safety teams across the technology industry have different scopes, missions and organizational structures: the field continues to evolve.
  • As an academic topic, T&S shares borders with:
    • Internet governance, cybersecurity, Internet Policy, Internet freedom, platform governance, but also online terrorism and violent extremism, disinformation, hate speech, online forensics, etc.

9 of 21

Approaches and Best Practices in T&S

10 of 21

Reactive vs. proactive models

  • Reactive moderation
    • Responding to user reports
    • Automatically flagging content
  • Proactive approaches
    • Safety by design
    • Community Building & Prosocial Behaviors

Sample company response following a user report of T&S violation

11 of 21

Managing trade-offs

  • Privacy and safety may sometimes be at odds
  • Restrictive platform features that require less moderation v. open features that require more moderation
  • More false positives or more false negatives

12 of 21

Where T&S fits in an organization

13 of 21

Gaining senior management support

  • Educating senior management about issues
  • Making the case for why trust & safety is related to core product mission
  • Involving senior management in important edge case decisions

14 of 21

Building a T&S team

  • Components of a T&S team
  • Types of T&S teams
  • Types of T&S professionals

15 of 21

Sample functions

Policy

Operations

Safety by Design

Engineering

Threat Detection, Intelligence

Child Safety

Product (tooling)

Enforcement & Investigations

16 of 21

Discussion: contrasting perspectives on building T&S teams

  • What resonated from the stories collected and shared by Alex Feerst?
  • What are themes echoed by Feerst, by Zoom and Pinterests’ papers, and by Nicole Wong + Alexander MacGilivray?
  • What are significant differences emerging from these testimonies?

17 of 21

Technologies used to implement T&S

18 of 21

Overview of automated technologies

  • Digital hash technology
  • Image recognition tools
  • Metadata filtering
  • Natural language processing (NLP) classifiers

19 of 21

Shortcomings

Circumvention techniques

  • E.g., for audio and video hashes (by altering the length or encoding format of the file)

Biases in training data

  • E.g., as a result of skewed representation of certain languages and geographic regions

Lack of transparency in how databases are populated

  • But downsides of being too transparent…

NLP classifiers struggle with nuance and can be under- or over-inclusive in their coverage

20 of 21

The Future of T&S

Advancements in large-language models (LLMs) and generative AI technology:

  • New and enhanced risks
  • Potential for assistance of LLMs in content moderation?

T&S in the “metaverse”:

  • Moderating user conduct – not just content – in real time
  • New attack vectors

21 of 21

Additional Reading & Materials