1 of 21

Data Science and Social Media

SPCP ComMonth 2021: How to Criticize the Media

Amber Teng

SPCP Batch 2013 �angelamarieteng@gmail.com

2 of 21

AGENDA

  • INTRODUCTIONS
  • WHY IS IT IMPORTANT TO THINK CRITICALLY ABOUT THE MEDIA WE CONSUME?
  • WHAT IS DATA SCIENCE AND HOW DOES IT AFFECT SOCIAL MEDIA?
  • RECOMMENDER SYSTEMS, INFORMATION RETRIEVAL, AND SOCIAL MEDIA
  • FILTER BUBBLES AND ECHO CHAMBERS
  • DATA PRIVACY AND ETHICS
  • ACTIVITY
  • Q&A

3 of 21

INTRODUCTIONS

*CAVEAT: this is a very big and widely discussed topic. During today’s talk, I aim to start a discussion and share preliminary resources rather than to comprehensively speak about all the implications and technicalities of data science, recommendation systems, and social media.

SPCP Batch 2013�Brown University, BA Economics, Archaeology�NYU, MS Data Science �Author, The Data Resource�Teaching Assistant, Data Science for Everyone�Research Assistant, Data Science Software & Services�Co-Founder & Co-President, NYU Women in Data Science�

4 of 21

Why is it important to think critically about the media we consume?

  • Technology advancement and data generation have skyrocketed--along with this, our reliance and dependency on media for information and increased
  • At a societal level, we generally trust media as an authority
  • In a world where we’re inundated with news and data, media literacy is key.
  • “Data is the new oil of the digital economy.”
  • In the age of globalization and hyperconnectivity, data and social media have become helpful drivers of efficiency, but they can also be equally harmful when misused

5 of 21

Why is it important to think critically about the media we consume?

6 of 21

Why is it important to think critically about the media we consume?

MENTAL HEALTH

DEMOCRACY

DISCRIMINATION

The # of countries with political disinformation campaigns on social media doubled in the past 2 years.

The New York Times

A 5,000 person study found that higher social media use correlated with self-reported declines in mental and physical health and life satisfaction

American Journal of Epidemiology, 2017

64% of the people who joined extremist groups on Facebook did so because the algorithms steered them there.

Internal Facebook report, 2018

7 of 21

What is data science and how does it affect social media?

8 of 21

What is data science and how does it affect social media?

  • Data science lies at the intersection of computer science, mathematics, and domain applications
  • “Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains”
  • We all make decisions everyday, and each of these decisions involves selecting among a huge space of choices
  • Increasingly, our choices are informed by algorithms and data

9 of 21

What is data science and how does it affect social media?

  • Machine learning models can alter societal behaviors
  • Broader societal impacts of search engines and recommender systems
  • Data ethics, data privacy, consent, confidentiality
  • Fairness, polarization, diversity
  • Models and datasets are inherently biased

10 of 21

Recommender systems, information retrieval, and social media

  • Basic Information Retrieval Problem:
    • A collection of documents
    • Queries
    • A notion of relevance (How well does each document match a given query?)
    • Objective: help users retrieve documents to satisfy their information needs (Eg. web search)
  • Basic Recommendation Problem:
    • Collection of documents
    • Collection of users
    • Observed interactions between documents and users
    • Objective: Estimate relevance of each unobserved interaction

Source: Search and Discovery Course taught by Professor Brian McFee, TA-ed by Guido Petri and Amber Teng, Fall 2020 https://newclasses.nyu.edu/access/content/group/cacae473-9f49-419d-9163-674e9d75a323/Week%2001/Week%2001_2%20-%20Information%20retrieval.pdf

11 of 21

Recommender systems, information retrieval, and social media

Information Retrieval

  • Focuses on short-term user needs
  • Applications derive from search engines
  • Queries are assumed to be independent

Recommender Systems

  • Focus on longer-term user needs
  • Applications primarily in entertainment
  • Adapt to a user’s behavior over time
  • Queries are sequential and interdependent
  • Also uses other users’ behavior

Source: Search and Discovery Course taught by Professor Brian McFee, TA-ed by Guido Petri and Amber Teng, Fall 2020 https://newclasses.nyu.edu/access/content/group/cacae473-9f49-419d-9163-674e9d75a323/Week%2001/Week%2001_2%20-%20Information%20retrieval.pdf

The core ingredients of a recommender system:

12 of 21

Recommender systems, information retrieval, and social media

Questions to consider when building or using recommender systems:

  • Does the evaluation accurately reflect how the system is used in practice?
  • Do we have enough interaction data to reliably evaluate a model?
  • Do we have enough interaction data to reliably fit a model?
  • What sources of bias exist in the content?
  • What sources of bias exist in the users?

Source: Search and Discovery Course taught by Professor Brian McFee, TA-ed by Guido Petri and Amber Teng, Fall 2020 https://newclasses.nyu.edu/access/content/group/cacae473-9f49-419d-9163-674e9d75a323/Week%2001/Week%2001_2%20-%20Information%20retrieval.pdf

13 of 21

Filter bubbles and echo chambers

Benefits of Recommender Systems:

  • Can help users efficiently navigate large collections of items
  • The recommender can adapt to the user to be more efficient

But...

  • What happens when users adapt to the recommender system instead?
  • How does the recommender affect users over time?

Filter Bubbles: intellectual isolation resulting from personalized searches when a website algorithm selectively guesses what information a user would like to see (user information, such as location, past click-behavior and search history)

Source: Search and Discovery Course taught by Professor Brian McFee, TA-ed by Guido Petri and Amber Teng, Fall 2020 https://newclasses.nyu.edu/access/content/group/cacae473-9f49-419d-9163-674e9d75a323/Week%2001/Week%2001_2%20-%20Information%20retrieval.pdf

14 of 21

Filter bubbles and echo chambers

Why is this a problem?

  • “users become separated from information that disagrees with their viewpoints, effectively isolating them in their own cultural or ideological bubbles” (Ie, you only hear what you want to hear)
  • Increased ideological and political polarization, extremism, cultural tribalism
  • Google Search, FB Feed / Newstream, 2016 US Election, 2016 Philippine Election

Echo Chambers: situations in which beliefs are amplified or reinforced by communication and repetition inside a closed system and insulated from rebuttal

  • People are able to seek out information that reinforces their existing views without encountering opposing views, potentially resulting in an unintended exercise in confirmation bias.

Source: Search and Discovery Course taught by Professor Brian McFee, TA-ed by Guido Petri and Amber Teng, Fall 2020 https://newclasses.nyu.edu/access/content/group/cacae473-9f49-419d-9163-674e9d75a323/Week%2001/Week%2001_2%20-%20Information%20retrieval.pdf

15 of 21

Data privacy and ethics

To build these models and recommender systems, we need data…

  • Unsolicited data collection
  • 3rd-party data sharing
  • Unsolicited access by employees / developers
  • Exposure of sensitive information
  • Targeted advertisement (e.g. the “Target story”)
  • (Pricing) discrimination and profiling
  • Leaking data to other users
  • Users may not be fully aware of what data is collected and how it’s used

16 of 21

Data privacy and ethics

The Attention Extraction Economy: technology platforms profit from the monetization of human attention and engagement

Surveillance Capitalism: mass surveillance of our online activity in ways that we are often unaware, and the commodification of this data for commercial purposes

Biases in Data and Modeling: datasets are inherently biased; the key is understanding how to account for, contextualize, and mitigate the effects of those biases

Trade-offs to Consider:

  • Accuracy / efficiency vs privacy
  • Accuracy vs transparency and model interpretability

17 of 21

So, what now?

Some questions to ask when consuming social/media:

  • Who is this intended for?
  • What assumptions does this content make about society, or about its audience?
  • How real and accurate is this information?
  • Why did I get this social media post recommended to me?
  • What data could I have given or shared on this platform for this to show up on my feed?
  • Am I okay with sharing this personal data to the public?
  • What values are presented?
  • What is the commercial message of this?
  • Are there additional political and social messaging latent in this context?
  • Do I agree or disagree with this message?
  • What biases persist in this post?
  • Be vigilant: Discuss, listen, and research
  • Be cognizant of your use of social media, and be aware of where and with whom you are sharing your data
  • Rebuild the system

18 of 21

In today’s age of hyperconnectivity and increased dependency on social media, how can we think critically about the media we consume and the algorithms that curate this content?

ACTIVITY

19 of 21

QUESTIONS?

20 of 21

References

A list of the references used for this deck can be found on my notion page:

https://www.notion.so/ateng2507/3c9271f4309246ceaf085935052b6b3e?v=2ec7d38ed2aa424bb43c14d3bda5d220

21 of 21

Thank you!

Amber Teng - SPCP Batch 2013 �Email: angelamarieteng@gmail.comBook: https://www.amazon.com/Data-Resource-Emerging-Countries-Landscape/dp/1641372524 Twitter: @ambervteng �LinkedIn: https://www.linkedin.com/in/angelavteng/Instagram: hambergur_fries