AI-Generated Misinformation
Social Sciences
September 2024
The Investigative Journalism Foundation
Table of Contents
6. AI Companies
7. Strategies for Identifying Misinformation
8. Group Exercise
9. Conclusion
How confident are you in your ability to spot misinformation online?
Discovery of the “Lost Tribe of Arawana” in the Amazon Jungle…
Location of Discovery: The lost tribe of Awarana was discovered in a remote, forested area of the Amazon rainforest 370 km west of the Brazilian city of Manaus. This site, hidden deep within the dense jungle, was revealed after a series of aerial surveys using LiDAR technology identified unusual structures beneath the canopy. These aerial surveys were prompted by the discoveries of unfamiliar objects by locals following major floods in the region.
An aerial shot of the archeological excavation site deep in the Amazon discovered earlier this year. Experts believe that the disaster which wiped out the Awarana may have occurred over 800 years ago. (Lucas Mendes, University of São Paulo, Department of Archaeology via Getty Images/File)
Discovery of the Lost Tribe…
History of Researching the Group:
The Awarana tribe had long been the subject of local legends, believed to have perished following a catastrophic natural disaster, possibly a massive flood or volcanic eruption, over a thousand years ago. Initial interest in the tribe sparked in the early 20th century when explorers and anthropologists began documenting indigenous folklore. However, it wasn't until the advent of modern technology that concrete evidence of the tribe's existence was found.
In February 2024, a team of archaeologists from the University of São Paulo, led by Dr. Mariana Silva (see adjacent image), embarked on an expedition to explore the identified site. They uncovered a settlement featuring a complex network of structures, including homes, communal spaces, and what appeared to be ceremonial grounds. Artifacts such as pottery, tools, and jewelry were found, alongside skeletal remains that provided valuable insights into the tribe's lifestyle, health, and social structure.
Dr. Mariana Silva, Ph.D., Professor of Archeology, University of São Paulo, onsite in March 2024 (Lucas Mendes, University of São Paulo, Department of Archaeology via Getty Images/File)
Implications for the Social Sciences:
Understanding Indigenous Cultures: The artifacts and remains provide a wealth of information about the Awarana’s daily life, social hierarchy, and cultural practices. This helps anthropologists gain a deeper understanding of pre-Colombian Indigenous cultures in the Amazon.
Natural Disaster Impact Studies: By studying the cause of the tribe’s demise, researchers can better understand the impact of natural disasters on ancient civilizations. This can offer insights into how current and future communities might mitigate such risks.
Technological Advancements in Archaeology: The use of LiDAR technology and other modern techniques highlights the importance of technological advancements in uncovering and studying lost civilizations. This can pave the way for future discoveries and more efficient archaeological practices.
Cultural Preservation: The findings emphasize the importance of preserving Indigenous heritage and knowledge. By documenting and studying these discoveries, we can ensure that the cultural legacy of the Awarana and similar tribes is not forgotten.
01
02
03
04
Actually, that entire scenario was completely AI-generated!
Presentation Overview:
This presentation intends to give you a brief overview of misinformation online, specifically in relation to AI platforms like ChatGPT and DALL-E:
What is Artificial Intelligence (AI) and AI-generated content?
An AI model is a program that has been trained on a set of data to recognize certain patterns or make certain decisions without further human intervention. Artificial intelligence models apply different algorithms to relevant data inputs to achieve the tasks, or output, they’ve been programmed for.
AI models utilize one of supervised learning, unsupervised learning, or reinforcement learning. AI-generated content uses a combination of all three forms of machine learning, eventually evolving into deep learning.
Supervised Learning
Example Application → Predicting spam email. The model is provided with thousands of emails, and the target variable is spam: yes/no?
How can it perpetuate bias and misinformation?
Supervised learning models are only as good as the data they are trained on. If the training data contains biases, these biases will be learned and perpetuated by the model. For example, if a news recommendation system is trained on articles that predominantly represent one political viewpoint, the system will learn to recommend articles that reinforce that bias, potentially leading to misinformation by over-representing certain viewpoints and under-representing others. This can create a feedback loop where users are only exposed to biased information, further entrenching their existing beliefs.
Unsupervised Learning
Example Application → Given the user profiles of a company’s customer base, a model can analyze the profile data to identify distinct groups of customers with similar behaviours without predefined group labels or identities.
How can it perpetuate bias and misinformation?
Unsupervised learning can perpetuate bias by reinforcing existing patterns in the data, even if those patterns are not inherently biased. For instance, if a clustering algorithm is used to group social media users based on their interactions, and the data includes biased interactions (e.g., users primarily interacting with others who share their views), the resulting clusters may reinforce echo chambers. This can lead to the spread of misinformation within these clusters as users are not exposed to diverse viewpoints. Furthermore, there may be less confidence in how good or accurate the patterns picked up by the ML model are, as there is no objective ‘answer key’ to evaluate performance on.
Reinforcement Learning
Example Application → Train an AI agent to play a video game, where there is some goal that can be achieved through a sequence of decisions. The AI agent receives rewards for achieving certain goals within the game and penalties for failing to do so, learning to play the game more effectively over time.
How can it perpetuate bias/misinformation?
Reinforcement learning can perpetuate bias if the reward system is biased. For example, if a reinforcement learning algorithm is used to optimize content recommendations on a social media platform, and it is rewarded based on user engagement, it may learn to prioritize sensational or polarizing content that increases engagement but may also spread misinformation. Additionally, if the algorithm learns that certain biased content leads to higher rewards, it will continue to promote that content, further entrenching biases and misinformation among users.
Example: ChatGPT allows you to ‘thumbs up’ or ‘thumbs down’ its responses. Using patterns of user satisfaction, it may adjust the format and tone of its responses to better please the user.
Machine learning in action: ChatGPT and DALL-E
ChatGPT, for example, takes a text prompt inputted by a human and, utilizing the various datasets it has been trained with and possesses (now including the internet, as of 2023), it executes that task to the best of its ability, within the parameters set out by the prompt and its own internal rules.
Conversely, DALL-E undergoes the same process, but instead utilizes its vast library of images it has been trained on to generate an image which matches the description outline in the prompt.
How do they work? Generative AI models use neural networks to identify the patterns and structures within existing data to generate new and original content.
What is the goal of AI-generated content?
These goals/purposes can vary widely:
2. Types of Misinformation
Differentiating between misinformation, disinformation, and malinformation:
Misinformation refers to false information that is not intended to cause harm.
Disinformation refers to false information that is intended to manipulate, cause damage, and guide people, organizations, and countries in the wrong direction.
Malinformation refers to information that stems from the truth, but is often exaggerated in a way that misleads and causes potential harm.
It’s important to remember that these terms are not synonymous.
An expanded look at misinformation:
Misinformation can broadly refer to misleading data, metrics, messages, images/graphics, and information disseminated across various mediums of communication.
For example: publication bias, citation misdirection, lack of thorough peer review, conflicts of interest, etc.
Where does misinformation come from?
Who is giving out that information? To what end?
A 2017 study estimated that, in 2016, 51.9% of total web traffic was coming from bots. This number has since decreased to comprise 49.6% of all internet activity in 2023. Imperva
Where does misinformation come from?
Who is giving out that information? To what end?
Conversely, the number of “bad” bots, such as impersonators, scrapers, and hacker tools has grown from 28.9% of all internet activity in 2016 to 32% in 2023. Imperva
Social media and misinformation:
Social media is another channel through which misinformation spreads, driven by the desire to boost engagement, popularity, and ultimately, the revenue generated by their pages or accounts.
Specific misinformation techniques commonly utilized on social media include:
What are you most likely to run into as a student?
SOME EXAMPLES:
3. Challenges and Risks
Challenges in detecting AI-generated misinformation:
Technological Sophistication and Adaptability:
Volume and Speed:
Contextual and Verification Challenges:
How AI-generated content can deceive and manipulate when you’re looking for educational material:
Realism and Authenticity:
Exploiting Cognitive Biases:
Volume and Speed:
Plausibility and Subtlety:
Disguising Intent:
In short…
… AI-generated content can mislead by looking real, targeting specific individuals, exploiting biases, spreading quickly, blending truth and lies, hiding malicious intent, targeting less informed individuals, and undermining trust in institutions.
For example, previous iterations of ChatGPT were not able to search the internet, but now it can, and can even provide sources, which has significantly improved its accuracy compared to the past.
That being said, it’s not foolproof, since not everything on the internet is factual. It may give you sources from an author that usually writes on the topic you are researching, but the title, page numbers, and date might be completely fake.
But what about the user’s bias?
How can your own confirmation bias affect how you interpret information?
Are you more likely to believe something you read if you already had a hunch that it were true?
Is there a threshold at which AI-generated content is so believable that you can’t tell?
If you’re seeing the same thing over and over, are you more likely to believe it, without fact-checking or looking for more legitimate sources of that same information?
How can the user perpetuate bias? Even if you have a model that was trained on unbiased data, the user can inadvertently perpetuate bias in the model through their interactions:
How are AI companies addressing bias and misinformation?
While AI tools have the potential to perpetuate bias, leading AI development companies have been demonstrating a greater commitment to ensuring their systems are safe and fair.
OpenAI Red-Teaming
Perplexity AI
More safety efforts:
Anthropic’s Claude
Image-generating platforms such as DALL-E and Midjourney
Why do we want to make sure students can identify AI-generated information? Detecting AI content helps to uphold standards of originality and validity, making sure that the contributions we rely on are reflective of human effort and intellect.
4. Strategies for Identifying Misinformation
Techniques for critically evaluating AI-generated content:
More techniques:
Using AI-identifying software (ex. ChatZero, ZeroGPT) or the AI models themselves to verify authenticity.
Source verification: How do we verify sources? What do we look to?
More techniques:
Cross reference with reputable sources:
Check for inconsistencies and logical fallacies!
Practical steps for identifying misinformation online:
Use fact-checking websites:
Look for information on the author (biography, other publications):
Dig deeper into the source:
Practical steps for identifying AI-generated images:
AI images often contain tell-tale signs:
How many common AI errors can you spot in this image?
Practical steps for identifying AI-generated text:
AI writing also has tell-tale signifiers:
Practical steps for identifying AI-generated text:
If you copy text in blocks from an AI generator directly, the pasted text will be formatted in markdown, which uses special characters to format the text within the generator, but doesn’t translate to other applications:
# Header 1
Normal text.
## Header 2
** bold text **
* italicized text *
- bullet point 1
- bullet point 2
Conclusion
By now, you should be able to:
5. Group Exercise
Example articles: Only for the instructor to see. Remove for class presentation.
See the two practice articles in the “Articles” doc or find your own article + create a fake one with AI.