1 | ||||||||
---|---|---|---|---|---|---|---|---|
2 | Evaluation Title | Evaluation Methodology | Risk Area | Modality (eval in) | Modality (eval out) | Modality (model) | Context / Evaluation Layer | Added before |
3 | ChatGPT vs. Google: A Comparative Study of Search Performance and User Experience | User research / Behavioural experiments | Human Autonomy & Integrity | Text | Text | Text | Human Interaction | Oct 10, 2023 |
4 | ChatGPT: The cognitive effects on learning and memory | Other | Human Autonomy & Integrity | Text | Text | Text | Human Interaction | Oct 10, 2023 |
5 | Co-Writing with Opinionated Language Models Affects Users’ Views Jakesch 2023 | User research / Behavioural experiments | Human Autonomy & Integrity | Text | Text | Text | Human Interaction | Oct 10, 2023 |
6 | Discovering LM Behaviors with Model-Written Evals: sycophancy, persona | Benchmarking | Human Autonomy & Integrity | Text | Text | Text | Capability | Oct 10, 2023 |
7 | User research / Behavioural experiments | Human Autonomy & Integrity | Text | Text | Text | Human Interaction | Oct 10, 2023 | |
8 | Benchmarking | Human Autonomy & Integrity | Text | Text | Text | Capability | Oct 10, 2023 | |
9 | Benchmarking | Human Autonomy & Integrity | Text | Text | Text | Capability | Oct 10, 2023 | |
10 | Sparrow adversarial dialogue: self-anthropomorphism | Adversarial testing (human or automated) | Human Autonomy & Integrity | Text | Text | Text | Capability | Oct 10, 2023 |
11 | User research / Behavioural experiments | Human Autonomy & Integrity | Text | Text | Text | Human Interaction | Oct 10, 2023 | |
12 | A Dialogic Analysis of Compliment Strategies Employed by Replika Chatbot | User research / Behavioural experiments | Human Autonomy & Integrity | Text | Text | Text | Human Interaction | Dec 15, 2023 |
13 | Putting ChatGPT’s Medical Advice to the (Turing) Test: Survey Study | User research / Behavioural experiments | Human Autonomy & Integrity | Text | Text | Text | Human Interaction | Dec 15, 2023 |
14 | User research / Behavioural experiments | Human Autonomy & Integrity | Text | Text | Text | Human Interaction | Dec 15, 2023 | |
15 | Evaluating the Moral Beliefs Encoded in LLMs | Benchmarking | Human Autonomy & Integrity | Text | Text | Text | Capability | Dec 15, 2023 |
16 | Evaluating Shutdown Avoidance of Language Models in Textual Scenarios | User research / Behavioural experiments | Human Autonomy & Integrity | Text | Text | Text | Capability | Dec 15, 2023 |
17 | User research / Behavioural experiments | Human Autonomy & Integrity | | | Text | Human Interaction | Dec 15, 2023 | |
18 | Pilots / Monitoring / Impact assessments | Human Autonomy & Integrity | | | Text | Systemic Impact | Dec 15, 2023 | |
19 | Friend, mentor, lover: does chatbot engagement lead to psychological dependence? | User research / Behavioural experiments | Human Autonomy & Integrity | | | Text | Human Interaction | Dec 15, 2023 |
20 | User research / Behavioural experiments | Human Autonomy & Integrity | | | Text | Human Interaction | Dec 15, 2023 | |
21 | User research / Behavioural experiments | Human Autonomy & Integrity | | | Text | Human Interaction | Dec 15, 2023 | |
22 | Evaluating Language-Model Agents on Realistic Autonomous Tasks | Benchmarking | Human Autonomy & Integrity | | | Multimodal | Capability | Dec 15, 2023 |
23 | MusicLM paper: audio memorization | Benchmarking | Information & Safety Harms | Audio | Audio | Audio | Capability | Oct 10, 2023 |
24 | Voicebox, synthetic audio binary classifier (Facebook internal) | Benchmarking | Information & Safety Harms | Audio | Classification | Audio | Capability | Oct 10, 2023 |
25 | Adversarial testing (human or automated) | Information & Safety Harms | Text | Image | Image | Capability | Oct 10, 2023 | |
26 | DALL-E 3 System card: scientific knowledge, CBRN, copyright | Adversarial testing (human or automated) | Information & Safety Harms | Text | Image | Image | Capability | Oct 10, 2023 |
27 | Extracting training data from diffusion models, Carlini 2023 | Benchmarking | Information & Safety Harms | | Image | Image | Capability | Oct 10, 2023 |
28 | Extracting training data from diffusion models, Carlini 2023 | Benchmarking | Information & Safety Harms | Text | Image | Image | Capability | Oct 10, 2023 |
29 | Adversarial testing (human or automated) | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 | |
30 | ARB: Advanced Reasoning Benchmark | Benchmarking | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 |
31 | Human annotation | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 | |
32 | BIOASQ-QA | Benchmarking | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 |
33 | BioLAMA | Benchmarking | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 |
34 | Can large language models democratize access to dual-use biotechnology? | User research / Behavioural experiments | Information & Safety Harms | Text | Text | Text | Human Interaction | Oct 10, 2023 |
35 | CelebA - for person identification task in GPT-4V system card | Benchmarking | Information & Safety Harms | Text+Image | Text | Multimodal | Capability | Oct 10, 2023 |
36 | Celebrity Faces in the Wild (CFPW) Sengupta, 2016 - for person identification task in GPT-4V system card | Benchmarking | Information & Safety Harms | Text+Image | Text | Multimodal | Capability | Oct 10, 2023 |
37 | Discovering LM Behaviors with Model-Written Evals: adv AI risk | Benchmarking | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 |
38 | Human annotation | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 | |
39 | Extracting Training Data from Large Language Model, Carlini 2021 | Benchmarking | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 |
40 | Frontier Threats Red Teaming for AI Safety | Adversarial testing (human or automated) | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 |
41 | GeoLocation - Geoguessr Images (50K), used in GPT-4V report | Benchmarking | Information & Safety Harms | Image | Text | Multimodal | Capability | Oct 10, 2023 |
42 | GPT-4V System card: science knowledge, medical advice | Adversarial testing (human or automated) | Information & Safety Harms | Text+Image | Text | Multimodal | Capability | Oct 10, 2023 |
43 | LLMs in biomedical natural language processing | Other | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 |
44 | Benchmarking | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 | |
45 | Multi-step Jailbreaking Privacy Attacks on ChatGPT | Adversarial testing (human or automated) | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 |
46 | PaLM2 tech report: Rare sequence memorization ie canaries, method from Carlini 2019 | Benchmarking | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 |
47 | ProPILE: Probing Privacy Leakage in Large Language Models | Benchmarking | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 |
48 | PubMedQA | Benchmarking | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 |
49 | Quantifying Memorization Across Neural Language Models, used in PaLM2 tech report | Benchmarking | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 |
50 | SCIBENCH | Benchmarking | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 |
51 | The LLM Safety Review by Active Fence - suicide & self harm | Adversarial testing (human or automated) | Information & Safety Harms | Text | Text | Text | Capability | Oct 10, 2023 |
52 | US Congress Member facial data Schwemmer 2020 - for person identification task | Benchmarking | Information & Safety Harms | Text+Image | Text | Multimodal | Capability | Oct 10, 2023 |
53 | VIGOR: cross-View Image Geo-localization beyond One-to-one Retrieval | Benchmarking | Information & Safety Harms | Text+Image | Text | Multimodal | Capability | Oct 10, 2023 |
54 | Benchmarking | Information & Safety Harms | Text | Classification | Text | Capability | Dec 15, 2023 | |
55 | Testing Language Model Agents Safely in the Wild | Benchmarking | Information & Safety Harms | Text | Text | Text | Capability | Dec 15, 2023 |
56 | Scalable Extraction of Training Data from (Production) Language Models | Benchmarking | Information & Safety Harms | Text | Text | Text | Capability | Dec 15, 2023 |
57 | Evaluating Language-Model Agents on Realistic Autonomous Tasks | Benchmarking | Information & Safety Harms | | | Multimodal | Capability | Dec 15, 2023 |
58 | Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks | Adversarial testing (human or automated) | Malicious Use | Text | Image | Image | Capability | Oct 10, 2023 |
59 | Red-Teaming the Stable Diffusion Safety Filter | Adversarial testing (human or automated) | Malicious Use | Text | Image | Image | Capability | Oct 10, 2023 |
60 | Benchmarking | Malicious Use | Other | Image | Image | Capability | Oct 10, 2023 | |
61 | Automated Jailbreak Across Multiple Large Language Model Chatbots | Adversarial testing (human or automated) | Malicious Use | Text | Text | Text | Capability | Oct 10, 2023 |
62 | Beyond the Safeguards: Exploring the Security Risks of ChatGPT | Adversarial testing (human or automated) | Malicious Use | Text | Text | Text | Capability | Oct 10, 2023 |
63 | Benchmarking | Malicious Use | Text | Text | Text | Capability | Oct 10, 2023 | |
64 | Generating Phishing Attacks using ChatGPT | Adversarial testing (human or automated) | Malicious Use | Text | Text | Text | Capability | Oct 10, 2023 |
65 | GPT-4V System card: jailbreaking, CAPTCHA breaking | Benchmarking | Malicious Use | Text+Image | Text | Multimodal | Capability | Oct 10, 2023 |
66 | Jailbroken: How Does LLM Safety Training Fail?, Wei 2023 | Adversarial testing (human or automated) | Malicious Use | Text | Text | Text | Capability | Oct 10, 2023 |
67 | User research / Behavioural experiments | Malicious Use | Text | Text | Text | Human Interaction | Oct 10, 2023 | |
68 | Red teaming ChatGPT via jailbreaking, Zhuo 2023 | Adversarial testing (human or automated) | Malicious Use | Text | Text | Text | Capability | Oct 10, 2023 |
69 | User research / Behavioural experiments | Malicious Use | Text | Text | Text | Human Interaction | Oct 10, 2023 | |
70 | Low-Resource Languages Jailbreak GPT-4 | Adversarial testing (human or automated) | Malicious Use | Text | Text | Text | Capability | Dec 15, 2023 |
71 | Testing Language Model Agents Safely in the Wild | Benchmarking | Malicious Use | Text | Text | Text | Capability | Dec 15, 2023 |
72 | Fake-or-Real | Benchmarking | Misinformation | Audio | Classification | Audio | Capability | Oct 10, 2023 |
73 | HPBench | Benchmarking | Misinformation | Image | Classification | Image | Capability | Oct 10, 2023 |
74 | Towards the Detection of Diffusion Model Deepfakes | Benchmarking | Misinformation | Image | Classification | Image | Capability | Oct 10, 2023 |
75 | Factify | Benchmarking | Misinformation | Text+Image | Classification | Multimodal | Capability | Oct 10, 2023 |
76 | Fakeeddit | Benchmarking | Misinformation | Text+Image | Classification | Multimodal | Capability | Oct 10, 2023 |
77 | Benchmarking | Misinformation | Text+Image | Classification | Multimodal | Capability | Oct 10, 2023 | |
78 | COVID-Social, COVID-Scientific | Benchmarking | Misinformation | Text | Classification | Text | Capability | Oct 10, 2023 |
79 | Yoder 2022 aggregated hate speech dataset | Benchmarking | Misinformation | Text | Classification | Text | Capability | Oct 10, 2023 |
80 | AI-enabled image fraud in scientific publications - ScienceDirect | User research / Behavioural experiments | Misinformation | Text+Image | Image | Image | Human Interaction | Oct 10, 2023 |
81 | DALL-E 3 System card: fictitious events, public figures, official documents | Adversarial testing (human or automated) | Misinformation | Text | Image | Image | Capability | Oct 10, 2023 |
82 | Benchmarking | Misinformation | Text | Image | Image | Systemic Impact | Oct 10, 2023 | |
83 | Generating Election Mis/Disinfo - Logically report | Adversarial testing (human or automated) | Misinformation | Text | Image | Image | Capability | Oct 10, 2023 |
84 | Multitask, Multilingual, Multimodal: covid misinformation | Benchmarking | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
85 | AI model GPT-3 (dis)informs us better than humans | Science Advances | User research / Behavioural experiments | Misinformation | Text | Text | Text | Human Interaction | Oct 10, 2023 |
86 | Arthur - Hallucination Experiment | Benchmarking | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
87 | Attributable to Identified Sources (AIS) | Human annotation | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
88 | Disinformation (wedging, reiteration) (used in HELM) | Benchmarking | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
89 | Benchmarking | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 | |
90 | FactualityPrompts | Benchmarking | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
91 | FaithDial | Benchmarking | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
92 | FRANK - Understanding Factuality in Abstractive Summarization | Benchmarking | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
93 | GPT-4V System card: disinformation | Adversarial testing (human or automated) | Misinformation | Text+Image | Text | Multimodal | Capability | Oct 10, 2023 |
94 | HaDes - A Token-level Reference-free Hallucination Detection Benchmark | Benchmarking | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
95 | HaluEval | Benchmarking | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
96 | Head-to-Tail: How Knowledgeable are Large Language Models | Benchmarking | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
97 | Human annotation | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 | |
98 | Pilots / Monitoring / Impact assessments | Misinformation | Text | Text | Text | Systemic Impact | Oct 10, 2023 | |
99 | News from Generative Artificial Intelligence Is Believed Less | User research / Behavioural experiments | Misinformation | Text | Text | Text | Human Interaction | Oct 10, 2023 |
100 | News-FACTOR | Benchmarking | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
101 | On Faithfulness and Factuality in Abstractive Summarization | Human annotation | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
102 | On the Risk of Misinformation Pollution with Large Language Models | Forecasts / Simulations | Misinformation | Text | Text | Text | Systemic Impact | Oct 10, 2023 |
103 | PolitiFact-based LIAR | Benchmarking | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
104 | RARR: Researching and Revising What Language Models Say, Using Language Models | Benchmarking | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
105 | Red-Teaming Finds OpenAI’s ChatGPT and Google’s Bard Still Spread Misinformation | Adversarial testing (human or automated) | Misinformation | Text | Text | Text | Human Interaction | Oct 10, 2023 |
106 | Silence of LLMs political bias and false info, Urman 2023 | Human annotation | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
107 | Sparrow adversarial dialogue: misinformation | Adversarial testing (human or automated) | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
108 | Survey of Hallucination in Natural Language Generation | Other | Misinformation | Text+Image | Text | Multimodal | Capability | Oct 10, 2023 |
109 | The LLM Safety Review by Active Fence - misinformation | Adversarial testing (human or automated) | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
110 | TruthfulQA | Benchmarking | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
111 | Use of Artificial Intelligence Chatbots for Cancer Treatment Information | Human annotation | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
112 | Wiki-FACTOR | Benchmarking | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
113 | WikiFact | Benchmarking | Misinformation | Text | Text | Text | Capability | Oct 10, 2023 |
114 | Working with AI to persuade | User research / Behavioural experiments | Misinformation | Text | Text | Text | Human Interaction | Oct 10, 2023 |
115 | Benchmarking | Misinformation | Text | Classification | Text | Capability | Dec 15, 2023 | |
116 | The Ethics of AI-Generated Maps: A Study of DALLE 2 and Implications for Cartography | Benchmarking | Misinformation | Text | Image | Image | Capability | Dec 15, 2023 |
117 | Pilots / Monitoring / Impact assessments | Misinformation | Text | Image | Image | Systemic Impact | Dec 15, 2023 | |
118 | Pilots / Monitoring / Impact assessments | Misinformation | Text | Text | Text | Systemic Impact | Dec 15, 2023 | |
119 | Benchmarking | Misinformation | Text | Text | Text | Capability | Dec 15, 2023 | |
120 | Human annotation | Misinformation | Text | Text | Text | Capability | Dec 15, 2023 | |
121 | Human annotation | Misinformation | Text | Text | Text | Capability | Dec 15, 2023 | |
122 | Towards ethical multimodal systems | Benchmarking | Multiple/ Other | Text+Image | Classification | Multimodal | Capability | Oct 10, 2023 |
123 | Red Teaming dataset Ganguli 2022 | Adversarial testing (human or automated) | Multiple/ Other | Text | Text | Text | Capability | Oct 10, 2023 |
124 | ARC dangerous capabilities red teaming | Adversarial testing (human or automated) | Multiple/ Other | Text | Text | Text | Capability | Oct 10, 2023 |
125 | Claude2: harmfulness on held out prompts (3.4) | Benchmarking | Multiple/ Other | Text | Text | Text | Capability | Oct 10, 2023 |
126 | Ethical Dilemma - 25 questions | Human annotation | Multiple/ Other | Text | Text | Text | Capability | Oct 10, 2023 |
127 | ETHICS: Aligning AI With Shared Human Values, Hendrycks 2021 | Benchmarking | Multiple/ Other | Text | Text | Text | Capability | Oct 10, 2023 |
128 | GPT-4 expert red teaming: disallowed prompts | Adversarial testing (human or automated) | Multiple/ Other | Text | Text | Text | Human Interaction | Oct 10, 2023 |
129 | GPT-4 expert red teaming: sensitive prompts | Adversarial testing (human or automated) | Multiple/ Other | Text | Text | Text | Capability | Oct 10, 2023 |
130 | Towards ethical multimodal systems | Benchmarking | Multiple/ Other | Text+Image | Text | Multimodal | Capability | Oct 10, 2023 |
131 | XSTest: eXaggerated Safety behaviours test | Benchmarking | Multiple/ Other | Text | Text | Text | Capability | Oct 10, 2023 |
132 | Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations | Benchmarking | Multiple/ Other | Text | Classification | Text | Capability | Dec 15, 2023 |
133 | A Benchmark for Understanding Dialogue Safety in Mental Health Support | Benchmarking | Multiple/ Other | Text | Text | Text | Human Interaction | Dec 15, 2023 |
134 | Benchmarking | Multiple/ Other | Text | Text | Text | Capability | Dec 15, 2023 | |
135 | MMMU | Benchmarking | Multiple/ Other | Text+Image | Text | Multimodal | Capability | Dec 15, 2023 |
136 | GPQA: A Graduate-Level Google-Proof Q&A Benchmark | Benchmarking | Multiple/ Other | Text | Text | Text | Capability | Dec 15, 2023 |
137 | Benchmarking | Multiple/ Other | Text | Text | Text | Human Interaction | Dec 15, 2023 | |
138 | The Social Impact of Generative AI: An Analysis on ChatGPT | Pilots / Monitoring / Impact assessments | Multiple/ Other | | | Text | Systemic Impact | Dec 15, 2023 |
139 | CivilComments dataset, toxicity classification task | Benchmarking | Representation & Toxicity | Text | Classification | Text | Capability | Oct 10, 2023 |
140 | HateBERT: Retraining BERT for Abusive Language Detection in English | Benchmarking | Representation & Toxicity | Text | Classification | Text | Capability | Oct 10, 2023 |
141 | Hateful Memes | Benchmarking | Representation & Toxicity | Text+Image | Classification | Multimodal | Capability | Oct 10, 2023 |
142 | Jigsaw Multilingual, toxicity classification task, used in PaLM2 tech report | Benchmarking | Representation & Toxicity | Text | Classification | Text | Capability | Oct 10, 2023 |
143 | Analyzing Bias in Diffusion-based Face Generation Models | Benchmarking | Representation & Toxicity | | Image | Image | Capability | Oct 10, 2023 |
144 | Challenging Cross-Cultural (C3) benchmark | Benchmarking | Representation & Toxicity | Text | Image | Image | Capability | Oct 10, 2023 |
145 | Conceptual Coverage Across Languages (CoCo-CroLa) | Benchmarking | Representation & Toxicity | Text | Image | Image | Capability | Oct 10, 2023 |
146 | DALL-E 3 System card: biases, objectification, racy images, body type | Adversarial testing (human or automated) | Representation & Toxicity | Text | Image | Image | Capability | Oct 10, 2023 |
147 | DALLE-2 system card: red teaming - single text prompt - bias & representation | Benchmarking | Representation & Toxicity | Text | Image | Image | Capability | Oct 10, 2023 |
148 | Human annotation | Representation & Toxicity | Text | Image | Image | Capability | Oct 10, 2023 | |
149 | Benchmarking | Representation & Toxicity | Text | Image | Image | Capability | Oct 10, 2023 | |
150 | Multimodal Composite Association Score (MCAS) | Benchmarking | Representation & Toxicity | Text | Image | Image | Capability | Oct 10, 2023 |
151 | Sexual objectification bias in t2i, Wolfe 2022 | Benchmarking | Representation & Toxicity | Text | Image | Multimodal | Capability | Oct 10, 2023 |
152 | SneakyPrompt | Adversarial testing (human or automated) | Representation & Toxicity | Text | Image | Image | Capability | Oct 10, 2023 |
153 | Stable Bias: Analyzing Societal Representations in Diffusion Models | Benchmarking | Representation & Toxicity | Text | Image | Image | Capability | Oct 10, 2023 |
154 | Stereotypes and Smut: Misrepresentation of Non-cis identities | Human annotation | Representation & Toxicity | Text | Image | Image | Capability | Oct 10, 2023 |
155 | t2i South Asian representation | User research / Behavioural experiments | Representation & Toxicity | Text | Image | Image | Capability | Oct 10, 2023 |
156 | Unsafe Diffusion: Unsafe Images, Hateful Memes Qu 2023 | Benchmarking | Representation & Toxicity | Other | Image | Image | Capability | Oct 10, 2023 |
157 | DALL-Eval | Benchmarking | Representation & Toxicity | Text | Image | Image | Capability | Oct 10, 2023 |
158 | Sexual objectification bias in t2i, Wolfe 2022 | Benchmarking | Representation & Toxicity | Text+Image | Other | Multimodal | Capability | Oct 10, 2023 |
159 | Sexual objectification bias in t2i, Wolfe 2022 | Benchmarking | Representation & Toxicity | Image | Text | Multimodal | Capability | Oct 10, 2023 |
160 | Understanding and Evaluating Racial Biases in Image Captioning | Benchmarking | Representation & Toxicity | Image | Text | Multimodal | Capability | Oct 10, 2023 |
161 | Are aligned neural networks adversarially aligned? Carlini 2023 | Benchmarking | Representation & Toxicity | Text+Image | Text | Multimodal | Capability | Oct 10, 2023 |
162 | Artie Bias Corpus | Benchmarking | Representation & Toxicity | Audio | Text | Multimodal | Capability | Oct 10, 2023 |
163 | Controlling for Stereotypes in Multimodal LM Eval | Benchmarking | Representation & Toxicity | Text+Image | Text | Multimodal | Capability | Oct 10, 2023 |
164 | Racial disparities in automated speech recognition | Benchmarking | Representation & Toxicity | Audio | Text | Multimodal | Capability | Oct 10, 2023 |
165 | Assessing Cross-Cultural Alignment between ChatGPT and Human Societies, Cao 2023 | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
166 | BBQ: Bias Benchmark for Question Answering | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
167 | BOLD, used in HELM | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
168 | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 | |
169 | CHBias: Bias Evaluation and Mitigation of Chinese Conversational Language Models | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
170 | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 | |
171 | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 | |
172 | Disability-centered perspectives on LLMs | User research / Behavioural experiments | Representation & Toxicity | Text | Text | Text | Human Interaction | Oct 10, 2023 |
173 | Discovering LM Behaviors with Model-Written Evals: gender bias - Winogenerated | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
174 | Evaluating Biased Attitude Associations of Language Models in an Intersectional Context | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
175 | Other | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 | |
176 | FairBench | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
177 | FairFace - for bias in demographic recognition task, GPT-4V system card | Benchmarking | Representation & Toxicity | Text+Image | Text | Multimodal | Capability | Oct 10, 2023 |
178 | FLoRes-200 - No Language Left Behind: Scaling Human-Centered Machine Translation | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
179 | FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
180 | Generative Ai at Work, Erik Brynjolfsson | User research / Behavioural experiments | Representation & Toxicity | Text | Text | Text | Human Interaction | Oct 10, 2023 |
181 | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 | |
182 | GPT-4V System card: ungrounded inference (bias), stereotyping, hate | Adversarial testing (human or automated) | Representation & Toxicity | Text+Image | Text | Multimodal | Capability | Oct 10, 2023 |
183 | HolisticBias | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
184 | ICE: International Corpus of English (in HELM) | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
185 | IWSLT17 Arabic-English, from LM Eval Harness | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
186 | Labeled faces in the wild (LFW) - for bias in demographic recognition task, GPT-4V system card | Benchmarking | Representation & Toxicity | Image | Text | Multimodal | Capability | Oct 10, 2023 |
187 | LLAMA2 paper: Red Teaming | Adversarial testing (human or automated) | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
188 | MACHIAVELLI - Do the Rewards Justify the Means?: fairness | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
189 | MaRVL - Visually Grounded Reasoning across Languages and Cultures | Benchmarking | Representation & Toxicity | Text+Image | Text | Multimodal | Capability | Oct 10, 2023 |
190 | MMLU: world religions QA | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
191 | PaLM2 tech report - multilingual misgendering | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
192 | ParlAI Dialogue Safety | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
193 | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 | |
194 | RealToxicityPrompts | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
195 | Red Teaming using LMs Perez 2022 | Adversarial testing (human or automated) | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
196 | SafetyKit: Instigator and yea-sayer | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
197 | SeeGULL | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
198 | Sparrow adversarial dialogue: hate and harassment | Adversarial testing (human or automated) | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
199 | Sparrow adversarial dialogue: stereotypes | Adversarial testing (human or automated) | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
200 | Sparrow paper: disparate impact in QA evals | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
201 | StereoSet | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
202 | TANGO - "I'm fully who I am": Towards Centering TGNB Voices | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
203 | The LLM Safety Review by Active Fence - hate speech, child exploitation | Adversarial testing (human or automated) | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
204 | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 | |
205 | The Self-Perception and Political Biases of ChatGPT | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
206 | Toxicity in ChatGPT: Analyzing Persona-assigned Language Models, Deshpande 2023 | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
207 | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 | |
208 | Trails of Political Bias, Feng 2023 - political compass test | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
209 | TwitterAAE, used in HELM | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
210 | TyDiQA | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
211 | Wikipedia Cloze QA from IndicNLPSuite | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
212 | Winobias | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
213 | Winogender | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
214 | Winoqueer | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
215 | XL-sum | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Oct 10, 2023 |
216 | MM Datasets: misogyny, porn, and malignant stereotypes (LAION) | Other | Representation & Toxicity | | | Image | Capability | Oct 10, 2023 |
217 | On Hate Scaling Laws For Data-Swamps (LAION audit) | Other | Representation & Toxicity | | | Image | Capability | Oct 10, 2023 |
218 | Colossal Clean Crawled Corpus (C4) audit | Other | Representation & Toxicity | | | Text | Capability | Oct 10, 2023 |
219 | Frequency of Pronouns (analysis of Common Crawl) | Other | Representation & Toxicity | | | Text | Capability | Oct 10, 2023 |
220 | LLAMA2 paper: Dataset Auditing | Other | Representation & Toxicity | | | Text | Capability | Oct 10, 2023 |
221 | Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets | Other | Representation & Toxicity | | | Text | Capability | Oct 10, 2023 |
222 | Social Bias Frames | Benchmarking | Representation & Toxicity | Text | Classification | Text | Capability | Dec 15, 2023 |
223 | Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts | Benchmarking | Representation & Toxicity | Text | Classification | Text | Capability | Dec 15, 2023 |
224 | Benchmarking | Representation & Toxicity | Text | Image | Image | Capability | Dec 15, 2023 | |
225 | Inspecting the Geographical Representativeness of Images from Text-to-Image Models | Human annotation | Representation & Toxicity | Text | Image | Image | Capability | Dec 15, 2023 |
226 | Uncurated Image-Text Datasets: Shedding Light on Demographic Bias | Benchmarking | Representation & Toxicity | Text | Image | Image | Capability | Dec 15, 2023 |
227 | Uncurated Image-Text Datasets: Shedding Light on Demographic Bias | Benchmarking | Representation & Toxicity | Text+Image | Other | Multimodal | Capability | Dec 15, 2023 |
228 | Crossmodal-3600 | Benchmarking | Representation & Toxicity | Image | Text | Multimodal | Capability | Dec 15, 2023 |
229 | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Dec 15, 2023 | |
230 | Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Dec 15, 2023 |
231 | Uncurated Image-Text Datasets: Shedding Light on Demographic Bias | Benchmarking | Representation & Toxicity | Image | Text | Multimodal | Capability | Dec 15, 2023 |
232 | Quantifying Societal Bias Amplification in Image Captioning | Benchmarking | Representation & Toxicity | Image | Text | Multimodal | Capability | Dec 15, 2023 |
233 | ROBBIE | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Dec 15, 2023 |
234 | Multilingual Holistic Bias | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Dec 15, 2023 |
235 | Evaluating and Mitigating Discrimination in Language Model Decisions | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Dec 15, 2023 |
236 | Benchmarking | Representation & Toxicity | Text | Text | Text | Capability | Dec 15, 2023 | |
237 | Pilots / Monitoring / Impact assessments | Representation & Toxicity | | | Text | Systemic Impact | Dec 15, 2023 | |
238 | Benchmarking | Representation & Toxicity | Text | Image | Image | Image | Dec 15, 2023 | |
239 | Safe Latent Diffusion | Benchmarking | Representation & Toxicity | Text | Image | Image | Image | Dec 15, 2023 |
240 | Pilots / Monitoring / Impact assessments | Socioeconomic & Environmental | Text | Text | Text | Systemic Impact | Oct 10, 2023 | |
241 | User research / Behavioural experiments | Socioeconomic & Environmental | Text | Text | Text | Systemic Impact | Oct 10, 2023 | |
242 | How People Can Create—and Destroy—Value with Generative AI | User research / Behavioural experiments | Socioeconomic & Environmental | Text | Text | Text | Systemic Impact | Oct 10, 2023 |
243 | User research / Behavioural experiments | Socioeconomic & Environmental | Text | Text | Text | Systemic Impact | Oct 10, 2023 | |
244 | The Impact of AI on Developer Productivity: Evidence from GitHub Copilot | User research / Behavioural experiments | Socioeconomic & Environmental | Text | Text | Text | Human Interaction | Oct 10, 2023 |
245 | Energy Consumption of Deep Generative Audio Models Douwes 2021 | Forecasts / Simulations | Socioeconomic & Environmental | | | Audio | Capability | Oct 10, 2023 |
246 | Perceptions and Realities of Text-to-Image Generation | User research / Behavioural experiments | Socioeconomic & Environmental | | | Image | Human Interaction | Oct 10, 2023 |
247 | The economic potential of generative AI, McKinsey, 2023 | Forecasts / Simulations | Socioeconomic & Environmental | | | Multimodal | Systemic Impact | Oct 10, 2023 |
248 | Exploring the Carbon Footprint of Hugging Face's ML Models | Pilots / Monitoring / Impact assessments | Socioeconomic & Environmental | | | Multimodal | Capability | Oct 10, 2023 |
249 | Forecasts / Simulations | Socioeconomic & Environmental | | | Text | Systemic Impact | Oct 10, 2023 | |
250 | Machine Learning CO2 Impact Calculator | Benchmarking | Socioeconomic & Environmental | | | Text | Capability | Oct 10, 2023 |
251 | Pilots / Monitoring / Impact assessments | Socioeconomic & Environmental | Text | Text | Text | Human Interaction | Dec 15, 2023 | |
252 | Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model | Pilots / Monitoring / Impact assessments | Socioeconomic & Environmental | | | Text | Capability | Dec 15, 2023 |