AI Red-teaming Tool Catalog

	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q
1		Last Updated:	August 25, 2025	Maintained by: Evelyn Yee
2	Tool Name	Developer	Link	Useful For	Structure	Target Scope	Eval scope	Risk scope	Interface	Model Connector	Multi-turn support	AI Involvement	Automation?	Data Source	Inputs	Outputs	Predefined Components

3	Adversarial Robustness Toolbox (ART)	IBM / LF AI & Data	https://github.com/Trusted-AI/adversarial-robustness-toolbox	General ML systems, not just language generation. Implementations for a lot of red-team and blue-team components in adversarial ML context.	framework	LLMs, image, audio/timeseries, video, tabular	general eval	general	python library	custom code	no	adversary, facillitator, scorer		user	code, data	supports custom logging	A bunch of attacks, defense mechanisms, metrics
4	AgentDojo	Academic (ETH Zurich)	https://github.com/ethz-spylab/agentdojo	toolkit for creating benchmarking pipelines, especially around prompt injection. Also features tool use. Some support for custom processes, attacks, etc. (e.g. multi-turn) but limited documentation. Integrates well with Invariant Labs tools for guardrails and logging	framework	LLMs, tool use	red-teaming	prompt injection	python library, CLI	api, custom code	AI adversary	adversary, scorer		user, predefined	code, data	eval trace log: to CLI/buffer and files, returns python object of eval trace.	4 "task suites" themed around applications (travel, worspace, banking, slack), a default eval pipeline for API models.
5	AI Explainability 360 (AIX360)	IBM / LF AI & Data	https://github.com/Trusted-AI/AIX360	explainability/interpretability methods for ML systems (predictive only?)	toolkit	tabular, audio/timeseries, image	general eval	general	python library	custom code	no	none		predefined, user	data, code	explanations of data/model behavior	lots of algorithms, datasets, metrics, and other pipelining components
6	AI Fairness 360 (AIF360)	IBM / LF AI & Data	https://github.com/Trusted-AI/AIF360	fairness/bias evals (and debiasing methods) for ML systems (predictive only?). More documentation focus on mitigation methods than eval though	toolkit	tabular, audio/timeseries, image	general eval	predictive unfairness/bias	python library	custom code	no	none		predefined, user	code, data	numerical metrics for fairness/bias/explainability (python objects)	lots of algorithms, datasets, metrics, and other pipelining components. Also de-biasing methods
7	aiapwn	Karim Habush	https://github.com/karimhabush/aiapwn	similar to promptmap2. Better UI and has implementation for AI adversary and scorer	tool	LLMs	red-teaming	prompt injection	CLI, python library	api, custom code	no	adversary, scorer		user, predefined	data, code	visual output, log	some predefined prompt injection attacks, eval harness
8	AnyCoder	HuggingFace	https://huggingface.co/spaces/akhaliq/anychat	web interface to interact with some HuggingFace models for coding tasks (with web search). For small-scale, manual prompt testing.	tool	LLMs, RAG	general eval	general	web	api	Human adversary	none		user	data, selection	generated code (also a preview for HTML)	n/a
9	Captum	PyTorch (Meta)	https://captum.ai/	interpretability methods for ML systems (predictive and generative)	toolkit	LLMs, image, tabular	general eval	general	python library	custom code, api	no	none		user	data, code	attribution for output, over components	16 input attribution methods, 4(?) of which are applicable to API LLMS. For white-box models, has layer/neuron attribution, concept methods, and data attribution
10	CleverHans	Google Brain	https://github.com/cleverhans-lab/cleverhans	automated red-teaming for ML systems (predictive only?)	tool	tabular, audio/timeseries, image, LLMs	red-teaming	adversarial inputs (classification?)	python library	custom code	no	adversary		user	data, code	attack success metrics	8 methods for generating adversarial examples
11	FuzzyAI	CyberArk	https://github.com/cyberark/FuzzyAI	jailbreak/fuzzing testing on small query sets. Easy running of premade prompting attacks, including ones with AI adversaries. Relatively intuitive implementation and good documentation for creating custom attacks, scorers, and model endpoints. GUI is extremely similar to CLI in functionality	tool	LLMs	red-teaming	general (framing is about fuzzing)	python library, CLI, GUI	api, custom code	AI adversary	scorer, adversary, facillitator		user, predefined	code, selection, data	visual output (tracking progress, table of prompts, responses, scores), python object	23 attacks/mutators (wrap evaluation calls), 10 attack success classifiers, 8 query datasets (2 benign, 6 harmful)
12	Garak	NVIDIA	https://github.com/NVIDIA/garak	good pre-defined probes and detectors. most probes are fixed, but there is one adaptive. not as good for custom evals or more advanced interaction styles	toolkit	LLMs	red-teaming	LLM unintended/ unsafe generation	CLI, python library	api, custom code	no	scorer, adversary		predefined	selection	visual output, query log, "hit log", debug log	30ish "probes" (datasets/interaction systems), including an auto red-team fine-tuned model ("art"), 30ish "detectors"
13	Giskard	Giskard	https://github.com/Giskard-AI/giskard?tab=readme-ov-file	automatic eval of a variety of LLM qualities, RAG eval (+ RAG eval dataset generation), some evals for vision and tabular non-LLMs	toolkit	LLMs, RAG, image, tabular	red-teaming	general, RAG	python library	api, custom code	no	scorer, adversary		user	selection, data	python object (has converters to file and logging formats)	8 "detectors" for generative LLM scanning (e.g. prompt injection, sycophancy, privacy), 8 detectors for ML/NLP scanning, toolkit for testset generation against RAG systems
14	HouYi	Academic (Nanyang Technological University)	https://github.com/LLMSecurity/HouYi?tab=readme-ov-file	Structure for automatic prompt injection engineering. Requires defining specific task for use.	tool	LLMs	red-teaming	prompt injection	python library	custom code	no	adversary		user	code, data	log: queries (status/progress), scoring, most successful injection prompt	demo for prompt injection optimization (eval harness and prompts). OpenAI api setup.
15	Inspect Framework	UK AISI	https://inspect.aisi.org.uk/	linear eval over predefined dataset. Good readable logs, best for binary scorers. Multi-turn is meant for agentic evals, so it looks a little clunky for red-teaming	framework	LLMs, image, audio/timeseries, video, tool use, RAG	general eval	general	python library, CLI	api, custom code	AI adversary, Human adversary	scorer		user	data, code	visual output, query log?	n/a
16	Learning Interpretability Tool (LIT)	Google PAIR	https://github.com/PAIR-code/lit	data/model analysis and interpretability methods for ML systems (predictive and generative). Also visualization tools and nice GUI	framework	LLMs, image, tabular	general eval	general	GUI, python library	api, custom code	no	none		user	data, code, selection	attribution for output, over components. visualizations of data and model.	framework: model, dataset, interpreter (salience, metrics, visualization), generators (make new (adversarial?) inputs). Not sure how many of each of these there are.
17	LLMBUS	Evren Yalcin	https://github.com/evrenyal/llmbus	web interface for small-scale, manual prompt design with a tokenizer and variety of known jailbreak/attack methods.	tool	LLMs	non-eval (assistive)	general	web	api	no	none		user	selection	A transformed/formatted prompt, in plaintext, image, or audio	prompt "transformation" (attack) templates, tokenizers for a few frontier models
18	NeMO Guardrails	NVIDIA	https://developer.nvidia.com/nemo-guardrails	guardrail implementation	tool	LLMs, tool use	non-eval (assistive)	general	python library, CLI	api	no	facillitator		user	data, selection	guardrail assessment (prompt and response), guarded response	load datasets from hf
19	Parseltongue	Pliny the prompter	https://github.com/BASI-LABS/parseltongue	For prompt design: text conversion (encoding), tokenization.	tool	LLMs	non-eval (assistive)	general	web	none	no	none		user	data	text conversion (encoding), tokenization	text conversion (encoding), tokenization
20	Project Moonshot	AI Verify Foundation	https://github.com/aiverify-foundation/moonshot	good GUI for organizing and managing eval runs (separates benchmarks from redteaming). Allows manual redteaming. Good documentation.	toolkit	LLMs	general eval	general	GUI, CLI, python library	api, custom code	AI adversary, Human adversary	adversary, scorer		user, predefined	data, selection	benchmark: report	benchmark "cookbooks" (dataset, scorer)
21	Promptfoo	Promptfoo	https://github.com/promptfoo/promptfoo	Benchmarking (vulnerability scanner) and red-teaming (AI-generated adversarial prompts) against LLM systems. Some limited support for multimodal image-text systems. Good visual reports and integration into eval/dev pipelines. Compared to other GUI tools, requires a bit of technical interaction (e.g. config file, launching app from commandline), but really good documentation. Also some good beginner-level reference guides to red-teaming and strategies.	framework	LLMs, RAG, image, tool use	red-teaming	general	CLI, GUI, python library	api, custom code	Human adversary	scorer, adversary		user, predefined	selection, data, code	GUI: log (table of conversations and outputs, scores), visual report of attack success rate (graphs, risk categories)	taxonomy of LLM vulnerabilities and 50+ pre-built "plugins" to test them. Reference materials/guides about red-teaming strategies.
22	Promptmap	Utku Sen	https://github.com/utkusen/promptmap	automated harness for running user-defined prompt injection attacks.	tool	LLMs	red-teaming	prompt injection	python library, CLI	api	no	none		user, predefined	data, code	test log: success rate, successful attack instances (response and score explanation)	some predefined prompt injection attacks, eval harness
23	PyRIT	Microsoft	https://github.com/Azure/PyRIT	programmatic interventions over fixed dataset. Multi-turn interactive RT by adversary AI	framework	LLMs, image, audio/timeseries, video	red-teaming	general	python library, CLI	api, custom code	AI adversary	adversary, scorer		predefined, user	data, code	visual output, query log? scorer metrics?	20 harmful/illicit content datasets, 55ish attack "prompt converters", 6 attack "orchestrators"
24	The Big Prompt Library	Elias Bachaalany	https://github.com/0xeb/TheBigPromptLibrary	reference resource containing system prompts, jailbreaks, etc. for most major hosted models	tool	LLMs	non-eval (assistive)	general	web	none	no	none		predefined	selection	n/a	system prompts for most major hosted models, and prompts for custom instructions, jailbreaks, security, and tool use for many models, especialy GPT-4 series.
25	TokenBuster	Sentry	https://tokenbuster.sentry.security/	For prompt design: tokenization and plaintext prompt formatting (e.g. chat history -> string). Supports many open models and a few of the GPT's.	tool	LLMs, tool use	non-eval (assistive)	general	web	api, none	Human adversary	none		user	data	token ID list and rendered plaintext prompt	chat templates/ rendering for 206 models, mostly open models.
26	ZetaLib	ZetaLib	https://github.com/Exocija/ZetaLib/tree/main	reference resource containing prompt lists and instructional guides to guardrails, jailbreaks, and frontier model system prompts	tool	LLMs	non-eval (assistive)	general	web	none	Human adversary	none		predefined	selection	n/a	20+ jailbreak prompts and guides to using them. also 2 prompts to implement simple guardrails
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100