| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA | AB | AC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Last Updated: | August 25, 2025 | Maintained by: Evelyn Yee | ||||||||||||||||||||||||||
2 | Tool Name | Developer | Link | Useful For | Structure | Target Scope | Eval scope | Risk scope | Interface | Model Connector | Multi-turn support | AI Involvement | Automation? | Data Source | Inputs | Outputs | Predefined Components | ||||||||||||
3 | Adversarial Robustness Toolbox (ART) | IBM / LF AI & Data | https://github.com/Trusted-AI/adversarial-robustness-toolbox | General ML systems, not just language generation. Implementations for a lot of red-team and blue-team components in adversarial ML context. | framework | LLMs, image, audio/timeseries, video, tabular | general eval | general | python library | custom code | no | adversary, facillitator, scorer | user | code, data | supports custom logging | A bunch of attacks, defense mechanisms, metrics | |||||||||||||
4 | AgentDojo | Academic (ETH Zurich) | https://github.com/ethz-spylab/agentdojo | toolkit for creating benchmarking pipelines, especially around prompt injection. Also features tool use. Some support for custom processes, attacks, etc. (e.g. multi-turn) but limited documentation. Integrates well with Invariant Labs tools for guardrails and logging | framework | LLMs, tool use | red-teaming | prompt injection | python library, CLI | api, custom code | AI adversary | adversary, scorer | user, predefined | code, data | eval trace log: to CLI/buffer and files, returns python object of eval trace. | 4 "task suites" themed around applications (travel, worspace, banking, slack), a default eval pipeline for API models. | |||||||||||||
5 | AI Explainability 360 (AIX360) | IBM / LF AI & Data | https://github.com/Trusted-AI/AIX360 | explainability/interpretability methods for ML systems (predictive only?) | toolkit | tabular, audio/timeseries, image | general eval | general | python library | custom code | no | none | predefined, user | data, code | explanations of data/model behavior | lots of algorithms, datasets, metrics, and other pipelining components | |||||||||||||
6 | AI Fairness 360 (AIF360) | IBM / LF AI & Data | https://github.com/Trusted-AI/AIF360 | fairness/bias evals (and debiasing methods) for ML systems (predictive only?). More documentation focus on mitigation methods than eval though | toolkit | tabular, audio/timeseries, image | general eval | predictive unfairness/bias | python library | custom code | no | none | predefined, user | code, data | numerical metrics for fairness/bias/explainability (python objects) | lots of algorithms, datasets, metrics, and other pipelining components. Also de-biasing methods | |||||||||||||
7 | aiapwn | Karim Habush | https://github.com/karimhabush/aiapwn | similar to promptmap2. Better UI and has implementation for AI adversary and scorer | tool | LLMs | red-teaming | prompt injection | CLI, python library | api, custom code | no | adversary, scorer | user, predefined | data, code | visual output, log | some predefined prompt injection attacks, eval harness | |||||||||||||
8 | AnyCoder | HuggingFace | https://huggingface.co/spaces/akhaliq/anychat | web interface to interact with some HuggingFace models for coding tasks (with web search). For small-scale, manual prompt testing. | tool | LLMs, RAG | general eval | general | web | api | Human adversary | none | user | data, selection | generated code (also a preview for HTML) | n/a | |||||||||||||
9 | Captum | PyTorch (Meta) | https://captum.ai/ | interpretability methods for ML systems (predictive and generative) | toolkit | LLMs, image, tabular | general eval | general | python library | custom code, api | no | none | user | data, code | attribution for output, over components | 16 input attribution methods, 4(?) of which are applicable to API LLMS. For white-box models, has layer/neuron attribution, concept methods, and data attribution | |||||||||||||
10 | CleverHans | Google Brain | https://github.com/cleverhans-lab/cleverhans | automated red-teaming for ML systems (predictive only?) | tool | tabular, audio/timeseries, image, LLMs | red-teaming | adversarial inputs (classification?) | python library | custom code | no | adversary | user | data, code | attack success metrics | 8 methods for generating adversarial examples | |||||||||||||
11 | FuzzyAI | CyberArk | https://github.com/cyberark/FuzzyAI | jailbreak/fuzzing testing on small query sets. Easy running of premade prompting attacks, including ones with AI adversaries. Relatively intuitive implementation and good documentation for creating custom attacks, scorers, and model endpoints. GUI is extremely similar to CLI in functionality | tool | LLMs | red-teaming | general (framing is about fuzzing) | python library, CLI, GUI | api, custom code | AI adversary | scorer, adversary, facillitator | user, predefined | code, selection, data | visual output (tracking progress, table of prompts, responses, scores), python object | 23 attacks/mutators (wrap evaluation calls), 10 attack success classifiers, 8 query datasets (2 benign, 6 harmful) | |||||||||||||
12 | Garak | NVIDIA | https://github.com/NVIDIA/garak | good pre-defined probes and detectors. most probes are fixed, but there is one adaptive. not as good for custom evals or more advanced interaction styles | toolkit | LLMs | red-teaming | LLM unintended/ unsafe generation | CLI, python library | api, custom code | no | scorer, adversary | predefined | selection | visual output, query log, "hit log", debug log | 30ish "probes" (datasets/interaction systems), including an auto red-team fine-tuned model ("art"), 30ish "detectors" | |||||||||||||
13 | Giskard | Giskard | https://github.com/Giskard-AI/giskard?tab=readme-ov-file | automatic eval of a variety of LLM qualities, RAG eval (+ RAG eval dataset generation), some evals for vision and tabular non-LLMs | toolkit | LLMs, RAG, image, tabular | red-teaming | general, RAG | python library | api, custom code | no | scorer, adversary | user | selection, data | python object (has converters to file and logging formats) | 8 "detectors" for generative LLM scanning (e.g. prompt injection, sycophancy, privacy), 8 detectors for ML/NLP scanning, toolkit for testset generation against RAG systems | |||||||||||||
14 | HouYi | Academic (Nanyang Technological University) | https://github.com/LLMSecurity/HouYi?tab=readme-ov-file | Structure for automatic prompt injection engineering. Requires defining specific task for use. | tool | LLMs | red-teaming | prompt injection | python library | custom code | no | adversary | user | code, data | log: queries (status/progress), scoring, most successful injection prompt | demo for prompt injection optimization (eval harness and prompts). OpenAI api setup. | |||||||||||||
15 | Inspect Framework | UK AISI | https://inspect.aisi.org.uk/ | linear eval over predefined dataset. Good readable logs, best for binary scorers. Multi-turn is meant for agentic evals, so it looks a little clunky for red-teaming | framework | LLMs, image, audio/timeseries, video, tool use, RAG | general eval | general | python library, CLI | api, custom code | AI adversary, Human adversary | scorer | user | data, code | visual output, query log? | n/a | |||||||||||||
16 | Learning Interpretability Tool (LIT) | Google PAIR | https://github.com/PAIR-code/lit | data/model analysis and interpretability methods for ML systems (predictive and generative). Also visualization tools and nice GUI | framework | LLMs, image, tabular | general eval | general | GUI, python library | api, custom code | no | none | user | data, code, selection | attribution for output, over components. visualizations of data and model. | framework: model, dataset, interpreter (salience, metrics, visualization), generators (make new (adversarial?) inputs). Not sure how many of each of these there are. | |||||||||||||
17 | LLMBUS | Evren Yalcin | https://github.com/evrenyal/llmbus | web interface for small-scale, manual prompt design with a tokenizer and variety of known jailbreak/attack methods. | tool | LLMs | non-eval (assistive) | general | web | api | no | none | user | selection | A transformed/formatted prompt, in plaintext, image, or audio | prompt "transformation" (attack) templates, tokenizers for a few frontier models | |||||||||||||
18 | NeMO Guardrails | NVIDIA | https://developer.nvidia.com/nemo-guardrails | guardrail implementation | tool | LLMs, tool use | non-eval (assistive) | general | python library, CLI | api | no | facillitator | user | data, selection | guardrail assessment (prompt and response), guarded response | load datasets from hf | |||||||||||||
19 | Parseltongue | Pliny the prompter | https://github.com/BASI-LABS/parseltongue | For prompt design: text conversion (encoding), tokenization. | tool | LLMs | non-eval (assistive) | general | web | none | no | none | user | data | text conversion (encoding), tokenization | text conversion (encoding), tokenization | |||||||||||||
20 | Project Moonshot | AI Verify Foundation | https://github.com/aiverify-foundation/moonshot | good GUI for organizing and managing eval runs (separates benchmarks from redteaming). Allows manual redteaming. Good documentation. | toolkit | LLMs | general eval | general | GUI, CLI, python library | api, custom code | AI adversary, Human adversary | adversary, scorer | user, predefined | data, selection | benchmark: report | benchmark "cookbooks" (dataset, scorer) | |||||||||||||
21 | Promptfoo | Promptfoo | https://github.com/promptfoo/promptfoo | Benchmarking (vulnerability scanner) and red-teaming (AI-generated adversarial prompts) against LLM systems. Some limited support for multimodal image-text systems. Good visual reports and integration into eval/dev pipelines. Compared to other GUI tools, requires a bit of technical interaction (e.g. config file, launching app from commandline), but really good documentation. Also some good beginner-level reference guides to red-teaming and strategies. | framework | LLMs, RAG, image, tool use | red-teaming | general | CLI, GUI, python library | api, custom code | Human adversary | scorer, adversary | user, predefined | selection, data, code | GUI: log (table of conversations and outputs, scores), visual report of attack success rate (graphs, risk categories) | taxonomy of LLM vulnerabilities and 50+ pre-built "plugins" to test them. Reference materials/guides about red-teaming strategies. | |||||||||||||
22 | Promptmap | Utku Sen | https://github.com/utkusen/promptmap | automated harness for running user-defined prompt injection attacks. | tool | LLMs | red-teaming | prompt injection | python library, CLI | api | no | none | user, predefined | data, code | test log: success rate, successful attack instances (response and score explanation) | some predefined prompt injection attacks, eval harness | |||||||||||||
23 | PyRIT | Microsoft | https://github.com/Azure/PyRIT | programmatic interventions over fixed dataset. Multi-turn interactive RT by adversary AI | framework | LLMs, image, audio/timeseries, video | red-teaming | general | python library, CLI | api, custom code | AI adversary | adversary, scorer | predefined, user | data, code | visual output, query log? scorer metrics? | 20 harmful/illicit content datasets, 55ish attack "prompt converters", 6 attack "orchestrators" | |||||||||||||
24 | The Big Prompt Library | Elias Bachaalany | https://github.com/0xeb/TheBigPromptLibrary | reference resource containing system prompts, jailbreaks, etc. for most major hosted models | tool | LLMs | non-eval (assistive) | general | web | none | no | none | predefined | selection | n/a | system prompts for most major hosted models, and prompts for custom instructions, jailbreaks, security, and tool use for many models, especialy GPT-4 series. | |||||||||||||
25 | TokenBuster | Sentry | https://tokenbuster.sentry.security/ | For prompt design: tokenization and plaintext prompt formatting (e.g. chat history -> string). Supports many open models and a few of the GPT's. | tool | LLMs, tool use | non-eval (assistive) | general | web | api, none | Human adversary | none | user | data | token ID list and rendered plaintext prompt | chat templates/ rendering for 206 models, mostly open models. | |||||||||||||
26 | ZetaLib | ZetaLib | https://github.com/Exocija/ZetaLib/tree/main | reference resource containing prompt lists and instructional guides to guardrails, jailbreaks, and frontier model system prompts | tool | LLMs | non-eval (assistive) | general | web | none | Human adversary | none | predefined | selection | n/a | 20+ jailbreak prompts and guides to using them. also 2 prompts to implement simple guardrails | |||||||||||||
27 | |||||||||||||||||||||||||||||
28 | |||||||||||||||||||||||||||||
29 | |||||||||||||||||||||||||||||
30 | |||||||||||||||||||||||||||||
31 | |||||||||||||||||||||||||||||
32 | |||||||||||||||||||||||||||||
33 | |||||||||||||||||||||||||||||
34 | |||||||||||||||||||||||||||||
35 | |||||||||||||||||||||||||||||
36 | |||||||||||||||||||||||||||||
37 | |||||||||||||||||||||||||||||
38 | |||||||||||||||||||||||||||||
39 | |||||||||||||||||||||||||||||
40 | |||||||||||||||||||||||||||||
41 | |||||||||||||||||||||||||||||
42 | |||||||||||||||||||||||||||||
43 | |||||||||||||||||||||||||||||
44 | |||||||||||||||||||||||||||||
45 | |||||||||||||||||||||||||||||
46 | |||||||||||||||||||||||||||||
47 | |||||||||||||||||||||||||||||
48 | |||||||||||||||||||||||||||||
49 | |||||||||||||||||||||||||||||
50 | |||||||||||||||||||||||||||||
51 | |||||||||||||||||||||||||||||
52 | |||||||||||||||||||||||||||||
53 | |||||||||||||||||||||||||||||
54 | |||||||||||||||||||||||||||||
55 | |||||||||||||||||||||||||||||
56 | |||||||||||||||||||||||||||||
57 | |||||||||||||||||||||||||||||
58 | |||||||||||||||||||||||||||||
59 | |||||||||||||||||||||||||||||
60 | |||||||||||||||||||||||||||||
61 | |||||||||||||||||||||||||||||
62 | |||||||||||||||||||||||||||||
63 | |||||||||||||||||||||||||||||
64 | |||||||||||||||||||||||||||||
65 | |||||||||||||||||||||||||||||
66 | |||||||||||||||||||||||||||||
67 | |||||||||||||||||||||||||||||
68 | |||||||||||||||||||||||||||||
69 | |||||||||||||||||||||||||||||
70 | |||||||||||||||||||||||||||||
71 | |||||||||||||||||||||||||||||
72 | |||||||||||||||||||||||||||||
73 | |||||||||||||||||||||||||||||
74 | |||||||||||||||||||||||||||||
75 | |||||||||||||||||||||||||||||
76 | |||||||||||||||||||||||||||||
77 | |||||||||||||||||||||||||||||
78 | |||||||||||||||||||||||||||||
79 | |||||||||||||||||||||||||||||
80 | |||||||||||||||||||||||||||||
81 | |||||||||||||||||||||||||||||
82 | |||||||||||||||||||||||||||||
83 | |||||||||||||||||||||||||||||
84 | |||||||||||||||||||||||||||||
85 | |||||||||||||||||||||||||||||
86 | |||||||||||||||||||||||||||||
87 | |||||||||||||||||||||||||||||
88 | |||||||||||||||||||||||||||||
89 | |||||||||||||||||||||||||||||
90 | |||||||||||||||||||||||||||||
91 | |||||||||||||||||||||||||||||
92 | |||||||||||||||||||||||||||||
93 | |||||||||||||||||||||||||||||
94 | |||||||||||||||||||||||||||||
95 | |||||||||||||||||||||||||||||
96 | |||||||||||||||||||||||||||||
97 | |||||||||||||||||||||||||||||
98 | |||||||||||||||||||||||||||||
99 | |||||||||||||||||||||||||||||
100 |