This form serves to express interest in receiving support from the SETS AI Content Red Team Clinic
administered by Alexios Mantzarlis at Cornell Tech and staffed by graduate students. Alexios created the Content Adversarial Red Team for Gemini at Google. The graduate students are recruited among the highest performers of CS 5342: Trust & Safety.
Our free clinic is meant to provide external assistance to public service organizations that do not have the resources of bandwidth to conduct adversarial red teaming internally. It does not replace normal assurance processes but can be a way to surface unexpected vulnerabilities to help build safeguards and make decisions about launch.
This clinic concentrates primarily on the content risks of AI tools. This means we focus most of our efforts on obtaining harmful, unintended, or unexpected outputs from these tools. Depending on the skills of our staff at the time of the engagement and the type of tool, we may be able to do more, but this is to be discussed at the moment of the first client meeting.
PLEASE MAKE SURE TO READ BEFORE APPLYING:
What we do:
- Review the purpose and content risks of a public-facing AI tool with the client and determine top vulnerabilities and adversaries who might want to deface it or misuse it
- Conduct several hundred primarily prompt-based attempts to circumvent protections and achieve goals that your organization deems undesirable
- Prepare a report with our findings and recommendations on how to avoid the vulnerabilities detected
What we don't do:
- Conduct penetration testing
- Produce ready-to-implement technical fixes
- Release any form of certificate vouching for the safety of the tool