This project is run by SecureBio, a Boston-based nonprofit organization dedicated to securing the world against biological threats. Evaluating biosecurity-relevant capabilities of large language models to understand misuse risks is one of SecureBio’s main projects, along with building early detection systems against unknown pathogens and developing DNA synthesis screening methods.
The SecureBio AI+Bio team is led by Dr. Seth Donoughe, and is assisted by several research scientists; see the team on the project’s website.
Why is SecureBio doing this?
Large Language Models (LLMs, a type of AI model) have the potential to dramatically alter our world, in ways both intended and not. While in some ways, the organic world of biology might seem far removed from the silicon world of AI, recent LLMs have demonstrated surprising capacity for retrieving and explaining even the most specialized medical and biological information.
Given the importance of biology to areas of fundamental importance to humanity, such as medicine and agriculture, we think that it is imperative that policymakers and the public have the best possible tools at their disposal to realistically assess the practical capabilities of LLMs to make meaningful contributions to biological problems.
Our evaluation is intended to provide a rigorous and unbiased picture of how an LLM performs not only on theoretical questions in biology, but also on the practical problems biologists face in the laboratory. Our hope is that such a benchmark will give policymakers the best information possible to help push the balance of AI’s impacts on society towards the intended and away from the unintended.
Who will benefit from this?
SecureBio’s primary audience for this evaluation is governmental agencies and non-governmental organizations focused on monitoring, regulating, and safeguarding the emerging field of large AI models. The eval is meant to be an unbiased, rigorous, third-party tool that these organizations can use to meaningfully assess the capabilities of AI models as they evolve.
It is explicitly NOT meant as a training set or ‘teaching tool’ for improving AI models. While SecureBio has communicated with LLM developers, an important aspect of this project is that the underlying data remain partitioned from them. See Safety and Security below.
How is this project different from other AI evaluations?
In this evaluation, SecureBio aims to assess the capabilities of various LLM models in helping individuals perform specific difficult specialized tasks, not just demonstrate theoretical knowledge about a particular subject. By the end of the experiment, SecureBio wants to estimate how useful the models are in assisting professionals with laboratory tasks, following protocols, and conducting laboratory work for specific purposes. Other evals in this space largely test basic knowledge of biology.
Other evals also neglect the multimodal/visual-aspect of assistance.
What is the final output of the project?
The final output of the project will be an evaluation tool (eval) designed to compare the performance of human experts, non-experts, and LLMs on practical questions related to laboratory activities. This tool is intended to be capable of helping assess the capabilities of current and future models in assisting researchers of varying proficiency levels in achieving results across various lab methodologies.
The eval itself will not be open access and SecureBio reserves the right to not publish, to publish partially, or to provide conditional access depending on the context and request.
Additionally, SecureBio intends to write a peer-reviewed academic publication detailing our methodology and findings (i.e., the performance of various LLMs on the eval)
What is the timeline of the project?
The end of the current phase, collecting questions, is currently scheduled for August 31th. The project's completion is expected by September 30th.
Secure Bio reserves the right to alter the timeline according to its own interests without prior notice.
Will there be follow-up studies or projects?
Yes, SecureBio has several initiatives aimed at advancing how we evaluate AI models; this is just one of several ongoing projects.
Contribution Details
What is my role in this project?
SecureBio’s primary need is for experts who can write and review difficult questions about practical problems in biology laboratories. Difficult, in this context, means questions that address information that isn’t easily acquired simply through a web search, or even through fairly serious literature review by a trained biologist in a different field.
In addition, SecureBio is open to more extensive intellectual contribution and collaboration with interested experts who would like to participate in the anticipated peer-reviewed literature paper expected to result from this project.
I might know someone who could help with this project. What are the required qualifications for additional contributors?
At this particular stage of the project, we are looking for specialists who have, at a minimum, 1 year of in-lab research experience at the graduate level (e.g., pursuing a doctorate in microbiology/molecular biology/virology).
If you can think of someone, simply let the person from SecureBio who first contacted you know, or email to lab-mm@securebio.org. You will receive a referral bonus of $200 for each onboarded specialist you recommended who submits at least five questions which are of sufficiently high quality to be retained in the final dataset.
Are there a minimum number of questions I need to create in order to be compensated?
Nope! We pay per question, and the minimum number of questions is 1.
Note that payment will be calculated based on completed tasks (i.e., SecureBio cannot provide compensation for a half-finished question).
Do I need to fill out any paperwork in order to participate?
Yep, we have a single doc for you to read over and sign if you want to contribute to the project. It covers the basic aspects of the work agreement, e.g. privacy, compensation, etc.
Credit & Reimbursement
How will my contribution be acknowledged?
Contributors have the option of remaining anonymous, or if you’d like we can list you in the acknowledgements of the paper that we publish describing our work. We also consider offering a co-authorship to contributors who engage extensively with the project beyond submitting/reviewing questions. Also, of course, you will be compensated for all completed tasks. If the first set that you submit is suitable for inclusion in the project, we will invite you to sign up for as many tasks as your time and interest allow.
What if I am not allowed to take paid side gigs?
SecureBio wants to make sure all contributors are acting in accordance with their existing obligations. Some organizations have clauses limiting outside paid work. Note that many times, such clauses have exemptions based on time or total compensation—it’s worth checking the fine print. If monetary compensation is definitely not possible but you still want to contribute, let us know and we can discuss what an adequate acknowledgement of your contribution would look like.
Do you intend to make any money from the results of this project?
In short: no. SecureBio, Inc is a federally registered 501(c)(3) non-profit organization. Moreover funding for this project comes from donations and partnerships, and we have committed to making the results available to any organization with a track record of responsibly handling datasets like this.
Safety & Security
Will my questions remain confidential?
All questions will be anonymized, ensuring that no one involved in the evaluation process (reviewers, other authors and LLM models) will explicitly know who submitted a given question. The question dataset itself may be part of the publication.
Can any harm result from this project?
LLMs are incredibly effective at collating information from across diverse sources. This presents something of a paradox to a would-be evaluator: how can we be sure our assessment isn’t itself actually training the model?
Given the potentially sensitive nature of biological methods, compartmentalizing the information in our study is at the very top of our list of priorities. This is part of why we insist that subject experts do not use LLM tools in drafting their questions. Within SecureBio, we have established protocols for our internal processes to prevent use of this information as training data.
We are also designing the evaluation to limit the ability of models to access the full set of underlying information.