1 of 42

The dam on AI security automation will break

And it’s on us to break it faster than our adversaries

Joshua Saxe, AI security engineer, Meta, https://substack.com/@joshuasaxe181906

2 of 42

We’re early on the path in AI security, but there is a paved road for rapid progress thanks to coding

The only barrier to achieving coding-like progress in offensive security is us

End-to-end offensive agents for cyber don’t work that well yet

But they’re advancing rapidly for coding

3 of 42

Because there’s a paved road, our adversaries will build offensive AI cyber capabilities, and if we don’t move fast, our adversaries will get there first

4 of 42

If we stop at assistant level the other side wins

5 of 42

If we stop at assistant level the other side wins

6 of 42

Our goal must be meaningful autonomy and capabilities quickly or the adversary wins

7 of 42

How do we work towards meaningful autonomy when we have so far to go?

My theses

Take dramatic progress seriously, both as a possibility for us, and our adversaries
Follow in the technical footsteps of the areas that are ahead of cyber -- coding and research agents – rather than relying on their side effects

8 of 42

How do we work towards meaningful autonomy when we have so far to go?

My theses

Hard evals and training data work is necessary (just like in other fields!)
Prompting general purpose LLMs is not enough; we’ll need our own fine-tuned models (just like in other fields!)
We’ll need as much process innovation as we will technology innovation (just as in other fields!)

9 of 42

Part one

On the data foundations necessary to achieve autonomous reliability

10 of 42

ML/AI engineering is different than traditional security software engineering, in that its foundations are mainly about data

11 of 42

“It works on my dataset” won’t get us there for AI cybersecurity capabilities; we need data-driven evals to succeed

12 of 42

We can’t avoid training LLMs to do cyber, and with high quality, representative data

13 of 42

The importance of careful data work for reliable cyber capabilities isn’t always obvious to newcomers to the field

14 of 42

Data is expensive, boring, but foundational

(How ImageNet solved computer vision)

ImageNet Fact	Number / Detail
Number of classes in labels	About 21,000 in the full ImageNet hierarchy
Images	1.28 million training, 50,000 validation, 100,000 test
Labeling workforce	49,000 Mechanical Turk workers from 167 countries; 160 million candidate images screened
Labeling effort	53,000 worker-hours of labeling
Label verification	2–5 independent workers per image; precision spot-check ≈99.7%; validation set error ≈6%
Cost	Roughly $0.5–0.8M total (based on AMT pay rates, not officially published)
Funding	Mix of NSF research grants plus Google, Intel, Microsoft, Yahoo industry support
Build timeline	≈2008–2010 (about 21 months from project start to first public release)

15 of 42

ImageNet data was the operating system for solving computer vision, and enabling many autonomous vision applications

16 of 42

Imagine not knowing which of these models performed better, and not having data to train them

17 of 42

As with ImageNet, we’ve paid for good data for coding;�cyber will need this too

18 of 42

Training environments and optimization-based solutions are the new data, and table stakes for autonomous AI progress in cybersecurity

19 of 42

An example cybersecurity gym

20 of 42

Environments are becoming ubiquitous in AI coding, will need to be in AI cyber, and this community needs to build them

Agent	Public evidence of RL in code execution environments
Google DeepMind AlphaCode 2	“We use reinforcement learning from code execution feedback to improve model solutions… running candidate programs in a sandbox environment to collect rewards.” (AlphaCode 2 blog / paper)
OpenAI Codex / GPT-4 for code	“Codex was further improved with reinforcement learning from human and code execution feedback, allowing the model to iteratively test, debug, and refine programs.” (OpenAI system card / blog)
Replit Ghostwriter / Replit Agent	“We leverage deep reinforcement learning, using signals such as unit test results, compiler messages, and runtime errors to train models to write more reliable code.” (Replit engineering blog)
Cognition Labs – Kevin-32B (Devin’s codegen core)	“If a kernel fails to compile, we pass the compiler error trace as feedback; if it runs correctly, we measure runtime performance. This feedback is used for reinforcement learning.” (Cognition technical post)
Poolside	“Our models are trained with Reinforcement Learning from Code Execution Feedback (RL-CE), where every program is executed and its results directly inform the reward signal.” (Poolside blog)
Anthropic Claude (for code tasks)	“To fine-tune the model, Anthropic used reinforcement learning with human feedback plus automated signals, including execution-based evaluations on code problems.” (Anthropic technical overview)
Cursor (Tab / Autocomplete)	“We use data from user interactions to improve Tab using online reinforcement learning, continuously optimizing completions from accept/reject and run outcomes.” (Cursor blog post)

21 of 42

The difference between having training ‘gyms’ and not having training gyms is 20% vs. 85% AIME accuracy; we need to do this for cyber and no one is really doing it yet

22 of 42

Reinforcement learning will be the way out of AI security engineering being about futzing with prompts

23 of 42

The good news about reinforcement learning environments is that they’re way more data efficient than expert data, which is important for cyber, where relevant data are often private

24 of 42

Summing up the ingredients for success

Data-driven evaluations that allow us to know whether changing a prompt or a tool makes our agent better or worse

Fine tuning data and reinforcement learning execution environments to improve the underlying LLM

Compute for training and to support a rapid feedback loop between changing an agent and understanding how well the agent is doing

25 of 42

How we should crawl, then walk, then run towards data foundations in AI security

26 of 42

Crawl: No training data, no evals, optimize agent scaffolding against a handful of examples, ship based on vibes

27 of 42

The eventual, inevitable problem with “crawl”

28 of 42

Walk: No fine tuning, but good, statistically significant evals so engineers can hill climb and compare results

(DARPA AIxCC an example of this)

29 of 42

Being able to compute and compare results quickly is a huge boost in “walk” state

30 of 42

The inevitable problem with “walk” (generalist LLMs aren’t trained to explicitly target Meta’s cybersecurity problems)

31 of 42

Run: Continuously refreshed annotated production data, online A/B testing, just like mature AI organizations do for mature capabilities

32 of 42

Getting to ”run” is urgent: we need significant autonomy, and labor productivity gains, as fast as possible, or the adversary wins

33 of 42

Why we need to change our organizations and risk philosophies with AI progress

34 of 42

35 of 42

Just as other industries, we’ll need to experiment with organization-AI interaction to realize the gains of cyber autonomy; the problem isn’t just the technology, it’s also our organizational structures, skill compositions, and the need to discover AI applications through experimentation

Some SOC roles today

Some SOC roles tomorrow

36 of 42

Surprises we’ve already encountered when introducing AI into cybersecurity labor processes

AI red teaming of internal infrastructure requires very high trust; we currently don’t know how to unblock this.
In some companies, triaging appsec alerts is more valuable than finding new bugs due to sufficient fuzzer/static analyzer alerts.
Regex, Yara, and Sigma rules usually outperform ML-based detection in practice.
SOC operators prefer explainable (even if less accurate) detectors over more accurate, unexplainable ones.

37 of 42

Even when we’ve found the right processes, stakeholder risk alignment can block deployment and requires attention

38 of 42

Wrapping up

39 of 42

We’re early on the path in AI security, but there is a paved road for rapid progress thanks to coding and math

The only barrier to achieving coding-like progress in offensive security is us

40 of 42

Let’s not forget the size of the opportunity, and the risk

AI agents will amplify human labor in cyber conflict, and if we don’t lean in, our adversaries will win

41 of 42

How do we work towards meaningful autonomy when we have so far to go?

My theses

Take dramatic progress seriously, both as a possibility for us, and our adversaries
Follow in the technical footsteps of the areas that are ahead of cyber -- coding and research agents – rather than relying on their side effects
Hard, expensive data foundations work is necessary to achieve the reliability we’re after (just like in other fields!)
Prompting general purpose LLMs might get us there, but not fast enough; we’ll need our own models, expert fine tuning data, and reinforcement learning environments (just like in other fields!)
We’ll need as much social innovation as we will tech innovation to build AI cyber autonomy we can rely on (just as in other fields!)

42 of 42

Discussion, questions; (my substack: https://substack.com/@joshuasaxe181906)