1 of 57

AI Automation

Making AI Work for You

Nathan Labenz, August 2024

2 of 57

2

3 of 57

About Me

Founder, AI R&D

Host, "AI Scout"

AI Advisor

GPT-4 Red Team

Llama Safety Review

4 of 57

I Study AI

5 of 57

5

Today's Agenda:

AI Automation: Making AI Work for You

Foundations – What is Intelligence? What can AI do today, and when should we use it?

AI Automation in 3 Easy Steps - a Method Proven Over 3 Years

Choose Work for AI to do
Understand and Document the Work
Optimize AI performance

Extras: Managing Trade-offs, Hiring Checklists, Enterprise Considerations, Future Proofing

6 of 57

What is Work?

7 of 57

Work: Inputs → Outputs

8 of 57

What is Intelligence?

9 of 57

Intelligence:

the ability to do work�without precise instructions

10 of 57

Recognizing Numbers – Easy for Humans

11 of 57

Recognizing Numbers – Impossible in Code

14%

Correct

Code generated by Claude 3.5 Sonnet

Answer fact-checked with Perplexity

12 of 57

Recognizing Numbers – Easy for AI

Source: 3Blue1Brown Neural Nets; Also see: AI Scouting Report

99.7%

Correct

13 of 57

Use AI for

Work that Requires Intelligence

14 of 57

What can AI do today?

15 of 57

15

Source: LifeArchitect

SOTA AIs → Human Experts on Routine Tasks

16 of 57

16

"AI Doctor"

AIs >= Human Doctors on Medical Diagnosis

Source: Towards Conversational Diagnostic AI

Human Doctors

17 of 57

It's Not Just

Research!

18 of 57

AI Can Very Likely

Do Routine Work For You

19 of 57

19

3 Ways to Work with AI

Chat

The first killer app

ex: ChatGPT, Copilot

Real-time interaction

Human pilot, AI copilot

"Can you help with…"

One-off, Do Anything

Immediate validation

Agents

The "Missing Middle"*

ex: MultiOn, Devin, etc

On-the-fly delegation

Autonomous

"Come back when it's done"

Do Small Projects

Trust but Verify

Automation

Unrealized Potential

ex: Zapier Zaps, etc

Background / Batch processing

Human design, AI execution

"Your job is to…"

Do One Thing Many Times

Structured Evals

* As of July, 2024 – Agents subject to "Wake Up" at any time

20 of 57

AI Automation in 3 Easy Steps:

Choose Work for AI to Do
Understand & Document the Work
Optimize AI Performance

21 of 57

Choosing Work for AI to Do

22 of 57

22

23 of 57

23

Good Targets for AI

Intelligence Required
Task-Sized
Slow / Expensive
Repetitive
Explicit Context Available
"Gold Standard" Examples
Low Risk
Fast Feedback
Not Fun
AI Can Do It

Bad Targets for AI

Purely Procedural
Job-Sized
Already Fast & Cheap
"One-offs"
Implicit Context Only
"Beauty in the Eye of the Beholder"
High Sensitivity
Blind Spots
Fun
AI Can't Do It

When should we use AI?

24 of 57

Translation

AI = Alien Intelligence

Common Sense Spatial Reasoning

Programming

Human Performance

AI Performance

25 of 57

10,000 Hours for Expertise?

10 Hours for a New Model

Follow Ethan Mollick on Twitter

26 of 57

AI: Human-level, not Human-like

Trait	AI	Human Expert	Notes
Breadth	+++	+	AIs have “read the whole internet”
Depth	++	+++	Human experts know better
Insight	+	+++	AIs have few “Eureka moments”
Speed	+++	+	AIs are ~10X faster
Cost	$	$$$	AIs are ~10% cost
Availability	+++	+	AIs are instantly available 24/7
Scalability	+++	+	You can run 1000 AIs at once
Context	+	++	AIs know nothing about your business
Memory	+ → +++	++	AIs have brittle memory, but improving

27 of 57

Understanding the Work:

Process Mapping

28 of 57

28

"It's Pretty Simple"

OK, but … How?

29 of 57

"Let's Break it Down, Step-by-Step"

30 of 57

30

"You're Right – There's a Bit More To It"

31 of 57

"How do you think about it?"

32 of 57

32

How many open tickets do we have?
Does the customer have an SLA?
Is the customer free / paid / VIP?
Did the customer provide repro steps?
Is this a known issue?
Is the product team already working on it?
Do we have a solution?
Is the customer upset?
Is the issue likely to affect others?

"Honestly, There's a Lot That Goes Into It"

33 of 57

Documenting the Work:

"What Does 'Great' Look Like?"

34 of 57

Examples Must Include

Inputs, Outputs – And Reasoning

35 of 57

Without 10 “Gold Standard” Examples,

You Can’t Successfully Automate

36 of 57

Designing a New Process?

37 of 57

37

"What if We Did it This Way?"

👨

🤖

🔎

(With the speed & parallelizability of AI, do we really need prioritization?)

38 of 57

38

Prioritize by Value, Risk

Expected

Value

=

x

Task

Value

x

Task

Volume

% of Success

Task	Answer Investor Qs	Draft Sales Outreach	Answer Service Ticket
Value	$250	$10	$2
Volume	50	1000	5000
Potential Value	$12500	$10000	$10000
% of Success	50%	75%	90%
Expected Value	$6250	$7500	$9000

39 of 57

Optimizing AI Performance

40 of 57

40

Optimizing AI Performance: Information & Behavior

41 of 57

41

Best Practice Prompts, "Gold Standard" Examples

Role	You are a Customer Service agent…
Task Definition	Your job is to assign a priority level to a ticket…
Instructions	If the customer is a "VIP", assign "High" priority… If an error code is provided, assign "Medium"...
Format	Use the format <reasoning> ... <answer>
Examples	Follow the below examples:
Sample Inputs	Customer: …; Message: …; Open issues: …
Reasoning	The customer is requesting a password reset…
Outputs	Prioritization: High
Inputs	Customer: [info]; Message: [message]; Open issues: [list of issues]

42 of 57

42

Strategy	Explanation / Examples
Dynamic Example Selection	Select most similar / relevant examples from database at runtime
Majority Vote	Run prompt N times, then choose the most popular answer
Multi-Model	Run prompt with N different models, check for agreement
Multimodal	Include screenshots, Audio recordings, etc
Multi-Lingual	Translate to English, perform task, translate back Majority vote across languages
Data Augmentation	Caption, ORC, segment images; Transcribe audio
Self-Correction	Give AI 3-5 chances to "Make it Better"
Claude Meta-Prompting	Maybe Claude can do a better job?
DSPy	Framework for algorithmically optimizing LM prompts & weights

Advanced Prompt Engineering

Source: The Prompt Report

43 of 57

43

Optimizing Information: Retrieval Augmented Generation

44 of 57

Be Sure to Include Enough Context

45 of 57

45

Optimizing Behavior: The Fine-Tuning Loop

46 of 57

Expect ~3 Rounds of Fine-Tuning

(+ More for Edge Cases)

47 of 57

47

Optimizing LLM Accuracy

48 of 57

AI Optimization Best Practices

De-risk by Doing the Hard Part(s) First

Maximize Performance First, then Improve Cost and Speed

Compare AI to Human Performance, Not to Perfection

Remember: Each “9” of Performance Requires 10X More Work

Plan for Iteration and Obsolescence

Always Look at Your Data

49 of 57

Congratulations!

You are Automating Work with AI

50 of 57

Extras

51 of 57

Managing Common Trade Offs

Performance	-vs-	Investment
Accuracy	-vs-	Complexity
Development Cost	-vs-	Inference Cost
Value	-vs-	Risk
False Positives	-vs-	False Negatives

52 of 57

52

No-Code Platforms > Custom Development

Key Term	Definition
Workflow	The overall process
Trigger	"Every time X happens…"
Inputs	What do we need to do the task?
Logic	How the task is done
Output	The finished work product
Action	What happens next?

53 of 57

53

AI Engineer

AI Obsession
(Async) Computer Programming
Does the Hard Part First
Advanced Prompt Engineering
RAG
Tool Use
Fine-Tuning
Pay Attention to Outputs
Know When To Change Course
Trade-Off Optimization
UI Development
Automated Testing / Monitoring

AI Automation Skills Checklist

AI Advisor

AI Obsession
(Async) Human Communication
Knows AI Capabilities
Target Selection
Task Decomposition
Context Mapping
Example Curation
Process Design
Test Set Development
Viability Demonstration
Expectations Management
Training & Feedback

54 of 57

54

What You Want from an AI Advisor

Best Practices	Anti-Patterns
“What makes this hard?”	Assume It's Easy
Laser Focused on Core Cognitive Work	Preoccupation with "Plumbing"
Collect Real "Gold Standard" Examples	Make Up Data
Establish a Standard Test Set	Work Purely on "Vibes"
Test for Viability with Top Models	Prototype with Inferior Models
Maximize Performance First	Premature Focus on Costs
Review AI Outputs with Stakeholders	Assume It's "Good Enough"
Plan for Iteration	Promise "One and Done"
Build Confidence with Quick Wins	Propose Huge Projects Immediately

55 of 57

Everything's Harder in the Enterprise

Data Access – Can't Be Taken for Granted�
SLAs – Challenging Given Immature, External Dependencies�
Privacy & Data Security – API Providers Make Strong Promises�
Risk Management – Internal/External Use, High/Low Stakes�
Compliance – Are we complying with laws around AI use?

Aversion to Change – Makes Task Selection extra important!

56 of 57

Future Proofing Planning

LLMs are getting better, faster, cheaper → Worry about $$ Later�
Architectures are getting simpler → Don't Over-Engineer�
Fine-Tuning Options Improving → Build a Test Suite

Infrastructure Commoditizing → Work at Highest Possible Layer

No-Code is Maturing → Buy>Build (Mostly)

Last Mile Customization Drives Huge Value → "Copy & Customize"

Future Capabilities May Surprise Us! → Plan for Obsolescence

57 of 57

Thank You

This Presentation

All Presentations & Podcasts