2
2
About Me
Founder, AI R&D
Host, "AI Scout"
AI Advisor
AI Advisor
GPT-4 Red Team
Llama Safety Review
I Study AI
5
5
Today's Agenda:
AI Automation: Making AI Work for You
What is Work?
Work: Inputs → Outputs
What is Intelligence?
Intelligence:
the ability to do work�without precise instructions
Recognizing Numbers – Easy for Humans
Recognizing Numbers – Impossible in Code
14%
Correct
Code generated by Claude 3.5 Sonnet
Answer fact-checked with Perplexity
Recognizing Numbers – Easy for AI
Source: 3Blue1Brown Neural Nets; Also see: AI Scouting Report
99.7%
Correct
Use AI for
Work that Requires Intelligence
What can AI do today?
15
SOTA AIs → Human Experts on Routine Tasks
16
"AI Doctor"
AIs >= Human Doctors on Medical Diagnosis
Human Doctors
It's Not Just
Research!
AI Can Very Likely
Do Routine Work For You
19
3 Ways to Work with AI
Chat
The first killer app
ex: ChatGPT, Copilot
Real-time interaction
Human pilot, AI copilot
"Can you help with…"
One-off, Do Anything
Immediate validation
Agents
The "Missing Middle"*
ex: MultiOn, Devin, etc
On-the-fly delegation
Autonomous
"Come back when it's done"
Do Small Projects
Trust but Verify
Automation
Unrealized Potential
ex: Zapier Zaps, etc
Background / Batch processing
Human design, AI execution
"Your job is to…"
Do One Thing Many Times
Structured Evals
* As of July, 2024 – Agents subject to "Wake Up" at any time
AI Automation in 3 Easy Steps:
Choosing Work for AI to Do
22
23
23
Good Targets for AI
Bad Targets for AI
When should we use AI?
Translation
AI = Alien Intelligence
Common Sense Spatial Reasoning
Programming
Human Performance
AI Performance
10,000 Hours for Expertise?
10 Hours for a New Model
AI: Human-level, not Human-like
Trait | AI | Human Expert | Notes |
Breadth | +++ | + | AIs have “read the whole internet” |
Depth | ++ | +++ | Human experts know better |
Insight | + | +++ | AIs have few “Eureka moments” |
Speed | +++ | + | AIs are ~10X faster |
Cost | $ | $$$ | AIs are ~10% cost |
Availability | +++ | + | AIs are instantly available 24/7 |
Scalability | +++ | + | You can run 1000 AIs at once |
Context | + | ++ | AIs know nothing about your business |
Memory | + → +++ | ++ | AIs have brittle memory, but improving |
Understanding the Work:
Process Mapping
28
"It's Pretty Simple"
OK, but … How?
"Let's Break it Down, Step-by-Step"
30
"You're Right – There's a Bit More To It"
"How do you think about it?"
32
"Honestly, There's a Lot That Goes Into It"
Documenting the Work:
"What Does 'Great' Look Like?"
Examples Must Include
Inputs, Outputs – And Reasoning
Without 10 “Gold Standard” Examples,
You Can’t Successfully Automate
Designing a New Process?
37
"What if We Did it This Way?"
👨
🤖
🤖
🤖
🔎
(With the speed & parallelizability of AI, do we really need prioritization?)
38
38
Prioritize by Value, Risk
Expected
Value
=
x
Task
Value
x
Task
Volume
% of Success
Task | Answer Investor Qs | Draft Sales Outreach | Answer Service Ticket |
Value | $250 | $10 | $2 |
Volume | 50 | 1000 | 5000 |
Potential Value | $12500 | $10000 | $10000 |
% of Success | 50% | 75% | 90% |
Expected Value | $6250 | $7500 | $9000 |
Optimizing AI Performance
40
Optimizing AI Performance: Information & Behavior
41
Best Practice Prompts, "Gold Standard" Examples
Role | You are a Customer Service agent… |
Task Definition | Your job is to assign a priority level to a ticket… |
Instructions | If the customer is a "VIP", assign "High" priority… If an error code is provided, assign "Medium"... |
Format | Use the format <reasoning> ... <answer> |
Examples | Follow the below examples: |
Sample Inputs | Customer: …; Message: …; Open issues: … |
Reasoning | The customer is requesting a password reset… |
Outputs | Prioritization: High |
Inputs | Customer: [info]; Message: [message]; Open issues: [list of issues] |
42
Strategy | Explanation / Examples |
Dynamic Example Selection | Select most similar / relevant examples from database at runtime |
Majority Vote | Run prompt N times, then choose the most popular answer |
Multi-Model | Run prompt with N different models, check for agreement |
Multimodal | Include screenshots, Audio recordings, etc |
Multi-Lingual | Translate to English, perform task, translate back Majority vote across languages |
Data Augmentation | Caption, ORC, segment images; Transcribe audio |
Self-Correction | Give AI 3-5 chances to "Make it Better" |
Claude Meta-Prompting | Maybe Claude can do a better job? |
DSPy | Framework for algorithmically optimizing LM prompts & weights |
Advanced Prompt Engineering
Source: The Prompt Report
43
Optimizing Information: Retrieval Augmented Generation
Be Sure to Include Enough Context
45
Optimizing Behavior: The Fine-Tuning Loop
Expect ~3 Rounds of Fine-Tuning
(+ More for Edge Cases)
47
Optimizing LLM Accuracy
AI Optimization Best Practices
Congratulations!
You are Automating Work with AI
Extras
Managing Common Trade Offs
Performance | -vs- | Investment |
Accuracy | -vs- | Complexity |
Development Cost | -vs- | Inference Cost |
Value | -vs- | Risk |
False Positives | -vs- | False Negatives |
52
No-Code Platforms > Custom Development
Key Term | Definition |
Workflow | The overall process |
Trigger | "Every time X happens…" |
Inputs | What do we need to do the task? |
Logic | How the task is done |
Output | The finished work product |
Action | What happens next? |
53
53
AI Engineer
AI Automation Skills Checklist
AI Advisor
54
54
What You Want from an AI Advisor
Best Practices | Anti-Patterns |
“What makes this hard?” | Assume It's Easy |
Laser Focused on Core Cognitive Work | Preoccupation with "Plumbing" |
Collect Real "Gold Standard" Examples | Make Up Data |
Establish a Standard Test Set | Work Purely on "Vibes" |
Test for Viability with Top Models | Prototype with Inferior Models |
Maximize Performance First | Premature Focus on Costs |
Review AI Outputs with Stakeholders | Assume It's "Good Enough" |
Plan for Iteration | Promise "One and Done" |
Build Confidence with Quick Wins | Propose Huge Projects Immediately |
Everything's Harder in the Enterprise
Future Proofing Planning
Thank You
This Presentation
All Presentations & Podcasts