1 of 57

AI Automation

Making AI Work for You

Nathan Labenz, August 2024

2 of 57

2

2

3 of 57

About Me

Founder, AI R&D

Host, "AI Scout"

AI Advisor

AI Advisor

GPT-4 Red Team

Llama Safety Review

4 of 57

I Study AI

5 of 57

5

5

Today's Agenda:

AI Automation: Making AI Work for You

  1. Foundations – What is Intelligence? What can AI do today, and when should we use it?

  • AI Automation in 3 Easy Steps - a Method Proven Over 3 Years
    1. Choose Work for AI to do
    2. Understand and Document the Work
    3. Optimize AI performance

  • Extras: Managing Trade-offs, Hiring Checklists, Enterprise Considerations, Future Proofing

6 of 57

What is Work?

7 of 57

Work: Inputs → Outputs

8 of 57

What is Intelligence?

9 of 57

Intelligence:

the ability to do work�without precise instructions

10 of 57

Recognizing Numbers – Easy for Humans

11 of 57

Recognizing Numbers – Impossible in Code

14%

Correct

Code generated by Claude 3.5 Sonnet

Answer fact-checked with Perplexity

12 of 57

Recognizing Numbers – Easy for AI

99.7%

Correct

13 of 57

Use AI for

Work that Requires Intelligence

14 of 57

What can AI do today?

15 of 57

15

SOTA AIs → Human Experts on Routine Tasks

16 of 57

16

"AI Doctor"

AIs >= Human Doctors on Medical Diagnosis

Human Doctors

17 of 57

It's Not Just

Research!

18 of 57

AI Can Very Likely

Do Routine Work For You

19 of 57

19

3 Ways to Work with AI

Chat

The first killer app

ex: ChatGPT, Copilot

Real-time interaction

Human pilot, AI copilot

"Can you help with…"

One-off, Do Anything

Immediate validation

Agents

The "Missing Middle"*

ex: MultiOn, Devin, etc

On-the-fly delegation

Autonomous

"Come back when it's done"

Do Small Projects

Trust but Verify

Automation

Unrealized Potential

ex: Zapier Zaps, etc

Background / Batch processing

Human design, AI execution

"Your job is to…"

Do One Thing Many Times

Structured Evals

* As of July, 2024 – Agents subject to "Wake Up" at any time

20 of 57

AI Automation in 3 Easy Steps:

  1. Choose Work for AI to Do
  2. Understand & Document the Work
  3. Optimize AI Performance

21 of 57

Choosing Work for AI to Do

22 of 57

22

23 of 57

23

23

Good Targets for AI

  1. Intelligence Required
  2. Task-Sized
  3. Slow / Expensive
  4. Repetitive
  5. Explicit Context Available
  6. "Gold Standard" Examples
  7. Low Risk
  8. Fast Feedback
  9. Not Fun
  10. AI Can Do It

Bad Targets for AI

  1. Purely Procedural
  2. Job-Sized
  3. Already Fast & Cheap
  4. "One-offs"
  5. Implicit Context Only
  6. "Beauty in the Eye of the Beholder"
  7. High Sensitivity
  8. Blind Spots
  9. Fun
  10. AI Can't Do It

When should we use AI?

24 of 57

Translation

AI = Alien Intelligence

Common Sense Spatial Reasoning

Programming

Human Performance

AI Performance

25 of 57

10,000 Hours for Expertise?

10 Hours for a New Model

26 of 57

AI: Human-level, not Human-like

Trait

AI

Human Expert

Notes

Breadth

+++

+

AIs have “read the whole internet”

Depth

++

+++

Human experts know better

Insight

+

+++

AIs have few “Eureka moments”

Speed

+++

+

AIs are ~10X faster

Cost

$

$$$

AIs are ~10% cost

Availability

+++

+

AIs are instantly available 24/7

Scalability

+++

+

You can run 1000 AIs at once

Context

+

++

AIs know nothing about your business

Memory

+ → +++

++

AIs have brittle memory, but improving

27 of 57

Understanding the Work:

Process Mapping

28 of 57

28

"It's Pretty Simple"

OK, but … How?

29 of 57

"Let's Break it Down, Step-by-Step"

30 of 57

30

"You're Right – There's a Bit More To It"

31 of 57

"How do you think about it?"

32 of 57

32

  • How many open tickets do we have?
  • Does the customer have an SLA?
  • Is the customer free / paid / VIP?
  • Did the customer provide repro steps?
  • Is this a known issue?
  • Is the product team already working on it?
  • Do we have a solution?
  • Is the customer upset?
  • Is the issue likely to affect others?

"Honestly, There's a Lot That Goes Into It"

33 of 57

Documenting the Work:

"What Does 'Great' Look Like?"

34 of 57

Examples Must Include

Inputs, Outputs – And Reasoning

35 of 57

Without 10 “Gold Standard” Examples,

You Can’t Successfully Automate

36 of 57

Designing a New Process?

37 of 57

37

"What if We Did it This Way?"

👨

🤖

🤖

🤖

🔎

(With the speed & parallelizability of AI, do we really need prioritization?)

38 of 57

38

38

Prioritize by Value, Risk

Expected

Value

=

x

Task

Value

x

Task

Volume

% of Success

Task

Answer Investor Qs

Draft Sales Outreach

Answer Service Ticket

Value

$250

$10

$2

Volume

50

1000

5000

Potential Value

$12500

$10000

$10000

% of Success

50%

75%

90%

Expected Value

$6250

$7500

$9000

39 of 57

Optimizing AI Performance

40 of 57

40

Optimizing AI Performance: Information & Behavior

41 of 57

41

Best Practice Prompts, "Gold Standard" Examples

Role

You are a Customer Service agent…

Task Definition

Your job is to assign a priority level to a ticket…

Instructions

If the customer is a "VIP", assign "High" priority…

If an error code is provided, assign "Medium"...

Format

Use the format <reasoning> ... <answer>

Examples

Follow the below examples:

Sample Inputs

Customer: …; Message: …; Open issues:

Reasoning

The customer is requesting a password reset…

Outputs

Prioritization: High

Inputs

Customer: [info]; Message: [message];

Open issues: [list of issues]

42 of 57

42

Strategy

Explanation / Examples

Dynamic Example Selection

Select most similar / relevant examples from database at runtime

Majority Vote

Run prompt N times, then choose the most popular answer

Multi-Model

Run prompt with N different models, check for agreement

Multimodal

Include screenshots, Audio recordings, etc

Multi-Lingual

Translate to English, perform task, translate back

Majority vote across languages

Data Augmentation

Caption, ORC, segment images; Transcribe audio

Self-Correction

Give AI 3-5 chances to "Make it Better"

Claude Meta-Prompting

Maybe Claude can do a better job?

DSPy

Framework for algorithmically optimizing LM prompts & weights

Advanced Prompt Engineering

43 of 57

43

Optimizing Information: Retrieval Augmented Generation

44 of 57

Be Sure to Include Enough Context

45 of 57

45

Optimizing Behavior: The Fine-Tuning Loop

46 of 57

Expect ~3 Rounds of Fine-Tuning

(+ More for Edge Cases)

47 of 57

47

Optimizing LLM Accuracy

48 of 57

AI Optimization Best Practices

  • De-risk by Doing the Hard Part(s) First

  • Maximize Performance First, then Improve Cost and Speed

  • Compare AI to Human Performance, Not to Perfection

  • Remember: Each “9” of Performance Requires 10X More Work

  • Plan for Iteration and Obsolescence

  • Always Look at Your Data

49 of 57

Congratulations!

You are Automating Work with AI

50 of 57

Extras

51 of 57

Managing Common Trade Offs

Performance

-vs-

Investment

Accuracy

-vs-

Complexity

Development Cost

-vs-

Inference Cost

Value

-vs-

Risk

False Positives

-vs-

False Negatives

52 of 57

52

No-Code Platforms > Custom Development

Key Term

Definition

Workflow

The overall process

Trigger

"Every time X happens…"

Inputs

What do we need to do the task?

Logic

How the task is done

Output

The finished work product

Action

What happens next?

53 of 57

53

53

AI Engineer

  • AI Obsession
  • (Async) Computer Programming
  • Does the Hard Part First
  • Advanced Prompt Engineering
  • RAG
  • Tool Use
  • Fine-Tuning
  • Pay Attention to Outputs
  • Know When To Change Course
  • Trade-Off Optimization
  • UI Development
  • Automated Testing / Monitoring

AI Automation Skills Checklist

AI Advisor

  • AI Obsession
  • (Async) Human Communication
  • Knows AI Capabilities
  • Target Selection
  • Task Decomposition
  • Context Mapping
  • Example Curation
  • Process Design
  • Test Set Development
  • Viability Demonstration
  • Expectations Management
  • Training & Feedback

54 of 57

54

54

What You Want from an AI Advisor

Best Practices

Anti-Patterns

“What makes this hard?”

Assume It's Easy

Laser Focused on Core Cognitive Work

Preoccupation with "Plumbing"

Collect Real "Gold Standard" Examples

Make Up Data

Establish a Standard Test Set

Work Purely on "Vibes"

Test for Viability with Top Models

Prototype with Inferior Models

Maximize Performance First

Premature Focus on Costs

Review AI Outputs with Stakeholders

Assume It's "Good Enough"

Plan for Iteration

Promise "One and Done"

Build Confidence with Quick Wins

Propose Huge Projects Immediately

55 of 57

Everything's Harder in the Enterprise

  • Data Access – Can't Be Taken for Granted�
  • SLAs – Challenging Given Immature, External Dependencies�
  • Privacy & Data Security – API Providers Make Strong Promises�
  • Risk Management – Internal/External Use, High/Low Stakes�
  • Compliance – Are we complying with laws around AI use?

  • Aversion to Change – Makes Task Selection extra important!

56 of 57

Future Proofing Planning

  • LLMs are getting better, faster, cheaper → Worry about $$ Later�
  • Architectures are getting simpler → Don't Over-Engineer�
  • Fine-Tuning Options Improving → Build a Test Suite

  • Infrastructure Commoditizing → Work at Highest Possible Layer

  • No-Code is Maturing → Buy>Build (Mostly)

  • Last Mile Customization Drives Huge Value → "Copy & Customize"

  • Future Capabilities May Surprise Us! → Plan for Obsolescence

57 of 57

Thank You

This Presentation

All Presentations & Podcasts