1 of 17

2 of 17

DevTools for Large Language Models: Unlocking the Future of AI-Driven Applications

Diego M. Oppenheimer - @doppenhe

3 of 17

Foundational Models vs Large Language Models

FMs:

  • Broad general purpose models aim to capture broad range of knowledge and capabilities (GPT-4, CLIP, DALL-E)

  • Billions of parameters

  • Designed to be adapted and fine tuned to be task specific using limited data.

LLMs

  • Specifically focused on language understanding and generation. (GPT, BERT, LLAMA)

  • Trained on massive text data sets from multiple domains

  • Tasks include classification, generation, translation, summarization and more.

4 of 17

A quick walk down memory lane

Model Size

Capabilities

Big Bang

No models

Emptiness

Self-Supervision

Data

Nothing

125-350M

Vomit up reddit

Small web/book dump

Who Could Get to 100B First?

1-100B

Taskless text generation

All the web

Instructed Tuned and Massive

10-200+B

Task generality and listen to feedback

Heavily curated and labeled web-scale data (probs cost billions)

As size and data quality increase, you get more generalization and in-context behaviors but higher cost

5 of 17

Entering the “Holy $#A!” phase

Early stage of development around new platforms tend to be simple wrappers:

  • Micro processor -> Single board computers

  • Operating system -> wrappers on utilities

  • Internet -> wrappers on unix and network utilities

  • GAI today -> wrappers on LLMs

6 of 17

Thriving developer ecosystem

7 of 17

The Development Process with LLMs

8 of 17

The Development Process with LLMs

GAI today -> wrappers on LLMs

9 of 17

Orchestration, Experimentation and Prompting Tools

  • LLMs “APIs” are natural language in form of prompts

  • Mastering the API requires tinkering and experimenting with single and chained prompts

  • Various tools have emerged that provide ability:
    • to connect to data sources,
    • provide indices,
    • coordinate chains of calls
    • other core abstractions

*not complete list

10 of 17

Knowledge Retrieval and Vector Databases

  • Provide LLMs with contextual knowledge

  • “Memory”

  • Vector databases provide:
    • efficient vector similarity searches (retrieval)
  • efficient storage of up to billions of embeddings
  • efficient indexing capabilities

*not complete list

11 of 17

Building V2 of LLM Features- Fine Tuning Language Models

*not complete list

  • Goal make models:
    • More accurate
    • Faster and cheaper to run

  • Quality of labels more important than quantity

12 of 17

Monitoring, Observability and Testing

*not complete list

  • LLM Performance: Unique challenge; assess quality via user interactions.

  • A/B Testing: Evaluate LLM features via product analytics (full workflow)

  • Eye Test vs. HELM: as OSS gains traction comparison frameworks will become more critical.

  • Performance Impact: Affects UX; guides model selection and fine-tuning.

13 of 17

Testing, Assurance and Guardrails

*not complete list

  • LLMs can generate plausible but incorrect information which pose risk for ‘low-affordability’ use cases like medical diagnosis, financial decisions.

  • Guardrails are needed to ensure safety , accuracy and reliability

  • Tools that allow users to define rules, schemas, and heuristics for LLM outputs will be crucial to build trust in these systems

14 of 17

Some future predictions

* Iteration cycles define winning developer experiences:

* Larger models, even more access and powerful wrappers:

15 of 17

* GPT-You:

MLOps tooling evolves to enable “personalized” FMs: trained on your own data and workflows:

  • Data is the most durable moat
  • Last mile is where the value is generated

16 of 17

Thank you

Diego Oppenheimer

doppenheimer

@doppenhe

17 of 17

Credits

David Hershey

Laurel Orr

Matt Turk