1 of 27

2 of 27

Emerging Stack for LLM-powered Products

What does it take to go from PoC to Production

Kuldeep Yadav, PhD

https://www.linkedin.com/in/kyadav/

Real World

AI Bootcamp

3 of 27

Agenda

Why should you think about LLM stack?

Lifecycle of a LLM product

Design patterns to take LLM products from PoC to Production

Key libraries/platforms that you need

Infrastructure/tooling that are yet to be built

Real World

AI Bootcamp

4 of 27

What is the most important ingredient of a successful LLM-powered product?

Slido: #3067587

Real World

AI Bootcamp

5 of 27

Few years back …

Now…

Everyone has access to same model

How do you differentiate?

Real World

AI Bootcamp

6 of 27

Why there is a different stack for LLMs vs ML?

Real World

AI Bootcamp

7 of 27

Typical ML Pipeline

Weeks

Weeks

Weeks

Infrequent ( Months)

Secret Sauce

Real World

AI Bootcamp

8 of 27

LLM Lifecycle

Days

Days

Days

Very Frequent

What is the secret sauce?

Real World

AI Bootcamp

9 of 27

User Expectations

Ref: https://www.sh-reya.com/blog/ai-engineering-short/

Real World

AI Bootcamp

10 of 27

Typical Development Lifecycle with LLMs

Scope

Does your product/feature really need LLMs?

Build

Iterate to design, develop, and test the product

Deploy

Deploy it for real users; system reliability; plans to scale

Monitor

Monitor the performance, availability closely

Iterate to improve

To ask the right questions is already half the solution of a problem – Carl Jung

Real World

AI Bootcamp

11 of 27

Scope: Key Question to #ASK

Scope

“Testing the Waters”

Does this task/feature/product need LLMs?

What are business metrics that it will impact?

Where does this fit in user journey?

How will I present the output to the users?

Real World

AI Bootcamp

12 of 27

Scope: Things to Do

  • Create a dataset of at least 10 examples with input and output pairs

  • Write prompt and generate the LLM output for these examples

  • Compare the output across paid (GPT, Gemini, Claude, etc) and open-source LLMs (Llama)

  • Evaluate the initial efforts/complexity in prompting techniques,
    • Breaking down the prompts
    • In-context examples
    • Single-shot vs Agentic

  • Is your product follows a well-established design?
    • Semantic Search/Retrieval-augmented Generation (RAG)
    • Text to SQL
    • Agents

Real World

AI Bootcamp

13 of 27

What are agents really?

Real World

AI Bootcamp

14 of 27

Scope: Tools to Use

ChainForge (Open-source, easy to install)

Athina IDE (SaaS, easy to try)

Several other tools: LangChain, Jupyter Notebooks, ChatGPT, and so on….

Real World

AI Bootcamp

15 of 27

What are the common mistakes that people make in scoping?

Slido: #3067587

Real World

AI Bootcamp

16 of 27

Build: Making it real

  • Continue asking questions to yourself

  • How do I break my business metrics as “evaluation metrics”? Are my evaluation metrics measurable?

  • Do my workflow need tool calling?

  • What sort of latency that my application needs to have?

  • How do I build product UI/UX to take care of hallucinations, errors?

  • What are the high-level cost estimates?

Real World

AI Bootcamp

17 of 27

Build: Benchmarking Datasets

  • Create a data annotation pipeline, quality data collection/calibration depends on how easy it is for users to rate

LabelStudio (Open-source, easy to use templates)

Real World

AI Bootcamp

18 of 27

Build: Workflow Orchestration

https://www.techtarget.com/searchenterpriseai/definition/LangChain

  • Complex LLM applications are interconnected “graphs” of LLM calls and require careful orchestration
  • Pre-built modules and workflow capabilities for your task

Real World

AI Bootcamp

19 of 27

Build: Workflow Orchestrators

  • Pick a workflow orchestator well-known for a task (i.e. LangGraph for Agents, LLamaIndex for RAG)

  • Examples: LangChain, LangGraph, LLamaIndex, flowise

Real World

AI Bootcamp

20 of 27

Build: Prompting

  • New prompting paradigm are “emergent”, keep yourself updated
    • Chain-of-thoughts
    • Reflection
    • Read it again
  • Iterate, iterate, iterate on the evaluation datasets to see what works
    • Single-shot
    • In-context learning
    • Agentic
    • Or do you need fine-tuning?
  • Use tools for storing different version prompts, experiment history, evaluation results
  • Prompt optimization to find the best prompt

Social Media

LangChain, PromptHub, Athina

DsPy, Adelflow

Real World

AI Bootcamp

21 of 27

Build: Testing/Evaluating LLM Products

Real World

AI Bootcamp

22 of 27

Build: Investing in create a data fly-wheel

https://www.sh-reya.com/blog/ai-engineering-flywheel/

Continuous Development

Continuous Integration

Continuous Improvement

Real World

AI Bootcamp

23 of 27

Build: Let's take a deeper dive in Evaluation

Evaluation has the most action in LLM Tooling Space; 8-10 good SaaS providers including LangSmith, Galileo, BrainTrust, Arize, etc

    • Assertions using PyTest or some other libraries

Unit Tests

    • Specific Q&A
    • LLM as a Judge

Model Evaluations

    • Analyze the traces as you build
    • Keep building the training datasets

Human Evaluations

    • Useful where inter-human subjectivity plays a major role

A/B Testing

Real World

AI Bootcamp

24 of 27

Methodological evaluation is a major difference between mediocre and great apps!

Real World

AI Bootcamp

25 of 27

Deployment and Monitoring�

  • Use caching to save costs

  • Continuous observability
    • Observe costs, failure rates, new data patterns
    • Keep track of key performance metrics
    • Automatic flag the drift across metrics

  • Evaluation is a constant across the whole lifecycle
    • Keep iterating on “Build” and adding new data points to the mix

  • Online guardrails to prevent attacks

Key Tools: PortKey, Galileo, Redis, GPTCache, Helicone

Real World

AI Bootcamp

26 of 27

Design Patterns of Successful Builders

Make data your friend

    • Collect, Analyze, Annotate, Breathe it
    • Invest in data pipelines and continuous feedback generation

Focus deeply on your user, workflow, domain, and the use-case

    • Everyone else has access to the same model

Evaluation should be omni-present in your entire lifecycle

    • Do not make product testing an afterthought especially with LLMs

Iterate quickly

    • Speed of experimentation and shipping is the new moat

Observability

    • You cannot take your eyes off even for a week

Tools and Stack

    • They are great but not replacement for a sound methodology and approach

Real World

AI Bootcamp

27 of 27

Thank You!

Real World

AI Bootcamp