1 of 47

LLM application development with LangChain

Lance Martin

Software Engineer, LangChain

@RLanceMartin

2 of 47

LangChain makes it as easy as possible to develop LLM-powered applications.

3 of 47

Building Blocks

Open source

Document

Loaders

Document

Transformers

Embedding

Models

Prompts

LLMs

Use-cases

Template apps

Platform

Observability, Data Management, Eval

Overview

RAG Chain

LangServe

More …

Vectorstores

Chat

SQL Chain

More …

4 of 47

Central concepts

5 of 47

Two ways pre-trained LLMs learn

Weight updates via fine-tuning

Prompt (e.g., via retrieval)

Cramming before a test

Bad for factual recall, good for tasks

Form (e.g., extraction, text-to-SQL)

Open book exam

Good for factual recall

Facts (e.g., QA)

6 of 47

LLM

(Memory only)

Search engines

(Retrieval only)

Retrieval Augmented Generation

(Add task-related documents to

LLM content window / working memory)

7 of 47

Building Blocks

8 of 47

Unstructured

Public

Structured

Proprietary

Private / Company Data

.pdf, .txt, .json., ,md, …

Datastores

Document

Loaders

Document Loaders: > 140 Integrations

9 of 47

Text splitters

Document

Transformers

10 of 47

Beyond basic splitting: Context-aware splitting, Function calling (Doctran)

Document

Transformers

Def foo(...):

for section in sections:

sect = section.find("head")

Code:

# Introduction

Notion templates are effective . . .

Markdown:

PDF:

Abstract

The dominant sequence

transduction models . . .

Def foo

for section in sections:

sect = section.find("head")

Introduction

Notion templates are effective . . .

The dominant sequence . . .

Abstract

Document

Loaders

Document

Transformers

Embeddings +

Storage

11 of 47

> 40 vectorstore integrations, > 30 embeddings

Document

Transformers

Embeddings +

Storage

Document Loaders

(> 140 Integrations)

Document

Transformers

(e.g., Text Splitters, OAI functions)

Vector Storage

(> 40 Integrations)

Embeddings

(> 30 Integrations)

12 of 47

Hosted or Private vectorstore + embeddings

Document

Transformers

Embeddings +

Storage

Vector Storage

Embeddings

Hosted

Private (on device)

13 of 47

Document Loaders

(> 140 Integrations)

Document

Transformers

(e.g., Text Splitters, OAI functions)

Vector Storage

(> 40 Integrations)

Embeddings

(> 30 Integrations)

> 60 LLM integrations

LLM

LLMs

(> 60 Integrations)

14 of 47

LLM landscape

LLM

OpenAI

Anthropic

Llama-2 (SOTA OSS)

Context Window (tokens)

4k - 32k

100k

4k

Performance

GPT-4 SOTA (best overall)

Claude-2 getting closer to GPT4

70b on par w/ GPT-3.5-turbo*

Cost

$0.06 / 1k token (input)

4-5x cheaper than GPT-4-32K

Free

Code

Math

*Llama2-70b on par w/ GPT-3.5-turbo on language, but lags on coding

15 of 47

180B

340B

BLOOM

OPT

LLaMA

Falcon

LLaMA-2

Base Model

(Training tokens)

Fine Tune

(Instructions)

1.4T

52k

70k

Vicuna

Alpaca

Koala

150k

800K

GPT4All

300k

300k

2T

15k

StableLM

1.5T

Tuned

1.5M

15k

100k

LLaMA-2-Chat

GPT-J

GPT-NeoX-20b

MPT

400B

1T

Open Source LLMs

SOTA

LLM

16 of 47

OSS models can run on device (private)

LLM

Llama2-13b running ~50 tok / sec (Mac M2 max, 32gb)

17 of 47

Integrations Hub

18 of 47

Use Cases

19 of 47

RAG: Load working memory w/ retrieved information relevant to a task

Splits

Question

Relevant

Splits

Query

PDFs

Prompt

URLs

Database

Documents

LLM

Answer

Document Loading

Splitting

Storage

Retrieval

Output

RAG

20 of 47

Pick desired level of abstraction

VectorstoreIndexCreator

RetrievalQA

Answer

Answer

Load_QA_chain

Answer

Relevant

Splits

Abstraction / Simplicity

RAG

21 of 47

Or, use runnables

RAG

22 of 47

LangSmith trace for RetrievalQA chain

RAG

Retrieved docs

Question

Trace

Response

Prompt

23 of 47

Distilling useful ideas / tricks to improve RAG

RAG

Idea

Example

Sources

Base case RAG

Top K retrieval on embedded document chunks, return doc chunks for LLM context window

Pinecone docs (here, here, here). Supported by many vectorstores and LLM frameworks.

Condensed content embedding

Top K retrieval on embedded document summaries, but return full doc for LLM context window

Top K retrieval on embedded chunks or sentences, but return expanded window or full doc

Fine-tune RAG embeddings

Fine-tune embedding model on your data

2-stage RAG

First stage keyword search followed by second stage semantic Top K retrieval

Agents

May benefit more complex RAG use-cases

24 of 47

Useful ideas / tricks

RAG

Top K RAG can fail when we do not …

Documents

Condensed Content

Chunk

Summary

Questions

Embed

Chunk

Question

Top K RAG can fail when we do not ...

Summary

Store documents with condensed content embedding

When can top K RAG fail?

Question + Embedding

Answer

Top K RAG can fail when we do not ...

Retrieve full documents

LLM

25 of 47

Chat: Persist conversation history

Chatbots

Question

Answer

Prompt

Memory

Retrieval (Optional)

Retrieved Chunks

Storage

LLM

26 of 47

LangSmith trace for LLMChain w/ chat model + memory

Chatbots

Response

Prompt

Chat history

27 of 47

Summarization: Summarize a corpus of text

Summary

Final

Summary

Final

Summary

Embed-and-cluster

Cluster

Summaries

Summarize each cluster

Prompt

Summarize themes in

the group of docs

LLM

Prompt

Extract final summary

from input list

LLM

Prompt

Summarize themes in

the group of docs

LLM

Does not fit in LLM

context window

Prompt

Extract final summary

from input list

LLM

Stuff document in context window

Distill into summary (reduce)

Summarize chunks (map)

Sample from clusters

Fits in LLM

context window

Document Loader

Summarize

28 of 47

Case-study: Apply to thousands of questions asked about LangChain docs

Summarized themes from LangChain questions using different methods and LLMs

User Questions

(Thousands)

Summarize

29 of 47

Extraction: Getting structured output from LLMs

Input

Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.

{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'}

{'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}

LLM

Function call

Schema (tell LLM schema we want)

schema = {

"properties": {

"name": {"type": "string"},

"height": {"type": "integer"},

"hair_color": {"type": "string"},

},}

Function (tell the LLM function)

"name": "information_extraction",

"description": "Extracts information from the passage.",

"parameters": {schema}

Extraction

30 of 47

Extraction

Response

Prompt

Output from function call

LangSmith trace for LLMChain w/ function call + output parsing

31 of 47

Text-to-SQL

Question

Answer

LLM

Query

LLM

Optional: SQL Agent

SQL

32 of 47

LangSmith trace for text-to-sql

SQL

CREATE TABLE description for each table and and three example rows in a SELECT statement

Response

Prompt

33 of 47

Chat Chains

ConversationalRetrievalChain

Basic LLM

API / Function Chains

APIChain

Access to tools

Access to memory

Yes

No

Yes

No

Agents

34 of 47

Tools

Memory

Plan

Agent

Action

Tools

(> 60 tools + toolkits)

Short-term

(Buffers)

Autonomous

Simulation

Action

Agents

Large agent ecosystem (will focus on ReAct as one example)

35 of 47

Multi-Step Reasoning

Action-Observation (Tool Use)

Yes

No

Yes

No

*Condition LLM to show its work

Agents

36 of 47

LangSmith trace for SQL ReAct agent

Agents

Response

Prompt

Tool / action

Observation

(Chain-of-thought) reasoning

Uses tool at

next step

37 of 47

Case-study on reliability: Web researcher started an agent, retriever was better

Research

Question

Answer

Vector

Storage

LLM

Query 1

Query N

Query 2

HTML

pages

LLM

Retrieved

Chunks

Document Loader

Document Retrieval + QA

Document

Transformation

Agents

38 of 47

Case-study on reliability: Web researcher started an agent, retriever was better

Agents

39 of 47

Tooling

40 of 47

Weight updates via fine-tuning

Prompt (e.g., via retrieval)

Cramming before a test

Bad for factual recall, good for tasks

Form (e.g., extraction)

Open book exam

Good for factual recall

Facts (e.g., QA)

LangSmith Case study: Fine-tuning for extraction

41 of 47

LangSmith Case study: Fine-tuning for extraction of knowledge graph triples

42 of 47

LangSmith Case study: Fine-tuning

Dataset

App generations

Train

Test

Data

Cleaning

LLM

Synthetic

Data

Eval

Fine-tune

43 of 47

LangSmith evaluation: fine-tuning vs few-shot prompting for triple extraction

44 of 47

LLaMA-7b-chat informal answers with hallucinations

45 of 47

LLaMA-7b-chat fine-tuning closer to reference

46 of 47

Case-study lessons

  • LangSmith can help address pain points in the fine-tuning workflow. Data collection, evaluation, and inspection of results
  • RAG or few-shot prompting should be considered first! Few-shot prompting GPT-4 performed best
  • Fine-tuning small open source models can outperform much larger generalist models. Fine-tuned LLaMA2-chat-7B better than GPT-3.5-turbo

47 of 47

Questions