1 of 47

LLM application development with LangChain

Lance Martin

Software Engineer, LangChain

@RLanceMartin

2 of 47

LangChain makes it as easy as possible to develop LLM-powered applications.

3 of 47

Building Blocks

Open source

Document

Loaders

Document

Transformers

Embedding

Models

Prompts

LLMs

Use-cases

Template apps

Chat-Langchain

Platform

Observability, Data Management, Eval

Overview

LangSmith + Prompt Hub

Chains

Vectorstores

4 of 47

Central concepts

5 of 47

Two ways pre-trained LLMs learn

Weight updates via fine-tuning

Prompt (e.g., via retrieval)

OpenAI, Fine-tuning and hallucinations, Anyscale

Cramming before a test

Bad for factual recall, good for tasks

Form (e.g., extraction, text-to-SQL)

Open book exam

Good for factual recall

Facts (e.g., QA)

6 of 47

LLM

(Memory only)

Search engines

(Retrieval only)

Retrieval Augmented Generation

(Add task-related documents to

LLM content window / working memory)

7 of 47

Building Blocks

8 of 47

Unstructured

Public

Structured

Proprietary

Private / Company Data

.pdf, .txt, .json., ,md, …

Datastores

Document

Loaders

Document Loaders: > 140 Integrations

Integrations hub

9 of 47

Text splitters

Playground

Document

Transformers

10 of 47

Beyond basic splitting: Context-aware splitting, Function calling (Doctran)

Context-aware splitting, Doctran

Document

Transformers

Def foo(...):

for section in sections:

sect = section.find("head")

Code:

# Introduction

Notion templates are effective . . .

Markdown:

PDF:

Abstract

The dominant sequence

transduction models . . .

Def foo

for section in sections:

sect = section.find("head")

Introduction

Notion templates are effective . . .

The dominant sequence . . .

Abstract

Document

Loaders

Document

Transformers

Embeddings +

Storage

11 of 47

> 40 vectorstore integrations, > 30 embeddings

Document

Transformers

Embeddings +

Storage

Document Loaders

(> 140 Integrations)

Document

Transformers

(e.g., Text Splitters, OAI functions)

Vector Storage

(> 40 Integrations)

Embeddings

(> 30 Integrations)

12 of 47

Hosted or Private vectorstore + embeddings

GPT4All embeddings

Document

Transformers

Embeddings +

Storage

Vector Storage

Embeddings

Hosted

Private (on device)

13 of 47

Document Loaders

(> 140 Integrations)

Document

Transformers

(e.g., Text Splitters, OAI functions)

Vector Storage

(> 40 Integrations)

Embeddings

(> 30 Integrations)

> 60 LLM integrations

LLM

LLMs

(> 60 Integrations)

14 of 47

LLM landscape

LLM

	OpenAI	Anthropic	Llama-2 (SOTA OSS)
Context Window (tokens)	4k - 32k	100k	4k
Performance	GPT-4 SOTA (best overall)	Claude-2 getting closer to GPT4	70b on par w/ GPT-3.5-turbo*
Cost	$0.06 / 1k token (input)	4-5x cheaper than GPT-4-32K	Free

Code

Math

Language

*Llama2-70b on par w/ GPT-3.5-turbo on language , but lags on coding

15 of 47

180B

340B

BLOOM

OPT

LLaMA

Falcon

LLaMA-2

Base Model

(Training tokens)

Fine Tune

(Instructions)

1.4T

52k

70k

Vicuna

Alpaca

Koala

150k

800K

GPT4All

300k

Nous

300k

Nous

2T

15k

Dolly

StableLM

1.5T

Tuned

1.5M

15k

MPT-Instruct

100k

LLaMA-2-Chat

GPT-J

GPT-NeoX-20b

MPT

400B

1T

Open Source LLMs

SOTA

LLM

16 of 47

OSS models can run on device (private)

Ollama, Llama.cpp

LLM

Llama2-13b running ~50 tok / sec (Mac M2 max, 32gb)

17 of 47

Integrations Hub

Integrations hub

18 of 47

Use Cases

19 of 47

RAG: Load working memory w/ retrieved information relevant to a task

Use case documentation

Splits

Question

Relevant

Splits

Query

PDFs

Prompt

URLs

Database

Documents

LLM

Answer

Document Loading

Splitting

Storage

Retrieval

Output

RAG

20 of 47

Pick desired level of abstraction

Use case documentation

VectorstoreIndexCreator

RetrievalQA

Answer

Load_QA_chain

Answer

Relevant

Splits

Abstraction / Simplicity

RAG

21 of 47

Or, use runnables

Use case documentation

RAG

22 of 47

LangSmith trace for RetrievalQA chain

RAG

LangSmith trace

Retrieved docs

Question

Trace

Response

Prompt

23 of 47

Distilling useful ideas / tricks to improve RAG

RAG

Idea	Example	Sources
Base case RAG	Top K retrieval on embedded document chunks, return doc chunks for LLM context window	Pinecone docs (here, here, here). Supported by many vectorstores and LLM frameworks.
Condensed content embedding	Top K retrieval on embedded document summaries, but return full doc for LLM context window	LangChain Multi Vector Retriever LLama-Index Node References
Condensed content embedding	Top K retrieval on embedded chunks or sentences, but return expanded window or full doc	LangChain Parent Document Retriever Llama-Index Sentence Window
Fine-tune RAG embeddings	Fine-tune embedding model on your data	Glean insights from fine-tuning LangChain fine-tuning guide Llama-Index embeddings fine-tuning
2-stage RAG	First stage keyword search followed by second stage semantic Top K retrieval	Cohere re-rank
Agents	May benefit more complex RAG use-cases	LangChain agents LLama-Index multi-document agents

24 of 47

Useful ideas / tricks

RAG

Top K RAG can fail when we do not …

Documents

Condensed Content

Chunk

Summary

Questions

Embed

Chunk

Question

Top K RAG can fail when we do not ...

Summary

Store documents with condensed content embedding

When can top K RAG fail?

Question + Embedding

Answer

Top K RAG can fail when we do not ...

Retrieve full documents

LLM

LangChain Multi Vector Retriever

25 of 47

Chat: Persist conversation history

Chatbots

Use case documentation

Question

Answer

Prompt

Memory

Retrieval (Optional)

Retrieved Chunks

Storage

LLM

26 of 47

LangSmith trace for LLMChain w/ chat model + memory

Chatbots

LangSmith trace, also works with retrieval

Response

Prompt

Chat history

27 of 47

Summarization: Summarize a corpus of text

Use case documentation

Summary

Final

Summary

Final

Summary

Embed-and-cluster

Cluster

Summaries

Summarize each cluster

Prompt

Summarize themes in

the group of docs

LLM

Prompt

Extract final summary

from input list

LLM

Prompt

Summarize themes in

the group of docs

LLM

Does not fit in LLM

context window

Prompt

Extract final summary

from input list

LLM

Stuff document in context window

Distill into summary (reduce)

Summarize chunks (map)

Sample from clusters

Fits in LLM

context window

Document Loader

Summarize

28 of 47

Case-study: Apply to thousands of questions asked about LangChain docs

Blog Post

Summarized themes from LangChain questions using different methods and LLMs

User Questions

(Thousands)

Summarize

29 of 47

Extraction: Getting structured output from LLMs

Use case documentation

Input

Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.

{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'}

{'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}

LLM

Function call

Schema (tell LLM schema we want)

schema = {

"properties": {

"name": {"type": "string"},

"height": {"type": "integer"},

"hair_color": {"type": "string"},

},}

Function (tell the LLM function)

"name": "information_extraction",

"description": "Extracts information from the passage.",

"parameters": {schema}

Extraction

30 of 47

LangSmith Trace

Extraction

Response

Prompt

Output from function call

JSON parser

LangSmith trace for LLMChain w/ function call + output parsing

31 of 47

Use Case Documentation

Text-to-SQL

Question

Answer

LLM

Query

LLM

Optional: SQL Agent

SQL

32 of 47

LangSmith trace, Paper

LangSmith trace for text-to-sql

SQL

CREATE TABLE description for each table and and three example rows in a SELECT statement

Response

Prompt

33 of 47

Chat Chains

ConversationalRetrievalChain

Agents

Basic LLM

API / Function Chains

APIChain

Access to tools

Access to memory

Yes

No

Yes

No

Agents

34 of 47

Tools

Memory

Plan

Agent

Action

Tools

(> 60 tools + toolkits)

LLMs

(> 60 integrations)

Agents

(> 15 agent types)

Long-term

(> 40 vectorstores)

Short-term

(Buffers)

Autonomous

Simulation

Action

Agents

Large agent ecosystem (will focus on ReAct as one example)

35 of 47

Multi-Step Reasoning

Action-Observation (Tool Use)

Yes

No

Yes

No

*Condition LLM to show its work

Agents

36 of 47

LangSmith trace

LangSmith trace for SQL ReAct agent

Agents

Response

Prompt

Tool / action

Observation

(Chain-of-thought) reasoning

Uses tool at

next step

37 of 47

Blog Post

Case-study on reliability: Web researcher started an agent, retriever was better

Research

Question

Answer

Vector

Storage

LLM

Query 1

Query N

Query 2

HTML

pages

LLM

Retrieved

Chunks

Document Loader

Document Retrieval + QA

Document

Transformation

Agents

38 of 47

Case-study on reliability: Web researcher started an agent, retriever was better

Agents

Hosted streamlit app

39 of 47

Tooling

40 of 47

Weight updates via fine-tuning

Prompt (e.g., via retrieval)

OpenAI, Fine-tuning and hallucinations, Anyscale

Cramming before a test

Bad for factual recall, good for tasks

Form (e.g., extraction)

Open book exam

Good for factual recall

Facts (e.g., QA)

LangSmith Case study: Fine-tuning for extraction

41 of 47

Streamlit app

LangSmith Case study: Fine-tuning for extraction of knowledge graph triples

42 of 47

Blog

LangSmith Case study: Fine-tuning

Dataset

App generations

Train

Test

Data

Cleaning

LLM

Synthetic

Data

Eval

Fine-tune

43 of 47

CoLab for LLaMA fine-tuning, CoLab for GPT-3.5 fine-tuning

LangSmith evaluation: fine-tuning vs few-shot prompting for triple extraction

44 of 47

CoLab for LLaMA fine-tuning, CoLab for GPT-3.5 fine-tuning

LLaMA-7b-chat informal answers with hallucinations

45 of 47

CoLab for LLaMA fine-tuning, CoLab for GPT-3.5 fine-tuning

LLaMA-7b-chat fine-tuning closer to reference

46 of 47

Case-study lessons

LangSmith can help address pain points in the fine-tuning workflow. Data collection, evaluation, and inspection of results
RAG or few-shot prompting should be considered first! Few-shot prompting GPT-4 performed best
Fine-tuning small open source models can outperform much larger generalist models. Fine-tuned LLaMA2-chat-7B better than GPT-3.5-turbo

Blog

47 of 47

Questions