1 of 41

How to Use Your Development Data to Make LLMs Code

Like You and Your Team

Tyler Dunn, Co-founder & CEO of Continue

2 of 41

Continue is on a mission to make building software feel like making music

Continue is a modular, open-source Copilot alternative

It’s built as a reusable set of components that enable developers to create their own copilot

3 of 41

First, why do I want to make LLMs code like me and my team?

4 of 41

As developers, we want to experience flow state

5 of 41

Getting stuck disrupts our flow state

This is why so many of us are excited about software development copilots

6 of 41

But bad / wrong suggestions disrupt flow state too

Blog post by

7 of 41

Okay, but what is development data?

8 of 41

Dev data = how you build software

Data on the stuff that happens in between Git commits

Created as a by-product of using LLMs while coding

9 of 41

How to use your development data

Step 1

Collect your dev data and look at it

Step 2

Improve the compound AI system

Step 3

Improve the Large Language Models (LLMs)

10 of 41

How to use your development data

Step 1

Collect your dev data and look at it

Step 2

Improve the compound AI system

Step 3

Improve the Large Language Models (LLMs)

11 of 41

Collect your dev data and look at it

12 of 41

Collect your development data and look at it

13 of 41

How to use your development data

Step 1

Collect your dev data and look at it

Step 2

Improve the compound AI system

Step 3

Improve the Large Language Models (LLMs)

14 of 41

How to use your development data

Step 1

Collect your dev data and look at it

Step 2

Improve the compound AI system

Step 3

Improve the Large Language Models (LLMs)

15 of 41

Improve the compound AI system

16 of 41

Software dev copilots are compound AI systems

Software development AI systems today include many components

“Chat” model
“Tab” model
“Embeddings” model
Local context engine
Server context engine
Filtering engine
etc.

17 of 41

Provide clear and comprehensive instructions

vs.

18 of 41

Add a system message with instructions that should always be followed

vs.

19 of 41

Automatically filter for obviously bad suggestions and ask for a new suggestion

Examples

Block suggestions matching public code
Ensure only certain libraries are used
Make sure suggestions pass your linter
etc.

20 of 41

Improve how context from your codebase + software development lifecycle is retrieved and used

21 of 41

Select the right model for the job

“Chat” model

Typically 30B+ parameters
Highest quality responses
Often run on server or used via an API endpoint
Examples: GPT-4, DeepSeek Coder 33B, Claude 3, Code Llama 70B, etc.

“Tab” model

Typically 1-15B parameters
Quality vs. latency tradeoffs
Often run locally or on server
Examples: Codex, StarCoder 2, Replit Code, etc.

22 of 41

How to use your development data

Step 1

Collect your dev data and look at it

Step 2

Improve the compound AI system

Step 3

Improve the Large Language Models (LLMs)

23 of 41

How to use your development data

Step 1

Collect your dev data and look at it

Step 2

Improve the compound AI system

Step 3

Improve the Large Language Models (LLMs)

24 of 41

Improve the LLMs

25 of 41

The ideal data for an LLM

26 of 41

By-product of using LLMs → close to ideal data

When you use LLMs while coding, you create development data that shows

The step-by-step process a developer takes to complete a task
The context a developer uses to decide what to do at each step
Natural language that explains the reasoning behind the steps

27 of 41

Google is already using their development data

28 of 41

So what development data is helpful now?

Examples

Tab-autocomplete accepted / rejected suggestions
/edit accepted / rejected suggestions
Thumbs up / down on chat responses
The “apply this code” button
Manual edits 1 min, 1 hour, 1 day later
What results from RAG are used in the response
etc.

29 of 41

Use fine-tuning to improve existing LLMs

dltHub fine-tuned StarCoder 2 on their codebase, docs, accepted tab autocomplete data, etc.

Domain-specific instructions + hundreds of GPU hours

GigaML is fine-tuning StarCoder 2 on accepted tab autocomplete data

30 of 41

Use domain-adaptive continued pre-training to improve open-source LLMs

How Code Llama was created by Meta

How ChipNeMo was created by Nvidia

Billions of tokens of relevant company data + thousands of GPU hours

31 of 41

Pre-train your own LLM from scratch

OpenAI, MosaicML, Together, etc. will help you train your own custom model

Trillions of tokens of Internet data + company data + millions of GPU hours

Replit trained their own model

32 of 41

How to use your development data

Step 1

Collect your dev data and look at it

Step 2

Improve the compound AI system

Step 3

Improve the Large Language Models (LLMs)

33 of 41

TL;DR: Dev data can be used to automate even more

34 of 41

Thanks!

We are at the beginning on this journey :)

Lots more R&D to come!

We are hiring

35 of 41

Appendix

1 of 41

2 of 41

3 of 41

4 of 41

5 of 41

6 of 41

7 of 41

8 of 41

9 of 41

10 of 41

11 of 41

12 of 41

13 of 41

14 of 41

15 of 41

16 of 41

17 of 41

18 of 41

19 of 41

20 of 41

21 of 41

22 of 41

23 of 41

24 of 41

25 of 41

26 of 41

27 of 41

28 of 41

29 of 41

30 of 41

31 of 41

32 of 41

33 of 41

34 of 41

35 of 41

36 of 41

37 of 41

38 of 41

39 of 41

40 of 41

41 of 41