1 of 41

How to Use Your Development Data to Make LLMs Code

Like You and Your Team

Tyler Dunn, Co-founder & CEO of Continue

2 of 41

Continue is on a mission to make building software feel like making music

Continue is a modular, open-source Copilot alternative

It’s built as a reusable set of components that enable developers to create their own copilot

3 of 41

First, why do I want to make LLMs code like me and my team?

4 of 41

As developers, we want to experience flow state

5 of 41

Getting stuck disrupts our flow state

This is why so many of us are excited about software development copilots

6 of 41

But bad / wrong suggestions disrupt flow state too

Blog post by

7 of 41

Okay, but what is development data?

8 of 41

Dev data = how you build software

Data on the stuff that happens in between Git commits

Created as a by-product of using LLMs while coding

9 of 41

How to use your development data

Step 1

Collect your dev data and look at it

Step 2

Improve the compound AI system

Step 3

Improve the Large Language Models (LLMs)

10 of 41

How to use your development data

Step 1

Collect your dev data and look at it

Step 2

Improve the compound AI system

Step 3

Improve the Large Language Models (LLMs)

11 of 41

Collect your dev data and look at it

12 of 41

Collect your development data and look at it

13 of 41

How to use your development data

Step 1

Collect your dev data and look at it

Step 2

Improve the compound AI system

Step 3

Improve the Large Language Models (LLMs)

14 of 41

How to use your development data

Step 1

Collect your dev data and look at it

Step 2

Improve the compound AI system

Step 3

Improve the Large Language Models (LLMs)

15 of 41

Improve the compound AI system

16 of 41

Software dev copilots are compound AI systems

Software development AI systems today include many components

  • “Chat” model
  • “Tab” model
  • “Embeddings” model
  • Local context engine
  • Server context engine
  • Filtering engine
  • etc.

17 of 41

Provide clear and comprehensive instructions

vs.

18 of 41

Add a system message with instructions that should always be followed

vs.

19 of 41

Automatically filter for obviously bad suggestions and ask for a new suggestion

Examples

  • Block suggestions matching public code
  • Ensure only certain libraries are used
  • Make sure suggestions pass your linter
  • etc.

20 of 41

Improve how context from your codebase + software development lifecycle is retrieved and used

21 of 41

Select the right model for the job

“Chat” model

  • Typically 30B+ parameters
  • Highest quality responses
  • Often run on server or used via an API endpoint
  • Examples: GPT-4, DeepSeek Coder 33B, Claude 3, Code Llama 70B, etc.

“Tab” model

  • Typically 1-15B parameters
  • Quality vs. latency tradeoffs
  • Often run locally or on server
  • Examples: Codex, StarCoder 2, Replit Code, etc.

22 of 41

How to use your development data

Step 1

Collect your dev data and look at it

Step 2

Improve the compound AI system

Step 3

Improve the Large Language Models (LLMs)

23 of 41

How to use your development data

Step 1

Collect your dev data and look at it

Step 2

Improve the compound AI system

Step 3

Improve the Large Language Models (LLMs)

24 of 41

Improve the LLMs

25 of 41

The ideal data for an LLM

26 of 41

By-product of using LLMs → close to ideal data

When you use LLMs while coding, you create development data that shows

  • The step-by-step process a developer takes to complete a task
  • The context a developer uses to decide what to do at each step
  • Natural language that explains the reasoning behind the steps

27 of 41

Google is already using their development data

28 of 41

So what development data is helpful now?

Examples

  • Tab-autocomplete accepted / rejected suggestions
  • /edit accepted / rejected suggestions
  • Thumbs up / down on chat responses
  • The “apply this code” button
  • Manual edits 1 min, 1 hour, 1 day later
  • What results from RAG are used in the response
  • etc.

29 of 41

Use fine-tuning to improve existing LLMs

dltHub fine-tuned StarCoder 2 on their codebase, docs, accepted tab autocomplete data, etc.

Domain-specific instructions + hundreds of GPU hours

GigaML is fine-tuning StarCoder 2 on accepted tab autocomplete data

30 of 41

Use domain-adaptive continued pre-training to improve open-source LLMs

How Code Llama was created by Meta

How ChipNeMo was created by Nvidia

Billions of tokens of relevant company data + thousands of GPU hours

31 of 41

Pre-train your own LLM from scratch

OpenAI, MosaicML, Together, etc. will help you train your own custom model

Trillions of tokens of Internet data + company data + millions of GPU hours

Replit trained their own model

32 of 41

How to use your development data

Step 1

Collect your dev data and look at it

Step 2

Improve the compound AI system

Step 3

Improve the Large Language Models (LLMs)

33 of 41

TL;DR: Dev data can be used to automate even more

34 of 41

Thanks!

We are at the beginning on this journey :)

Lots more R&D to come!

We are hiring

35 of 41

Appendix

36 of 41

37 of 41

38 of 41

39 of 41

40 of 41

41 of 41