Project Memo “Your_Devin”
It’s funny, but just two months before Cognition Lab gave their demo, my friend Devin and I were at the AGI house hackathon in Hillsborough, hosted by Josh from Coframe. Guess what? We built a code generation autopilot that writes code, deploys code, and debugs code in a loop for front-end tasks. We won that hackathon. And what happened next?
These guys at Cognition stole our idea and didn’t even bother to change the name (it’s a joke). Well, I’m not jealous; let them take all the ideas, as long as we build something useful for the public good.
Hmm, so since our idea proved so practical, I thought, why not take it a step further. What is Devin beyond Devin? The Devin beyond Devin is “Your_Devin”.
What does that mean? And why?
Let’s answer the following question: What is the biggest problem with all coding copilots? Hallucinations. But why? Well, just having codebase context awareness isn’t enough to instruct a model to write code for you. You have to push the model further, fine-tune it. But how? How can I fine-tune it on a single codebase?
Well, here comes the Ada-instruct paper. From a single text corpus, we generate ten high-quality samples of question-answer pairs that serve as our main training dataset. But is it enough? No, as the next step, we use these benchmarks to instruct a larger LLM to generate a number of downstream dataset tasks for the small custom model. Since models hallucinate we’ll run all task code snippets to select the ones that produce a valid response. We then fine-tune this small private model on this dataset, and create Your-Devin. A Devin who starts performing much better on tasks related to your codebase…
Well, it’s not the end.
What would be even better than Your_Devin? How can we generalize from a well-trained copilot of our codebases to a broader set of problems without having to share our codebase training dataset?
We crossbreed Devins by merging fine-tuned models together (the MergeKit library comes in handy here).
Will it work? Not necessarily. Most of the time, it won’t. But who cares? We only care if one crossbreed out of a hundred performs better; we’ll be happy!
So, imagine now, a world where every software engineer can create a Devin and then crossbreed them to produce a better, golden Devin! 😁 It’s a world of decentralized autonomous models, which, by a law of natural selection, can hopefully pose a challenge to the huge, trillions-size generalist models produced by big tech. 😊
How it works?
Stack to be used based on the list of judges:
TBD:
Your_Devin script flow:
User experience:
Demo prompt: “I’m building a RAG app for my company and would like to set up MongoDB vector database”
Other notes:
Include citations:
@article{cui2023ada,
title={Ada-Instruct: Adapting Instruction Generators for Complex Reasoning},
author={Cui, Wanyun and Wang, Qianle},
journal={arXiv preprint arXiv:2310.04484},
year={2023}
}
@misc{Genstruct,
url={[https://https://huggingface.co/NousResearch/Genstruct-7B](https://huggingface.co/NousResearch/https://huggingface.co/NousResearch/Genstruct-7B)},
title={Genstruct},
author={"euclaise"}
}
What to do next: