Evolving Code with A Large Language Model
Erik Hemberg1*, Stephen Moskal1 and Una-May O’Reilly1 1*EECS, MIT CSAIL, 32 Vassar St, Cambridge, 02139, MA, USA.
Presenter: Zihan Zhang
Motivation
Limitations of Traditional GP
GP relies on syntactic operations that lack semantic awareness, often producing invalid or low-quality programs and requiring many generations to converge.
Strengths of LLMs in Code Semantics
Toward Semantics-Driven Evolution
LLMs demonstrate strong capabilities in understanding, rewriting, and generating code, making them suitable as intelligent variation operators.
Integrating LLMs into GP enables semantically guided variation, reduces manual design effort, and improves evolutionary efficiency.
Background
GP Algorithm (Genetic Programming)
LLM
Genetic Programming (GP) is an evolutionary algorithm that automatically discovers programs.
It begins with a population of candidate programs and iteratively improves them through:
Through repeated evolutionary cycles, GP gradually evolves a program that solves the problem.
Pre-trained on massive text corpora
Strong at pattern recognition and code generation
Can rewrite, refactor, and combine code in a semantically meaningful way
LLMs understand code patterns, so they can generate better offspring than random mutation/crossover in traditional GP.
LLM_GP Algorithm
Algorithm Overview
LLM Based GP
Mutation is replaced by LLM-based code rewriting
Crossover is replaced by LLM-based code merging
Some variants even use LLMs for selection.
LLM-based Operator
�
LLM_GP Algorithm
Each LLM-based operator works in three small steps:
Prompt-functions and Prompts
Experiments
Baselines
Computational Resources
Random — Generates expressions purely at random.
Tutorial GP — Standard Genetic Programming implementation.
LLM — Directly prompts the LLM to generate random expressions.
LLM-GP_Mu_XO — Uses the LLM for initialization, mutation, and crossover only.
LLM-GP — Uses the LLM for all evolutionary operators except fitness evaluation.
Experiment Results
1. Traditional GP is the fastest and cheapest, while LLM-based methods are much slower and more costly due to frequent model calls.
2. LLM-generated programs tend to be longer and more complex, whereas GP produces smaller and more efficient solutions.
3. Full LLM-GP suffers from higher operator error rates, but reducing LLM usage (LLM-GP_Mu_XO) improves stability and efficiency.
Conclusion
LLM-GP represents a new paradigm for code evolution: It retains the GP framework but uses prompts and LLMs to perform initialization, selection, mutation, and other operators.
LLMs introduce new costs and risks: Frequent LLM interactions during runtime lead to significant time and monetary overhead, and the outputs can be unstable or ill-formed.
Still worth exploring: Despite these uncertainties, LLM-GP offers expressive capabilities and search behaviors that traditional GP cannot achieve, making it a promising direction for future research.
Thank you
1