1 of 28

DSPy - Declarative Programming in the era of AI

Jayita Bhattacharyya

@jayitabhattac11

2 of 28

$Whoami

👋🏻 Jayita Bhattacharyya or JB is what I go by name!

🤺AKA a vibe-debugger & a glorified if-else coder inside Jupyter notebook

🫠 Pretending to be a Data Scientist these days.

🪄A Tech Speaker and Hackathon Wizard.

📝 I like to pen down my tech thoughts Medium Blogs

Contribute to opensource jayita13

Started sharing my deep tech opinions on @jayitabhattac11

Volunteer @ Bangpypers (Bangalore Python User Group)

My video tutorials & conferences.

3 of 28

Table of Contents

  • What is DSPy
  • Prompt to program
  • Signatures
  • Modules
  • CoT, ReAct
  • Adaptors
  • Optimizers
  • BootstrapFewShot
  • MIPROv2
  • Evaluations
  • Observability with MLFlow
  • GEPA
  • References

4 of 28

Ain’t this normal?

  • Prompts are great way of communicating with LLMs

But…

  • Prompts need to be elaborate & exhaustive in nature to get things out of LLMs.

  • Maintenance is a challenge. They break when you change from one LM to another.

  • This unreliability causes hindrance in trust while deploying in production systems.

  • Manual effort to update prompts as fine tuning is costly business.

5 of 28

The problem or challenge

LLMs are just stochastic parrots and…

We’re developers, not parrots…

So let's change the paradigms with DSPy

6 of 28

DSPy - The Vibe Prompter via Programming

DSPy, an open source declarative framework for building and optimizing modular AI software, emphasizing programming—not prompting—language models (LMs).

No more prompt engineering guesswork—DSPy handles the 'how' so you focus on the 'what'.

Python native, has type hints, reliable with pydantic in built, has support for multiple LLMs with LiteLLM backend & ofcourse the optimizers are a charmer, that will fine-tune your prompts based on your dataset & metrics.

7 of 28

Ain’t that cool enough to already get excited about DSPy?

Lets see it in action now!!!!!

8 of 28

Signatures - from prompt to programs

9 of 28

How does the actual prompt look?

10 of 28

More Efficient Signatures

11 of 28

CoT

12 of 28

ReAct

13 of 28

Optimizers - the magician in DSPy

Teleprompters are general-purpose optimization strategies that determine how the DSPy modules should learn from data. They are designed to automate the task of prompting, ensuring it happens "at a distance, without manual intervention"

Kinds of Optimizers:

  • Automatic FewShot Learners - LabeledFewShot, BootStrapFewShot, BootStrapFewShotRandomSearch, KNNFewShot

  • Automatic Instruction Tuning - COPRO, MIPRO, SIMBA, GEPA

  • Automatic Finetuning - BootstrapFineTune

14 of 28

Optimizers - some ground rules

  • If you have very few examples (around 10), start with BootstrapFewShot.

  • If you have more data (50 examples or more), try BootstrapFewShotWithRandomSearch.

  • If you prefer to do instruction optimization only (i.e. you want to keep your prompt 0-shot), use MIPROv2 configured for 0-shot optimization.

  • If you’re willing to use more inference calls to perform longer optimization runs (e.g. 40 trials or more), and have enough data (e.g. 200 examples or more to prevent overfitting) then try MIPROv2. Keep a check on cost for hosted models!

  • If you have been able to use one of these with a large LM (e.g., 7B parameters or above) and need a very efficient program, finetune a small LM for your task with BootstrapFinetune.

  • GEPA is quite creating the buzz now integrating RL to enhance instruction tuning.

15 of 28

BootStrapFewShot

The primary function of dspy.BootstrapFewShot is to automatically synthesize good few-shot examples (demonstrations) for the modules within a DSPy program, optimizing them based on a defined metric, without requiring manual prompt engineering.

16 of 28

Trace with MLFlow

17 of 28

Initial vs Final Prompt

18 of 28

MIPROv2

MIPROv2 (Meta-Optimization via Iterative Prompt Refinement) is considered one of the key optimization techniques in DSPy, aiming to find the best instructions and demonstrations to maximize a given metric.

Uses Bayesian Optimization to effectively search over the space of generation instructions/demonstrations across your modules.

19 of 28

Trace with MLFlow

20 of 28

Evaluation Results

21 of 28

Evaluation Results

22 of 28

Evaluation Results

23 of 28

Compare Eval runs with MLFlow

24 of 28

Improvement on generation with training

25 of 28

GEPA - the game changer

GEPA (Genetic-Pareto) is a framework for optimizing arbitrary systems composed of text components—like AI prompts, code snippets, or textual specs—against any evaluation metric. It employs LLMs to reflect on system behavior, using feedback from execution and evaluation traces to drive targeted improvements.

Through iterative mutation, reflection, and Pareto-aware candidate selection, GEPA evolves robust, high-performing variants with minimal evaluations, co-evolving multiple components in modular systems for domain-specific gains.

26 of 28

GEPA Results

The optimized prompt that GEPA generates for AIME, which achieves improves GPT-4.1 Mini's performance from 46.6% to 56.6%, an improvement of 10% on AIME 2025. Note the details captured in the prompts in just 2 iterations of GEPA. GEPA can be thought of as precomputing some reasoning (during optimization) to come up with a good plan for future task instances.

27 of 28

References

  • Official Website - https://dspy.ai/

  • DSPy Newsletter - https://dspyweekly.com

  • Context Engineering with DSPy - https://youtu.be/1I9PoXzvWcs

  • Engineering AI Systems - https://youtu.be/qdmxApz3EJI

  • Join @DSPyOSS on X

28 of 28