Python DSPy Agent Framework: 5 Fatal Pitfalls in LLM Programming and Auto-Optimization in 2026

The Era of Hand-Written Prompts Is Over

You spend 3 days crafting a perfect Prompt, and it's useless with a different model; your carefully designed few-shot examples perform worse on newer model versions; your Agent chain keeps growing, and every step's Prompt becomes a maintenance nightmare. In 2026, DSPy (Declarative Self-improving Python) evolves LLM programming from "hand-written Prompts" to "declarative programming + automatic optimization" — you only need to define input/output signatures, and the framework automatically searches for optimal Prompts and fine-tuning strategies.

This article guides you through building a DSPy-based AI Agent from scratch and solving the 5 most common fatal pitfalls in production.

DSPy Core Concepts

Concept	Description
Signature	Declarative definition of module input/output, e.g., `"question -> answer"`
Module	Composable LLM call unit, similar to PyTorch's nn.Module
Teleprompter	Optimizer that automatically searches for optimal Prompts/examples
Example	Standardized input/output data sample
Metric	Scoring function that evaluates module output quality
Adapter	Adapter layer that converts signatures to specific LLM API calls

DSPy vs Traditional Prompt Engineering

Dimension	Hand-written Prompt	DSPy Declarative
Development	Manual writing, trial and error	Declare signatures, auto-optimize
Model Migration	Rewrite all Prompts	Just swap Adapter
Maintainability	Low, Prompts scattered	High, signatures as documentation
Optimization	Relies on human experience	Automatic search for optimal
Multi-step Reasoning	Manual chaining, error-prone	Modular composition, type-safe

Problem Analysis: 5 Major DSPy Development Challenges

Poor signature design: Vague input/output field names cause LLM misunderstanding
Optimizer selection difficulty: BootstrapFewShot, MIPROv2 suit different scenarios
Multi-step reasoning chain breaks: Type mismatches between modules cause mid-chain crashes
Inaccurate metric functions: Evaluation criteria misaligned with business goals, optimization goes off-track
Async concurrency pitfalls: No concurrency control during large-scale optimization triggers API rate limits

Step-by-Step: Complete DSPy Agent Implementation

Step 1: Environment Setup

pip install dspy-ai==2.6.0
pip install openai==1.35.0
pip install datasets==2.19.0

import dspy

lm = dspy.LM(
    model="openai/gpt-4o-mini",
    api_key="your-api-key",
    temperature=0.7,
    max_tokens=2048,
)
dspy.configure(lm=lm)

Step 2: Define Signatures and Modules

class QuestionAnswer(dspy.Signature):
    """Answer the question based on the given context. If the context doesn't contain the answer, respond with 'Cannot answer'."""

    context: str = dspy.InputField(desc="Context text containing the answer")
    question: str = dspy.InputField(desc="Question to answer")
    answer: str = dspy.OutputField(desc="Brief answer based on context")


class RAGModule(dspy.Module):
    def __init__(self, num_passages: int = 3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(QuestionAnswer)

    def forward(self, question: str) -> dspy.Prediction:
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

Step 3: Build Multi-Step Reasoning Agent

class DecomposeQuestion(dspy.Signature):
    """Decompose a complex question into multiple simple sub-questions."""

    question: str = dspy.InputField(desc="Complex question to decompose")
    sub_questions: list[str] = dspy.OutputField(desc="List of decomposed sub-questions")


class SynthesizeAnswer(dspy.Signature):
    """Synthesize a final answer from multiple sub-question answers."""

    original_question: str = dspy.InputField(desc="Original complex question")
    sub_answers: list[str] = dspy.InputField(desc="Answers to each sub-question")
    final_answer: str = dspy.OutputField(desc="Synthesized final answer")


class MultiStepAgent(dspy.Module):
    def __init__(self, num_passages: int = 3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.decompose = dspy.ChainOfThought(DecomposeQuestion)
        self.sub_answer = dspy.ChainOfThought(QuestionAnswer)
        self.synthesize = dspy.ChainOfThought(SynthesizeAnswer)

    def forward(self, question: str) -> dspy.Prediction:
        decomposed = self.decompose(question=question)
        sub_answers = []
        for sub_q in decomposed.sub_questions:
            context = self.retrieve(sub_q).passages
            sub_pred = self.sub_answer(context="\n".join(context), question=sub_q)
            sub_answers.append(sub_pred.answer)
        final = self.synthesize(
            original_question=question,
            sub_answers=sub_answers,
        )
        return dspy.Prediction(
            sub_questions=decomposed.sub_questions,
            sub_answers=sub_answers,
            answer=final.final_answer,
        )

Step 4: Define Metric Functions

def answer_exact_match(example: dspy.Example, prediction: dspy.Prediction, trace=None) -> float:
    """Exact match metric"""
    return float(
        example.answer.strip().lower() == prediction.answer.strip().lower()
    )

def answer_f1_score(example: dspy.Example, prediction: dspy.Prediction, trace=None) -> float:
    """F1 score metric"""
    pred_tokens = set(prediction.answer.strip().lower().split())
    gold_tokens = set(example.answer.strip().lower().split())
    if not pred_tokens or not gold_tokens:
        return float(pred_tokens == gold_tokens)
    common = pred_tokens & gold_tokens
    if not common:
        return 0.0
    precision = len(common) / len(pred_tokens)
    recall = len(common) / len(gold_tokens)
    return 2 * precision * recall / (precision + recall)

Step 5: Automatic Optimization

from dspy.teleprompt import BootstrapFewShot, MIPROv2

trainset = [
    dspy.Example(question="What is DSPy?", answer="A declarative LLM programming framework").with_inputs("question"),
    dspy.Example(question="What does LoRA do?", answer="Reduces LLM fine-tuning memory requirements").with_inputs("question"),
    dspy.Example(question="What does RAG stand for?", answer="Retrieval-Augmented Generation").with_inputs("question"),
]

optimizer_fewshot = BootstrapFewShot(
    metric=answer_exact_match,
    max_bootstrapped_demos=4,
    max_labeled_demos=4,
    max_rounds=3,
)

optimized_module = optimizer_fewshot.compile(
    RAGModule(),
    trainset=trainset,
)

optimizer_mipro = MIPROv2(
    metric=answer_f1_score,
    num_threads=4,
    max_bootstrapped_demos=4,
    max_labeled_demos=4,
    num_candidates=10,
    num_trials=20,
)

fully_optimized = optimizer_mipro.compile(
    RAGModule(),
    trainset=trainset,
)

Step 6: Evaluation and Deployment

from dspy.evaluate import Evaluate

evaluator = Evaluate(
    devset=trainset,
    metric=answer_f1_score,
    num_threads=4,
    display_progress=True,
    display_table=5,
)

score = evaluator(fully_optimized)
print(f"Optimized F1 score: {score:.2f}")

result = fully_optimized(question="What is the core advantage of DSPy framework?")
print(f"Answer: {result.answer}")

Pitfall Guide

Pitfall 1: Missing Signature Field Descriptions

# ❌ Wrong: no description, LLM doesn't know output format
class BadSig(dspy.Signature):
    question: str = dspy.InputField()
    answer: str = dspy.OutputField()

# ✅ Correct: add detailed descriptions to guide LLM output
class GoodSig(dspy.Signature):
    """Answer the question based on context, answer within 50 words."""
    question: str = dspy.InputField(desc="Question asked by the user")
    answer: str = dspy.OutputField(desc="Concise and accurate answer, within 50 words")

Pitfall 2: Insufficient Optimizer Training Data

# ❌ Wrong: insufficient training data, optimizer can't learn effective patterns
trainset = [dspy.Example(question="1+1=?", answer="2").with_inputs("question")]

# ✅ Correct: at least 50-200 high-quality training examples
trainset = load_training_data(min_size=50)

Pitfall 3: Overly Lenient Metric Function

# ❌ Wrong: always returns 1.0, optimizer can't distinguish good from bad
def bad_metric(example, prediction, trace=None):
    return 1.0

# ✅ Correct: use a discriminative metric
def good_metric(example, prediction, trace=None):
    return answer_f1_score(example, prediction, trace)

Pitfall 4: Type Mismatch Between Modules

# ❌ Wrong: sub-module returns list, downstream expects str
class StepA(dspy.Signature):
    items: list[str] = dspy.OutputField()

class StepB(dspy.Signature):
    text: str = dspy.InputField()

# ✅ Correct: do type conversion in forward
def forward(self, question):
    result_a = self.step_a(question=question)
    joined = "\n".join(result_a.items)
    result_b = self.step_b(text=joined)
    return result_b

Pitfall 5: Not Handling LLM Output Parse Failures

# ❌ Wrong: directly accessing output fields, may throw exceptions
prediction = self.module(question=q)
answer = prediction.answer

# ✅ Correct: add exception handling and default values
try:
    prediction = self.module(question=q)
    answer = prediction.answer if prediction.answer else "Cannot answer"
except Exception as e:
    answer = f"Processing failed: {str(e)}"

Error Troubleshooting

#	Error Message	Cause	Solution
1	`AssertionError: Signature must have at least one output field`	Signature missing output field	Ensure Signature has at least one OutputField
2	`TypeError: Expected str, got list`	Type mismatch between modules	Do type conversion in forward
3	`dspy.primitives.assertions.AssertionError`	Assertion condition not met	Check dspy.Assert condition logic
4	`openai.RateLimitError`	API call rate exceeded	Reduce num_threads or add retry logic
5	`KeyError: 'answer'`	LLM output missing expected field	Check signature definition, add field descriptions
6	`ValueError: No demos were bootstrapped`	Insufficient training data quality	Increase training data, check metric function
7	`JSONDecodeError`	LLM output not JSON format	Use dspy.ChainOfThought instead of dspy.Predict
8	`AttributeError: module has no attribute 'retrieve'`	Module not initialized with retriever	Ensure all sub-modules initialized in init
9	`TimeoutError: LLM call timed out`	LLM response timeout	Increase max_tokens or set timeout parameter
10	`ImportError: cannot import name 'MIPROv2'`	DSPy version too low	Upgrade to dspy-ai>=2.5.0

Advanced Optimization

1. Custom Adapter for Local Models

class LocalModelAdapter(dspy.Adapter):
    def format(self, signature, demos, inputs):
        prompt = f"Task: {signature.__doc__}\n\n"
        for demo in demos:
            for key, val in demo.items():
                prompt += f"{key}: {val}\n"
            prompt += "\n"
        for key, val in inputs.items():
            prompt += f"{key}: {val}\n"
        prompt += "\nPlease output:\n"
        for field_name, field_info in signature.output_fields.items():
            prompt += f"{field_name}: "
        return prompt

    def parse(self, signature, completion):
        outputs = {}
        for line in completion.strip().split("\n"):
            if ":" in line:
                key, val = line.split(":", 1)
                outputs[key.strip()] = val.strip()
        return outputs

2. Assertion-Driven Output Constraints

class ConstrainedQA(dspy.Module):
    def __init__(self):
        super().__init__()
        self.generate = dspy.ChainOfThought(QuestionAnswer)

    def forward(self, question: str, context: str) -> dspy.Prediction:
        result = self.generate(question=question, context=context)
        dspy.Assert(
            len(result.answer) > 0,
            "Answer cannot be empty",
        )
        dspy.Assert(
            len(result.answer) <= 200,
            "Answer cannot exceed 200 characters",
        )
        return result

3. Cache Optimization to Reduce API Calls

import hashlib
import json

class CachedModule(dspy.Module):
    def __init__(self, module: dspy.Module, cache_dir: str = ".dspy_cache"):
        super().__init__()
        self.module = module
        self.cache_dir = cache_dir
        self.cache = {}

    def _cache_key(self, **kwargs):
        content = json.dumps(kwargs, sort_keys=True)
        return hashlib.md5(content.encode()).hexdigest()

    def forward(self, **kwargs):
        key = self._cache_key(**kwargs)
        if key in self.cache:
            return self.cache[key]
        result = self.module(**kwargs)
        self.cache[key] = result
        return result

Comparison Analysis

Dimension	DSPy	LangChain	LlamaIndex	Raw Prompt
Programming Paradigm	Declarative	Imperative chain	Imperative index	Hand-written Prompt
Auto Optimization	✅ Built-in optimizer	❌ Manual	❌ Manual	❌ Fully manual
Reproducibility	✅ Fixed signatures	⚠️ Template-dependent	⚠️ Template-dependent	❌ Hard to reproduce
Model Migration	✅ Swap Adapter	⚠️ Change templates	⚠️ Change templates	❌ Rewrite all
Learning Curve	Medium	Low	Low	Low
Production Ready	✅ Type-safe	⚠️ Flexible but fragile	✅ Strong for RAG	❌ High maintenance
Community	Fast-growing	Mature	Mature	N/A

Summary: DSPy isn't "yet another LLM framework" — it's a fundamental paradigm shift in LLM programming from "hand-written Prompts" to "declarative programming + automatic optimization." Its core value: 1) Signatures as documentation, eliminating Prompt maintenance nightmares; 2) Optimizers automatically search for optimal Prompts, no longer relying on human experience; 3) Modular composition ensures type safety, multi-step reasoning chains no longer break. The 2026 DSPy practice path: use ChainOfThought + signatures for quick validation → BootstrapFewShot for example optimization → MIPROv2 for full optimization. The key is having a high-quality metric function — it determines whether optimization heads in the right direction.

Recommended Online Tools

JSON Formatter: /en/json/format
Base64 Encode/Decode: /en/encode/base64
Hash Calculator: /en/encode/hash
JWT Decode: /en/encode/jwt-decode