Python LLM Structured Output: 6 Production Patterns from JSON Schema to Function Calling
LLM Outputs a Blob of Free Text, and Your Downstream Systems Crash
You ask GPT to return JSON, it gives you JSON with comments. You specify an integer field, it returns the string "42". You request a list of 3 items, it gives you 5. LLM structured output is the most critical foundational capability in AI engineering in 2026 — without it, your RAG pipelines, agent tool calls, and data extraction workflows are all ticking time bombs.
This article starts from JSON Schema constraints and guides you through JSON Schema validation → OpenAI function calling → Instructor auto-retry → multi-model adaptation → streaming structured output → production reliability — 6 production patterns from concept to implementation.
Key Takeaways
- Understand 3 core mechanisms for LLM structured output: Prompt constraints, JSON Schema, function calling protocol
- Master 6 Python structured output patterns from simple to complex
- Learn Instructor library's auto-retry and validation strategies
- Implement cross-model (OpenAI/Anthropic/Gemini) structured output adaptation
- Build a production-grade reliability assurance system
Table of Contents
- LLM Structured Output Core Concepts
- Pattern 1: JSON Schema Constrained Output
- Pattern 2: OpenAI Function Calling Protocol
- Pattern 3: Instructor Library Auto-Retry
- Pattern 4: Multi-Model Structured Output Adaptation
- Pattern 5: Streaming Structured Output
- Pattern 6: Production-Grade Reliability
- 5 Common Pitfalls and Solutions
- 10 Common Error Troubleshooting
- Advanced Optimization Tips
- Comparison: 3 Structured Output Approaches
- Recommended Online Tools
LLM Structured Output Core Concepts
| Concept | Description |
|---|---|
| Structured Output | LLM outputs structured data (JSON/XML) conforming to a predefined Schema |
| JSON Schema | Specification for describing JSON data structure, used to constrain and validate LLM output |
| Function Calling | Protocol proposed by OpenAI, making LLM output JSON conforming to function parameter Schema |
| Tool Use | Anthropic/Gemini's implementation of function calling, semantically equivalent |
| Constrained Decoding | Constraining token selection during inference, guaranteeing 100% Schema compliance |
| Instructor | Python library that auto-generates Schema + validation + retry based on Pydantic models |
Why LLM Structured Output Matters
Traditional LLM output flow:
User Prompt → LLM free generation → String → Regex/JSON parsing → May fail → Retry
Structured output flow:
User Prompt + Schema → LLM constrained generation → Valid JSON → Pydantic validation → Success
Key differences:
1. Traditional: unpredictable output, fragile parsing, high retry cost
2. Structured: predictable output, reliable validation, guaranteed retries
3 Structured Output Mechanisms Compared
| Mechanism | Principle | Reliability | Latency | Compatibility |
|---|---|---|---|---|
| Prompt constraints | Describe output format in prompt | ⭐Low | No extra | All models |
| JSON Schema | Constrain output structure via Schema | ⭐⭐Medium | Slight | Some models |
| Function calling protocol | Dedicated API channel + Constrained Decoding | ⭐⭐⭐High | Slight | Specific models |
Pattern 1: JSON Schema Constrained Output
The most basic structured output approach: describe format requirements in the prompt, validate results with JSON Schema.
import json
import re
from typing import Optional
from pydantic import BaseModel, Field, ValidationError
class MovieReview(BaseModel):
title: str = Field(description="Movie title")
rating: int = Field(ge=1, le=10, description="Rating 1-10")
sentiment: str = Field(pattern="^(positive|negative|neutral)$")
summary: str = Field(max_length=200, description="Brief review")
recommended: bool = Field(description="Whether recommended")
MOVIE_REVIEW_SCHEMA = MovieReview.model_json_schema()
STRUCTURED_PROMPT = """You are a professional movie review analyzer.
Analyze the following review and return results strictly following the JSON Schema.
JSON Schema:
{schema}
Review content:
{review}
Important requirements:
1. Must return valid JSON
2. rating must be an integer from 1-10
3. sentiment can only be positive/negative/neutral
4. Do not add any content outside the JSON
5. Do not wrap with ```json```
"""
def extract_json_from_response(text: str) -> Optional[dict]:
patterns = [
r'```json\s*(.*?)\s*```',
r'```\s*(.*?)\s*```',
r'(\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\})',
]
for pattern in patterns:
match = re.search(pattern, text, re.DOTALL)
if match:
try:
return json.loads(match.group(1))
except json.JSONDecodeError:
continue
try:
return json.loads(text.strip())
except json.JSONDecodeError:
return None
def parse_structured_output(raw_text: str) -> Optional[MovieReview]:
parsed_json = extract_json_from_response(raw_text)
if parsed_json is None:
return None
try:
return MovieReview.model_validate(parsed_json)
except ValidationError as e:
print(f"Validation failed: {e}")
return None
async def call_llm_with_schema(prompt: str) -> Optional[MovieReview]:
from openai import AsyncOpenAI
client = AsyncOpenAI()
formatted_prompt = STRUCTURED_PROMPT.format(
schema=json.dumps(MOVIE_REVIEW_SCHEMA, ensure_ascii=False, indent=2),
review=prompt
)
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": formatted_prompt}],
temperature=0.1,
)
raw_text = response.choices[0].message.content or ""
return parse_structured_output(raw_text)
Limitations of JSON Schema Validation
Issue 1: LLM may return invalid JSON
→ Need extract_json_from_response for fault-tolerant extraction
Issue 2: LLM may ignore Schema constraints
→ rating returns "9 points" instead of 9
→ sentiment returns "very positive" instead of positive
Issue 3: Nested structures are error-prone
→ List length is uncontrollable
→ Optional fields may be missing
Issue 4: Must hand-write prompts every time
→ High maintenance cost, easy to miss details
Pattern 2: OpenAI Function Calling Protocol
OpenAI's Function Calling protocol is the standard approach for structured output, using a dedicated API channel to make LLM output JSON conforming to the Schema.
import json
from typing import Optional
from pydantic import BaseModel, Field
from openai import AsyncOpenAI
class SentimentAnalysis(BaseModel):
text: str = Field(description="The analyzed text")
sentiment: str = Field(description="Sentiment: positive/negative/neutral")
confidence: float = Field(ge=0.0, le=1.0, description="Confidence 0-1")
keywords: list[str] = Field(description="Keyword list")
language: str = Field(description="Detected language")
class EntityExtraction(BaseModel):
entities: list[dict] = Field(description="Extracted entity list")
relationships: list[dict] = Field(default_factory=list, description="Entity relationships")
summary: str = Field(description="Text summary")
def pydantic_to_function_schema(model_class: type[BaseModel]) -> dict:
schema = model_class.model_json_schema()
return {
"type": "function",
"function": {
"name": model_class.__name__,
"description": model_class.__doc__ or f"Extract {model_class.__name__}",
"parameters": {
"type": "object",
"properties": schema.get("properties", {}),
"required": schema.get("required", []),
}
}
}
async def function_calling_extract(
text: str,
model_class: type[BaseModel],
model: str = "gpt-4o"
) -> Optional[BaseModel]:
client = AsyncOpenAI()
function_schema = pydantic_to_function_schema(model_class)
response = await client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": "You are a precise data extraction assistant. Use the provided function for structured output."
},
{
"role": "user",
"content": text
}
],
tools=[function_schema],
tool_choice={"type": "function", "function": {"name": model_class.__name__}},
)
message = response.choices[0].message
if message.tool_calls and len(message.tool_calls) > 0:
tool_call = message.tool_calls[0]
try:
args = json.loads(tool_call.function.arguments)
return model_class.model_validate(args)
except (json.JSONDecodeError, Exception) as e:
print(f"Failed to parse function call result: {e}")
return None
return None
async def multi_function_calling(
text: str,
model_classes: list[type[BaseModel]],
model: str = "gpt-4o"
) -> dict[str, BaseModel]:
client = AsyncOpenAI()
tool_schemas = [pydantic_to_function_schema(cls) for cls in model_classes]
response = await client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a multi-task data extraction assistant."},
{"role": "user", "content": text}
],
tools=tool_schemas,
tool_choice="auto",
)
results = {}
message = response.choices[0].message
if message.tool_calls:
for tool_call in message.tool_calls:
for cls in model_classes:
if tool_call.function.name == cls.__name__:
try:
args = json.loads(tool_call.function.arguments)
results[cls.__name__] = cls.model_validate(args)
except Exception as e:
print(f"Failed to parse {cls.__name__}: {e}")
return results
Function Calling Strict Mode
from openai import AsyncOpenAI
async def strict_structured_output(
text: str,
model_class: type[BaseModel],
model: str = "gpt-4o-2024-08-06"
) -> Optional[BaseModel]:
client = AsyncOpenAI()
schema = model_class.model_json_schema()
response = await client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "Extract structured data"},
{"role": "user", "content": text}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": model_class.__name__,
"strict": True,
"schema": schema,
}
}
)
raw = response.choices[0].message.content
if raw:
try:
return model_class.model_validate(json.loads(raw))
except Exception as e:
print(f"Strict mode parse failed: {e}")
return None
Pattern 3: Instructor Library Auto-Retry
Instructor is the best practice for Python LLM structured output, auto-generating Schema, validating output, and auto-retrying based on Pydantic models.
import instructor
from pydantic import BaseModel, Field, field_validator
from openai import AsyncOpenAI
class ProductInfo(BaseModel):
name: str = Field(description="Product name")
price: float = Field(gt=0, description="Price, must be greater than 0")
category: str = Field(description="Product category")
features: list[str] = Field(description="Product features list", min_length=1, max_length=5)
in_stock: bool = Field(description="Whether in stock")
@field_validator("price")
@classmethod
def round_price(cls, v: float) -> float:
return round(v, 2)
@field_validator("category")
@classmethod
def normalize_category(cls, v: str) -> str:
return v.strip().lower()
class ArticleMetadata(BaseModel):
title: str = Field(description="Article title")
author: str = Field(description="Author")
publish_date: str = Field(description="Publish date, format YYYY-MM-DD")
tags: list[str] = Field(description="Tag list")
word_count: int = Field(ge=0, description="Word count")
reading_time_minutes: int = Field(ge=1, description="Estimated reading time (minutes)")
@field_validator("publish_date")
@classmethod
def validate_date_format(cls, v: str) -> str:
import re
if not re.match(r'^\d{4}-\d{2}-\d{2}$', v):
raise ValueError(f"Invalid date format: {v}, expected YYYY-MM-DD")
return v
async def instructor_extract_product(text: str) -> ProductInfo:
client = instructor.from_openai(AsyncOpenAI())
result = await client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": f"Extract product info from the following text:\n\n{text}"}
],
response_model=ProductInfo,
max_retries=3,
temperature=0.1,
)
return result
async def instructor_extract_with_mode(
text: str,
mode: instructor.Mode = instructor.Mode.TOOLS
) -> ArticleMetadata:
client = instructor.from_openai(AsyncOpenAI(), mode=mode)
result = await client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": f"Extract article metadata:\n\n{text}"}
],
response_model=ArticleMetadata,
max_retries=3,
)
return result
async def instructor_partial_streaming(text: str):
client = instructor.from_openai(AsyncOpenAI())
article = await client.chat.completions.create_partial(
model="gpt-4o",
messages=[
{"role": "user", "content": f"Extract article metadata:\n\n{text}"}
],
response_model=ArticleMetadata,
max_retries=3,
)
async for partial in article:
print(f"Partial result: {partial.model_dump_json(exclude_none=True)}")
Instructor Mode Selection
Mode.JSON_SCHEMA → OpenAI's response_format=json_schema (recommended, most reliable)
Mode.TOOLS → OpenAI's function calling (good compatibility)
Mode.JSON → Request JSON output in prompt (most universal, least reliable)
Mode.ANTHROPIC_TOOLS→ Anthropic's tool_use
Mode.GEMINI_JSON → Gemini's JSON mode
Instructor Retry Strategy Deep Dive
import instructor
from pydantic import BaseModel, Field, ValidationError
from openai import AsyncOpenAI
from tenacity import retry, stop_after_attempt, wait_exponential
class StrictUser(BaseModel):
name: str = Field(min_length=2, max_length=50)
age: int = Field(ge=0, le=150)
email: str = Field(pattern=r'^[\w\.-]+@[\w\.-]+\.\w+$')
async def instructor_with_custom_retry(text: str) -> StrictUser:
client = instructor.from_openai(
AsyncOpenAI(),
mode=instructor.Mode.JSON_SCHEMA,
)
result, completion = await client.chat.completions.create_with_completion(
model="gpt-4o",
messages=[{"role": "user", "content": text}],
response_model=StrictUser,
max_retries=3,
validation_context={"strict": True},
)
print(f"Token usage: prompt={completion.usage.prompt_tokens}, "
f"completion={completion.usage.completion_tokens}")
return result
async def instructor_batch_extract(
texts: list[str],
) -> list[StrictUser]:
client = instructor.from_openai(AsyncOpenAI())
results = []
for text in texts:
try:
result = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": text}],
response_model=StrictUser,
max_retries=2,
)
results.append(result)
except instructor.exceptions.InstructorRetryException as e:
print(f"Batch extraction failed, skipping: {e}")
results.append(None)
return results
Pattern 4: Multi-Model Structured Output Adaptation
Different LLM vendors have different structured output APIs, requiring an adaptation layer for unified handling.
import json
from abc import ABC, abstractmethod
from typing import Optional, TypeVar
from pydantic import BaseModel
from openai import AsyncOpenAI
T = TypeVar("T", bound=BaseModel)
class StructuredOutputAdapter(ABC):
@abstractmethod
async def extract(self, text: str, model_class: type[T]) -> Optional[T]:
pass
class OpenAIStructuredAdapter(StructuredOutputAdapter):
def __init__(self, model: str = "gpt-4o"):
self.client = AsyncOpenAI()
self.model = model
async def extract(self, text: str, model_class: type[T]) -> Optional[T]:
import instructor
client = instructor.from_openai(self.client, mode=instructor.Mode.JSON_SCHEMA)
return await client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": text}],
response_model=model_class,
max_retries=3,
)
class AnthropicStructuredAdapter(StructuredOutputAdapter):
def __init__(self, model: str = "claude-sonnet-4-20250514"):
try:
import anthropic
self.client = anthropic.AsyncAnthropic()
except ImportError:
raise ImportError("Install anthropic: pip install anthropic")
self.model = model
async def extract(self, text: str, model_class: type[T]) -> Optional[T]:
import anthropic
import instructor
client = instructor.from_anthropic(
self.client,
mode=instructor.Mode.ANTHROPIC_TOOLS,
)
return await client.chat.completions.create(
model=self.model,
max_tokens=4096,
messages=[{"role": "user", "content": text}],
response_model=model_class,
max_retries=3,
)
class GeminiStructuredAdapter(StructuredOutputAdapter):
def __init__(self, model: str = "gemini-2.0-flash"):
self.model_name = model
async def extract(self, text: str, model_class: type[T]) -> Optional[T]:
import google.generativeai as genai
import os
genai.configure(api_key=os.environ.get("GEMINI_API_KEY"))
model = genai.GenerativeModel(self.model_name)
schema = model_class.model_json_schema()
prompt = f"""Extract structured data from the following text.
Return results strictly following the JSON Schema. Do not add any extra content.
JSON Schema:
{json.dumps(schema, ensure_ascii=False, indent=2)}
Text:
{text}"""
response = await model.generate_content_async(prompt)
raw = response.text.strip()
if raw.startswith("```json"):
raw = raw[7:]
if raw.endswith("```"):
raw = raw[:-3]
raw = raw.strip()
try:
return model_class.model_validate(json.loads(raw))
except Exception as e:
print(f"Gemini parse failed: {e}")
return None
class MultiModelStructuredExtractor:
def __init__(self):
self.adapters: dict[str, StructuredOutputAdapter] = {}
def register(self, name: str, adapter: StructuredOutputAdapter):
self.adapters[name] = adapter
async def extract(
self,
text: str,
model_class: type[T],
preferred: str = "openai",
fallback: bool = True,
) -> Optional[T]:
order = [preferred]
if fallback:
order.extend([k for k in self.adapters if k != preferred])
for model_name in order:
adapter = self.adapters.get(model_name)
if adapter is None:
continue
try:
result = await adapter.extract(text, model_class)
if result is not None:
return result
except Exception as e:
print(f"[{model_name}] Extraction failed: {e}")
continue
return None
async def extract_consensus(
self,
text: str,
model_class: type[T],
min_agreement: int = 2,
) -> Optional[T]:
import asyncio
tasks = {
name: adapter.extract(text, model_class)
for name, adapter in self.adapters.items()
}
results = await asyncio.gather(*tasks.values(), return_exceptions=True)
valid_results = []
for (name, _), result in zip(tasks.items(), results):
if isinstance(result, Exception):
print(f"[{name}] Exception: {result}")
continue
if result is not None:
valid_results.append(result)
if len(valid_results) >= min_agreement:
return valid_results[0]
return valid_results[0] if valid_results else None
Multi-Model Adaptation Architecture
┌─────────────────────┐
│ MultiModelExtractor │
│ (Unified Interface) │
└──────────┬──────────┘
│
┌────────────────┼────────────────┐
│ │ │
┌─────────▼──────┐ ┌──────▼───────┐ ┌──────▼───────┐
│ OpenAI Adapter │ │Anthropic Adp │ │ Gemini Adptr │
│ JSON_SCHEMA │ │ANTHROPIC_TOOLS│ │ Prompt+Parse │
│ Instructor │ │ Instructor │ │ Manual parse │
└────────────────┘ └──────────────┘ └──────────────┘
Pattern 5: Streaming Structured Output
Combining LLM structured output with streaming for real-time parsing and progressive display.
import json
import asyncio
from typing import AsyncIterator, Optional
from pydantic import BaseModel, Field
from openai import AsyncOpenAI
class StreamingJsonParser:
def __init__(self):
self.buffer = ""
self.depth = 0
self.in_string = False
self.escape_next = False
self.started = False
def feed(self, chunk: str) -> list[dict]:
self.buffer += chunk
results = []
for char in chunk:
if self.escape_next:
self.escape_next = False
continue
if char == '\\' and self.in_string:
self.escape_next = True
continue
if char == '"' and not self.escape_next:
self.in_string = not self.in_string
continue
if self.in_string:
continue
if char == '{':
if not self.started:
self.started = True
idx = self.buffer.rfind('{')
self.buffer = self.buffer[idx:]
self.depth += 1
elif char == '}':
self.depth -= 1
if self.depth == 0 and self.started:
try:
parsed = json.loads(self.buffer)
results.append(parsed)
except json.JSONDecodeError:
pass
self.buffer = ""
self.started = False
return results
class PartialModelBuilder:
def __init__(self, model_class: type[BaseModel]):
self.model_class = model_class
self.current_json = {}
self.last_valid = None
def update(self, json_data: dict) -> Optional[BaseModel]:
self.current_json.update(json_data)
try:
self.last_valid = self.model_class.model_validate(self.current_json)
return self.last_valid
except Exception:
return self.last_valid
class StreamingEvent(BaseModel):
event_type: str = Field(description="Event type")
data: dict = Field(description="Event data")
confidence: float = Field(ge=0.0, le=1.0, description="Confidence")
async def stream_structured_output(
prompt: str,
model_class: type[BaseModel],
model: str = "gpt-4o",
) -> AsyncIterator[BaseModel]:
import instructor
client = instructor.from_openai(AsyncOpenAI())
stream = await client.chat.completions.create_partial(
model=model,
messages=[{"role": "user", "content": prompt}],
response_model=model_class,
max_retries=2,
)
async for partial in stream:
yield partial
async def stream_with_raw_parser(
prompt: str,
model: str = "gpt-4o",
) -> AsyncIterator[dict]:
client = AsyncOpenAI()
schema_prompt = f"""Return results in JSON format. Only return JSON, nothing else.
{prompt}"""
stream = await client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": schema_prompt}],
stream=True,
temperature=0.1,
)
parser = StreamingJsonParser()
full_content = ""
async for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
content = chunk.choices[0].delta.content
full_content += content
parsed_results = parser.feed(content)
for result in parsed_results:
yield result
if not parsed_results and full_content:
try:
yield json.loads(full_content)
except json.JSONDecodeError:
pass
async def stream_sse_structured(
prompt: str,
model_class: type[BaseModel],
):
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
app = FastAPI()
async def generate():
async for partial in stream_structured_output(prompt, model_class):
data = partial.model_dump_json(exclude_none=True)
yield f"data: {data}\n\n"
return StreamingResponse(
generate(),
media_type="text/event-stream",
headers={"X-Accel-Buffering": "no"},
)
Streaming Structured Output Architecture
Client Request
│
▼
FastAPI SSE Endpoint
│
▼
Instructor create_partial()
│
├──→ chunk1: {"name": "Product A"...}
├──→ chunk2: {"name": "Product A", "price": 99...}
├──→ chunk3: {"name": "Product A", "price": 99.0, "category": "electronics"...}
└──→ Final: Complete Pydantic model instance
Each chunk pushed to client via SSE
Client renders UI progressively
Pattern 6: Production-Grade Reliability
Integrating all patterns into a production-ready structured output service.
import json
import time
import logging
from typing import Optional
from dataclasses import dataclass, field
from enum import Enum
from pydantic import BaseModel, Field
from openai import AsyncOpenAI
logger = logging.getLogger(__name__)
class OutputStatus(str, Enum):
SUCCESS = "success"
VALIDATION_FAILED = "validation_failed"
PARSE_FAILED = "parse_failed"
LLM_ERROR = "llm_error"
TIMEOUT = "timeout"
RETRY_EXHAUSTED = "retry_exhausted"
@dataclass
class ExtractionResult:
data: Optional[BaseModel] = None
status: OutputStatus = OutputStatus.SUCCESS
attempts: int = 0
latency_ms: float = 0.0
error_message: str = ""
model_used: str = ""
tokens_used: dict = field(default_factory=dict)
class CircuitBreaker:
def __init__(
self,
failure_threshold: int = 5,
recovery_timeout: float = 60.0,
):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.failure_count = 0
self.last_failure_time: Optional[float] = None
self.is_open = False
def record_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.is_open = True
def record_success(self):
self.failure_count = 0
self.is_open = False
def can_execute(self) -> bool:
if not self.is_open:
return True
if self.last_failure_time and \
time.time() - self.last_failure_time > self.recovery_timeout:
self.is_open = False
self.failure_count = 0
return True
return False
class StructuredOutputService:
def __init__(
self,
max_retries: int = 3,
timeout: float = 30.0,
fallback_models: Optional[list[str]] = None,
):
self.client = AsyncOpenAI()
self.max_retries = max_retries
self.timeout = timeout
self.fallback_models = fallback_models or ["gpt-4o", "gpt-4o-mini"]
self.circuit_breakers: dict[str, CircuitBreaker] = {}
def _get_breaker(self, model: str) -> CircuitBreaker:
if model not in self.circuit_breakers:
self.circuit_breakers[model] = CircuitBreaker()
return self.circuit_breakers[model]
async def extract(
self,
text: str,
model_class: type[BaseModel],
preferred_model: Optional[str] = None,
) -> ExtractionResult:
import instructor
models = [preferred_model] if preferred_model else self.fallback_models
models = [m for m in models if self._get_breaker(m).can_execute()]
if not models:
return ExtractionResult(
status=OutputStatus.RETRY_EXHAUSTED,
error_message="All model circuit breakers are open",
)
for model in models:
result = await self._try_extract(text, model_class, model)
if result.status == OutputStatus.SUCCESS:
self._get_breaker(model).record_success()
return result
else:
self._get_breaker(model).record_failure()
logger.warning(f"Model {model} extraction failed: {result.error_message}")
return result
async def _try_extract(
self,
text: str,
model_class: type[BaseModel],
model: str,
) -> ExtractionResult:
import instructor
start_time = time.time()
client = instructor.from_openai(self.client, mode=instructor.Mode.JSON_SCHEMA)
for attempt in range(1, self.max_retries + 1):
try:
result = await client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": text}],
response_model=model_class,
max_retries=0,
timeout=self.timeout,
)
latency = (time.time() - start_time) * 1000
return ExtractionResult(
data=result,
status=OutputStatus.SUCCESS,
attempts=attempt,
latency_ms=latency,
model_used=model,
)
except instructor.exceptions.InstructorRetryException as e:
logger.warning(f"Attempt {attempt} validation failed: {e}")
continue
except Exception as e:
error_msg = str(e)
if "timeout" in error_msg.lower():
return ExtractionResult(
status=OutputStatus.TIMEOUT,
attempts=attempt,
latency_ms=(time.time() - start_time) * 1000,
error_message=error_msg,
model_used=model,
)
logger.error(f"Attempt {attempt} LLM error: {e}")
continue
return ExtractionResult(
status=OutputStatus.RETRY_EXHAUSTED,
attempts=self.max_retries,
latency_ms=(time.time() - start_time) * 1000,
error_message=f"Failed after {self.max_retries} retries",
model_used=model,
)
class StructuredOutputCache:
def __init__(self, ttl: float = 3600.0, max_size: int = 1000):
self.ttl = ttl
self.max_size = max_size
self._cache: dict[str, tuple[float, BaseModel]] = {}
def _make_key(self, text: str, model_class: type[BaseModel]) -> str:
import hashlib
content_hash = hashlib.sha256(text.encode()).hexdigest()[:16]
return f"{model_class.__name__}:{content_hash}"
def get(self, text: str, model_class: type[BaseModel]) -> Optional[BaseModel]:
key = self._make_key(text, model_class)
if key in self._cache:
timestamp, data = self._cache[key]
if time.time() - timestamp < self.ttl:
return data
del self._cache[key]
return None
def set(self, text: str, model_class: type[BaseModel], data: BaseModel):
if len(self._cache) >= self.max_size:
oldest_key = min(self._cache, key=lambda k: self._cache[k][0])
del self._cache[oldest_key]
key = self._make_key(text, model_class)
self._cache[key] = (time.time(), data)
Production Architecture Overview
┌────────────────────────────┐
│ StructuredOutputService │
│ (Unified Entry Point) │
└──────────┬─────────────────┘
│
┌──────────▼─────────────────┐
│ CircuitBreaker │
│ (Per-model circuit breaker) │
└──────────┬─────────────────┘
│
┌────────────────┼────────────────┐
│ │ │
┌─────────▼──────┐ ┌──────▼───────┐ ┌──────▼───────┐
│ gpt-4o │ │ gpt-4o-mini │ │ fallback │
│ JSON_SCHEMA │ │ JSON_SCHEMA │ │ Prompt+Parse│
│ +Instructor │ │ +Instructor │ │ +retry │
└────────────────┘ └──────────────┘ └──────────────┘
│ │ │
└────────────────┼────────────────┘
│
┌──────────▼─────────────────┐
│ StructuredOutputCache │
│ (Result cache) │
└────────────────────────────┘
5 Common Pitfalls and Solutions
Pitfall 1: LLM Returns JSON with Comments
import json
import re
def strip_json_comments(text: str) -> str:
text = re.sub(r'//.*?$', '', text, flags=re.MULTILINE)
text = re.sub(r'/\*.*?\*/', '', text, flags=re.DOTALL)
return text
raw = '''{
"name": "Product A", // this is a comment
"price": 99.0
/* multi-line
comment */
}'''
clean = strip_json_comments(raw)
data = json.loads(clean)
Pitfall 2: Nested Schema Causes Output Truncation
from pydantic import BaseModel, Field
class Address(BaseModel):
street: str
city: str
zip_code: str
class PersonFlat(BaseModel):
name: str
street: str = Field(description="Street address")
city: str = Field(description="City")
zip_code: str = Field(description="ZIP code")
class PersonNested(BaseModel):
name: str
address: Address
# Recommendation: Keep nesting depth under 2 levels; flatten if deeper
# Not recommended: Person → Address → GeoLocation → Coordinates
# Recommended: PersonFlat (all fields at the same level)
Pitfall 3: LLM Doesn't Respect Enum Values
from enum import Enum
from pydantic import BaseModel, Field, field_validator
class Sentiment(str, Enum):
POSITIVE = "positive"
NEGATIVE = "negative"
NEUTRAL = "neutral"
class ReviewWithEnum(BaseModel):
text: str
sentiment: Sentiment
@field_validator("sentiment", mode="before")
@classmethod
def normalize_sentiment(cls, v):
if isinstance(v, str):
v = v.strip().lower()
mapping = {
"great": "positive", "good": "positive", "excellent": "positive",
"bad": "negative", "poor": "negative", "terrible": "negative",
"okay": "neutral", "average": "neutral", "so-so": "neutral",
}
return mapping.get(v, v)
return v
Pitfall 4: Uncontrollable List Length
from pydantic import BaseModel, Field, field_validator
class TaggedContent(BaseModel):
content: str
tags: list[str] = Field(min_length=1, max_length=5)
@field_validator("tags")
@classmethod
def deduplicate_tags(cls, v: list[str]) -> list[str]:
seen = set()
result = []
for tag in v:
normalized = tag.strip().lower()
if normalized not in seen:
seen.add(normalized)
result.append(tag.strip())
return result[:5]
Pitfall 5: Strict Mode Doesn't Support All Schema Features
from pydantic import BaseModel, Field
# Features NOT supported in Strict Mode:
# 1. additionalProperties: false must be explicitly set
# 2. Optional fields must have default values
# 3. Union types not supported (some models)
# 4. Complex regex patterns not supported
# Solution: Simplify Schema + compensate with field_validator
class SimpleProduct(BaseModel):
name: str
price: float = Field(gt=0)
category: str = Field(default="other")
tags: list[str] = Field(default_factory=list)
10 Common Error Troubleshooting
| # | Error Message | Cause | Solution |
|---|---|---|---|
| 1 | json.decoder.JSONDecodeError |
LLM returned invalid JSON | Use extract_json_from_response for fault-tolerant extraction |
| 2 | ValidationError: field required |
LLM omitted required field | Add default values or use Instructor auto-retry |
| 3 | InstructorRetryException: max retries |
Still failing validation after 3 retries | Check if Schema is too complex, simplify nesting |
| 4 | TypeError: 'NoneType' object |
tool_calls is empty, LLM didn't call function | Check tool_choice setting, confirm model supports function calling |
| 5 | RateLimitError: 429 |
API call rate exceeded | Add exponential backoff retry, reduce concurrency |
| 6 | Timeout: request timed out |
LLM inference timeout | Reduce Schema complexity, increase timeout parameter |
| 7 | BadRequestError: Invalid schema |
Schema doesn't meet model requirements | Check strict mode limitations, simplify Schema |
| 8 | ValidationError: string too long |
LLM returned overly long string | Add max_length constraint |
| 9 | KeyError: 'tool_calls' |
Model doesn't support function calling | Switch to JSON Schema mode or Prompt mode |
| 10 | RecursionError: maximum depth |
Schema nesting too deep | Flatten nested structure, max 2 levels |
Advanced Optimization Tips
Tip 1: Few-shot Examples Improve Accuracy
from pydantic import BaseModel, Field
from openai import AsyncOpenAI
import instructor
class Classification(BaseModel):
category: str = Field(description="Category")
confidence: float = Field(ge=0.0, le=1.0)
async def few_shot_extract(text: str) -> Classification:
client = instructor.from_openai(AsyncOpenAI())
examples = [
{"role": "user", "content": "This product is amazing, highly recommended!"},
{"role": "assistant", "content": '{"category": "positive", "confidence": 0.95}'},
{"role": "user", "content": "Quality is average, price is too high"},
{"role": "assistant", "content": '{"category": "neutral", "confidence": 0.7}'},
]
return await client.chat.completions.create(
model="gpt-4o",
messages=examples + [{"role": "user", "content": text}],
response_model=Classification,
max_retries=2,
)
Tip 2: Schema Description Optimization
from pydantic import BaseModel, Field
class BadSchema(BaseModel):
type: str
value: str
class GoodSchema(BaseModel):
type: str = Field(
description="Entity type, must be one of: person, organization, location, date"
)
value: str = Field(
description="Standardized entity value. Use full name for person, "
"official name for organization, YYYY-MM-DD for date, "
"city+country for location"
)
Tip 3: Progressive Extraction for Complex Structures
from pydantic import BaseModel, Field
from openai import AsyncOpenAI
import instructor
class BasicInfo(BaseModel):
title: str
summary: str
class DetailedInfo(BasicInfo):
key_points: list[str]
entities: list[str]
sentiment: str
async def progressive_extract(text: str) -> DetailedInfo:
client = instructor.from_openai(AsyncOpenAI())
basic = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Extract basic info:\n{text}"}],
response_model=BasicInfo,
max_retries=2,
)
detailed = await client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": f"Based on the basic info below, extract detailed analysis:\n"
f"Title: {basic.title}\nSummary: {basic.summary}\n\nOriginal text:\n{text}"}
],
response_model=DetailedInfo,
max_retries=2,
)
return detailed
Tip 4: Output Quality Self-Check
from pydantic import BaseModel, Field, model_validator
class SelfValidatingOutput(BaseModel):
question: str
answer: str
sources: list[str] = Field(min_length=1)
confidence: float = Field(ge=0.0, le=1.0)
@model_validator(mode="after")
def check_answer_quality(self):
if len(self.answer) < 10:
raise ValueError("Answer too short, may be incomplete")
if self.confidence > 0.9 and len(self.sources) < 2:
raise ValueError("High confidence but insufficient sources, please re-verify")
return self
Comparison: 3 Structured Output Approaches
| Dimension | Prompt+JSON Parse | Function Calling Protocol | Instructor Library |
|---|---|---|---|
| Reliability | ⭐⭐ 60-80% | ⭐⭐⭐⭐ 90-95% | ⭐⭐⭐⭐⭐ 95-99% |
| Implementation complexity | Low | Medium | Low (after wrapping) |
| Model compatibility | All models | OpenAI/some models | OpenAI/Anthropic/Gemini |
| Auto-retry | ❌ Manual | ❌ Manual | ✅ Built-in |
| Streaming support | ❌ Difficult | ⚠️ Limited | ✅ create_partial |
| Schema validation | ❌ Manual | ⚠️ Partial | ✅ Pydantic auto |
| Debugging difficulty | High | Medium | Low |
| Production recommendation | ⭐ Not recommended | ⭐⭐⭐ Recommended | ⭐⭐⭐⭐⭐ Strongly recommended |
| Token overhead | Low | Medium (+tool definition) | Medium (+tool definition) |
| Nesting depth | Unlimited | Limited | Limited |
Decision Tree
Need structured output?
├── No → Use Chat Completion directly
└── Yes → What models are you using?
├── OpenAI only → Instructor + Mode.JSON_SCHEMA
├── OpenAI + Anthropic → Instructor + Adapter pattern
├── Any model → Prompt+JSON parse + strict validation
└── Need streaming → Instructor + create_partial
Recommended Online Tools
- JSON Formatter & Validator: /en/json/format
- JSONPath Query: /en/json/jsonpath
- cURL to Code: /en/dev/curl-to-code
Summary: Python LLM structured output is the core infrastructure of AI engineering. 6 patterns from simple to complex: JSON Schema constraints → function calling protocol → Instructor auto-retry → multi-model adaptation → streaming structured output → production reliability. For production, prefer the Instructor library with Pydantic validation and auto-retry, achieving 95%+ reliability. Key points: 1) Keep nested Schema under 2 levels, 2) Use field_validator to normalize enum values, 3) Use circuit breakers to protect downstream models, 4) Use caching to reduce duplicate calls. For multi-model scenarios, use the adapter pattern for unified interfaces; for streaming scenarios, use create_partial for progressive output.
Related Reading
Try these browser-local tools — no sign-up required →