Python LLM Structured Output: 6 Production Patterns from JSON Schema to Function Calling

LLM Outputs a Blob of Free Text, and Your Downstream Systems Crash

You ask GPT to return JSON, it gives you JSON with comments. You specify an integer field, it returns the string "42". You request a list of 3 items, it gives you 5. LLM structured output is the most critical foundational capability in AI engineering in 2026 — without it, your RAG pipelines, agent tool calls, and data extraction workflows are all ticking time bombs.

This article starts from JSON Schema constraints and guides you through JSON Schema validation → OpenAI function calling → Instructor auto-retry → multi-model adaptation → streaming structured output → production reliability — 6 production patterns from concept to implementation.

Key Takeaways

Understand 3 core mechanisms for LLM structured output: Prompt constraints, JSON Schema, function calling protocol
Master 6 Python structured output patterns from simple to complex
Learn Instructor library's auto-retry and validation strategies
Implement cross-model (OpenAI/Anthropic/Gemini) structured output adaptation
Build a production-grade reliability assurance system

LLM Structured Output Core Concepts
Pattern 1: JSON Schema Constrained Output
Pattern 2: OpenAI Function Calling Protocol
Pattern 3: Instructor Library Auto-Retry
Pattern 4: Multi-Model Structured Output Adaptation
Pattern 5: Streaming Structured Output
Pattern 6: Production-Grade Reliability
5 Common Pitfalls and Solutions
10 Common Error Troubleshooting
Advanced Optimization Tips
Comparison: 3 Structured Output Approaches
Recommended Online Tools

LLM Structured Output Core Concepts

Concept	Description
Structured Output	LLM outputs structured data (JSON/XML) conforming to a predefined Schema
JSON Schema	Specification for describing JSON data structure, used to constrain and validate LLM output
Function Calling	Protocol proposed by OpenAI, making LLM output JSON conforming to function parameter Schema
Tool Use	Anthropic/Gemini's implementation of function calling, semantically equivalent
Constrained Decoding	Constraining token selection during inference, guaranteeing 100% Schema compliance
Instructor	Python library that auto-generates Schema + validation + retry based on Pydantic models

Why LLM Structured Output Matters

Traditional LLM output flow:
  User Prompt → LLM free generation → String → Regex/JSON parsing → May fail → Retry

Structured output flow:
  User Prompt + Schema → LLM constrained generation → Valid JSON → Pydantic validation → Success

Key differences:
  1. Traditional: unpredictable output, fragile parsing, high retry cost
  2. Structured: predictable output, reliable validation, guaranteed retries

3 Structured Output Mechanisms Compared

Mechanism	Principle	Reliability	Latency	Compatibility
Prompt constraints	Describe output format in prompt	⭐Low	No extra	All models
JSON Schema	Constrain output structure via Schema	⭐⭐Medium	Slight	Some models
Function calling protocol	Dedicated API channel + Constrained Decoding	⭐⭐⭐High	Slight	Specific models

Pattern 1: JSON Schema Constrained Output

The most basic structured output approach: describe format requirements in the prompt, validate results with JSON Schema.

import json
import re
from typing import Optional
from pydantic import BaseModel, Field, ValidationError


class MovieReview(BaseModel):
    title: str = Field(description="Movie title")
    rating: int = Field(ge=1, le=10, description="Rating 1-10")
    sentiment: str = Field(pattern="^(positive|negative|neutral)$")
    summary: str = Field(max_length=200, description="Brief review")
    recommended: bool = Field(description="Whether recommended")


MOVIE_REVIEW_SCHEMA = MovieReview.model_json_schema()

STRUCTURED_PROMPT = """You are a professional movie review analyzer.

Analyze the following review and return results strictly following the JSON Schema.

JSON Schema:
{schema}

Review content:
{review}

Important requirements:
1. Must return valid JSON
2. rating must be an integer from 1-10
3. sentiment can only be positive/negative/neutral
4. Do not add any content outside the JSON
5. Do not wrap with ```json```
"""


def extract_json_from_response(text: str) -> Optional[dict]:
    patterns = [
        r'```json\s*(.*?)\s*```',
        r'```\s*(.*?)\s*```',
        r'(\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\})',
    ]
    for pattern in patterns:
        match = re.search(pattern, text, re.DOTALL)
        if match:
            try:
                return json.loads(match.group(1))
            except json.JSONDecodeError:
                continue
    try:
        return json.loads(text.strip())
    except json.JSONDecodeError:
        return None


def parse_structured_output(raw_text: str) -> Optional[MovieReview]:
    parsed_json = extract_json_from_response(raw_text)
    if parsed_json is None:
        return None
    try:
        return MovieReview.model_validate(parsed_json)
    except ValidationError as e:
        print(f"Validation failed: {e}")
        return None


async def call_llm_with_schema(prompt: str) -> Optional[MovieReview]:
    from openai import AsyncOpenAI

    client = AsyncOpenAI()
    formatted_prompt = STRUCTURED_PROMPT.format(
        schema=json.dumps(MOVIE_REVIEW_SCHEMA, ensure_ascii=False, indent=2),
        review=prompt
    )

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": formatted_prompt}],
        temperature=0.1,
    )

    raw_text = response.choices[0].message.content or ""
    return parse_structured_output(raw_text)

Limitations of JSON Schema Validation

Issue 1: LLM may return invalid JSON
  → Need extract_json_from_response for fault-tolerant extraction

Issue 2: LLM may ignore Schema constraints
  → rating returns "9 points" instead of 9
  → sentiment returns "very positive" instead of positive

Issue 3: Nested structures are error-prone
  → List length is uncontrollable
  → Optional fields may be missing

Issue 4: Must hand-write prompts every time
  → High maintenance cost, easy to miss details

Pattern 2: OpenAI Function Calling Protocol

OpenAI's Function Calling protocol is the standard approach for structured output, using a dedicated API channel to make LLM output JSON conforming to the Schema.

import json
from typing import Optional
from pydantic import BaseModel, Field
from openai import AsyncOpenAI


class SentimentAnalysis(BaseModel):
    text: str = Field(description="The analyzed text")
    sentiment: str = Field(description="Sentiment: positive/negative/neutral")
    confidence: float = Field(ge=0.0, le=1.0, description="Confidence 0-1")
    keywords: list[str] = Field(description="Keyword list")
    language: str = Field(description="Detected language")


class EntityExtraction(BaseModel):
    entities: list[dict] = Field(description="Extracted entity list")
    relationships: list[dict] = Field(default_factory=list, description="Entity relationships")
    summary: str = Field(description="Text summary")


def pydantic_to_function_schema(model_class: type[BaseModel]) -> dict:
    schema = model_class.model_json_schema()
    return {
        "type": "function",
        "function": {
            "name": model_class.__name__,
            "description": model_class.__doc__ or f"Extract {model_class.__name__}",
            "parameters": {
                "type": "object",
                "properties": schema.get("properties", {}),
                "required": schema.get("required", []),
            }
        }
    }


async def function_calling_extract(
    text: str,
    model_class: type[BaseModel],
    model: str = "gpt-4o"
) -> Optional[BaseModel]:
    client = AsyncOpenAI()

    function_schema = pydantic_to_function_schema(model_class)

    response = await client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "system",
                "content": "You are a precise data extraction assistant. Use the provided function for structured output."
            },
            {
                "role": "user",
                "content": text
            }
        ],
        tools=[function_schema],
        tool_choice={"type": "function", "function": {"name": model_class.__name__}},
    )

    message = response.choices[0].message

    if message.tool_calls and len(message.tool_calls) > 0:
        tool_call = message.tool_calls[0]
        try:
            args = json.loads(tool_call.function.arguments)
            return model_class.model_validate(args)
        except (json.JSONDecodeError, Exception) as e:
            print(f"Failed to parse function call result: {e}")
            return None

    return None


async def multi_function_calling(
    text: str,
    model_classes: list[type[BaseModel]],
    model: str = "gpt-4o"
) -> dict[str, BaseModel]:
    client = AsyncOpenAI()

    tool_schemas = [pydantic_to_function_schema(cls) for cls in model_classes]

    response = await client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a multi-task data extraction assistant."},
            {"role": "user", "content": text}
        ],
        tools=tool_schemas,
        tool_choice="auto",
    )

    results = {}
    message = response.choices[0].message

    if message.tool_calls:
        for tool_call in message.tool_calls:
            for cls in model_classes:
                if tool_call.function.name == cls.__name__:
                    try:
                        args = json.loads(tool_call.function.arguments)
                        results[cls.__name__] = cls.model_validate(args)
                    except Exception as e:
                        print(f"Failed to parse {cls.__name__}: {e}")

    return results

Function Calling Strict Mode

from openai import AsyncOpenAI


async def strict_structured_output(
    text: str,
    model_class: type[BaseModel],
    model: str = "gpt-4o-2024-08-06"
) -> Optional[BaseModel]:
    client = AsyncOpenAI()

    schema = model_class.model_json_schema()

    response = await client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "Extract structured data"},
            {"role": "user", "content": text}
        ],
        response_format={
            "type": "json_schema",
            "json_schema": {
                "name": model_class.__name__,
                "strict": True,
                "schema": schema,
            }
        }
    )

    raw = response.choices[0].message.content
    if raw:
        try:
            return model_class.model_validate(json.loads(raw))
        except Exception as e:
            print(f"Strict mode parse failed: {e}")
    return None

Pattern 3: Instructor Library Auto-Retry

Instructor is the best practice for Python LLM structured output, auto-generating Schema, validating output, and auto-retrying based on Pydantic models.

import instructor
from pydantic import BaseModel, Field, field_validator
from openai import AsyncOpenAI


class ProductInfo(BaseModel):
    name: str = Field(description="Product name")
    price: float = Field(gt=0, description="Price, must be greater than 0")
    category: str = Field(description="Product category")
    features: list[str] = Field(description="Product features list", min_length=1, max_length=5)
    in_stock: bool = Field(description="Whether in stock")

    @field_validator("price")
    @classmethod
    def round_price(cls, v: float) -> float:
        return round(v, 2)

    @field_validator("category")
    @classmethod
    def normalize_category(cls, v: str) -> str:
        return v.strip().lower()


class ArticleMetadata(BaseModel):
    title: str = Field(description="Article title")
    author: str = Field(description="Author")
    publish_date: str = Field(description="Publish date, format YYYY-MM-DD")
    tags: list[str] = Field(description="Tag list")
    word_count: int = Field(ge=0, description="Word count")
    reading_time_minutes: int = Field(ge=1, description="Estimated reading time (minutes)")

    @field_validator("publish_date")
    @classmethod
    def validate_date_format(cls, v: str) -> str:
        import re
        if not re.match(r'^\d{4}-\d{2}-\d{2}$', v):
            raise ValueError(f"Invalid date format: {v}, expected YYYY-MM-DD")
        return v


async def instructor_extract_product(text: str) -> ProductInfo:
    client = instructor.from_openai(AsyncOpenAI())

    result = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": f"Extract product info from the following text:\n\n{text}"}
        ],
        response_model=ProductInfo,
        max_retries=3,
        temperature=0.1,
    )

    return result


async def instructor_extract_with_mode(
    text: str,
    mode: instructor.Mode = instructor.Mode.TOOLS
) -> ArticleMetadata:
    client = instructor.from_openai(AsyncOpenAI(), mode=mode)

    result = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": f"Extract article metadata:\n\n{text}"}
        ],
        response_model=ArticleMetadata,
        max_retries=3,
    )

    return result


async def instructor_partial_streaming(text: str):
    client = instructor.from_openai(AsyncOpenAI())

    article = await client.chat.completions.create_partial(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": f"Extract article metadata:\n\n{text}"}
        ],
        response_model=ArticleMetadata,
        max_retries=3,
    )

    async for partial in article:
        print(f"Partial result: {partial.model_dump_json(exclude_none=True)}")

Instructor Mode Selection

Mode.JSON_SCHEMA    → OpenAI's response_format=json_schema (recommended, most reliable)
Mode.TOOLS          → OpenAI's function calling (good compatibility)
Mode.JSON           → Request JSON output in prompt (most universal, least reliable)
Mode.ANTHROPIC_TOOLS→ Anthropic's tool_use
Mode.GEMINI_JSON    → Gemini's JSON mode

Instructor Retry Strategy Deep Dive

import instructor
from pydantic import BaseModel, Field, ValidationError
from openai import AsyncOpenAI
from tenacity import retry, stop_after_attempt, wait_exponential


class StrictUser(BaseModel):
    name: str = Field(min_length=2, max_length=50)
    age: int = Field(ge=0, le=150)
    email: str = Field(pattern=r'^[\w\.-]+@[\w\.-]+\.\w+$')


async def instructor_with_custom_retry(text: str) -> StrictUser:
    client = instructor.from_openai(
        AsyncOpenAI(),
        mode=instructor.Mode.JSON_SCHEMA,
    )

    result, completion = await client.chat.completions.create_with_completion(
        model="gpt-4o",
        messages=[{"role": "user", "content": text}],
        response_model=StrictUser,
        max_retries=3,
        validation_context={"strict": True},
    )

    print(f"Token usage: prompt={completion.usage.prompt_tokens}, "
          f"completion={completion.usage.completion_tokens}")

    return result


async def instructor_batch_extract(
    texts: list[str],
) -> list[StrictUser]:
    client = instructor.from_openai(AsyncOpenAI())

    results = []
    for text in texts:
        try:
            result = await client.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": text}],
                response_model=StrictUser,
                max_retries=2,
            )
            results.append(result)
        except instructor.exceptions.InstructorRetryException as e:
            print(f"Batch extraction failed, skipping: {e}")
            results.append(None)

    return results

Pattern 4: Multi-Model Structured Output Adaptation

Different LLM vendors have different structured output APIs, requiring an adaptation layer for unified handling.

import json
from abc import ABC, abstractmethod
from typing import Optional, TypeVar
from pydantic import BaseModel
from openai import AsyncOpenAI

T = TypeVar("T", bound=BaseModel)


class StructuredOutputAdapter(ABC):
    @abstractmethod
    async def extract(self, text: str, model_class: type[T]) -> Optional[T]:
        pass


class OpenAIStructuredAdapter(StructuredOutputAdapter):
    def __init__(self, model: str = "gpt-4o"):
        self.client = AsyncOpenAI()
        self.model = model

    async def extract(self, text: str, model_class: type[T]) -> Optional[T]:
        import instructor
        client = instructor.from_openai(self.client, mode=instructor.Mode.JSON_SCHEMA)

        return await client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": text}],
            response_model=model_class,
            max_retries=3,
        )


class AnthropicStructuredAdapter(StructuredOutputAdapter):
    def __init__(self, model: str = "claude-sonnet-4-20250514"):
        try:
            import anthropic
            self.client = anthropic.AsyncAnthropic()
        except ImportError:
            raise ImportError("Install anthropic: pip install anthropic")
        self.model = model

    async def extract(self, text: str, model_class: type[T]) -> Optional[T]:
        import anthropic
        import instructor

        client = instructor.from_anthropic(
            self.client,
            mode=instructor.Mode.ANTHROPIC_TOOLS,
        )

        return await client.chat.completions.create(
            model=self.model,
            max_tokens=4096,
            messages=[{"role": "user", "content": text}],
            response_model=model_class,
            max_retries=3,
        )


class GeminiStructuredAdapter(StructuredOutputAdapter):
    def __init__(self, model: str = "gemini-2.0-flash"):
        self.model_name = model

    async def extract(self, text: str, model_class: type[T]) -> Optional[T]:
        import google.generativeai as genai
        import os

        genai.configure(api_key=os.environ.get("GEMINI_API_KEY"))
        model = genai.GenerativeModel(self.model_name)

        schema = model_class.model_json_schema()
        prompt = f"""Extract structured data from the following text.
Return results strictly following the JSON Schema. Do not add any extra content.

JSON Schema:
{json.dumps(schema, ensure_ascii=False, indent=2)}

Text:
{text}"""

        response = await model.generate_content_async(prompt)
        raw = response.text.strip()

        if raw.startswith("```json"):
            raw = raw[7:]
        if raw.endswith("```"):
            raw = raw[:-3]
        raw = raw.strip()

        try:
            return model_class.model_validate(json.loads(raw))
        except Exception as e:
            print(f"Gemini parse failed: {e}")
            return None


class MultiModelStructuredExtractor:
    def __init__(self):
        self.adapters: dict[str, StructuredOutputAdapter] = {}

    def register(self, name: str, adapter: StructuredOutputAdapter):
        self.adapters[name] = adapter

    async def extract(
        self,
        text: str,
        model_class: type[T],
        preferred: str = "openai",
        fallback: bool = True,
    ) -> Optional[T]:
        order = [preferred]
        if fallback:
            order.extend([k for k in self.adapters if k != preferred])

        for model_name in order:
            adapter = self.adapters.get(model_name)
            if adapter is None:
                continue
            try:
                result = await adapter.extract(text, model_class)
                if result is not None:
                    return result
            except Exception as e:
                print(f"[{model_name}] Extraction failed: {e}")
                continue

        return None

    async def extract_consensus(
        self,
        text: str,
        model_class: type[T],
        min_agreement: int = 2,
    ) -> Optional[T]:
        import asyncio

        tasks = {
            name: adapter.extract(text, model_class)
            for name, adapter in self.adapters.items()
        }

        results = await asyncio.gather(*tasks.values(), return_exceptions=True)

        valid_results = []
        for (name, _), result in zip(tasks.items(), results):
            if isinstance(result, Exception):
                print(f"[{name}] Exception: {result}")
                continue
            if result is not None:
                valid_results.append(result)

        if len(valid_results) >= min_agreement:
            return valid_results[0]

        return valid_results[0] if valid_results else None

Multi-Model Adaptation Architecture

                    ┌─────────────────────┐
                    │  MultiModelExtractor │
                    │  (Unified Interface) │
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
    ┌─────────▼──────┐ ┌──────▼───────┐ ┌──────▼───────┐
    │ OpenAI Adapter │ │Anthropic Adp │ │ Gemini Adptr │
    │ JSON_SCHEMA    │ │ANTHROPIC_TOOLS│ │ Prompt+Parse │
    │ Instructor     │ │ Instructor   │ │ Manual parse  │
    └────────────────┘ └──────────────┘ └──────────────┘

Pattern 5: Streaming Structured Output

Combining LLM structured output with streaming for real-time parsing and progressive display.

import json
import asyncio
from typing import AsyncIterator, Optional
from pydantic import BaseModel, Field
from openai import AsyncOpenAI


class StreamingJsonParser:
    def __init__(self):
        self.buffer = ""
        self.depth = 0
        self.in_string = False
        self.escape_next = False
        self.started = False

    def feed(self, chunk: str) -> list[dict]:
        self.buffer += chunk
        results = []

        for char in chunk:
            if self.escape_next:
                self.escape_next = False
                continue

            if char == '\\' and self.in_string:
                self.escape_next = True
                continue

            if char == '"' and not self.escape_next:
                self.in_string = not self.in_string
                continue

            if self.in_string:
                continue

            if char == '{':
                if not self.started:
                    self.started = True
                    idx = self.buffer.rfind('{')
                    self.buffer = self.buffer[idx:]
                self.depth += 1
            elif char == '}':
                self.depth -= 1
                if self.depth == 0 and self.started:
                    try:
                        parsed = json.loads(self.buffer)
                        results.append(parsed)
                    except json.JSONDecodeError:
                        pass
                    self.buffer = ""
                    self.started = False

        return results


class PartialModelBuilder:
    def __init__(self, model_class: type[BaseModel]):
        self.model_class = model_class
        self.current_json = {}
        self.last_valid = None

    def update(self, json_data: dict) -> Optional[BaseModel]:
        self.current_json.update(json_data)
        try:
            self.last_valid = self.model_class.model_validate(self.current_json)
            return self.last_valid
        except Exception:
            return self.last_valid


class StreamingEvent(BaseModel):
    event_type: str = Field(description="Event type")
    data: dict = Field(description="Event data")
    confidence: float = Field(ge=0.0, le=1.0, description="Confidence")


async def stream_structured_output(
    prompt: str,
    model_class: type[BaseModel],
    model: str = "gpt-4o",
) -> AsyncIterator[BaseModel]:
    import instructor

    client = instructor.from_openai(AsyncOpenAI())

    stream = await client.chat.completions.create_partial(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        response_model=model_class,
        max_retries=2,
    )

    async for partial in stream:
        yield partial


async def stream_with_raw_parser(
    prompt: str,
    model: str = "gpt-4o",
) -> AsyncIterator[dict]:
    client = AsyncOpenAI()

    schema_prompt = f"""Return results in JSON format. Only return JSON, nothing else.
{prompt}"""

    stream = await client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": schema_prompt}],
        stream=True,
        temperature=0.1,
    )

    parser = StreamingJsonParser()
    full_content = ""

    async for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            full_content += content
            parsed_results = parser.feed(content)
            for result in parsed_results:
                yield result

    if not parsed_results and full_content:
        try:
            yield json.loads(full_content)
        except json.JSONDecodeError:
            pass


async def stream_sse_structured(
    prompt: str,
    model_class: type[BaseModel],
):
    from fastapi import FastAPI
    from fastapi.responses import StreamingResponse

    app = FastAPI()

    async def generate():
        async for partial in stream_structured_output(prompt, model_class):
            data = partial.model_dump_json(exclude_none=True)
            yield f"data: {data}\n\n"

    return StreamingResponse(
        generate(),
        media_type="text/event-stream",
        headers={"X-Accel-Buffering": "no"},
    )

Streaming Structured Output Architecture

Client Request
    │
    ▼
FastAPI SSE Endpoint
    │
    ▼
Instructor create_partial()
    │
    ├──→ chunk1: {"name": "Product A"...}
    ├──→ chunk2: {"name": "Product A", "price": 99...}
    ├──→ chunk3: {"name": "Product A", "price": 99.0, "category": "electronics"...}
    └──→ Final: Complete Pydantic model instance

Each chunk pushed to client via SSE
Client renders UI progressively

Pattern 6: Production-Grade Reliability

Integrating all patterns into a production-ready structured output service.

import json
import time
import logging
from typing import Optional
from dataclasses import dataclass, field
from enum import Enum
from pydantic import BaseModel, Field
from openai import AsyncOpenAI

logger = logging.getLogger(__name__)


class OutputStatus(str, Enum):
    SUCCESS = "success"
    VALIDATION_FAILED = "validation_failed"
    PARSE_FAILED = "parse_failed"
    LLM_ERROR = "llm_error"
    TIMEOUT = "timeout"
    RETRY_EXHAUSTED = "retry_exhausted"


@dataclass
class ExtractionResult:
    data: Optional[BaseModel] = None
    status: OutputStatus = OutputStatus.SUCCESS
    attempts: int = 0
    latency_ms: float = 0.0
    error_message: str = ""
    model_used: str = ""
    tokens_used: dict = field(default_factory=dict)


class CircuitBreaker:
    def __init__(
        self,
        failure_threshold: int = 5,
        recovery_timeout: float = 60.0,
    ):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failure_count = 0
        self.last_failure_time: Optional[float] = None
        self.is_open = False

    def record_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.is_open = True

    def record_success(self):
        self.failure_count = 0
        self.is_open = False

    def can_execute(self) -> bool:
        if not self.is_open:
            return True
        if self.last_failure_time and \
           time.time() - self.last_failure_time > self.recovery_timeout:
            self.is_open = False
            self.failure_count = 0
            return True
        return False


class StructuredOutputService:
    def __init__(
        self,
        max_retries: int = 3,
        timeout: float = 30.0,
        fallback_models: Optional[list[str]] = None,
    ):
        self.client = AsyncOpenAI()
        self.max_retries = max_retries
        self.timeout = timeout
        self.fallback_models = fallback_models or ["gpt-4o", "gpt-4o-mini"]
        self.circuit_breakers: dict[str, CircuitBreaker] = {}

    def _get_breaker(self, model: str) -> CircuitBreaker:
        if model not in self.circuit_breakers:
            self.circuit_breakers[model] = CircuitBreaker()
        return self.circuit_breakers[model]

    async def extract(
        self,
        text: str,
        model_class: type[BaseModel],
        preferred_model: Optional[str] = None,
    ) -> ExtractionResult:
        import instructor

        models = [preferred_model] if preferred_model else self.fallback_models
        models = [m for m in models if self._get_breaker(m).can_execute()]

        if not models:
            return ExtractionResult(
                status=OutputStatus.RETRY_EXHAUSTED,
                error_message="All model circuit breakers are open",
            )

        for model in models:
            result = await self._try_extract(text, model_class, model)
            if result.status == OutputStatus.SUCCESS:
                self._get_breaker(model).record_success()
                return result
            else:
                self._get_breaker(model).record_failure()
                logger.warning(f"Model {model} extraction failed: {result.error_message}")

        return result

    async def _try_extract(
        self,
        text: str,
        model_class: type[BaseModel],
        model: str,
    ) -> ExtractionResult:
        import instructor

        start_time = time.time()
        client = instructor.from_openai(self.client, mode=instructor.Mode.JSON_SCHEMA)

        for attempt in range(1, self.max_retries + 1):
            try:
                result = await client.chat.completions.create(
                    model=model,
                    messages=[{"role": "user", "content": text}],
                    response_model=model_class,
                    max_retries=0,
                    timeout=self.timeout,
                )

                latency = (time.time() - start_time) * 1000
                return ExtractionResult(
                    data=result,
                    status=OutputStatus.SUCCESS,
                    attempts=attempt,
                    latency_ms=latency,
                    model_used=model,
                )

            except instructor.exceptions.InstructorRetryException as e:
                logger.warning(f"Attempt {attempt} validation failed: {e}")
                continue

            except Exception as e:
                error_msg = str(e)
                if "timeout" in error_msg.lower():
                    return ExtractionResult(
                        status=OutputStatus.TIMEOUT,
                        attempts=attempt,
                        latency_ms=(time.time() - start_time) * 1000,
                        error_message=error_msg,
                        model_used=model,
                    )
                logger.error(f"Attempt {attempt} LLM error: {e}")
                continue

        return ExtractionResult(
            status=OutputStatus.RETRY_EXHAUSTED,
            attempts=self.max_retries,
            latency_ms=(time.time() - start_time) * 1000,
            error_message=f"Failed after {self.max_retries} retries",
            model_used=model,
        )


class StructuredOutputCache:
    def __init__(self, ttl: float = 3600.0, max_size: int = 1000):
        self.ttl = ttl
        self.max_size = max_size
        self._cache: dict[str, tuple[float, BaseModel]] = {}

    def _make_key(self, text: str, model_class: type[BaseModel]) -> str:
        import hashlib
        content_hash = hashlib.sha256(text.encode()).hexdigest()[:16]
        return f"{model_class.__name__}:{content_hash}"

    def get(self, text: str, model_class: type[BaseModel]) -> Optional[BaseModel]:
        key = self._make_key(text, model_class)
        if key in self._cache:
            timestamp, data = self._cache[key]
            if time.time() - timestamp < self.ttl:
                return data
            del self._cache[key]
        return None

    def set(self, text: str, model_class: type[BaseModel], data: BaseModel):
        if len(self._cache) >= self.max_size:
            oldest_key = min(self._cache, key=lambda k: self._cache[k][0])
            del self._cache[oldest_key]

        key = self._make_key(text, model_class)
        self._cache[key] = (time.time(), data)

Production Architecture Overview

                    ┌────────────────────────────┐
                    │  StructuredOutputService    │
                    │  (Unified Entry Point)      │
                    └──────────┬─────────────────┘
                               │
                    ┌──────────▼─────────────────┐
                    │  CircuitBreaker             │
                    │  (Per-model circuit breaker) │
                    └──────────┬─────────────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
    ┌─────────▼──────┐ ┌──────▼───────┐ ┌──────▼───────┐
    │  gpt-4o        │ │  gpt-4o-mini │ │  fallback    │
    │  JSON_SCHEMA   │ │  JSON_SCHEMA │ │  Prompt+Parse│
    │  +Instructor   │ │  +Instructor │ │  +retry      │
    └────────────────┘ └──────────────┘ └──────────────┘
              │                │                │
              └────────────────┼────────────────┘
                               │
                    ┌──────────▼─────────────────┐
                    │  StructuredOutputCache      │
                    │  (Result cache)             │
                    └────────────────────────────┘

5 Common Pitfalls and Solutions

Pitfall 1: LLM Returns JSON with Comments

import json
import re


def strip_json_comments(text: str) -> str:
    text = re.sub(r'//.*?$', '', text, flags=re.MULTILINE)
    text = re.sub(r'/\*.*?\*/', '', text, flags=re.DOTALL)
    return text


raw = '''{
    "name": "Product A",  // this is a comment
    "price": 99.0
    /* multi-line
       comment */
}'''

clean = strip_json_comments(raw)
data = json.loads(clean)

Pitfall 2: Nested Schema Causes Output Truncation

from pydantic import BaseModel, Field


class Address(BaseModel):
    street: str
    city: str
    zip_code: str


class PersonFlat(BaseModel):
    name: str
    street: str = Field(description="Street address")
    city: str = Field(description="City")
    zip_code: str = Field(description="ZIP code")


class PersonNested(BaseModel):
    name: str
    address: Address


# Recommendation: Keep nesting depth under 2 levels; flatten if deeper
# Not recommended: Person → Address → GeoLocation → Coordinates
# Recommended: PersonFlat (all fields at the same level)

Pitfall 3: LLM Doesn't Respect Enum Values

from enum import Enum
from pydantic import BaseModel, Field, field_validator


class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"


class ReviewWithEnum(BaseModel):
    text: str
    sentiment: Sentiment

    @field_validator("sentiment", mode="before")
    @classmethod
    def normalize_sentiment(cls, v):
        if isinstance(v, str):
            v = v.strip().lower()
            mapping = {
                "great": "positive", "good": "positive", "excellent": "positive",
                "bad": "negative", "poor": "negative", "terrible": "negative",
                "okay": "neutral", "average": "neutral", "so-so": "neutral",
            }
            return mapping.get(v, v)
        return v

Pitfall 4: Uncontrollable List Length

from pydantic import BaseModel, Field, field_validator


class TaggedContent(BaseModel):
    content: str
    tags: list[str] = Field(min_length=1, max_length=5)

    @field_validator("tags")
    @classmethod
    def deduplicate_tags(cls, v: list[str]) -> list[str]:
        seen = set()
        result = []
        for tag in v:
            normalized = tag.strip().lower()
            if normalized not in seen:
                seen.add(normalized)
                result.append(tag.strip())
        return result[:5]

Pitfall 5: Strict Mode Doesn't Support All Schema Features

from pydantic import BaseModel, Field


# Features NOT supported in Strict Mode:
# 1. additionalProperties: false must be explicitly set
# 2. Optional fields must have default values
# 3. Union types not supported (some models)
# 4. Complex regex patterns not supported

# Solution: Simplify Schema + compensate with field_validator

class SimpleProduct(BaseModel):
    name: str
    price: float = Field(gt=0)
    category: str = Field(default="other")
    tags: list[str] = Field(default_factory=list)

10 Common Error Troubleshooting

#	Error Message	Cause	Solution
1	`json.decoder.JSONDecodeError`	LLM returned invalid JSON	Use `extract_json_from_response` for fault-tolerant extraction
2	`ValidationError: field required`	LLM omitted required field	Add default values or use Instructor auto-retry
3	`InstructorRetryException: max retries`	Still failing validation after 3 retries	Check if Schema is too complex, simplify nesting
4	`TypeError: 'NoneType' object`	tool_calls is empty, LLM didn't call function	Check tool_choice setting, confirm model supports function calling
5	`RateLimitError: 429`	API call rate exceeded	Add exponential backoff retry, reduce concurrency
6	`Timeout: request timed out`	LLM inference timeout	Reduce Schema complexity, increase timeout parameter
7	`BadRequestError: Invalid schema`	Schema doesn't meet model requirements	Check strict mode limitations, simplify Schema
8	`ValidationError: string too long`	LLM returned overly long string	Add `max_length` constraint
9	`KeyError: 'tool_calls'`	Model doesn't support function calling	Switch to JSON Schema mode or Prompt mode
10	`RecursionError: maximum depth`	Schema nesting too deep	Flatten nested structure, max 2 levels

Advanced Optimization Tips

Tip 1: Few-shot Examples Improve Accuracy

from pydantic import BaseModel, Field
from openai import AsyncOpenAI
import instructor


class Classification(BaseModel):
    category: str = Field(description="Category")
    confidence: float = Field(ge=0.0, le=1.0)


async def few_shot_extract(text: str) -> Classification:
    client = instructor.from_openai(AsyncOpenAI())

    examples = [
        {"role": "user", "content": "This product is amazing, highly recommended!"},
        {"role": "assistant", "content": '{"category": "positive", "confidence": 0.95}'},
        {"role": "user", "content": "Quality is average, price is too high"},
        {"role": "assistant", "content": '{"category": "neutral", "confidence": 0.7}'},
    ]

    return await client.chat.completions.create(
        model="gpt-4o",
        messages=examples + [{"role": "user", "content": text}],
        response_model=Classification,
        max_retries=2,
    )

Tip 2: Schema Description Optimization

from pydantic import BaseModel, Field


class BadSchema(BaseModel):
    type: str
    value: str


class GoodSchema(BaseModel):
    type: str = Field(
        description="Entity type, must be one of: person, organization, location, date"
    )
    value: str = Field(
        description="Standardized entity value. Use full name for person, "
                    "official name for organization, YYYY-MM-DD for date, "
                    "city+country for location"
    )

Tip 3: Progressive Extraction for Complex Structures

from pydantic import BaseModel, Field
from openai import AsyncOpenAI
import instructor


class BasicInfo(BaseModel):
    title: str
    summary: str


class DetailedInfo(BasicInfo):
    key_points: list[str]
    entities: list[str]
    sentiment: str


async def progressive_extract(text: str) -> DetailedInfo:
    client = instructor.from_openai(AsyncOpenAI())

    basic = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Extract basic info:\n{text}"}],
        response_model=BasicInfo,
        max_retries=2,
    )

    detailed = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": f"Based on the basic info below, extract detailed analysis:\n"
                                        f"Title: {basic.title}\nSummary: {basic.summary}\n\nOriginal text:\n{text}"}
        ],
        response_model=DetailedInfo,
        max_retries=2,
    )

    return detailed

Tip 4: Output Quality Self-Check

from pydantic import BaseModel, Field, model_validator


class SelfValidatingOutput(BaseModel):
    question: str
    answer: str
    sources: list[str] = Field(min_length=1)
    confidence: float = Field(ge=0.0, le=1.0)

    @model_validator(mode="after")
    def check_answer_quality(self):
        if len(self.answer) < 10:
            raise ValueError("Answer too short, may be incomplete")
        if self.confidence > 0.9 and len(self.sources) < 2:
            raise ValueError("High confidence but insufficient sources, please re-verify")
        return self

Comparison: 3 Structured Output Approaches

Dimension	Prompt+JSON Parse	Function Calling Protocol	Instructor Library
Reliability	⭐⭐ 60-80%	⭐⭐⭐⭐ 90-95%	⭐⭐⭐⭐⭐ 95-99%
Implementation complexity	Low	Medium	Low (after wrapping)
Model compatibility	All models	OpenAI/some models	OpenAI/Anthropic/Gemini
Auto-retry	❌ Manual	❌ Manual	✅ Built-in
Streaming support	❌ Difficult	⚠️ Limited	✅ create_partial
Schema validation	❌ Manual	⚠️ Partial	✅ Pydantic auto
Debugging difficulty	High	Medium	Low
Production recommendation	⭐ Not recommended	⭐⭐⭐ Recommended	⭐⭐⭐⭐⭐ Strongly recommended
Token overhead	Low	Medium (+tool definition)	Medium (+tool definition)
Nesting depth	Unlimited	Limited	Limited

Decision Tree

Need structured output?
  ├── No → Use Chat Completion directly
  └── Yes → What models are you using?
       ├── OpenAI only → Instructor + Mode.JSON_SCHEMA
       ├── OpenAI + Anthropic → Instructor + Adapter pattern
       ├── Any model → Prompt+JSON parse + strict validation
       └── Need streaming → Instructor + create_partial

Recommended Online Tools

JSON Formatter & Validator: /en/json/format
JSONPath Query: /en/json/jsonpath
cURL to Code: /en/dev/curl-to-code

Summary: Python LLM structured output is the core infrastructure of AI engineering. 6 patterns from simple to complex: JSON Schema constraints → function calling protocol → Instructor auto-retry → multi-model adaptation → streaming structured output → production reliability. For production, prefer the Instructor library with Pydantic validation and auto-retry, achieving 95%+ reliability. Key points: 1) Keep nested Schema under 2 levels, 2) Use field_validator to normalize enum values, 3) Use circuit breakers to protect downstream models, 4) Use caching to reduce duplicate calls. For multi-model scenarios, use the adapter pattern for unified interfaces; for streaming scenarios, use create_partial for progressive output.