Python LLM結構化輸出：從JSON Schema到函式呼叫的6種生產模式

LLM輸出一坨自由文字，你的下游系統全崩了

你讓GPT回傳JSON，它給你帶註解的JSON；你指定欄位型別為整數，它回傳字串"42"；你要求列表長度為3，它給你5個。LLM結構化輸出是2026年AI工程最核心的基礎能力——沒有它，你的RAG管道、Agent工具呼叫、資料提取管線全是定時炸彈。

本文將從JSON Schema約束出發，帶你完成JSON Schema驗證→OpenAI函式呼叫→Instructor自動重試→多模型適配→串流結構化輸出→生產可靠性保障的6種生產模式，從概念到落地，一步到位。

核心收穫

理解LLM結構化輸出的3種核心機制：Prompt約束、JSON Schema、函式呼叫協定
掌握6種從簡單到複雜的Python結構化輸出模式
學會Instructor庫的自動重試和驗證策略
實現跨模型（OpenAI/Anthropic/Gemini）的結構化輸出適配
構建生產級可靠性保障體系

LLM結構化輸出核心概念
模式1：JSON Schema約束輸出
模式2：OpenAI函式呼叫協定
模式3：Instructor庫自動重試
模式4：多模型結構化輸出適配
模式5：串流結構化輸出
模式6：生產級可靠性保障
5個常見坑及解決方案
10個常見報錯排查
進階最佳化技巧
對比分析：3種結構化輸出方案
線上工具推薦

LLM結構化輸出核心概念

概念	說明
Structured Output	LLM輸出符合預定義Schema的結構化資料（JSON/XML）
JSON Schema	描述JSON資料結構的規範，用於約束和驗證LLM輸出
Function Calling	OpenAI提出的協定，讓LLM輸出符合函式參數Schema的JSON
Tool Use	Anthropic/Gemini對函式呼叫的實作，語義相同
Constrained Decoding	推理時約束token選擇，保證輸出100%符合Schema
Instructor	Python庫，基於Pydantic模型自動產生Schema+驗證+重試

為什麼LLM結構化輸出如此重要

傳統LLM輸出流程：
  使用者Prompt → LLM自由生成 → 字串 → 正則/JSON解析 → 可能失敗 → 重試

結構化輸出流程：
  使用者Prompt + Schema → LLM受約束生成 → 合法JSON → Pydantic驗證 → 成功

關鍵差異：
  1. 傳統方式：輸出不可預測，解析脆弱，重試成本高
  2. 結構化輸出：輸出可預測，驗證可靠，重試有保障

3種結構化輸出機制對比

機制	原理	可靠性	延遲	相容性
Prompt約束	在提示詞中描述輸出格式	⭐低	無額外	所有模型
JSON Schema	透過Schema約束輸出結構	⭐⭐中	輕微	部分模型
函式呼叫協定	專用API通道+Constrained Decoding	⭐⭐⭐高	輕微	特定模型

模式1：JSON Schema約束輸出

最基礎的結構化輸出方式：在Prompt中描述格式要求，用JSON Schema驗證結果。

import json
import re
from typing import Optional
from pydantic import BaseModel, Field, ValidationError


class MovieReview(BaseModel):
    title: str = Field(description="電影名稱")
    rating: int = Field(ge=1, le=10, description="評分1-10")
    sentiment: str = Field(pattern="^(positive|negative|neutral)$")
    summary: str = Field(max_length=200, description="簡短評價")
    recommended: bool = Field(description="是否推薦")


MOVIE_REVIEW_SCHEMA = MovieReview.model_json_schema()

STRUCTURED_PROMPT = """你是一個專業的電影評論分析器。

請分析以下評論，並嚴格按照JSON Schema回傳結果。

JSON Schema:
{schema}

評論內容：
{review}

重要要求：
1. 必須回傳合法JSON
2. rating必須是1-10的整數
3. sentiment只能是positive/negative/neutral
4. 不要新增任何JSON以外的內容
5. 不要用```json```包裹
"""


def extract_json_from_response(text: str) -> Optional[dict]:
    patterns = [
        r'```json\s*(.*?)\s*```',
        r'```\s*(.*?)\s*```',
        r'(\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\})',
    ]
    for pattern in patterns:
        match = re.search(pattern, text, re.DOTALL)
        if match:
            try:
                return json.loads(match.group(1))
            except json.JSONDecodeError:
                continue
    try:
        return json.loads(text.strip())
    except json.JSONDecodeError:
        return None


def parse_structured_output(raw_text: str) -> Optional[MovieReview]:
    parsed_json = extract_json_from_response(raw_text)
    if parsed_json is None:
        return None
    try:
        return MovieReview.model_validate(parsed_json)
    except ValidationError as e:
        print(f"驗證失敗: {e}")
        return None


async def call_llm_with_schema(prompt: str) -> Optional[MovieReview]:
    from openai import AsyncOpenAI

    client = AsyncOpenAI()
    formatted_prompt = STRUCTURED_PROMPT.format(
        schema=json.dumps(MOVIE_REVIEW_SCHEMA, ensure_ascii=False, indent=2),
        review=prompt
    )

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": formatted_prompt}],
        temperature=0.1,
    )

    raw_text = response.choices[0].message.content or ""
    return parse_structured_output(raw_text)

JSON Schema驗證的侷限

問題1：LLM可能回傳不合法的JSON
  → 需要extract_json_from_response做容錯提取

問題2：LLM可能忽略Schema約束
  → rating回傳"9分"而不是9
  → sentiment回傳"很積極"而不是positive

問題3：巢狀結構容易出錯
  → 列表長度不可控
  → 可選欄位可能缺失

問題4：每次都要手寫Prompt
  → 維護成本高，容易遺漏

模式2：OpenAI函式呼叫協定

OpenAI的Function Calling協定是結構化輸出的標準方案，透過專用API通道讓LLM輸出符合Schema的JSON。

import json
from typing import Optional
from pydantic import BaseModel, Field
from openai import AsyncOpenAI


class SentimentAnalysis(BaseModel):
    text: str = Field(description="被分析的文字")
    sentiment: str = Field(description="情感傾向: positive/negative/neutral")
    confidence: float = Field(ge=0.0, le=1.0, description="置信度0-1")
    keywords: list[str] = Field(description="關鍵詞列表")
    language: str = Field(description="偵測到的語言")


class EntityExtraction(BaseModel):
    entities: list[dict] = Field(description="提取的實體列表")
    relationships: list[dict] = Field(default_factory=list, description="實體間關係")
    summary: str = Field(description="文字摘要")


def pydantic_to_function_schema(model_class: type[BaseModel]) -> dict:
    schema = model_class.model_json_schema()
    return {
        "type": "function",
        "function": {
            "name": model_class.__name__,
            "description": model_class.__doc__ or f"Extract {model_class.__name__}",
            "parameters": {
                "type": "object",
                "properties": schema.get("properties", {}),
                "required": schema.get("required", []),
            }
        }
    }


async def function_calling_extract(
    text: str,
    model_class: type[BaseModel],
    model: str = "gpt-4o"
) -> Optional[BaseModel]:
    client = AsyncOpenAI()

    function_schema = pydantic_to_function_schema(model_class)

    response = await client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "system",
                "content": "你是一個精確的資料提取助手。使用提供的函式來結構化輸出結果。"
            },
            {
                "role": "user",
                "content": text
            }
        ],
        tools=[function_schema],
        tool_choice={"type": "function", "function": {"name": model_class.__name__}},
    )

    message = response.choices[0].message

    if message.tool_calls and len(message.tool_calls) > 0:
        tool_call = message.tool_calls[0]
        try:
            args = json.loads(tool_call.function.arguments)
            return model_class.model_validate(args)
        except (json.JSONDecodeError, Exception) as e:
            print(f"解析函式呼叫結果失敗: {e}")
            return None

    return None


async def multi_function_calling(
    text: str,
    model_classes: list[type[BaseModel]],
    model: str = "gpt-4o"
) -> dict[str, BaseModel]:
    client = AsyncOpenAI()

    tool_schemas = [pydantic_to_function_schema(cls) for cls in model_classes]

    response = await client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "你是一個多工資料提取助手。"},
            {"role": "user", "content": text}
        ],
        tools=tool_schemas,
        tool_choice="auto",
    )

    results = {}
    message = response.choices[0].message

    if message.tool_calls:
        for tool_call in message.tool_calls:
            for cls in model_classes:
                if tool_call.function.name == cls.__name__:
                    try:
                        args = json.loads(tool_call.function.arguments)
                        results[cls.__name__] = cls.model_validate(args)
                    except Exception as e:
                        print(f"解析{cls.__name__}失敗: {e}")

    return results

函式呼叫協定的Strict Mode

from openai import AsyncOpenAI


async def strict_structured_output(
    text: str,
    model_class: type[BaseModel],
    model: str = "gpt-4o-2024-08-06"
) -> Optional[BaseModel]:
    client = AsyncOpenAI()

    schema = model_class.model_json_schema()

    response = await client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "提取結構化資料"},
            {"role": "user", "content": text}
        ],
        response_format={
            "type": "json_schema",
            "json_schema": {
                "name": model_class.__name__,
                "strict": True,
                "schema": schema,
            }
        }
    )

    raw = response.choices[0].message.content
    if raw:
        try:
            return model_class.model_validate(json.loads(raw))
        except Exception as e:
            print(f"Strict mode解析失敗: {e}")
    return None

模式3：Instructor庫自動重試

Instructor庫是Python LLM結構化輸出的最佳實踐，基於Pydantic模型自動產生Schema、驗證輸出、自動重試。

import instructor
from pydantic import BaseModel, Field, field_validator
from openai import AsyncOpenAI


class ProductInfo(BaseModel):
    name: str = Field(description="產品名稱")
    price: float = Field(gt=0, description="價格，必須大於0")
    category: str = Field(description="產品分類")
    features: list[str] = Field(description="產品特性列表", min_length=1, max_length=5)
    in_stock: bool = Field(description="是否有庫存")

    @field_validator("price")
    @classmethod
    def round_price(cls, v: float) -> float:
        return round(v, 2)

    @field_validator("category")
    @classmethod
    def normalize_category(cls, v: str) -> str:
        return v.strip().lower()


class ArticleMetadata(BaseModel):
    title: str = Field(description="文章標題")
    author: str = Field(description="作者")
    publish_date: str = Field(description="發布日期，格式YYYY-MM-DD")
    tags: list[str] = Field(description="標籤列表")
    word_count: int = Field(ge=0, description="字數")
    reading_time_minutes: int = Field(ge=1, description="預計閱讀時間（分鐘）")

    @field_validator("publish_date")
    @classmethod
    def validate_date_format(cls, v: str) -> str:
        import re
        if not re.match(r'^\d{4}-\d{2}-\d{2}$', v):
            raise ValueError(f"日期格式錯誤: {v}，需要YYYY-MM-DD")
        return v


async def instructor_extract_product(text: str) -> ProductInfo:
    client = instructor.from_openai(AsyncOpenAI())

    result = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": f"從以下文字中提取產品資訊:\n\n{text}"}
        ],
        response_model=ProductInfo,
        max_retries=3,
        temperature=0.1,
    )

    return result


async def instructor_extract_with_mode(
    text: str,
    mode: instructor.Mode = instructor.Mode.TOOLS
) -> ArticleMetadata:
    client = instructor.from_openai(AsyncOpenAI(), mode=mode)

    result = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": f"提取文章元資料:\n\n{text}"}
        ],
        response_model=ArticleMetadata,
        max_retries=3,
    )

    return result


async def instructor_partial_streaming(text: str):
    client = instructor.from_openai(AsyncOpenAI())

    article = await client.chat.completions.create_partial(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": f"提取文章元資料:\n\n{text}"}
        ],
        response_model=ArticleMetadata,
        max_retries=3,
    )

    async for partial in article:
        print(f"部分結果: {partial.model_dump_json(exclude_none=True)}")

Instructor的Mode選擇

Mode.JSON_SCHEMA    → OpenAI的response_format=json_schema（推薦，最可靠）
Mode.TOOLS          → OpenAI的function calling（相容性好）
Mode.JSON           → 在Prompt中要求JSON輸出（最通用，可靠性最低）
Mode.ANTHROPIC_TOOLS→ Anthropic的tool_use
Mode.GEMINI_JSON    → Gemini的JSON模式

Instructor重試策略詳解

import instructor
from pydantic import BaseModel, Field, ValidationError
from openai import AsyncOpenAI
from tenacity import retry, stop_after_attempt, wait_exponential


class StrictUser(BaseModel):
    name: str = Field(min_length=2, max_length=50)
    age: int = Field(ge=0, le=150)
    email: str = Field(pattern=r'^[\w\.-]+@[\w\.-]+\.\w+$')


async def instructor_with_custom_retry(text: str) -> StrictUser:
    client = instructor.from_openai(
        AsyncOpenAI(),
        mode=instructor.Mode.JSON_SCHEMA,
    )

    result, completion = await client.chat.completions.create_with_completion(
        model="gpt-4o",
        messages=[{"role": "user", "content": text}],
        response_model=StrictUser,
        max_retries=3,
        validation_context={"strict": True},
    )

    print(f"Token使用: prompt={completion.usage.prompt_tokens}, "
          f"completion={completion.usage.completion_tokens}")

    return result


async def instructor_batch_extract(
    texts: list[str],
) -> list[StrictUser]:
    client = instructor.from_openai(AsyncOpenAI())

    results = []
    for text in texts:
        try:
            result = await client.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": text}],
                response_model=StrictUser,
                max_retries=2,
            )
            results.append(result)
        except instructor.exceptions.InstructorRetryException as e:
            print(f"批次提取失敗，跳過: {e}")
            results.append(None)

    return results

模式4：多模型結構化輸出適配

不同LLM廠商的結構化輸出API各不相同，需要適配層統一處理。

import json
from abc import ABC, abstractmethod
from typing import Optional, TypeVar
from pydantic import BaseModel
from openai import AsyncOpenAI

T = TypeVar("T", bound=BaseModel)


class StructuredOutputAdapter(ABC):
    @abstractmethod
    async def extract(self, text: str, model_class: type[T]) -> Optional[T]:
        pass


class OpenAIStructuredAdapter(StructuredOutputAdapter):
    def __init__(self, model: str = "gpt-4o"):
        self.client = AsyncOpenAI()
        self.model = model

    async def extract(self, text: str, model_class: type[T]) -> Optional[T]:
        import instructor
        client = instructor.from_openai(self.client, mode=instructor.Mode.JSON_SCHEMA)

        return await client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": text}],
            response_model=model_class,
            max_retries=3,
        )


class AnthropicStructuredAdapter(StructuredOutputAdapter):
    def __init__(self, model: str = "claude-sonnet-4-20250514"):
        try:
            import anthropic
            self.client = anthropic.AsyncAnthropic()
        except ImportError:
            raise ImportError("請安裝anthropic: pip install anthropic")
        self.model = model

    async def extract(self, text: str, model_class: type[T]) -> Optional[T]:
        import anthropic
        import instructor

        client = instructor.from_anthropic(
            self.client,
            mode=instructor.Mode.ANTHROPIC_TOOLS,
        )

        return await client.chat.completions.create(
            model=self.model,
            max_tokens=4096,
            messages=[{"role": "user", "content": text}],
            response_model=model_class,
            max_retries=3,
        )


class GeminiStructuredAdapter(StructuredOutputAdapter):
    def __init__(self, model: str = "gemini-2.0-flash"):
        self.model_name = model

    async def extract(self, text: str, model_class: type[T]) -> Optional[T]:
        import google.generativeai as genai
        import os

        genai.configure(api_key=os.environ.get("GEMINI_API_KEY"))
        model = genai.GenerativeModel(self.model_name)

        schema = model_class.model_json_schema()
        prompt = f"""從以下文字中提取結構化資料。
嚴格按照JSON Schema回傳結果，不要新增任何額外內容。

JSON Schema:
{json.dumps(schema, ensure_ascii=False, indent=2)}

文字:
{text}"""

        response = await model.generate_content_async(prompt)
        raw = response.text.strip()

        if raw.startswith("```json"):
            raw = raw[7:]
        if raw.endswith("```"):
            raw = raw[:-3]
        raw = raw.strip()

        try:
            return model_class.model_validate(json.loads(raw))
        except Exception as e:
            print(f"Gemini解析失敗: {e}")
            return None


class MultiModelStructuredExtractor:
    def __init__(self):
        self.adapters: dict[str, StructuredOutputAdapter] = {}

    def register(self, name: str, adapter: StructuredOutputAdapter):
        self.adapters[name] = adapter

    async def extract(
        self,
        text: str,
        model_class: type[T],
        preferred: str = "openai",
        fallback: bool = True,
    ) -> Optional[T]:
        order = [preferred]
        if fallback:
            order.extend([k for k in self.adapters if k != preferred])

        for model_name in order:
            adapter = self.adapters.get(model_name)
            if adapter is None:
                continue
            try:
                result = await adapter.extract(text, model_class)
                if result is not None:
                    return result
            except Exception as e:
                print(f"[{model_name}] 提取失敗: {e}")
                continue

        return None

    async def extract_consensus(
        self,
        text: str,
        model_class: type[T],
        min_agreement: int = 2,
    ) -> Optional[T]:
        import asyncio

        tasks = {
            name: adapter.extract(text, model_class)
            for name, adapter in self.adapters.items()
        }

        results = await asyncio.gather(*tasks.values(), return_exceptions=True)

        valid_results = []
        for (name, _), result in zip(tasks.items(), results):
            if isinstance(result, Exception):
                print(f"[{name}] 異常: {result}")
                continue
            if result is not None:
                valid_results.append(result)

        if len(valid_results) >= min_agreement:
            return valid_results[0]

        return valid_results[0] if valid_results else None

多模型適配架構

                    ┌─────────────────────┐
                    │  MultiModelExtractor │
                    │  (統一介面)          │
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
    ┌─────────▼──────┐ ┌──────▼───────┐ ┌──────▼───────┐
    │ OpenAI Adapter │ │Anthropic Adp │ │ Gemini Adptr │
    │ JSON_SCHEMA    │ │ANTHROPIC_TOOLS│ │ Prompt+Parse │
    │ Instructor     │ │ Instructor   │ │ 手動解析      │
    └────────────────┘ └──────────────┘ └──────────────┘

模式5：串流結構化輸出

LLM結構化輸出結合串流傳輸，實現即時解析和漸進式展示。

import json
import asyncio
from typing import AsyncIterator, Optional
from pydantic import BaseModel, Field
from openai import AsyncOpenAI


class StreamingJsonParser:
    def __init__(self):
        self.buffer = ""
        self.depth = 0
        self.in_string = False
        self.escape_next = False
        self.started = False

    def feed(self, chunk: str) -> list[dict]:
        self.buffer += chunk
        results = []

        for char in chunk:
            if self.escape_next:
                self.escape_next = False
                continue

            if char == '\\' and self.in_string:
                self.escape_next = True
                continue

            if char == '"' and not self.escape_next:
                self.in_string = not self.in_string
                continue

            if self.in_string:
                continue

            if char == '{':
                if not self.started:
                    self.started = True
                    idx = self.buffer.rfind('{')
                    self.buffer = self.buffer[idx:]
                self.depth += 1
            elif char == '}':
                self.depth -= 1
                if self.depth == 0 and self.started:
                    try:
                        parsed = json.loads(self.buffer)
                        results.append(parsed)
                    except json.JSONDecodeError:
                        pass
                    self.buffer = ""
                    self.started = False

        return results


class PartialModelBuilder:
    def __init__(self, model_class: type[BaseModel]):
        self.model_class = model_class
        self.current_json = {}
        self.last_valid = None

    def update(self, json_data: dict) -> Optional[BaseModel]:
        self.current_json.update(json_data)
        try:
            self.last_valid = self.model_class.model_validate(self.current_json)
            return self.last_valid
        except Exception:
            return self.last_valid


class StreamingEvent(BaseModel):
    event_type: str = Field(description="事件型別")
    data: dict = Field(description="事件資料")
    confidence: float = Field(ge=0.0, le=1.0, description="置信度")


async def stream_structured_output(
    prompt: str,
    model_class: type[BaseModel],
    model: str = "gpt-4o",
) -> AsyncIterator[BaseModel]:
    import instructor

    client = instructor.from_openai(AsyncOpenAI())

    stream = await client.chat.completions.create_partial(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        response_model=model_class,
        max_retries=2,
    )

    async for partial in stream:
        yield partial


async def stream_with_raw_parser(
    prompt: str,
    model: str = "gpt-4o",
) -> AsyncIterator[dict]:
    client = AsyncOpenAI()

    schema_prompt = f"""請以JSON格式回傳結果。只回傳JSON，不要其他內容。
{prompt}"""

    stream = await client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": schema_prompt}],
        stream=True,
        temperature=0.1,
    )

    parser = StreamingJsonParser()
    full_content = ""

    async for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            full_content += content
            parsed_results = parser.feed(content)
            for result in parsed_results:
                yield result

    if not parsed_results and full_content:
        try:
            yield json.loads(full_content)
        except json.JSONDecodeError:
            pass


async def stream_sse_structured(
    prompt: str,
    model_class: type[BaseModel],
):
    from fastapi import FastAPI
    from fastapi.responses import StreamingResponse

    app = FastAPI()

    async def generate():
        async for partial in stream_structured_output(prompt, model_class):
            data = partial.model_dump_json(exclude_none=True)
            yield f"data: {data}\n\n"

    return StreamingResponse(
        generate(),
        media_type="text/event-stream",
        headers={"X-Accel-Buffering": "no"},
    )

串流結構化輸出架構

使用者請求
    │
    ▼
FastAPI SSE端點
    │
    ▼
Instructor create_partial()
    │
    ├──→ chunk1: {"name": "產品A"...}
    ├──→ chunk2: {"name": "產品A", "price": 99...}
    ├──→ chunk3: {"name": "產品A", "price": 99.0, "category": "電子"...}
    └──→ 最終: 完整的Pydantic模型實例

每個chunk透過SSE推送到使用者端
使用者端漸進式渲染UI

模式6：生產級可靠性保障

將所有模式整合為生產可用的結構化輸出服務。

import json
import time
import logging
from typing import Optional
from dataclasses import dataclass, field
from enum import Enum
from pydantic import BaseModel, Field
from openai import AsyncOpenAI

logger = logging.getLogger(__name__)


class OutputStatus(str, Enum):
    SUCCESS = "success"
    VALIDATION_FAILED = "validation_failed"
    PARSE_FAILED = "parse_failed"
    LLM_ERROR = "llm_error"
    TIMEOUT = "timeout"
    RETRY_EXHAUSTED = "retry_exhausted"


@dataclass
class ExtractionResult:
    data: Optional[BaseModel] = None
    status: OutputStatus = OutputStatus.SUCCESS
    attempts: int = 0
    latency_ms: float = 0.0
    error_message: str = ""
    model_used: str = ""
    tokens_used: dict = field(default_factory=dict)


class CircuitBreaker:
    def __init__(
        self,
        failure_threshold: int = 5,
        recovery_timeout: float = 60.0,
    ):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failure_count = 0
        self.last_failure_time: Optional[float] = None
        self.is_open = False

    def record_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.is_open = True

    def record_success(self):
        self.failure_count = 0
        self.is_open = False

    def can_execute(self) -> bool:
        if not self.is_open:
            return True
        if self.last_failure_time and \
           time.time() - self.last_failure_time > self.recovery_timeout:
            self.is_open = False
            self.failure_count = 0
            return True
        return False


class StructuredOutputService:
    def __init__(
        self,
        max_retries: int = 3,
        timeout: float = 30.0,
        fallback_models: Optional[list[str]] = None,
    ):
        self.client = AsyncOpenAI()
        self.max_retries = max_retries
        self.timeout = timeout
        self.fallback_models = fallback_models or ["gpt-4o", "gpt-4o-mini"]
        self.circuit_breakers: dict[str, CircuitBreaker] = {}

    def _get_breaker(self, model: str) -> CircuitBreaker:
        if model not in self.circuit_breakers:
            self.circuit_breakers[model] = CircuitBreaker()
        return self.circuit_breakers[model]

    async def extract(
        self,
        text: str,
        model_class: type[BaseModel],
        preferred_model: Optional[str] = None,
    ) -> ExtractionResult:
        import instructor

        models = [preferred_model] if preferred_model else self.fallback_models
        models = [m for m in models if self._get_breaker(m).can_execute()]

        if not models:
            return ExtractionResult(
                status=OutputStatus.RETRY_EXHAUSTED,
                error_message="所有模型熔斷器已開啟",
            )

        for model in models:
            result = await self._try_extract(text, model_class, model)
            if result.status == OutputStatus.SUCCESS:
                self._get_breaker(model).record_success()
                return result
            else:
                self._get_breaker(model).record_failure()
                logger.warning(f"模型{model}提取失敗: {result.error_message}")

        return result

    async def _try_extract(
        self,
        text: str,
        model_class: type[BaseModel],
        model: str,
    ) -> ExtractionResult:
        import instructor

        start_time = time.time()
        client = instructor.from_openai(self.client, mode=instructor.Mode.JSON_SCHEMA)

        for attempt in range(1, self.max_retries + 1):
            try:
                result = await client.chat.completions.create(
                    model=model,
                    messages=[{"role": "user", "content": text}],
                    response_model=model_class,
                    max_retries=0,
                    timeout=self.timeout,
                )

                latency = (time.time() - start_time) * 1000
                return ExtractionResult(
                    data=result,
                    status=OutputStatus.SUCCESS,
                    attempts=attempt,
                    latency_ms=latency,
                    model_used=model,
                )

            except instructor.exceptions.InstructorRetryException as e:
                logger.warning(f"嘗試{attempt}驗證失敗: {e}")
                continue

            except Exception as e:
                error_msg = str(e)
                if "timeout" in error_msg.lower():
                    return ExtractionResult(
                        status=OutputStatus.TIMEOUT,
                        attempts=attempt,
                        latency_ms=(time.time() - start_time) * 1000,
                        error_message=error_msg,
                        model_used=model,
                    )
                logger.error(f"嘗試{attempt}LLM錯誤: {e}")
                continue

        return ExtractionResult(
            status=OutputStatus.RETRY_EXHAUSTED,
            attempts=self.max_retries,
            latency_ms=(time.time() - start_time) * 1000,
            error_message=f"重試{self.max_retries}次後仍失敗",
            model_used=model,
        )


class StructuredOutputCache:
    def __init__(self, ttl: float = 3600.0, max_size: int = 1000):
        self.ttl = ttl
        self.max_size = max_size
        self._cache: dict[str, tuple[float, BaseModel]] = {}

    def _make_key(self, text: str, model_class: type[BaseModel]) -> str:
        import hashlib
        content_hash = hashlib.sha256(text.encode()).hexdigest()[:16]
        return f"{model_class.__name__}:{content_hash}"

    def get(self, text: str, model_class: type[BaseModel]) -> Optional[BaseModel]:
        key = self._make_key(text, model_class)
        if key in self._cache:
            timestamp, data = self._cache[key]
            if time.time() - timestamp < self.ttl:
                return data
            del self._cache[key]
        return None

    def set(self, text: str, model_class: type[BaseModel], data: BaseModel):
        if len(self._cache) >= self.max_size:
            oldest_key = min(self._cache, key=lambda k: self._cache[k][0])
            del self._cache[oldest_key]

        key = self._make_key(text, model_class)
        self._cache[key] = (time.time(), data)

生產架構全景

                    ┌────────────────────────────┐
                    │  StructuredOutputService    │
                    │  (統一入口)                 │
                    └──────────┬─────────────────┘
                               │
                    ┌──────────▼─────────────────┐
                    │  CircuitBreaker             │
                    │  (模型級熔斷)               │
                    └──────────┬─────────────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
    ┌─────────▼──────┐ ┌──────▼───────┐ ┌──────▼───────┐
    │  gpt-4o        │ │  gpt-4o-mini │ │  fallback    │
    │  JSON_SCHEMA   │ │  JSON_SCHEMA │ │  Prompt+Parse│
    │  +Instructor   │ │  +Instructor │ │  +重試       │
    └────────────────┘ └──────────────┘ └──────────────┘
              │                │                │
              └────────────────┼────────────────┘
                               │
                    ┌──────────▼─────────────────┐
                    │  StructuredOutputCache      │
                    │  (結果快取)                 │
                    └────────────────────────────┘

5個常見坑及解決方案

坑1：LLM回傳的JSON帶註解

import json
import re


def strip_json_comments(text: str) -> str:
    text = re.sub(r'//.*?$', '', text, flags=re.MULTILINE)
    text = re.sub(r'/\*.*?\*/', '', text, flags=re.DOTALL)
    return text


raw = '''{
    "name": "產品A",  // 這是註解
    "price": 99.0
    /* 多行
       註解 */
}'''

clean = strip_json_comments(raw)
data = json.loads(clean)

坑2：巢狀Schema導致輸出截斷

from pydantic import BaseModel, Field


class Address(BaseModel):
    street: str
    city: str
    zip_code: str


class PersonFlat(BaseModel):
    name: str
    street: str = Field(description="街道地址")
    city: str = Field(description="城市")
    zip_code: str = Field(description="郵遞區號")


class PersonNested(BaseModel):
    name: str
    address: Address


# 建議：巢狀層級不超過2層，超過則展平
# 不推薦：Person → Address → GeoLocation → Coordinates
# 推薦：PersonFlat（所有欄位在同一層級）

坑3：列舉值LLM不遵守

from enum import Enum
from pydantic import BaseModel, Field, field_validator


class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"


class ReviewWithEnum(BaseModel):
    text: str
    sentiment: Sentiment

    @field_validator("sentiment", mode="before")
    @classmethod
    def normalize_sentiment(cls, v):
        if isinstance(v, str):
            v = v.strip().lower()
            mapping = {
                "積極": "positive", "正面": "positive", "好": "positive",
                "消極": "negative", "負面": "negative", "差": "negative",
                "中性": "neutral", "一般": "neutral",
            }
            return mapping.get(v, v)
        return v

坑4：列表長度不可控

from pydantic import BaseModel, Field, field_validator


class TaggedContent(BaseModel):
    content: str
    tags: list[str] = Field(min_length=1, max_length=5)

    @field_validator("tags")
    @classmethod
    def deduplicate_tags(cls, v: list[str]) -> list[str]:
        seen = set()
        result = []
        for tag in v:
            normalized = tag.strip().lower()
            if normalized not in seen:
                seen.add(normalized)
                result.append(tag.strip())
        return result[:5]

坑5：Strict Mode不支援所有Schema特性

from pydantic import BaseModel, Field


# Strict Mode不支援的特性：
# 1. additionalProperties: false 必須顯式設定
# 2. 可選欄位必須有default值
# 3. 不支援union型別（部分模型）
# 4. 不支援複雜的正則pattern

# 解決方案：簡化Schema + field_validator補償

class SimpleProduct(BaseModel):
    name: str
    price: float = Field(gt=0)
    category: str = Field(default="other")
    tags: list[str] = Field(default_factory=list)

10個常見報錯排查

#	報錯資訊	原因	解決方案
1	`json.decoder.JSONDecodeError`	LLM回傳的不是合法JSON	使用`extract_json_from_response`容錯提取
2	`ValidationError: field required`	LLM遺漏了必填欄位	新增default值或使用Instructor自動重試
3	`InstructorRetryException: max retries`	重試3次仍無法通過驗證	檢查Schema是否過於複雜，簡化巢狀
4	`TypeError: 'NoneType' object`	tool_calls為空，LLM未呼叫函式	檢查tool_choice設定，確認模型支援函式呼叫
5	`RateLimitError: 429`	API呼叫頻率超限	新增指數退避重試，降低並行
6	`Timeout: request timed out`	LLM推理逾時	減小Schema複雜度，增加timeout參數
7	`BadRequestError: Invalid schema`	Schema不符合模型要求	檢查strict mode限制，簡化Schema
8	`ValidationError: string too long`	LLM回傳超長字串	新增`max_length`約束
9	`KeyError: 'tool_calls'`	模型不支援函式呼叫	切換到JSON Schema模式或Prompt模式
10	`RecursionError: maximum depth`	Schema巢狀層級過深	展平巢狀結構，最多2層

進階最佳化技巧

技巧1：Few-shot範例提升準確率

from pydantic import BaseModel, Field
from openai import AsyncOpenAI
import instructor


class Classification(BaseModel):
    category: str = Field(description="分類")
    confidence: float = Field(ge=0.0, le=1.0)


async def few_shot_extract(text: str) -> Classification:
    client = instructor.from_openai(AsyncOpenAI())

    examples = [
        {"role": "user", "content": "這個產品太棒了，強烈推薦！"},
        {"role": "assistant", "content": '{"category": "positive", "confidence": 0.95}'},
        {"role": "user", "content": "品質一般，價格偏高"},
        {"role": "assistant", "content": '{"category": "neutral", "confidence": 0.7}'},
    ]

    return await client.chat.completions.create(
        model="gpt-4o",
        messages=examples + [{"role": "user", "content": text}],
        response_model=Classification,
        max_retries=2,
    )

技巧2：Schema描述最佳化

from pydantic import BaseModel, Field


class BadSchema(BaseModel):
    type: str
    value: str


class GoodSchema(BaseModel):
    type: str = Field(
        description="實體型別，只能是: person, organization, location, date"
    )
    value: str = Field(
        description="實體的標準化值。person用全名，organization用官方名稱，"
                    "date用YYYY-MM-DD格式，location用城市+國家"
    )

技巧3：分步提取複雜結構

from pydantic import BaseModel, Field
from openai import AsyncOpenAI
import instructor


class BasicInfo(BaseModel):
    title: str
    summary: str


class DetailedInfo(BasicInfo):
    key_points: list[str]
    entities: list[str]
    sentiment: str


async def progressive_extract(text: str) -> DetailedInfo:
    client = instructor.from_openai(AsyncOpenAI())

    basic = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"提取基本資訊:\n{text}"}],
        response_model=BasicInfo,
        max_retries=2,
    )

    detailed = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": f"基於以下基本資訊，提取詳細分析:\n"
                                        f"標題: {basic.title}\n摘要: {basic.summary}\n\n原文:\n{text}"}
        ],
        response_model=DetailedInfo,
        max_retries=2,
    )

    return detailed

技巧4：輸出品質自檢

from pydantic import BaseModel, Field, model_validator


class SelfValidatingOutput(BaseModel):
    question: str
    answer: str
    sources: list[str] = Field(min_length=1)
    confidence: float = Field(ge=0.0, le=1.0)

    @model_validator(mode="after")
    def check_answer_quality(self):
        if len(self.answer) < 10:
            raise ValueError("回答太短，可能不完整")
        if self.confidence > 0.9 and len(self.sources) < 2:
            raise ValueError("高置信度但來源不足，請重新驗證")
        return self

對比分析：3種結構化輸出方案

維度	Prompt+JSON解析	函式呼叫協定	Instructor庫
可靠性	⭐⭐ 60-80%	⭐⭐⭐⭐ 90-95%	⭐⭐⭐⭐⭐ 95-99%
實作複雜度	低	中	低（封裝後）
模型相容性	所有模型	OpenAI/部分模型	OpenAI/Anthropic/Gemini
自動重試	❌需手動	❌需手動	✅內建
串流支援	❌困難	⚠️有限	✅create_partial
Schema驗證	❌需手動	⚠️部分	✅Pydantic自動
除錯難度	高	中	低
生產推薦度	⭐不推薦	⭐⭐⭐推薦	⭐⭐⭐⭐⭐強烈推薦
Token開銷	低	中（+tool定義）	中（+tool定義）
巢狀深度	無限制	有限制	有限制

選型決策樹

是否需要結構化輸出？
  ├── 否 → 直接使用Chat Completion
  └── 是 → 使用什麼模型？
       ├── 僅OpenAI → Instructor + Mode.JSON_SCHEMA
       ├── OpenAI + Anthropic → Instructor + 適配器模式
       ├── 任意模型 → Prompt+JSON解析 + 嚴格驗證
       └── 需要串流 → Instructor + create_partial

線上工具推薦

JSON格式化驗證：/zh-TW/json/format
JSONPath查詢：/zh-TW/json/jsonpath
cURL轉程式碼：/zh-TW/dev/curl-to-code

總結：Python LLM結構化輸出是AI工程的核心基礎設施。6種模式從簡到繁：JSON Schema約束→函式呼叫協定→Instructor自動重試→多模型適配→串流結構化輸出→生產可靠性保障。生產環境首選Instructor庫，配合Pydantic驗證和自動重試，可靠率達95%以上。關鍵注意點：1）巢狀Schema不超過2層，2）列舉值用field_validator歸一化，3）熔斷器保護下游模型，4）快取減少重複呼叫。多模型場景用適配器模式統一介面，串流場景用create_partial漸進式輸出。