Next.js 15串流AI聊天:從SSE到React Server Actions的6種生產模式
AI聊天為什麼總讓人覺得卡
你開啟一個AI聊天頁面,輸入問題,然後盯著空白對話框等了8秒——LLM終於吐出一大段文字,但使用者早以為頁面掛了。傳統請求-響應模式下,使用者感知延遲 = 網路延遲 + LLM首token時間 + 全量生成時間,體感就是「卡」。串流響應(Streaming)把等待拆成一個個token逐步推送,使用者在0.5秒內看到第一個字,感知延遲降低80%。
Next.js 15在串流AI聊天場景下提供了從底層SSE到高層React Server Actions的完整工具鏈。本文將帶你完成SSE串流響應→React Server Actions + AI→Vercel AI SDK整合→多模型路由與降級→對話狀態與上下文管理→生產部署與效能最佳化的6種生產模式,從協定到上線,一步不落。
核心要點
- SSE是LLM串流輸出的最佳傳輸協定,Next.js Route Handler原生支援
- React Server Actions讓AI呼叫無需手寫API端點,型別安全零樣板
- Vercel AI SDK統一了OpenAI/Anthropic/Google等多家LLM的串流介面
- 多模型路由實作成本最佳化和容災降級,GPT-4o掛了自動切Claude
- 對話狀態管理需要區分短期上下文和長期記憶,避免token爆炸
- 生產部署要處理並行連線數、超時、背壓和錯誤恢復
目錄
- 串流AI聊天架構全景
- Pattern 1: SSE串流響應實作
- Pattern 2: React Server Actions + AI
- Pattern 3: Vercel AI SDK整合
- Pattern 4: 多模型路由與降級
- Pattern 5: 對話狀態與上下文管理
- Pattern 6: 生產部署與效能最佳化
- 5個常見坑及解決方案
- 10個常見報錯排查
- 進階最佳化技巧
- 對比分析:SSE vs WebSocket vs 長輪詢
- 線上工具推薦
- 總結
串流AI聊天架構全景
┌─────────────────────────────────────────────────────────────┐
│ Next.js 15 AI Chat 架構 │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ SSE/Stream ┌──────────────────────┐ │
│ │ Client │ ◄────────────── │ Route Handler / │ │
│ │ Chat UI │ │ Server Action │ │
│ │ │ ──────────────► │ │ │
│ └──────────┘ POST請求 └──────────┬───────────┘ │
│ │ │
│ ┌─────────────┼──────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────┐ │
│ │ OpenAI │ │ Anthropic│ │ 本地 │ │
│ │ GPT-4o │ │ Claude │ │ Ollama│ │
│ └──────────┘ └──────────┘ └──────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ 對話狀態層 (Conversation State) │ │
│ │ ┌─────────┐ ┌──────────┐ ┌──────────────────┐ │ │
│ │ │ Context │ │ History │ │ Long-term Memory │ │ │
│ │ │ Window │ │ Store │ │ (Vector DB) │ │ │
│ │ └─────────┘ └──────────┘ └──────────────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
技術選型決策樹
需要串流AI聊天?
├── 快速原型 → Vercel AI SDK (Pattern 3)
├── 完全控制 → SSE + Route Handler (Pattern 1)
├── 型別安全優先 → React Server Actions (Pattern 2)
├── 多模型需求 → 多模型路由 (Pattern 4)
└── 企業級生產 → 全部組合 + 狀態管理 (Pattern 5+6)
Pattern 1: SSE串流響應實作
SSE(Server-Sent Events)是LLM串流輸出的標準傳輸協定。Next.js 15的Route Handler原生支援ReadableStream,可以直接回傳SSE格式的串流響應。
基礎SSE Route Handler
// app/api/chat/sse/route.ts
import { NextRequest } from 'next/server';
export const runtime = 'edge';
export const maxDuration = 60;
interface ChatMessage {
role: 'system' | 'user' | 'assistant';
content: string;
}
export async function POST(request: NextRequest) {
const { messages }: { messages: ChatMessage[] } = await request.json();
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
try {
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
},
body: JSON.stringify({
model: 'gpt-4o',
messages,
stream: true,
}),
});
if (!response.ok) {
const errorData = await response.text();
controller.enqueue(
encoder.encode(`data: ${JSON.stringify({ error: errorData })}\n\n`)
);
controller.close();
return;
}
const reader = response.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
const lines = chunk.split('\n').filter((line) => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6);
if (data === '[DONE]') {
controller.enqueue(encoder.encode('data: [DONE]\n\n'));
continue;
}
try {
const parsed = JSON.parse(data);
const content = parsed.choices?.[0]?.delta?.content;
if (content) {
controller.enqueue(
encoder.encode(`data: ${JSON.stringify({ content })}\n\n`)
);
}
} catch {
// skip malformed chunks
}
}
}
controller.close();
} catch (error) {
controller.enqueue(
encoder.encode(`data: ${JSON.stringify({ error: String(error) })}\n\n`)
);
controller.close();
}
},
});
return new Response(stream, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
Connection: 'keep-alive',
},
});
}
客戶端SSE消費
// components/ChatSSE.tsx
'use client';
import { useState, useRef, useCallback } from 'react';
interface Message {
id: string;
role: 'user' | 'assistant';
content: string;
}
export default function ChatSSE() {
const [messages, setMessages] = useState<Message[]>([]);
const [input, setInput] = useState('');
const [isStreaming, setIsStreaming] = useState(false);
const abortControllerRef = useRef<AbortController | null>(null);
const sendMessage = useCallback(async () => {
if (!input.trim() || isStreaming) return;
const userMessage: Message = {
id: crypto.randomUUID(),
role: 'user',
content: input.trim(),
};
const assistantMessage: Message = {
id: crypto.randomUUID(),
role: 'assistant',
content: '',
};
setMessages((prev) => [...prev, userMessage, assistantMessage]);
setInput('');
setIsStreaming(true);
const abortController = new AbortController();
abortControllerRef.current = abortController;
try {
const response = await fetch('/api/chat/sse', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [...messages, userMessage].map((m) => ({
role: m.role,
content: m.content,
})),
}),
signal: abortController.signal,
});
if (!response.ok) throw new Error(`HTTP ${response.status}`);
const reader = response.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
const lines = chunk.split('\n').filter((line) => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6);
if (data === '[DONE]') continue;
try {
const parsed = JSON.parse(data);
if (parsed.error) {
console.error('SSE error:', parsed.error);
continue;
}
if (parsed.content) {
setMessages((prev) =>
prev.map((m) =>
m.id === assistantMessage.id
? { ...m, content: m.content + parsed.content }
: m
)
);
}
} catch {
// skip malformed data
}
}
}
} catch (error) {
if ((error as Error).name !== 'AbortError') {
console.error('Stream error:', error);
}
} finally {
setIsStreaming(false);
abortControllerRef.current = null;
}
}, [input, messages, isStreaming]);
const stopStreaming = useCallback(() => {
abortControllerRef.current?.abort();
}, []);
return (
<div className="flex flex-col h-screen max-w-3xl mx-auto p-4">
<div className="flex-1 overflow-y-auto space-y-4 mb-4">
{messages.map((msg) => (
<div
key={msg.id}
className={`p-3 rounded-lg ${
msg.role === 'user'
? 'bg-blue-600 text-white ml-auto max-w-[80%]'
: 'bg-gray-100 text-gray-900 mr-auto max-w-[80%]'
}`}
>
<pre className="whitespace-pre-wrap font-sans text-sm">{msg.content}</pre>
</div>
))}
</div>
<div className="flex gap-2">
<input
type="text"
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyDown={(e) => e.key === 'Enter' && !e.shiftKey && sendMessage()}
placeholder="輸入訊息..."
className="flex-1 px-4 py-2 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-blue-500"
disabled={isStreaming}
/>
<button
onClick={isStreaming ? stopStreaming : sendMessage}
className={`px-4 py-2 rounded-lg font-medium ${
isStreaming
? 'bg-red-500 hover:bg-red-600 text-white'
: 'bg-blue-600 hover:bg-blue-700 text-white'
}`}
>
{isStreaming ? '停止' : '傳送'}
</button>
</div>
</div>
);
}
Pattern 2: React Server Actions + AI
React Server Actions讓AI呼叫無需手寫API端點,直接在Server Component中定義非同步函式,客戶端透過useActionState呼叫,型別安全零樣板。
串流Server Action
// app/actions/streaming-chat-action.ts
'use server';
import { createStreamableValue } from 'ai/rsc';
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';
export async function streamingChatAction(messages: Array<{ role: 'user' | 'assistant'; content: string }>) {
const streamableValue = createStreamableValue('');
(async () => {
try {
const result = await streamText({
model: openai('gpt-4o'),
messages,
});
for await (const chunk of result.textStream) {
streamableValue.update(chunk);
}
streamableValue.done();
} catch (error) {
streamableValue.error(String(error));
}
})();
return streamableValue.value;
}
客戶端消費串流Server Action
// components/ChatServerAction.tsx
'use client';
import { readStreamableValue } from 'ai/rsc';
import { streamingChatAction } from '@/app/actions/streaming-chat-action';
import { useState, useCallback } from 'react';
interface Message {
id: string;
role: 'user' | 'assistant';
content: string;
}
export default function ChatServerAction() {
const [messages, setMessages] = useState<Message[]>([]);
const [input, setInput] = useState('');
const [isStreaming, setIsStreaming] = useState(false);
const handleSubmit = useCallback(async () => {
if (!input.trim() || isStreaming) return;
const userMessage: Message = {
id: crypto.randomUUID(),
role: 'user',
content: input.trim(),
};
const assistantMessage: Message = {
id: crypto.randomUUID(),
role: 'assistant',
content: '',
};
const updatedMessages = [...messages, userMessage];
setMessages([...updatedMessages, assistantMessage]);
setInput('');
setIsStreaming(true);
try {
const streamValue = await streamingChatAction(
updatedMessages.map((m) => ({ role: m.role, content: m.content }))
);
for await (const chunk of readStreamableValue(streamValue)) {
if (chunk) {
setMessages((prev) =>
prev.map((m) =>
m.id === assistantMessage.id
? { ...m, content: m.content + chunk }
: m
)
);
}
}
} catch (error) {
console.error('Server Action error:', error);
} finally {
setIsStreaming(false);
}
}, [input, messages, isStreaming]);
return (
<div className="flex flex-col h-screen max-w-3xl mx-auto p-4">
<div className="flex-1 overflow-y-auto space-y-4 mb-4">
{messages.map((msg) => (
<div
key={msg.id}
className={`p-3 rounded-lg ${
msg.role === 'user'
? 'bg-blue-600 text-white ml-auto max-w-[80%]'
: 'bg-gray-100 text-gray-900 mr-auto max-w-[80%]'
}`}
>
<pre className="whitespace-pre-wrap font-sans text-sm">{msg.content}</pre>
</div>
))}
</div>
<div className="flex gap-2">
<input
type="text"
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyDown={(e) => e.key === 'Enter' && !e.shiftKey && handleSubmit()}
placeholder="輸入訊息..."
className="flex-1 px-4 py-2 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-blue-500"
disabled={isStreaming}
/>
<button
onClick={handleSubmit}
disabled={isStreaming}
className="px-4 py-2 bg-blue-600 hover:bg-blue-700 text-white rounded-lg font-medium disabled:opacity-50"
>
{isStreaming ? '生成中...' : '傳送'}
</button>
</div>
</div>
);
}
Pattern 3: Vercel AI SDK整合
Vercel AI SDK(ai套件)是Next.js串流AI聊天的官方推薦方案,統一了多家LLM的串流介面,提供useChat等開箱即用的Hook。
安裝與設定
npm install ai @ai-sdk/openai @ai-sdk/anthropic @ai-sdk/google
Route Handler(AI SDK版)
// app/api/chat/route.ts
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
export const runtime = 'edge';
export const maxDuration = 60;
export async function POST(request: Request) {
const { messages } = await request.json();
const result = streamText({
model: openai('gpt-4o'),
system: '你是一個有幫助的AI助手,用簡潔準確的方式回答問題。',
messages,
maxTokens: 4096,
temperature: 0.7,
});
return result.toDataStreamResponse();
}
useChat Hook(最簡實作)
// components/ChatAISDK.tsx
'use client';
import { useChat } from '@ai-sdk/react';
export default function ChatAISDK() {
const { messages, input, handleInputChange, handleSubmit, isLoading, stop } =
useChat({
api: '/api/chat',
onError: (error) => {
console.error('Chat error:', error);
},
onFinish: (message) => {
console.log('Finished:', message.content.length, 'chars');
},
});
return (
<div className="flex flex-col h-screen max-w-3xl mx-auto p-4">
<div className="flex-1 overflow-y-auto space-y-4 mb-4">
{messages.map((msg) => (
<div
key={msg.id}
className={`p-3 rounded-lg ${
msg.role === 'user'
? 'bg-blue-600 text-white ml-auto max-w-[80%]'
: 'bg-gray-100 text-gray-900 mr-auto max-w-[80%]'
}`}
>
<div className="whitespace-pre-wrap text-sm">{msg.content}</div>
</div>
))}
{isLoading && messages[messages.length - 1]?.role === 'user' && (
<div className="bg-gray-100 text-gray-900 mr-auto max-w-[80%] p-3 rounded-lg">
<div className="flex items-center gap-2 text-sm text-gray-500">
<div className="animate-pulse">●</div>
<div className="animate-pulse delay-75">●</div>
<div className="animate-pulse delay-150">●</div>
</div>
</div>
)}
</div>
<form onSubmit={handleSubmit} className="flex gap-2">
<input
value={input}
onChange={handleInputChange}
placeholder="輸入訊息..."
className="flex-1 px-4 py-2 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-blue-500"
/>
<button
type={isLoading ? 'button' : 'submit'}
onClick={isLoading ? stop : undefined}
className={`px-4 py-2 rounded-lg font-medium ${
isLoading
? 'bg-red-500 hover:bg-red-600 text-white'
: 'bg-blue-600 hover:bg-blue-700 text-white'
}`}
>
{isLoading ? '停止' : '傳送'}
</button>
</form>
</div>
);
}
Pattern 4: 多模型路由與降級
生產環境中不能只依賴一個LLM供應商。多模型路由實作成本最佳化和容災降級——GPT-4o掛了自動切Claude,簡單問題用GPT-4o-mini省錢。
模型路由器
// lib/ai/model-router.ts
import { LanguageModelV1 } from 'ai';
import { openai } from '@ai-sdk/openai';
import { anthropic } from '@ai-sdk/anthropic';
type ModelTier = 'fast' | 'standard' | 'premium';
interface ModelConfig {
model: LanguageModelV1;
name: string;
tier: ModelTier;
maxRetries: number;
timeoutMs: number;
costPer1kTokens: number;
}
const MODEL_REGISTRY: Record<ModelTier, ModelConfig[]> = {
fast: [
{
model: openai('gpt-4o-mini'),
name: 'gpt-4o-mini',
tier: 'fast',
maxRetries: 2,
timeoutMs: 15000,
costPer1kTokens: 0.00015,
},
{
model: anthropic('claude-3-5-haiku-2024-10-22'),
name: 'claude-3.5-haiku',
tier: 'fast',
maxRetries: 2,
timeoutMs: 15000,
costPer1kTokens: 0.00025,
},
],
standard: [
{
model: openai('gpt-4o'),
name: 'gpt-4o',
tier: 'standard',
maxRetries: 2,
timeoutMs: 30000,
costPer1kTokens: 0.005,
},
{
model: anthropic('claude-sonnet-4-20250514'),
name: 'claude-sonnet-4',
tier: 'standard',
maxRetries: 2,
timeoutMs: 30000,
costPer1kTokens: 0.003,
},
],
premium: [
{
model: openai('o3'),
name: 'o3',
tier: 'premium',
maxRetries: 1,
timeoutMs: 60000,
costPer1kTokens: 0.03,
},
{
model: anthropic('claude-opus-4-20250514'),
name: 'claude-opus-4',
tier: 'premium',
maxRetries: 1,
timeoutMs: 60000,
costPer1kTokens: 0.015,
},
],
};
interface RouteDecision {
model: LanguageModelV1;
modelName: string;
tier: ModelTier;
fallbackChain: string[];
}
export function routeModel(
complexity: 'simple' | 'medium' | 'complex',
preferredProvider?: 'openai' | 'anthropic'
): RouteDecision {
const tierMap: Record<string, ModelTier> = {
simple: 'fast',
medium: 'standard',
complex: 'premium',
};
const tier = tierMap[complexity];
const models = MODEL_REGISTRY[tier];
const preferred = preferredProvider
? models.find((m) => m.name.startsWith(preferredProvider))
: models[0];
const selected = preferred || models[0];
const fallbackChain = models
.filter((m) => m.name !== selected.name)
.map((m) => m.name);
return {
model: selected.model,
modelName: selected.name,
tier,
fallbackChain,
};
}
export { MODEL_REGISTRY };
export type { ModelConfig, ModelTier };
帶降級的串流Route Handler
// app/api/chat/routed/route.ts
import { streamText } from 'ai';
import { routeModel, MODEL_REGISTRY } from '@/lib/ai/model-router';
import { NextRequest } from 'next/server';
export const runtime = 'edge';
export const maxDuration = 60;
export async function POST(request: NextRequest) {
const body = await request.json();
const { messages, complexity = 'medium', provider } = body;
const route = routeModel(complexity, provider);
try {
const result = streamText({
model: route.model,
system: '你是一個有幫助的AI助手。',
messages,
maxTokens: 4096,
abortSignal: request.signal,
});
return result.toDataStreamResponse({
headers: {
'X-Model-Name': route.modelName,
'X-Model-Tier': route.tier,
'X-Fallback-Chain': route.fallbackChain.join(','),
},
});
} catch (error) {
const fallbackModelName = route.fallbackChain[0];
const allModels = Object.values(MODEL_REGISTRY).flat();
const fallback = allModels.find((m) => m.name === fallbackModelName);
if (!fallback) {
return new Response(
JSON.stringify({ error: '所有模型均不可用' }),
{ status: 503, headers: { 'Content-Type': 'application/json' } }
);
}
const result = streamText({
model: fallback.model,
system: '你是一個有幫助的AI助手。',
messages,
maxTokens: 4096,
});
return result.toDataStreamResponse({
headers: {
'X-Model-Name': fallback.name,
'X-Model-Tier': fallback.tier,
'X-Fallback-Used': 'true',
},
});
}
}
Pattern 5: 對話狀態與上下文管理
AI聊天的核心挑戰之一是上下文管理——對話歷史越長,token消耗越大,成本指數增長。需要區分短期上下文視窗和長期記憶。
對話狀態管理
// lib/ai/conversation-state.ts
import { Redis } from '@upstash/redis';
const redis = new Redis({
url: process.env.REDIS_URL!,
token: process.env.REDIS_TOKEN!,
});
interface ConversationMessage {
id: string;
role: 'system' | 'user' | 'assistant';
content: string;
timestamp: number;
tokenCount: number;
}
interface ConversationState {
id: string;
userId: string;
title: string;
messages: ConversationMessage[];
summary?: string;
totalTokens: number;
createdAt: number;
updatedAt: number;
}
const MAX_CONTEXT_TOKENS = 8000;
const SUMMARY_THRESHOLD = 6000;
function estimateTokenCount(text: string): number {
return Math.ceil(text.length / 3.5);
}
export async function getConversation(conversationId: string): Promise<ConversationState | null> {
const data = await redis.get<ConversationState>(
`conversation:${conversationId}`
);
return data;
}
export async function saveConversation(state: ConversationState): Promise<void> {
state.updatedAt = Date.now();
await redis.set(`conversation:${state.id}`, JSON.stringify(state), {
ex: 86400 * 30,
});
}
export async function createContextWindow(
conversationId: string
): Promise<ConversationMessage[]> {
const conversation = await getConversation(conversationId);
if (!conversation) return [];
const messages = conversation.messages;
const totalTokens = messages.reduce((sum, m) => sum + m.tokenCount, 0);
if (totalTokens <= MAX_CONTEXT_TOKENS) {
return messages;
}
if (conversation.summary) {
const summaryMessage: ConversationMessage = {
id: 'summary',
role: 'system',
content: `以下是之前對話的摘要:\n${conversation.summary}`,
timestamp: Date.now(),
tokenCount: estimateTokenCount(conversation.summary),
};
const recentMessages: ConversationMessage[] = [];
let currentTokens = summaryMessage.tokenCount;
for (let i = messages.length - 1; i >= 0; i--) {
if (currentTokens + messages[i].tokenCount > MAX_CONTEXT_TOKENS) break;
recentMessages.unshift(messages[i]);
currentTokens += messages[i].tokenCount;
}
return [summaryMessage, ...recentMessages];
}
const result: ConversationMessage[] = [];
let currentTokens = 0;
for (let i = messages.length - 1; i >= 0; i--) {
if (currentTokens + messages[i].tokenCount > MAX_CONTEXT_TOKENS) break;
result.unshift(messages[i]);
currentTokens += messages[i].tokenCount;
}
return result;
}
export async function generateSummary(
conversationId: string,
messages: ConversationMessage[]
): Promise<string> {
const conversationText = messages
.map((m) => `${m.role}: ${m.content}`)
.join('\n');
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
},
body: JSON.stringify({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content: '請用2-3句話總結以下對話的關鍵資訊和結論。',
},
{ role: 'user', content: conversationText },
],
max_tokens: 200,
}),
});
const data = await response.json();
const summary = data.choices?.[0]?.message?.content || '';
const conversation = await getConversation(conversationId);
if (conversation) {
conversation.summary = summary;
await saveConversation(conversation);
}
return summary;
}
export { estimateTokenCount, MAX_CONTEXT_TOKENS, SUMMARY_THRESHOLD };
export type { ConversationMessage, ConversationState };
Pattern 6: 生產部署與效能最佳化
並行連線管理
// lib/ai/connection-pool.ts
import { Redis } from '@upstash/redis';
const redis = new Redis({
url: process.env.REDIS_URL!,
token: process.env.REDIS_TOKEN!,
});
const MAX_CONCURRENT_CONNECTIONS = 100;
const CONNECTION_TTL_SECONDS = 120;
export async function acquireConnection(userId: string): Promise<boolean> {
const key = `conn:${userId}`;
const current = await redis.incr(key);
if (current === 1) {
await redis.expire(key, CONNECTION_TTL_SECONDS);
}
if (current > MAX_CONCURRENT_CONNECTIONS) {
await redis.decr(key);
return false;
}
return true;
}
export async function releaseConnection(userId: string): Promise<void> {
const key = `conn:${userId}`;
const current = await redis.decr(key);
if (current <= 0) {
await redis.del(key);
}
}
限流中介軟體
// lib/ai/rate-limiter.ts
import { Redis } from '@upstash/redis';
import { NextRequest, NextResponse } from 'next/server';
const redis = new Redis({
url: process.env.REDIS_URL!,
token: process.env.REDIS_TOKEN!,
});
interface RateLimitConfig {
windowMs: number;
maxRequests: number;
}
const RATE_LIMITS: Record<string, RateLimitConfig> = {
free: { windowMs: 60000, maxRequests: 10 },
pro: { windowMs: 60000, maxRequests: 60 },
enterprise: { windowMs: 60000, maxRequests: 300 },
};
export async function rateLimitMiddleware(
request: NextRequest,
userId: string,
tier: string = 'free'
): Promise<NextResponse | null> {
const config = RATE_LIMITS[tier] || RATE_LIMITS.free;
const key = `rate:${userId}:${Math.floor(Date.now() / config.windowMs)}`;
const current = await redis.incr(key);
if (current === 1) {
await redis.expire(key, Math.ceil(config.windowMs / 1000));
}
if (current > config.maxRequests) {
return NextResponse.json(
{
error: '請求過於頻繁,請稍後再試',
retryAfter: config.windowMs / 1000,
},
{
status: 429,
headers: {
'Retry-After': String(Math.ceil(config.windowMs / 1000)),
'X-RateLimit-Limit': String(config.maxRequests),
'X-RateLimit-Remaining': '0',
},
}
);
}
return null;
}
健康檢查與監控
// app/api/health/ai/route.ts
import { NextResponse } from 'next/server';
interface HealthCheck {
service: string;
status: 'healthy' | 'degraded' | 'down';
latencyMs: number;
error?: string;
}
async function checkOpenAI(): Promise<HealthCheck> {
const start = Date.now();
try {
const res = await fetch('https://api.openai.com/v1/models', {
headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
signal: AbortSignal.timeout(5000),
});
return {
service: 'openai',
status: res.ok ? 'healthy' : 'degraded',
latencyMs: Date.now() - start,
};
} catch (error) {
return {
service: 'openai',
status: 'down',
latencyMs: Date.now() - start,
error: String(error),
};
}
}
export async function GET() {
const checks = await Promise.all([checkOpenAI()]);
const overallStatus = checks.every((c) => c.status === 'healthy')
? 'healthy'
: checks.some((c) => c.status === 'healthy')
? 'degraded'
: 'down';
return NextResponse.json({
status: overallStatus,
timestamp: new Date().toISOString(),
checks,
});
}
5個常見坑及解決方案
坑1:SSE連線在Nginx/CDN層被緩衝
現象:串流響應變成一次性回傳,使用者看到的是等了很久然後整段文字出現。
原因:Nginx預設開啟proxy_buffering,CDN也會緩衝SSE響應。
解決:
location /api/chat {
proxy_pass http://nextjs_backend;
proxy_buffering off;
proxy_cache off;
proxy_set_header Connection '';
proxy_http_version 1.1;
chunked_transfer_encoding on;
proxy_read_timeout 300s;
}
坑2:Edge Runtime下無法使用Node.js原生模組
現象:部署到Vercel Edge Functions時報錯Module not found。
原因:Edge Runtime不支援Node.js的net、fs、child_process等模組。
解決:使用Edge相容的替代庫,如@upstash/redis替代ioredis。
坑3:useChat的messages狀態與外部狀態不同步
現象:在useChat外維護了一份messages,但兩者不同步。
解決:使用useChat的onFinish回呼同步狀態,或使用setMessages方法。
坑4:串流響應中斷後無法恢復
現象:網路抖動導致SSE連線斷開,已接收的內容遺失。
解決:客戶端快取已接收內容,重連時將已有內容作為上下文重新請求。
坑5:大量並行SSE連線導致記憶體洩漏
現象:伺服器記憶體持續增長,最終OOM。
解決:確保所有串流都有超時和清理機制。
10個常見報錯排查
| # | 報錯資訊 | 原因 | 解決方案 |
|---|---|---|---|
| 1 | TypeError: response.body is null |
Route Handler未回傳串流響應 | 確認回傳ReadableStream或使用toDataStreamResponse() |
| 2 | AI_APICallError: 429 Too Many Requests |
LLM API限流 | 實作指數退避重試,或切換到備選模型 |
| 3 | AI_APICallError: context_length_exceeded |
對話歷史超過模型上下文視窗 | 實作上下文視窗裁剪或摘要壓縮 |
| 4 | Error: Invalid SSE data |
SSE資料格式不正確 | 檢查data:前綴和\n\n分隔符 |
| 5 | AbortError: The operation was aborted |
使用者取消或請求超時 | 正確處理AbortSignal,清理資源 |
| 6 | Error: Cannot read properties of undefined (reading 'delta') |
LLM回傳了非標準格式的chunk | 新增防禦性解析,跳過格式異常的chunk |
| 7 | RuntimeError: Edge Runtime does not support Node.js API |
在Edge Runtime中使用了Node.js API | 切換到Node.js Runtime或使用Edge相容庫 |
| 8 | Error: Maximum call stack size exceeded |
遞迴處理串流資料導致棧溢位 | 使用迭代替代遞迴處理chunk |
| 9 | TypeError: Failed to execute 'fetch' on 'Window' |
瀏覽器CORS限制 | 確保API路由和前端同源,或設定CORS頭 |
| 10 | Error: Stream ended unexpectedly |
服務端串流提前關閉 | 新增心跳機制檢測連線狀態,實作自動重連 |
進階最佳化技巧
1. 串流Markdown渲染
// components/StreamMarkdown.tsx
'use client';
import { memo, useMemo } from 'react';
interface StreamMarkdownProps {
content: string;
isStreaming: boolean;
}
const StreamMarkdown = memo(function StreamMarkdown({
content,
isStreaming,
}: StreamMarkdownProps) {
const html = useMemo(() => {
return content
.replace(/```(\w*)\n([\s\S]*?)```/g, '<pre><code class="language-$1">$2</code></pre>')
.replace(/`([^`]+)`/g, '<code>$1</code>')
.replace(/\*\*([^*]+)\*\*/g, '<strong>$1</strong>')
.replace(/\*([^*]+)\*/g, '<em>$1</em>')
.replace(/^### (.+)$/gm, '<h3>$1</h3>')
.replace(/^## (.+)$/gm, '<h2>$1</h2>')
.replace(/^# (.+)$/gm, '<h1>$1</h1>')
.replace(/\n/g, '<br/>');
}, [content]);
return (
<div className="prose prose-sm max-w-none">
<div dangerouslySetInnerHTML={{ __html: html }} />
{isStreaming && (
<span className="inline-block w-2 h-4 bg-blue-600 animate-pulse ml-0.5" />
)}
</div>
);
});
export default StreamMarkdown;
2. 串流響應快取
// lib/ai/stream-cache.ts
import { Redis } from '@upstash/redis';
const redis = new Redis({
url: process.env.REDIS_URL!,
token: process.env.REDIS_TOKEN!,
});
export async function getCachedStream(queryHash: string): Promise<string | null> {
return redis.get<string>(`stream-cache:${queryHash}`);
}
export async function cacheStreamResponse(
queryHash: string,
response: string,
ttlSeconds: number = 3600
): Promise<void> {
await redis.set(`stream-cache:${queryHash}`, response, { ex: ttlSeconds });
}
export function computeQueryHash(
messages: Array<{ role: string; content: string }>,
model: string
): string {
const raw = JSON.stringify({ messages, model });
let hash = 0;
for (let i = 0; i < raw.length; i++) {
const char = raw.charCodeAt(i);
hash = ((hash << 5) - hash) + char;
hash |= 0;
}
return hash.toString(36);
}
對比分析:SSE vs WebSocket vs 長輪詢
| 維度 | SSE | WebSocket | 長輪詢 |
|---|---|---|---|
| 傳輸方向 | 單向(服端→客戶端) | 雙向 | 單向 |
| 協定 | HTTP/1.1+ | ws/wss | HTTP |
| 自動重連 | 瀏覽器原生支援 | 需手動實作 | 每次都是新請求 |
| 代理/CDN | 友好(標準HTTP) | 可能被攔截 | 完全相容 |
| 二進位資料 | 不支援 | 支援 | 不支援 |
| 幀開銷 | 較高(文字格式) | 低(2位元組) | 高(每次HTTP頭) |
| 瀏覽器相容 | IE不支援 | 廣泛支援 | 廣泛支援 |
| 連線數限制 | 6個/域名(HTTP/1.1) | 無限制 | 無實際限制 |
| LLM適用性 | ★★★★★ | ★★★ | ★★ |
| 實作複雜度 | ★★ | ★★★★ | ★ |
結論:LLM串流輸出是典型的單向推送場景,SSE是最佳選擇。只有在需要雙向即時互動(如語音對話、即時協作編輯)時才考慮WebSocket。
線上工具推薦
相關文章
- Next.js串流SSR實戰:從Suspense到漸進式渲染的5種生產模式 - 深入理解Next.js Streaming SSR原理
- Next.js App Router效能最佳化指南 - App Router效能最佳化全攻略
- Python SSE串流輸出LLM - 後端SSE實作參考
外部資源
- Vercel AI SDK官方文件 - AI SDK完整API參考
- MDN: Server-Sent Events - SSE協定規範
總結
Next.js 15串流AI聊天的6種生產模式各有適用場景:
| 模式 | 適用場景 | 複雜度 | 推薦度 |
|---|---|---|---|
| SSE串流響應 | 需要完全控制串流協定 | ★★★ | ★★★★ |
| React Server Actions | 型別安全優先,快速開發 | ★★ | ★★★★ |
| Vercel AI SDK | 快速原型,多LLM支援 | ★ | ★★★★★ |
| 多模型路由 | 生產級容災,成本最佳化 | ★★★★ | ★★★★ |
| 對話狀態管理 | 長對話,上下文敏感 | ★★★★ | ★★★★ |
| 生產部署最佳化 | 企業級上線 | ★★★★★ | ★★★★★ |
核心建議:從Vercel AI SDK開始快速驗證,逐步引入多模型路由和狀態管理,最後完善生產部署。Next.js 15的串流AI聊天不再是技術難題,關鍵是選對模式、避開常見坑。
本站提供瀏覽器本地工具,免註冊即可試用 →