Nuxt4 + AI Streaming SSR: Optimizing LLM App First Paint from 3s to 300ms in 2026

前端工程

Nuxt4 + AI Streaming SSR: Optimizing LLM App First Paint from 3s to 300ms in 2026

Is your AI chat app's first paint taking over 3 seconds? Users stare at a blank page waiting for the LLM to finish before seeing anything? SSR-rendered HTML contains all AI replies but users feel like they're waiting forever? In 2026, Nuxt4's streaming SSR completely transforms the AI app experience — first paint visible in 300ms, streaming output visible in real-time.


Background

The Dilemma of Traditional SSR in AI Applications

Dimension Traditional SSR Streaming SSR
Rendering Mode Render all at once after all data is ready Render each part as data becomes available
First Paint Time Wait for complete AI response (3-30s) 300ms for first paint skeleton
User Experience Long blank screen Progressive content display
TTFB Extremely high (waiting for AI response) Extremely low (HTML header sent immediately)
Hydration Full hydration Island / Progressive hydration
Server Resources Long connection occupation Streaming release

Nuxt4 Core New Features

  • Streaming Rendering: renderToString supports AsyncIterable, can send while rendering
  • Server Components: .server.vue components render on the server, no JS sent to client
  • Edge SSR: Native support for Cloudflare Workers / Vercel Edge / Deno Deploy
  • Hybrid Rendering: Configure SSR/SSG/SWR strategy per route

Problem Analysis

Root Causes of Slow SSR in AI Applications

  1. Serial Waiting: SSR must wait for the AI API to return the complete response before generating HTML
  2. Full Hydration: Client re-executes all component logic, including AI calls
  3. Blocking Render: One slow component blocks the entire page render
  4. No Cache Strategy: AI responses are not cacheable, every request re-invokes the API

Step-by-Step Guide

Step 1: Create a Nuxt4 Project

npx nuxi@latest init ai-chat-app --template v4-compat
cd ai-chat-app
npm install
// nuxt.config.ts
export default defineNuxtConfig({
  future: {
    compatibilityVersion: 4,
  },
  experimental: {
    componentIslands: true,
    viewTransition: true,
    renderJsonPayloads: true,
  },
  routeRules: {
    '/': { ssr: true },
    '/chat/**': { ssr: true },
    '/static/**': { ssr: false },
    '/api/ai/**': { cors: true },
  },
  nitro: {
    preset: 'cloudflare-pages',
    compressPublicAssets: true,
  },
})

Step 2: Implement Streaming AI Server Component

<!-- components/ChatStream.server.vue -->
<script lang="ts" setup>
const props = defineProps<{
  messageId: string
  prompt: string
}>()

const stream = await aiStreamResponse(props.prompt)

async function* aiStreamResponse(prompt: string) {
  const response = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${useRuntimeConfig().public.aiApiKey}`,
    },
    body: JSON.stringify({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: prompt }],
      stream: true,
    }),
  })

  const reader = response.body!.getReader()
  const decoder = new TextDecoder()

  while (true) {
    const { done, value } = await reader.read()
    if (done) break

    const chunk = decoder.decode(value, { stream: true })
    const lines = chunk.split('\n').filter(line => line.startsWith('data: '))

    for (const line of lines) {
      const data = line.slice(6)
      if (data === '[DONE]') return
      try {
        const parsed = JSON.parse(data)
        const content = parsed.choices[0]?.delta?.content
        if (content) yield content
      } catch {}
    }
  }
}
</script>

<template>
  <div class="chat-stream">
    <div class="message-content">
      <template v-for="(segment, i) in stream" :key="i">
        <span v-html="renderMarkdown(segment)" />
      </template>
    </div>
  </div>
</template>

Step 3: Chat Page Implementation

<!-- pages/chat/[id].vue -->
<script lang="ts" setup>
const route = useRoute()
const chatId = route.params.id as string

const { data: messages, refresh } = await useFetch(`/api/chat/${chatId}/messages`)

const newMessage = ref('')
const isStreaming = ref(false)

async function sendMessage() {
  if (!newMessage.value.trim() || isStreaming.value) return

  const prompt = newMessage.value
  newMessage.value = ''
  isStreaming.value = true

  await $fetch('/api/chat/send', {
    method: 'POST',
    body: { chatId, content: prompt },
  })

  await refresh()
  isStreaming.value = false
}
</script>

<template>
  <div class="chat-container">
    <div class="messages">
      <div v-for="msg in messages" :key="msg.id" :class="['message', msg.role]">
        <div class="message-text">{{ msg.content }}</div>
      </div>
      <LazyChatStream v-if="isStreaming" :message-id="chatId" :prompt="newMessage" />
    </div>
    <div class="input-area">
      <textarea v-model="newMessage" @keydown.enter.exact.prevent="sendMessage" placeholder="Type a message..." />
      <button :disabled="isStreaming" @click="sendMessage">Send</button>
    </div>
  </div>
</template>

Step 4: API Route Streaming Response Implementation

// server/api/chat/stream.get.ts
export default defineEventHandler(async (event) => {
  const query = getQuery(event)
  const prompt = query.prompt as string

  setResponseHeader(event, 'content-type', 'text/event-stream')
  setResponseHeader(event, 'cache-control', 'no-cache')
  setResponseHeader(event, 'connection', 'keep-alive')

  const stream = await callAIStream(prompt)

  return sendStream(event, stream)
})

async function callAIStream(prompt: string): Promise<ReadableStream> {
  const response = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${process.env.AI_API_KEY}`,
    },
    body: JSON.stringify({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: prompt }],
      stream: true,
    }),
  })

  return new ReadableStream({
    async start(controller) {
      const reader = response.body!.getReader()
      const decoder = new TextDecoder()

      while (true) {
        const { done, value } = await reader.read()
        if (done) {
          controller.close()
          break
        }
        controller.enqueue(value)
      }
    },
  })
}

Step 5: Edge SSR Deployment Configuration

// nuxt.config.ts - Edge Configuration
export default defineNuxtConfig({
  nitro: {
    preset: 'cloudflare-pages',
    cloudflarePages: {
      routes: {
        exclude: ['/assets/*', '/_nuxt/*'],
      },
    },
  },
  experimental: {
    asyncContext: true,
  },
})
# Build and deploy to Cloudflare Pages
npm run build
npx wrangler pages deploy .output/public

Complete Code: Production-Grade AI Chat Application

// composables/useAIChat.ts
export function useAIChat(chatId: string) {
  const config = useRuntimeConfig()
  const messages = ref<ChatMessage[]>([])
  const isStreaming = ref(false)
  const currentStreamContent = ref('')

  async function loadMessages() {
    const { data } = await useFetch<ChatMessage[]>(`/api/chat/${chatId}/messages`)
    if (data.value) messages.value = data.value
  }

  async function sendMessage(content: string) {
    if (isStreaming.value) return

    messages.value.push({
      id: crypto.randomUUID(),
      role: 'user',
      content,
      createdAt: new Date().toISOString(),
    })

    isStreaming.value = true
    currentStreamContent.value = ''

    try {
      const response = await fetch(`/api/chat/stream?prompt=${encodeURIComponent(content)}`, {
        headers: { 'Accept': 'text/event-stream' },
      })

      const reader = response.body!.getReader()
      const decoder = new TextDecoder()

      while (true) {
        const { done, value } = await reader.read()
        if (done) break

        const chunk = decoder.decode(value, { stream: true })
        const lines = chunk.split('\n').filter(l => l.startsWith('data: '))

        for (const line of lines) {
          const data = line.slice(6)
          if (data === '[DONE]') continue
          try {
            const parsed = JSON.parse(data)
            const delta = parsed.choices[0]?.delta?.content || ''
            currentStreamContent.value += delta
          } catch {}
        }
      }

      messages.value.push({
        id: crypto.randomUUID(),
        role: 'assistant',
        content: currentStreamContent.value,
        createdAt: new Date().toISOString(),
      })
    } catch (error) {
      console.error('Stream error:', error)
    } finally {
      isStreaming.value = false
      currentStreamContent.value = ''
    }
  }

  return { messages, isStreaming, currentStreamContent, loadMessages, sendMessage }
}

interface ChatMessage {
  id: string
  role: 'user' | 'assistant' | 'system'
  content: string
  createdAt: string
}
// server/api/chat/[id]/messages.get.ts
import { kv } from '~/server/utils/kv'

export default defineEventHandler(async (event) => {
  const chatId = getRouterParam(event, 'id')!
  const cached = await kv.get(`chat:${chatId}:messages`)
  if (cached) return cached

  const messages = await db.chatMessage.findMany({
    where: { chatId },
    orderBy: { createdAt: 'asc' },
  })

  await kv.set(`chat:${chatId}:messages`, messages, { ttl: 60 })
  return messages
})
// server/middleware/cache.ts
export default defineEventHandler(async (event) => {
  if (event.path.startsWith('/api/chat/stream')) return

  const cached = await getResponseCache(event)
  if (cached) {
    return cached
  }
})

async function getResponseCache(event: H3Event) {
  const key = `cache:${event.path}`
  return await kv.get(key)
}

Common Pitfalls

Pitfall 1: Using Client APIs in Server Components

Symptom: Using onClick, ref and other client APIs in .server.vue components causes build errors.

Solution: Server Components can only run on the server and cannot use any client APIs. Extract interactive parts into separate client components, wrapped with <ClientOnly> or island components.

Pitfall 2: Hydration Mismatch During Streaming Render

Symptom: Console shows Hydration mismatch warnings, streaming content is inconsistent with server-rendered content.

Solution: Wrap streaming content with <ClientOnly>, or use useAsyncData's lazy: true option to avoid blocking hydration. Ensure client and server use the same data source.

Pitfall 3: Edge Runtime Does Not Support Node.js API

Symptom: After deploying to Cloudflare Workers, errors like process is not defined or Buffer is not defined appear.

Solution: Edge Runtime does not include Node.js APIs. Use import { H3Event } from 'h3' instead of Node.js's IncomingMessage, use native fetch for AI API calls, and avoid Node.js dependencies like axios.

Pitfall 4: SSE Connections Buffered by CDN

Symptom: Streaming output appears all at once on the client side, not character by character.

Solution: Ensure CDN is configured with the X-Accel-Buffering: no response header. Cloudflare supports SSE passthrough by default. Vercel requires experimental: { streaming: true } in next.config.js.

Pitfall 5: High Concurrent AI Requests Causing Server OOM

Symptom: Under high concurrency, Node.js process memory keeps growing until OOM.

Solution: Implement request queues and concurrency limits, use AbortController for timeouts, and promptly clean up completed streaming connections.


Error Troubleshooting

# Error Message Cause Solution
1 Hydration mismatch Server and client rendered content is inconsistent Wrap dynamic content with ClientOnly, check data consistency
2 process is not defined Edge Runtime does not support Node.js API Use Web standard APIs instead, add nitro polyfill
3 Server Component cannot use client APIs Using ref/onClick in .server.vue Extract interactive parts as client components
4 fetch is not a function Server fetch not configured Ensure Node.js 18+ or configure nitro.nodeCompat
5 ReadableStream is not supported Runtime does not support streams Upgrade to Node.js 18+ or use polyfill
6 CORS error on SSE Cross-origin SSE request blocked Configure routeRules cors option
7 KV storage not available Edge environment has no persistent storage Use Cloudflare KV or Vercel KV
8 Maximum call stack exceeded Recursive component render overflow Check component nesting, limit recursion depth
9 429 Too Many Requests AI API rate limiting Implement request queue and backoff retry
10 Worker exceeded CPU time limit Edge function CPU time exceeded Reduce server computation, stream AI responses

Advanced Optimization

1. Island Architecture to Reduce JS Bundle Size

<!-- components/AIChat.island.vue -->
<script lang="ts" setup>
defineOptions({
  island: true,
})
</script>

Island components only render on the server; the client doesn't download the corresponding JS, significantly reducing hydration overhead.

2. SWR Cache for AI Responses

const { data } = await useFetch('/api/chat/messages', {
  key: `chat-${chatId}`,
  getCachedData(key, nuxtApp) {
    const cached = nuxtApp.payload.data[key]
    if (!cached) return null
    const expirationDate = new Date(cached.fetchedAt)
    expirationDate.setMinutes(expirationDate.getMinutes() + 5)
    if (expirationDate < new Date()) return null
    return cached
  },
})

3. Pre-render Skeleton Screen

<!-- components/ChatSkeleton.vue -->
<template>
  <div class="chat-skeleton animate-pulse">
    <div class="h-4 bg-gray-200 rounded w-3/4 mb-2" />
    <div class="h-4 bg-gray-200 rounded w-1/2 mb-2" />
    <div class="h-4 bg-gray-200 rounded w-5/6" />
  </div>
</template>

4. Progressive Hydration Strategy

// nuxt.config.ts
export default defineNuxtConfig({
  experimental: {
    componentIslands: {
      selectiveHydration: true,
    },
  },
})

Comparison Analysis

Dimension Nuxt3 SSR Nuxt4 Streaming SSR Next.js App Router Remix
Streaming Render Manual implementation Native support Native support defer support
Server Components None Native support RSC None
Edge SSR Experimental Native support Native support Needs adaptation
Island Architecture Experimental Stable None None
AI Streaming Integration Manual Built-in composable Vercel AI SDK Manual
Hydration Strategy Full Progressive / Island Selective Full
Learning Curve Medium Medium High Medium
Ecosystem Maturity Mature Mature in 2026 Mature Medium

Summary & Outlook

Summary: Nuxt4's streaming SSR brings a qualitative leap for AI applications — from 3-second blank screens to 300ms first paint visibility. Core optimization strategies: Server Components reduce JS bundle size, streaming rendering outputs AI responses in real-time, Edge SSR reduces latency, and island architecture enables on-demand hydration. For new AI applications, use Nuxt4 directly; for existing Nuxt3 projects, upgrade incrementally, focusing on converting AI interaction pages to streaming rendering mode.


Try these browser-local tools — no sign-up required →

#Nuxt4#AI#SSR#流式渲染#Server Components#性能优化#大模型#Vue