Vue3 + Nuxt 4 Full-Stack AI App: 7 Production Patterns from SSR to Edge Inference

前端工程

Vue3 + Nuxt 4 Full-Stack AI App: 7 Production Patterns from SSR to Edge Inference

Still building AI apps with separate frontend and backend? Dealing with CORS, deploying two systems? Nuxt 4's Server Routes + SSR + Edge Runtime make Vue3 full-stack AI apps a reality—one codebase, one deployment, SSR-rendered AI content. This article dives into 7 production-grade patterns, from Server Routes proxying to edge inference, every line of code ready for production.


Key Takeaways

  • Master Nuxt 4 Server Routes for building AI API proxies
  • Implement SSR + AI streaming render first-paint optimization
  • Build production-grade streaming chat UI components
  • Deploy edge inference to Cloudflare Workers
  • Design RAG frontend interaction experiences
  • Pinia persistent conversation state with cross-page recovery
  • Production performance optimization and deployment best practices

Table of Contents

  1. Nuxt 4 Full-Stack AI Architecture Overview
  2. Pattern 1: Server Routes + AI API Proxy
  3. Pattern 2: SSR + AI Streaming Render
  4. Pattern 3: Streaming Chat UI Component
  5. Pattern 4: Edge Inference with Cloudflare Workers
  6. Pattern 5: RAG Frontend Interaction Design
  7. Pattern 6: Conversation State with Pinia Persistence
  8. Pattern 7: Production Deployment & Performance Optimization
  9. 5 Common Pitfalls and Solutions
  10. 10 Common Error Troubleshooting
  11. Advanced Optimization Techniques
  12. Comparison: Nuxt 4 vs Next.js 15 vs SvelteKit
  13. Recommended Online Tools
  14. Summary

Nuxt 4 Full-Stack AI Architecture Overview

Nuxt 4 in 2026 brings critical capabilities for full-stack AI applications: Server Routes as backend API layer, SSR streaming rendering, Edge Runtime support, and native TypeScript end-to-end type safety.

┌──────────────────────────────────────────────────────────┐
│                   Nuxt 4 Full-Stack AI Architecture      │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  ┌─────────────┐    ┌──────────────┐    ┌────────────┐  │
│  │   Browser   │───▶│  Nuxt SSR    │───▶│  Edge /    │  │
│  │   Client    │◀───│  Server      │◀───│  Node      │  │
│  └──────┬──────┘    └──────┬───────┘    └─────┬──────┘  │
│         │                  │                   │         │
│  ┌──────▼──────┐    ┌──────▼───────┐    ┌─────▼──────┐  │
│  │  Vue3       │    │ Server       │    │ AI Models  │  │
│  │  Composables│    │ Routes       │    │ OpenAI     │  │
│  │  Pinia      │    │ /api/chat    │    │ Anthropic  │  │
│  │  Components │    │ /api/embed   │    │ Local LLM  │  │
│  └─────────────┘    │ /api/rag     │    └────────────┘  │
│                     └──────────────┘                     │
│                                                          │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  Shared Layer: Types / Utils / Constants            │ │
│  └─────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘

Nuxt 4 Full-Stack AI Core Capabilities

Capability Description Use Case
Server Routes File-based API routing, no Express needed AI proxy, Webhooks, BFF
SSR Streaming Server-side streaming HTML output SEO + AI first paint
Edge Runtime Cloudflare/Deno edge deployment Low-latency inference
Nitro Engine Cross-platform server engine Multi-environment unified deployment
Shared Types Frontend-backend type sharing End-to-end type safety
useAsyncData SSR data fetching AI data preloading

Pattern 1: Server Routes + AI API Proxy

Nuxt 4 Server Routes let you write APIs directly in your Nuxt project without a separate backend. This is the foundation of full-stack AI—all AI requests are proxied through Server Routes, avoiding API Key exposure, unifying error handling, and implementing rate limiting.

Basic Server Route Proxy

// server/api/chat.post.ts
import { defineEventHandler, readBody, createError } from 'h3'
import { z } from 'zod'

const chatRequestSchema = z.object({
  messages: z.array(z.object({
    role: z.enum(['user', 'assistant', 'system']),
    content: z.string().max(4000),
  })).min(1).max(50),
  model: z.enum(['gpt-4o', 'gpt-4o-mini', 'claude-sonnet-4-20250514']).default('gpt-4o-mini'),
  temperature: z.number().min(0).max(2).default(0.7),
  maxTokens: z.number().min(1).max(4096).default(2048),
})

const RATE_LIMIT_WINDOW = 60_000
const RATE_LIMIT_MAX = 20
const requestCounts = new Map<string, { count: number; resetAt: number }>()

function checkRateLimit(ip: string): boolean {
  const now = Date.now()
  const record = requestCounts.get(ip)
  if (!record || now > record.resetAt) {
    requestCounts.set(ip, { count: 1, resetAt: now + RATE_LIMIT_WINDOW })
    return true
  }
  if (record.count >= RATE_LIMIT_MAX) {
    return false
  }
  record.count++
  return true
}

export default defineEventHandler(async (event) => {
  const clientIp = getRequestHeader(event, 'x-forwarded-for') || 'unknown'
  if (!checkRateLimit(clientIp)) {
    throw createError({
      statusCode: 429,
      statusMessage: 'Rate limit exceeded. Please try again later.',
    })
  }

  const body = await readBody(event)
  const parsed = chatRequestSchema.safeParse(body)
  if (!parsed.success) {
    throw createError({
      statusCode: 400,
      statusMessage: `Validation error: ${parsed.error.message}`,
    })
  }

  const { messages, model, temperature, maxTokens } = parsed.data

  const apiKey = process.env.OPENAI_API_KEY
  if (!apiKey) {
    throw createError({
      statusCode: 500,
      statusMessage: 'AI service not configured',
    })
  }

  const response = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${apiKey}`,
    },
    body: JSON.stringify({
      model,
      messages,
      temperature,
      max_tokens: maxTokens,
    }),
  })

  if (!response.ok) {
    const errorData = await response.json().catch(() => ({}))
    throw createError({
      statusCode: response.status,
      statusMessage: errorData.error?.message || 'AI service error',
    })
  }

  return response.json()
})

Streaming Server Route

// server/api/chat/stream.post.ts
import { defineEventHandler, readBody, createError, setResponseHeader, sendStream } from 'h3'

export default defineEventHandler(async (event) => {
  const body = await readBody(event)
  const { messages, model = 'gpt-4o-mini' } = body

  setResponseHeader(event, 'Content-Type', 'text/event-stream')
  setResponseHeader(event, 'Cache-Control', 'no-cache')
  setResponseHeader(event, 'Connection', 'keep-alive')
  setResponseHeader(event, 'X-Accel-Buffering', 'no')

  const apiKey = process.env.OPENAI_API_KEY
  if (!apiKey) {
    throw createError({ statusCode: 500, statusMessage: 'AI service not configured' })
  }

  const response = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${apiKey}`,
    },
    body: JSON.stringify({
      model,
      messages,
      stream: true,
    }),
  })

  if (!response.ok) {
    throw createError({
      statusCode: response.status,
      statusMessage: 'AI streaming error',
    })
  }

  const transformStream = new TransformStream({
    transform(chunk, controller) {
      const text = new TextDecoder().decode(chunk)
      const lines = text.split('\n').filter((line) => line.startsWith('data: '))

      for (const line of lines) {
        const data = line.slice(6)
        if (data === '[DONE]') {
          controller.enqueue(new TextEncoder().encode('data: [DONE]\n\n'))
          continue
        }
        try {
          const parsed = JSON.parse(data)
          const content = parsed.choices?.[0]?.delta?.content
          if (content) {
            controller.enqueue(new TextEncoder().encode(`data: ${JSON.stringify({ content })}\n\n`))
          }
        } catch {
          // skip malformed chunks
        }
      }
    },
  })

  const readableStream = response.body!.pipeThrough(transformStream)
  return sendStream(event, readableStream)
})

Shared Type Definitions

// shared/types/ai.ts
export interface ChatMessage {
  id: string
  role: 'user' | 'assistant' | 'system'
  content: string
  timestamp: number
  metadata?: MessageMetadata
}

export interface MessageMetadata {
  model: string
  tokens: number
  latency: number
  finishReason: string
}

export interface ChatRequest {
  messages: Pick<ChatMessage, 'role' | 'content'>[]
  model: AIModel
  temperature?: number
  maxTokens?: number
  stream?: boolean
}

export type AIModel = 'gpt-4o' | 'gpt-4o-mini' | 'claude-sonnet-4-20250514'

export interface StreamChunk {
  content: string
  done: boolean
}

export interface RAGQuery {
  question: string
  topK?: number
  threshold?: number
}

export interface RAGResult {
  answer: string
  sources: RAGSource[]
  confidence: number
}

export interface RAGSource {
  content: string
  metadata: Record<string, unknown>
  score: number
}

Pattern 2: SSR + AI Streaming Render

The core value of combining SSR with AI: search engines can index AI-generated content, and users see AI answers on first paint. Nuxt 4's useAsyncData + Server Routes make SSR AI rendering straightforward.

SSR Data Preloading

// composables/useAIContent.ts
import { useAsyncData, useHead } from '#imports'

interface AIContentOptions {
  prompt: string
  model?: AIModel
  ttl?: number
}

export function useAIContent(options: AIContentOptions) {
  const { prompt, model = 'gpt-4o-mini', ttl = 3600 } = options

  const { data, pending, error, refresh } = useAsyncData(
    `ai-content-${prompt.slice(0, 32)}`,
    () => $fetch<string>('/api/ai/generate', {
      method: 'POST',
      body: { prompt, model },
    }),
    {
      server: true,
      lazy: false,
      getCachedData(key, nuxtApp) {
        const cached = nuxtApp.payload.data[key]
        if (cached) {
          const expirationDate = new Date(cached.expiresAt)
          if (expirationDate.getTime() > Date.now()) {
            return cached.data
          }
        }
        return null
      },
    }
  )

  useHead({
    meta: [
      { name: 'description', content: () => data.value?.slice(0, 160) || '' },
    ],
  })

  return { content: data, pending, error, refresh }
}

SSR AI Page Component

<!-- pages/ai-insights/[topic].vue -->
<script setup lang="ts">
const route = useRoute()
const topic = route.params.topic as string

const { content, pending, error } = useAIContent({
  prompt: `Generate a comprehensive technical insight about ${topic} for developers in 2026`,
  model: 'gpt-4o-mini',
})

useHead({
  title: () => `AI Insights: ${topic} | ToolsKu`,
})
</script>

<template>
  <div class="mx-auto max-w-4xl px-4 py-8">
    <header class="mb-8">
      <h2 class="text-3xl font-bold text-gray-900">
        AI Insights: {{ topic }}
      </h2>
      <p class="mt-2 text-gray-500">
        AI-generated analysis, verified and curated for developers
      </p>
    </header>

    <div v-if="pending" class="space-y-4">
      <div class="h-8 w-3/4 animate-pulse rounded bg-gray-200" />
      <div class="h-8 w-1/2 animate-pulse rounded bg-gray-200" />
      <div class="h-8 w-2/3 animate-pulse rounded bg-gray-200" />
    </div>

    <div v-else-if="error" class="rounded-lg bg-red-50 p-4">
      <p class="text-red-700">Failed to generate AI content. Please try again.</p>
    </div>

    <article v-else class="prose prose-lg max-w-none">
      <div v-html="content" />
    </article>
  </div>
</template>

SSR Cache Middleware

// server/middleware/ai-cache.ts
import { defineEventHandler, setResponseHeader } from 'h3'
import { useStorage } from '#imports'

const aiCache = useStorage('ai-cache')

export default defineEventHandler(async (event) => {
  if (!event.path.startsWith('/api/ai/')) return

  const cacheKey = `ssr-ai:${event.path}:${JSON.stringify(await readBody(event).catch(() => ({})))}`
  const cached = await aiCache.getItem<{ data: string; expiresAt: number }>(cacheKey)

  if (cached && cached.expiresAt > Date.now()) {
    setResponseHeader(event, 'X-AI-Cache', 'HIT')
    return cached.data
  }

  setResponseHeader(event, 'X-AI-Cache', 'MISS')
})

Pattern 3: Streaming Chat UI Component

Streaming chat is the core interaction of AI applications. This pattern implements a complete, production-grade streaming chat component with Markdown rendering, code highlighting, interrupt generation, and message retry.

Streaming Chat Composable

// composables/useStreamingChat.ts
import { ref, computed } from 'vue'
import type { ChatMessage, StreamChunk, AIModel } from '~/shared/types/ai'

interface UseStreamingChatOptions {
  apiEndpoint?: string
  defaultModel?: AIModel
  maxRetries?: number
}

export function useStreamingChat(options: UseStreamingChatOptions = {}) {
  const {
    apiEndpoint = '/api/chat/stream',
    defaultModel = 'gpt-4o-mini',
    maxRetries = 2,
  } = options

  const messages = ref<ChatMessage[]>([])
  const currentStreamContent = ref('')
  const isStreaming = ref(false)
  const error = ref<string | null>(null)
  const selectedModel = ref<AIModel>(defaultModel)
  let abortController: AbortController | null = null

  const displayedMessages = computed(() => {
    const base = [...messages.value]
    if (isStreaming.value && currentStreamContent.value) {
      base.push({
        id: 'streaming',
        role: 'assistant',
        content: currentStreamContent.value,
        timestamp: Date.now(),
      })
    }
    return base
  })

  async function sendMessage(content: string) {
    const userMessage: ChatMessage = {
      id: crypto.randomUUID(),
      role: 'user',
      content,
      timestamp: Date.now(),
    }
    messages.value.push(userMessage)
    error.value = null
    currentStreamContent.value = ''
    isStreaming.value = true
    abortController = new AbortController()

    let retryCount = 0

    const attemptStream = async (): Promise<void> => {
      try {
        const response = await fetch(apiEndpoint, {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({
            messages: messages.value.map((m) => ({
              role: m.role,
              content: m.content,
            })),
            model: selectedModel.value,
          }),
          signal: abortController!.signal,
        })

        if (!response.ok) {
          throw new Error(`HTTP ${response.status}: ${response.statusText}`)
        }

        const reader = response.body!.getReader()
        const decoder = new TextDecoder()
        let buffer = ''

        while (true) {
          const { done, value } = await reader.read()
          if (done) break

          buffer += decoder.decode(value, { stream: true })
          const lines = buffer.split('\n')
          buffer = lines.pop() || ''

          for (const line of lines) {
            if (!line.startsWith('data: ')) continue
            const data = line.slice(6)
            if (data === '[DONE]') {
              finalizeStream()
              return
            }
            try {
              const chunk: StreamChunk = JSON.parse(data)
              currentStreamContent.value += chunk.content
            } catch {
              // skip malformed chunks
            }
          }
        }

        finalizeStream()
      } catch (err: any) {
        if (err.name === 'AbortError') return
        if (retryCount < maxRetries) {
          retryCount++
          return attemptStream()
        }
        error.value = err.message
        isStreaming.value = false
      }
    }

    await attemptStream()
  }

  function finalizeStream() {
    if (currentStreamContent.value) {
      messages.value.push({
        id: crypto.randomUUID(),
        role: 'assistant',
        content: currentStreamContent.value,
        timestamp: Date.now(),
      })
    }
    currentStreamContent.value = ''
    isStreaming.value = false
  }

  function stopStreaming() {
    abortController?.abort()
    finalizeStream()
  }

  function retryLastMessage() {
    const lastUserIndex = messages.value.findLastIndex((m) => m.role === 'user')
    if (lastUserIndex === -1) return
    const lastUserContent = messages.value[lastUserIndex].content
    messages.value = messages.value.slice(0, lastUserIndex)
    sendMessage(lastUserContent)
  }

  function clearMessages() {
    messages.value = []
    currentStreamContent.value = ''
    error.value = null
  }

  return {
    messages: displayedMessages,
    isStreaming,
    error,
    selectedModel,
    sendMessage,
    stopStreaming,
    retryLastMessage,
    clearMessages,
  }
}

Chat UI Component

<!-- components/AIChatWindow.vue -->
<script setup lang="ts">
import { useStreamingChat } from '~/composables/useStreamingChat'
import { useConversationStore } from '~/stores/conversation'

const props = defineProps<{
  conversationId?: string
}>()

const {
  messages,
  isStreaming,
  error,
  selectedModel,
  sendMessage,
  stopStreaming,
  retryLastMessage,
  clearMessages,
} = useStreamingChat()

const conversationStore = useConversationStore()
const inputText = ref('')
const messagesContainer = ref<HTMLElement>()

const modelOptions = [
  { label: 'GPT-4o', value: 'gpt-4o' as const },
  { label: 'GPT-4o Mini', value: 'gpt-4o-mini' as const },
  { label: 'Claude Sonnet 4', value: 'claude-sonnet-4-20250514' as const },
]

async function handleSubmit() {
  const text = inputText.value.trim()
  if (!text || isStreaming.value) return
  inputText.value = ''
  await sendMessage(text)
  if (props.conversationId) {
    conversationStore.saveConversation(props.conversationId, messages.value)
  }
  scrollToBottom()
}

function scrollToBottom() {
  nextTick(() => {
    if (messagesContainer.value) {
      messagesContainer.value.scrollTop = messagesContainer.value.scrollHeight
    }
  })
}

watch(messages, () => scrollToBottom(), { deep: true })
</script>

<template>
  <div class="flex h-full flex-col rounded-xl border border-gray-200 bg-white shadow-sm">
    <header class="flex items-center justify-between border-b border-gray-200 px-4 py-3">
      <div class="flex items-center gap-3">
        <span class="text-sm font-medium text-gray-700">AI Chat</span>
        <select
          v-model="selectedModel"
          class="rounded-md border border-gray-300 px-2 py-1 text-xs text-gray-600"
          :disabled="isStreaming"
        >
          <option v-for="opt in modelOptions" :key="opt.value" :value="opt.value">
            {{ opt.label }}
          </option>
        </select>
      </div>
      <div class="flex gap-2">
        <button
          class="rounded-md px-2 py-1 text-xs text-gray-500 hover:bg-gray-100"
          @click="retryLastMessage"
          :disabled="isStreaming || messages.length === 0"
        >
          Retry
        </button>
        <button
          class="rounded-md px-2 py-1 text-xs text-red-500 hover:bg-red-50"
          @click="clearMessages"
          :disabled="isStreaming"
        >
          Clear
        </button>
      </div>
    </header>

    <div ref="messagesContainer" class="flex-1 overflow-y-auto p-4 space-y-4">
      <div
        v-for="msg in messages"
        :key="msg.id"
        :class="[
          'max-w-[80%] rounded-lg px-4 py-2.5 text-sm',
          msg.role === 'user'
            ? 'ml-auto bg-blue-600 text-white'
            : 'mr-auto bg-gray-100 text-gray-900',
        ]"
      >
        <div v-if="msg.role === 'assistant'" class="prose prose-sm max-w-none" v-html="renderMarkdown(msg.content)" />
        <p v-else>{{ msg.content }}</p>
      </div>

      <div v-if="error" class="mx-auto max-w-md rounded-lg bg-red-50 p-3 text-center text-sm text-red-600">
        {{ error }}
        <button class="ml-2 underline" @click="retryLastMessage">Retry</button>
      </div>
    </div>

    <footer class="border-t border-gray-200 p-3">
      <div class="flex gap-2">
        <input
          v-model="inputText"
          type="text"
          placeholder="Type your message..."
          class="flex-1 rounded-lg border border-gray-300 px-3 py-2 text-sm focus:border-blue-500 focus:outline-none focus:ring-1 focus:ring-blue-500"
          @keydown.enter="handleSubmit"
          :disabled="isStreaming"
        />
        <button
          v-if="isStreaming"
          class="rounded-lg bg-red-500 px-4 py-2 text-sm font-medium text-white hover:bg-red-600"
          @click="stopStreaming"
        >
          Stop
        </button>
        <button
          v-else
          class="rounded-lg bg-blue-600 px-4 py-2 text-sm font-medium text-white hover:bg-blue-700"
          @click="handleSubmit"
          :disabled="!inputText.trim()"
        >
          Send
        </button>
      </div>
    </footer>
  </div>
</template>

Pattern 4: Edge Inference with Cloudflare Workers

Edge inference is a key trend for AI applications in 2026—deploying inference logic to the nearest edge node reduces latency from hundreds of milliseconds to single digits. Nuxt 4 + Nitro makes deploying to Cloudflare Workers incredibly simple.

Nitro Edge Configuration

// nuxt.config.ts
export default defineNuxtConfig({
  future: {
    compatibilityVersion: 4,
  },
  nitro: {
    preset: 'cloudflare-module',
    runtimeConfig: {
      aiApiKey: process.env.OPENAI_API_KEY,
      aiBaseUrl: process.env.AI_BASE_URL || 'https://api.openai.com/v1',
    },
    routeRules: {
      '/api/ai/**': {
        cors: true,
        headers: {
          'cache-control': 'no-cache',
        },
      },
    },
  },
})

Edge Inference Server Route

// server/api/ai/edge-chat.post.ts
import { defineEventHandler, readBody, setResponseHeader, sendStream } from 'h3'

interface EdgeAIConfig {
  provider: 'openai' | 'anthropic' | 'local'
  baseUrl: string
  apiKey: string
}

function getAIConfig(event: any): EdgeAIConfig {
  const config = useRuntimeConfig(event)
  const provider = getHeader(event, 'x-ai-provider') || 'openai'

  const configs: Record<string, EdgeAIConfig> = {
    openai: {
      provider: 'openai',
      baseUrl: config.public.aiBaseUrl || 'https://api.openai.com/v1',
      apiKey: config.aiApiKey,
    },
    anthropic: {
      provider: 'anthropic',
      baseUrl: 'https://api.anthropic.com/v1',
      apiKey: process.env.ANTHROPIC_API_KEY || '',
    },
    local: {
      provider: 'local',
      baseUrl: process.env.LOCAL_AI_URL || 'http://localhost:11434/v1',
      apiKey: 'local',
    },
  }

  return configs[provider] || configs.openai
}

export default defineEventHandler(async (event) => {
  const body = await readBody(event)
  const { messages, model = 'gpt-4o-mini' } = body
  const aiConfig = getAIConfig(event)

  setResponseHeader(event, 'Content-Type', 'text/event-stream')
  setResponseHeader(event, 'Cache-Control', 'no-cache')
  setResponseHeader(event, 'Connection', 'keep-alive')

  const endpoint = aiConfig.provider === 'anthropic'
    ? `${aiConfig.baseUrl}/messages`
    : `${aiConfig.baseUrl}/chat/completions`

  const requestHeaders: Record<string, string> = {
    'Content-Type': 'application/json',
  }

  if (aiConfig.provider === 'anthropic') {
    requestHeaders['x-api-key'] = aiConfig.apiKey
    requestHeaders['anthropic-version'] = '2023-06-01'
  } else {
    requestHeaders['Authorization'] = `Bearer ${aiConfig.apiKey}`
  }

  const requestBody = aiConfig.provider === 'anthropic'
    ? {
        model,
        messages: messages.map((m: any) => ({ role: m.role, content: m.content })),
        max_tokens: 2048,
        stream: true,
      }
    : {
        model,
        messages: messages.map((m: any) => ({ role: m.role, content: m.content })),
        stream: true,
      }

  const response = await fetch(endpoint, {
    method: 'POST',
    headers: requestHeaders,
    body: JSON.stringify(requestBody),
  })

  if (!response.ok) {
    throw createError({
      statusCode: response.status,
      statusMessage: `Edge AI error: ${response.statusText}`,
    })
  }

  return sendStream(event, response.body!)
})

Wrangler Deployment Configuration

# wrangler.toml
name = "toolsku-ai-edge"
main = ".output/server/index.mjs"
compatibility_date = "2026-06-01"
compatibility_flags = ["nodejs_compat"]

[vars]
AI_BASE_URL = "https://api.openai.com/v1"

[ai]
binding = "AI"

[[r2_buckets]]
binding = "AI_CACHE"
bucket_name = "toolsku-ai-cache"

[observability]
enabled = true

Pattern 5: RAG Frontend Interaction Design

RAG (Retrieval-Augmented Generation) is one of the most practical patterns for AI applications. The frontend needs to handle the complete interaction chain: document upload, vector search, and result display.

RAG Query Composable

// composables/useRAG.ts
import { ref, computed } from 'vue'
import type { RAGQuery, RAGResult, RAGSource } from '~/shared/types/ai'

interface DocumentChunk {
  id: string
  content: string
  metadata: {
    source: string
    page: number
    section: string
  }
  score: number
}

export function useRAG() {
  const query = ref('')
  const isSearching = ref(false)
  const isGenerating = ref(false)
  const searchResults = ref<DocumentChunk[]>([])
  const ragAnswer = ref('')
  const ragSources = ref<RAGSource[]>([])
  const error = ref<string | null>(null)

  const hasResults = computed(() => searchResults.value.length > 0)
  const isProcessing = computed(() => isSearching.value || isGenerating.value)

  async function searchDocuments(searchQuery: string, topK = 5) {
    isSearching.value = true
    error.value = null
    searchResults.value = []

    try {
      const results = await $fetch<DocumentChunk[]>('/api/rag/search', {
        method: 'POST',
        body: { query: searchQuery, topK },
      })
      searchResults.value = results
    } catch (err: any) {
      error.value = err.data?.message || 'Search failed'
    } finally {
      isSearching.value = false
    }
  }

  async function generateAnswer(searchQuery: string) {
    isGenerating.value = true
    ragAnswer.value = ''
    ragSources.value = []

    try {
      const result = await $fetch<RAGResult>('/api/rag/generate', {
        method: 'POST',
        body: {
          question: searchQuery,
          topK: 5,
          threshold: 0.7,
        } satisfies RAGQuery,
      })
      ragAnswer.value = result.answer
      ragSources.value = result.sources
    } catch (err: any) {
      error.value = err.data?.message || 'Generation failed'
    } finally {
      isGenerating.value = false
    }
  }

  async function fullRAGPipeline(searchQuery: string) {
    await searchDocuments(searchQuery)
    if (searchResults.value.length > 0) {
      await generateAnswer(searchQuery)
    }
  }

  return {
    query,
    isSearching,
    isGenerating,
    isProcessing,
    searchResults,
    ragAnswer,
    ragSources,
    hasResults,
    error,
    searchDocuments,
    generateAnswer,
    fullRAGPipeline,
  }
}

RAG Interaction Component

<!-- components/RAGSearchPanel.vue -->
<script setup lang="ts">
import { useRAG } from '~/composables/useRAG'

const {
  query,
  isProcessing,
  searchResults,
  ragAnswer,
  ragSources,
  error,
  fullRAGPipeline,
} = useRAG()

const showSources = ref(false)

async function handleSearch() {
  if (!query.value.trim()) return
  await fullRAGPipeline(query.value)
}
</script>

<template>
  <div class="mx-auto max-w-3xl space-y-6">
    <div class="flex gap-2">
      <input
        v-model="query"
        type="text"
        placeholder="Ask about your documents..."
        class="flex-1 rounded-lg border border-gray-300 px-4 py-2.5 text-sm focus:border-blue-500 focus:outline-none focus:ring-1 focus:ring-blue-500"
        @keydown.enter="handleSearch"
        :disabled="isProcessing"
      />
      <button
        class="rounded-lg bg-blue-600 px-6 py-2.5 text-sm font-medium text-white hover:bg-blue-700 disabled:opacity-50"
        @click="handleSearch"
        :disabled="isProcessing || !query.trim()"
      >
        {{ isProcessing ? 'Searching...' : 'Search' }}
      </button>
    </div>

    <div v-if="error" class="rounded-lg bg-red-50 p-4 text-sm text-red-600">
      {{ error }}
    </div>

    <div v-if="ragAnswer" class="rounded-lg border border-gray-200 bg-white p-6 shadow-sm">
      <h3 class="mb-3 text-sm font-semibold text-gray-500 uppercase tracking-wide">AI Answer</h3>
      <div class="prose prose-sm max-w-none" v-html="ragAnswer" />
      <button
        class="mt-4 text-xs text-blue-600 hover:underline"
        @click="showSources = !showSources"
      >
        {{ showSources ? 'Hide' : 'Show' }} Sources ({{ ragSources.length }})
      </button>
    </div>

    <div v-if="showSources && ragSources.length" class="space-y-3">
      <h3 class="text-sm font-semibold text-gray-500 uppercase tracking-wide">Sources</h3>
      <div
        v-for="(source, index) in ragSources"
        :key="index"
        class="rounded-lg border border-gray-200 bg-gray-50 p-4"
      >
        <div class="mb-2 flex items-center justify-between">
          <span class="text-xs font-medium text-gray-500">
            Relevance: {{ (source.score * 100).toFixed(1) }}%
          </span>
        </div>
        <p class="text-sm text-gray-700 line-clamp-3">{{ source.content }}</p>
      </div>
    </div>

    <div v-if="searchResults.length && !ragAnswer" class="space-y-3">
      <h3 class="text-sm font-semibold text-gray-500 uppercase tracking-wide">Matching Chunks</h3>
      <div
        v-for="chunk in searchResults"
        :key="chunk.id"
        class="rounded-lg border border-gray-200 bg-white p-4"
      >
        <div class="mb-2 flex items-center gap-2">
          <span class="rounded bg-blue-100 px-2 py-0.5 text-xs font-medium text-blue-700">
            {{ chunk.metadata.source }}
          </span>
          <span class="text-xs text-gray-400">Page {{ chunk.metadata.page }}</span>
        </div>
        <p class="text-sm text-gray-700">{{ chunk.content }}</p>
      </div>
    </div>
  </div>
</template>

Pattern 6: Conversation State with Pinia Persistence

One of the core challenges of AI chat applications is state management—conversation history, model selection, and user preferences all need persistence. Pinia + Nuxt 4's SSR-compatible solution makes this straightforward.

Conversation Store

// stores/conversation.ts
import { defineStore } from 'pinia'
import type { ChatMessage, AIModel } from '~/shared/types/ai'

interface Conversation {
  id: string
  title: string
  messages: ChatMessage[]
  model: AIModel
  createdAt: number
  updatedAt: number
}

interface ConversationState {
  conversations: Map<string, Conversation>
  activeConversationId: string | null
  preferences: {
    defaultModel: AIModel
    temperature: number
    systemPrompt: string
    streamByDefault: boolean
  }
}

export const useConversationStore = defineStore('conversation', {
  state: (): ConversationState => ({
    conversations: new Map(),
    activeConversationId: null,
    preferences: {
      defaultModel: 'gpt-4o-mini',
      temperature: 0.7,
      systemPrompt: 'You are a helpful assistant.',
      streamByDefault: true,
    },
  }),

  getters: {
    activeConversation(state): Conversation | undefined {
      if (!state.activeConversationId) return undefined
      return state.conversations.get(state.activeConversationId)
    },
    conversationList(state): Conversation[] {
      return Array.from(state.conversations.values())
        .sort((a, b) => b.updatedAt - a.updatedAt)
    },
    messageCount(state): number {
      return (id: string) => state.conversations.get(id)?.messages.length || 0
    },
  },

  actions: {
    createConversation(title?: string): string {
      const id = crypto.randomUUID()
      const conversation: Conversation = {
        id,
        title: title || `Chat ${this.conversations.size + 1}`,
        messages: [],
        model: this.preferences.defaultModel,
        createdAt: Date.now(),
        updatedAt: Date.now(),
      }
      this.conversations.set(id, conversation)
      this.activeConversationId = id
      this.persist()
      return id
    },

    saveConversation(id: string, messages: ChatMessage[]) {
      const conversation = this.conversations.get(id)
      if (!conversation) return
      conversation.messages = messages
      conversation.updatedAt = Date.now()
      if (messages.length > 0 && messages[0].role === 'user') {
        conversation.title = messages[0].content.slice(0, 50)
      }
      this.persist()
    },

    deleteConversation(id: string) {
      this.conversations.delete(id)
      if (this.activeConversationId === id) {
        const remaining = this.conversationList
        this.activeConversationId = remaining.length > 0 ? remaining[0].id : null
      }
      this.persist()
    },

    setActiveConversation(id: string) {
      this.activeConversationId = id
    },

    updatePreferences(prefs: Partial<ConversationState['preferences']>) {
      this.preferences = { ...this.preferences, ...prefs }
      this.persist()
    },

    persist() {
      if (import.meta.client) {
        const data = {
          conversations: Object.fromEntries(this.conversations),
          activeConversationId: this.activeConversationId,
          preferences: this.preferences,
        }
        localStorage.setItem('toolsku-ai-conversations', JSON.stringify(data))
      }
    },

    hydrate() {
      if (import.meta.client) {
        const stored = localStorage.getItem('toolsku-ai-conversations')
        if (stored) {
          try {
            const data = JSON.parse(stored)
            this.conversations = new Map(Object.entries(data.conversations))
            this.activeConversationId = data.activeConversationId
            this.preferences = data.preferences
          } catch {
            localStorage.removeItem('toolsku-ai-conversations')
          }
        }
      }
    },
  },
})

SSR-Safe Initialization Plugin

// plugins/conversation-init.client.ts
import { useConversationStore } from '~/stores/conversation'

export default defineNuxtPlugin(() => {
  const store = useConversationStore()
  store.hydrate()
})

Pattern 7: Production Deployment & Performance Optimization

From development to production, Nuxt 4 full-stack AI applications need attention to performance, security, and observability.

Production Configuration

// nuxt.config.ts (production)
export default defineNuxtConfig({
  future: {
    compatibilityVersion: 4,
  },

  nitro: {
    compressPublicAssets: true,
    minify: true,

    routeRules: {
      '/api/ai/**': {
        cors: false,
        headers: {
          'strict-transport-security': 'max-age=31536000; includeSubDomains',
          'x-content-type-options': 'nosniff',
          'x-frame-options': 'DENY',
        },
      },
      '/api/chat/stream': {
        headers: {
          'cache-control': 'no-cache, no-store, must-revalidate',
          'x-accel-buffering': 'no',
        },
      },
    },

    rollupConfig: {
      external: ['sharp', 'canvas'],
    },
  },

  app: {
    head: {
      meta: [
        { 'http-equiv': 'X-UA-Compatible', content: 'IE=edge' },
        { name: 'viewport', content: 'width=device-width, initial-scale=1' },
      ],
    },
  },

  experimental: {
    payloadExtraction: true,
    renderJsonPayloads: true,
  },

  vite: {
    build: {
      rollupOptions: {
        output: {
          manualChunks: {
            'ai-vendor': ['openai'],
            'markdown': ['marked', 'highlight.js'],
          },
        },
      },
    },
  },
})

Performance Monitoring Composable

// composables/useAIPerformance.ts
import { ref, computed } from 'vue'

interface PerformanceMetric {
  name: string
  startTime: number
  endTime: number
  duration: number
  metadata?: Record<string, unknown>
}

export function useAIPerformance() {
  const metrics = ref<PerformanceMetric[]>([])
  const activeTimers = new Map<string, number>()

  function startTimer(name: string) {
    activeTimers.set(name, performance.now())
  }

  function endTimer(name: string, metadata?: Record<string, unknown>) {
    const startTime = activeTimers.get(name)
    if (startTime === undefined) return
    const endTime = performance.now()
    metrics.value.push({
      name,
      startTime,
      endTime,
      duration: endTime - startTime,
      metadata,
    })
    activeTimers.delete(name)
  }

  const averageLatency = computed(() => {
    const chatMetrics = metrics.value.filter((m) => m.name === 'ai-response')
    if (chatMetrics.length === 0) return 0
    return chatMetrics.reduce((sum, m) => sum + m.duration, 0) / chatMetrics.length
  })

  const p95Latency = computed(() => {
    const chatMetrics = metrics.value
      .filter((m) => m.name === 'ai-response')
      .sort((a, b) => a.duration - b.duration)
    if (chatMetrics.length < 2) return 0
    const index = Math.ceil(chatMetrics.length * 0.95) - 1
    return chatMetrics[index].duration
  })

  function getReport() {
    return {
      totalRequests: metrics.value.filter((m) => m.name === 'ai-response').length,
      averageLatency: averageLatency.value,
      p95Latency: p95Latency.value,
      errorRate: metrics.value.filter((m) => m.metadata?.error).length / Math.max(metrics.value.length, 1),
    }
  }

  return { metrics, startTimer, endTimer, averageLatency, p95Latency, getReport }
}

Health Check Endpoint

// server/api/health.get.ts
import { defineEventHandler } from 'h3'

export default defineEventHandler(async () => {
  const checks: Record<string, { status: 'ok' | 'error'; latency?: number; error?: string }> = {}

  const aiStart = Date.now()
  try {
    const apiKey = process.env.OPENAI_API_KEY
    if (!apiKey) throw new Error('API key not configured')
    await fetch('https://api.openai.com/v1/models', {
      headers: { Authorization: `Bearer ${apiKey}` },
      signal: AbortSignal.timeout(5000),
    })
    checks.ai = { status: 'ok', latency: Date.now() - aiStart }
  } catch (err: any) {
    checks.ai = { status: 'error', error: err.message }
  }

  const overallStatus = Object.values(checks).every((c) => c.status === 'ok') ? 'ok' : 'degraded'

  return {
    status: overallStatus,
    timestamp: new Date().toISOString(),
    version: process.env.APP_VERSION || 'unknown',
    checks,
  }
})

5 Common Pitfalls and Solutions

Pitfall 1: Accessing localStorage During SSR Causes Hydration Mismatch

Problem: Pinia persistence reads localStorage during SSR, causing data inconsistency during client hydration.

Solution:

// Only execute persistence logic on the client
if (import.meta.client) {
  store.hydrate()
}

// Or use useCookie instead of localStorage
const savedData = useCookie('ai-conversations', {
  maxAge: 60 * 60 * 24 * 30,
  sameSite: 'lax',
})

Pitfall 2: Streaming Responses Buffered by Nginx/CDN

Problem: SSE streaming responses are buffered by intermediate layers, users don't see character-by-character output.

Solution:

# nginx.conf
location /api/chat/stream {
    proxy_pass http://nuxt_backend;
    proxy_buffering off;
    proxy_cache off;
    proxy_set_header Connection '';
    proxy_http_version 1.1;
    chunked_transfer_encoding off;
}

Pitfall 3: Exposing API Key in Server Routes

Problem: AI API Key leaked in frontend code or error responses.

Solution:

// server/api/chat.post.ts
// Never return API Key in responses
const apiKey = useRuntimeConfig().aiApiKey // Server-side private config

// Error responses must also filter sensitive information
catch (err: any) {
  throw createError({
    statusCode: 500,
    statusMessage: 'AI service error', // Don't expose err.message
  })
}

Pitfall 4: Edge Runtime Doesn't Support Node.js APIs

Problem: Cloudflare Workers don't support fs, path, and other Node.js modules.

Solution:

// Use Nitro's auto-import detection
// server/api/ai/edge-chat.post.ts
// Avoid Node.js APIs, use Web standard APIs

// Wrong: import { readFile } from 'fs'
// Right: Use env variables or KV storage

export default defineEventHandler(async (event) => {
  const config = useRuntimeConfig(event)
  // Use config instead of reading files
})

Pitfall 5: Excessive Conversation History Exceeds Token Limits

Problem: Sending full conversation history to AI API exceeds model token limits.

Solution:

// utils/message-trimmer.ts
import type { ChatMessage } from '~/shared/types/ai'

const MAX_CONTEXT_TOKENS = 4096
const CHARS_PER_TOKEN = 4

export function trimMessages(messages: ChatMessage[], maxTokens = MAX_CONTEXT_TOKENS): ChatMessage[] {
  let totalTokens = 0
  const trimmed: ChatMessage[] = []

  for (let i = messages.length - 1; i >= 0; i--) {
    const estimatedTokens = Math.ceil(messages[i].content.length / CHARS_PER_TOKEN)
    if (totalTokens + estimatedTokens > maxTokens) break
    totalTokens += estimatedTokens
    trimmed.unshift(messages[i])
  }

  if (trimmed[0]?.role !== 'system' && messages[0]?.role === 'system') {
    trimmed.unshift(messages[0])
  }

  return trimmed
}

10 Common Error Troubleshooting

# Error Message Cause Solution
1 Hydration mismatch SSR/CSR data inconsistency Ensure localStorage operations are in import.meta.client
2 429 Too Many Requests AI API rate limiting Implement request queue and exponential backoff retry
3 fetch failed in Server Routes Cannot access external API during SSR Check server network and DNS configuration
4 CORS error Cross-origin request blocked Use Server Routes proxy instead of direct calls
5 Stream interrupted Connection timeout or interruption Implement resumption and auto-reconnect
6 context_length_exceeded Conversation history too long Use trimMessages to trim context
7 Invalid API Key Environment variable not set Check .env and runtimeConfig configuration
8 Worker exceeded CPU time limit Edge runtime timeout Optimize inference logic, use streaming responses
9 Module not found: fs Edge environment doesn't support Node modules Use Web standard APIs instead of Node.js APIs
10 Pinia store not initialized Store not ready during SSR Use callOnce or onNuxtReady to initialize

Advanced Optimization Techniques

1. AI Response Caching Strategy

// server/utils/ai-cache.ts
import { useStorage } from '#imports'

const cache = useStorage('redis')

interface CacheEntry<T> {
  data: T
  expiresAt: number
  hitCount: number
}

export async function getCachedAIResponse<T>(
  key: string,
  generator: () => Promise<T>,
  ttl = 3600
): Promise<T> {
  const cached = await cache.getItem<CacheEntry<T>>(`ai:${key}`)
  if (cached && cached.expiresAt > Date.now()) {
    cached.hitCount++
    await cache.setItem(`ai:${key}`, cached)
    return cached.data
  }

  const data = await generator()
  await cache.setItem(`ai:${key}`, {
    data,
    expiresAt: Date.now() + ttl * 1000,
    hitCount: 0,
  })
  return data
}

2. Multi-Model Intelligent Routing

// server/utils/model-router.ts
import type { AIModel } from '~/shared/types/ai'

interface ModelRoute {
  model: AIModel
  condition: (messages: any[]) => boolean
  priority: number
}

const routes: ModelRoute[] = [
  {
    model: 'gpt-4o',
    condition: (msgs) => msgs.length > 20 || msgs.some((m) => m.content.length > 2000),
    priority: 10,
  },
  {
    model: 'gpt-4o-mini',
    condition: () => true,
    priority: 1,
  },
]

export function selectModel(messages: any[]): AIModel {
  const matched = routes
    .filter((r) => r.condition(messages))
    .sort((a, b) => b.priority - a.priority)
  return matched[0]?.model || 'gpt-4o-mini'
}

3. Request Deduplication and Batching

// server/utils/request-dedup.ts
const pendingRequests = new Map<string, Promise<any>>()

export async function deduplicatedFetch<T>(
  key: string,
  fetcher: () => Promise<T>
): Promise<T> {
  const pending = pendingRequests.get(key)
  if (pending) return pending as Promise<T>

  const promise = fetcher().finally(() => {
    pendingRequests.delete(key)
  })
  pendingRequests.set(key, promise)
  return promise
}

Comparison: Nuxt 4 vs Next.js 15 vs SvelteKit

Dimension Nuxt 4 Next.js 15 SvelteKit
Full-Stack AI Server Routes + Nitro Route Handlers + Edge Server Endpoints
SSR Streaming Native support App Router support Supported but smaller ecosystem
Edge Deployment Cloudflare/Vercel/Deno Vercel Edge first Cloudflare adapter
Type Safety Shared Types across stack Manual configuration needed Built-in but different mechanism
State Management Pinia (official) Third-party required Built-in Stores
Learning Curve Vue developer friendly React ecosystem Svelte syntax unique
AI Ecosystem Vercel AI SDK compatible Vercel AI SDK native Community adaptation
Streaming UI Custom Composable useChat/useCompletion Community solutions
Bundle Size Medium Larger Smallest
Developer Experience Auto-import + HMR Turbopack HMR Vite HMR

Selection Guide:

  • Vue teams → Nuxt 4: Zero learning curve, complete full-stack capabilities
  • React teams → Next.js 15: Most mature AI SDK ecosystem
  • Pursuing extreme performance → SvelteKit: Smallest bundle, but AI ecosystem needs improvement

When developing Vue3 + Nuxt 4 full-stack AI applications, these online tools can help boost your productivity:


Summary

Vue3 + Nuxt 4 full-stack AI applications are fully mature in 2026. The 7 production patterns cover the complete chain from API proxying to edge inference:

  1. Server Routes are the foundation of full-stack AI—no backend needed, API Key secure
  2. SSR + AI lets search engines index AI content, SEO and AI together
  3. Streaming Chat is the core interaction of AI apps, must support interrupt and retry
  4. Edge Inference reduces latency to single digits, Cloudflare Workers is the first choice
  5. RAG Frontend needs a complete search-generate-display interaction chain
  6. Pinia Persistence keeps conversation state across pages and sessions
  7. Production Deployment focuses on security, performance, and observability

Nuxt 4 gives Vue developers full-stack AI capabilities that rival Next.js for the first time—one codebase, one deployment, full-stack AI.


Related Posts:

External References:

Try these browser-local tools — no sign-up required →

#Vue3#Nuxt 4#全栈AI#SSR#边缘推理#Server Routes#2026#前端工程