Vue3 + Nuxt 4 Full-Stack AI App: 7 Production Patterns from SSR to Edge Inference

Still building AI apps with separate frontend and backend? Dealing with CORS, deploying two systems? Nuxt 4's Server Routes + SSR + Edge Runtime make Vue3 full-stack AI apps a reality—one codebase, one deployment, SSR-rendered AI content. This article dives into 7 production-grade patterns, from Server Routes proxying to edge inference, every line of code ready for production.

Key Takeaways

Master Nuxt 4 Server Routes for building AI API proxies
Implement SSR + AI streaming render first-paint optimization
Build production-grade streaming chat UI components
Deploy edge inference to Cloudflare Workers
Design RAG frontend interaction experiences
Pinia persistent conversation state with cross-page recovery
Production performance optimization and deployment best practices

Nuxt 4 Full-Stack AI Architecture Overview
Pattern 1: Server Routes + AI API Proxy
Pattern 2: SSR + AI Streaming Render
Pattern 3: Streaming Chat UI Component
Pattern 4: Edge Inference with Cloudflare Workers
Pattern 5: RAG Frontend Interaction Design
Pattern 6: Conversation State with Pinia Persistence
Pattern 7: Production Deployment & Performance Optimization
5 Common Pitfalls and Solutions
10 Common Error Troubleshooting
Advanced Optimization Techniques
Comparison: Nuxt 4 vs Next.js 15 vs SvelteKit
Recommended Online Tools
Summary

Nuxt 4 Full-Stack AI Architecture Overview

Nuxt 4 in 2026 brings critical capabilities for full-stack AI applications: Server Routes as backend API layer, SSR streaming rendering, Edge Runtime support, and native TypeScript end-to-end type safety.

┌──────────────────────────────────────────────────────────┐
│                   Nuxt 4 Full-Stack AI Architecture      │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  ┌─────────────┐    ┌──────────────┐    ┌────────────┐  │
│  │   Browser   │───▶│  Nuxt SSR    │───▶│  Edge /    │  │
│  │   Client    │◀───│  Server      │◀───│  Node      │  │
│  └──────┬──────┘    └──────┬───────┘    └─────┬──────┘  │
│         │                  │                   │         │
│  ┌──────▼──────┐    ┌──────▼───────┐    ┌─────▼──────┐  │
│  │  Vue3       │    │ Server       │    │ AI Models  │  │
│  │  Composables│    │ Routes       │    │ OpenAI     │  │
│  │  Pinia      │    │ /api/chat    │    │ Anthropic  │  │
│  │  Components │    │ /api/embed   │    │ Local LLM  │  │
│  └─────────────┘    │ /api/rag     │    └────────────┘  │
│                     └──────────────┘                     │
│                                                          │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  Shared Layer: Types / Utils / Constants            │ │
│  └─────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘

Nuxt 4 Full-Stack AI Core Capabilities

Capability	Description	Use Case
Server Routes	File-based API routing, no Express needed	AI proxy, Webhooks, BFF
SSR Streaming	Server-side streaming HTML output	SEO + AI first paint
Edge Runtime	Cloudflare/Deno edge deployment	Low-latency inference
Nitro Engine	Cross-platform server engine	Multi-environment unified deployment
Shared Types	Frontend-backend type sharing	End-to-end type safety
useAsyncData	SSR data fetching	AI data preloading

Pattern 1: Server Routes + AI API Proxy

Nuxt 4 Server Routes let you write APIs directly in your Nuxt project without a separate backend. This is the foundation of full-stack AI—all AI requests are proxied through Server Routes, avoiding API Key exposure, unifying error handling, and implementing rate limiting.

Basic Server Route Proxy

// server/api/chat.post.ts
import { defineEventHandler, readBody, createError } from 'h3'
import { z } from 'zod'

const chatRequestSchema = z.object({
  messages: z.array(z.object({
    role: z.enum(['user', 'assistant', 'system']),
    content: z.string().max(4000),
  })).min(1).max(50),
  model: z.enum(['gpt-4o', 'gpt-4o-mini', 'claude-sonnet-4-20250514']).default('gpt-4o-mini'),
  temperature: z.number().min(0).max(2).default(0.7),
  maxTokens: z.number().min(1).max(4096).default(2048),
})

const RATE_LIMIT_WINDOW = 60_000
const RATE_LIMIT_MAX = 20
const requestCounts = new Map<string, { count: number; resetAt: number }>()

function checkRateLimit(ip: string): boolean {
  const now = Date.now()
  const record = requestCounts.get(ip)
  if (!record || now > record.resetAt) {
    requestCounts.set(ip, { count: 1, resetAt: now + RATE_LIMIT_WINDOW })
    return true
  }
  if (record.count >= RATE_LIMIT_MAX) {
    return false
  }
  record.count++
  return true
}

export default defineEventHandler(async (event) => {
  const clientIp = getRequestHeader(event, 'x-forwarded-for') || 'unknown'
  if (!checkRateLimit(clientIp)) {
    throw createError({
      statusCode: 429,
      statusMessage: 'Rate limit exceeded. Please try again later.',
    })
  }

  const body = await readBody(event)
  const parsed = chatRequestSchema.safeParse(body)
  if (!parsed.success) {
    throw createError({
      statusCode: 400,
      statusMessage: `Validation error: ${parsed.error.message}`,
    })
  }

  const { messages, model, temperature, maxTokens } = parsed.data

  const apiKey = process.env.OPENAI_API_KEY
  if (!apiKey) {
    throw createError({
      statusCode: 500,
      statusMessage: 'AI service not configured',
    })
  }

  const response = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${apiKey}`,
    },
    body: JSON.stringify({
      model,
      messages,
      temperature,
      max_tokens: maxTokens,
    }),
  })

  if (!response.ok) {
    const errorData = await response.json().catch(() => ({}))
    throw createError({
      statusCode: response.status,
      statusMessage: errorData.error?.message || 'AI service error',
    })
  }

  return response.json()
})

Streaming Server Route

// server/api/chat/stream.post.ts
import { defineEventHandler, readBody, createError, setResponseHeader, sendStream } from 'h3'

export default defineEventHandler(async (event) => {
  const body = await readBody(event)
  const { messages, model = 'gpt-4o-mini' } = body

  setResponseHeader(event, 'Content-Type', 'text/event-stream')
  setResponseHeader(event, 'Cache-Control', 'no-cache')
  setResponseHeader(event, 'Connection', 'keep-alive')
  setResponseHeader(event, 'X-Accel-Buffering', 'no')

  const apiKey = process.env.OPENAI_API_KEY
  if (!apiKey) {
    throw createError({ statusCode: 500, statusMessage: 'AI service not configured' })
  }

  const response = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${apiKey}`,
    },
    body: JSON.stringify({
      model,
      messages,
      stream: true,
    }),
  })

  if (!response.ok) {
    throw createError({
      statusCode: response.status,
      statusMessage: 'AI streaming error',
    })
  }

  const transformStream = new TransformStream({
    transform(chunk, controller) {
      const text = new TextDecoder().decode(chunk)
      const lines = text.split('\n').filter((line) => line.startsWith('data: '))

      for (const line of lines) {
        const data = line.slice(6)
        if (data === '[DONE]') {
          controller.enqueue(new TextEncoder().encode('data: [DONE]\n\n'))
          continue
        }
        try {
          const parsed = JSON.parse(data)
          const content = parsed.choices?.[0]?.delta?.content
          if (content) {
            controller.enqueue(new TextEncoder().encode(`data: ${JSON.stringify({ content })}\n\n`))
          }
        } catch {
          // skip malformed chunks
        }
      }
    },
  })

  const readableStream = response.body!.pipeThrough(transformStream)
  return sendStream(event, readableStream)
})

Shared Type Definitions

// shared/types/ai.ts
export interface ChatMessage {
  id: string
  role: 'user' | 'assistant' | 'system'
  content: string
  timestamp: number
  metadata?: MessageMetadata
}

export interface MessageMetadata {
  model: string
  tokens: number
  latency: number
  finishReason: string
}

export interface ChatRequest {
  messages: Pick<ChatMessage, 'role' | 'content'>[]
  model: AIModel
  temperature?: number
  maxTokens?: number
  stream?: boolean
}

export type AIModel = 'gpt-4o' | 'gpt-4o-mini' | 'claude-sonnet-4-20250514'

export interface StreamChunk {
  content: string
  done: boolean
}

export interface RAGQuery {
  question: string
  topK?: number
  threshold?: number
}

export interface RAGResult {
  answer: string
  sources: RAGSource[]
  confidence: number
}

export interface RAGSource {
  content: string
  metadata: Record<string, unknown>
  score: number
}

Pattern 2: SSR + AI Streaming Render

The core value of combining SSR with AI: search engines can index AI-generated content, and users see AI answers on first paint. Nuxt 4's useAsyncData + Server Routes make SSR AI rendering straightforward.

SSR Data Preloading

// composables/useAIContent.ts
import { useAsyncData, useHead } from '#imports'

interface AIContentOptions {
  prompt: string
  model?: AIModel
  ttl?: number
}

export function useAIContent(options: AIContentOptions) {
  const { prompt, model = 'gpt-4o-mini', ttl = 3600 } = options

  const { data, pending, error, refresh } = useAsyncData(
    `ai-content-${prompt.slice(0, 32)}`,
    () => $fetch<string>('/api/ai/generate', {
      method: 'POST',
      body: { prompt, model },
    }),
    {
      server: true,
      lazy: false,
      getCachedData(key, nuxtApp) {
        const cached = nuxtApp.payload.data[key]
        if (cached) {
          const expirationDate = new Date(cached.expiresAt)
          if (expirationDate.getTime() > Date.now()) {
            return cached.data
          }
        }
        return null
      },
    }
  )

  useHead({
    meta: [
      { name: 'description', content: () => data.value?.slice(0, 160) || '' },
    ],
  })

  return { content: data, pending, error, refresh }
}

SSR AI Page Component

<!-- pages/ai-insights/[topic].vue -->
<script setup lang="ts">
const route = useRoute()
const topic = route.params.topic as string

const { content, pending, error } = useAIContent({
  prompt: `Generate a comprehensive technical insight about ${topic} for developers in 2026`,
  model: 'gpt-4o-mini',
})

useHead({
  title: () => `AI Insights: ${topic} | ToolsKu`,
})
</script>

<template>
  <div class="mx-auto max-w-4xl px-4 py-8">
    <header class="mb-8">
      <h2 class="text-3xl font-bold text-gray-900">
        AI Insights: {{ topic }}
      </h2>
      <p class="mt-2 text-gray-500">
        AI-generated analysis, verified and curated for developers
      </p>
    </header>

    <div v-if="pending" class="space-y-4">
      <div class="h-8 w-3/4 animate-pulse rounded bg-gray-200" />
      <div class="h-8 w-1/2 animate-pulse rounded bg-gray-200" />
      <div class="h-8 w-2/3 animate-pulse rounded bg-gray-200" />
    </div>

    <div v-else-if="error" class="rounded-lg bg-red-50 p-4">
      <p class="text-red-700">Failed to generate AI content. Please try again.</p>
    </div>

    <article v-else class="prose prose-lg max-w-none">
      <div v-html="content" />
    </article>
  </div>
</template>

SSR Cache Middleware

// server/middleware/ai-cache.ts
import { defineEventHandler, setResponseHeader } from 'h3'
import { useStorage } from '#imports'

const aiCache = useStorage('ai-cache')

export default defineEventHandler(async (event) => {
  if (!event.path.startsWith('/api/ai/')) return

  const cacheKey = `ssr-ai:${event.path}:${JSON.stringify(await readBody(event).catch(() => ({})))}`
  const cached = await aiCache.getItem<{ data: string; expiresAt: number }>(cacheKey)

  if (cached && cached.expiresAt > Date.now()) {
    setResponseHeader(event, 'X-AI-Cache', 'HIT')
    return cached.data
  }

  setResponseHeader(event, 'X-AI-Cache', 'MISS')
})

Pattern 3: Streaming Chat UI Component

Streaming chat is the core interaction of AI applications. This pattern implements a complete, production-grade streaming chat component with Markdown rendering, code highlighting, interrupt generation, and message retry.

Streaming Chat Composable

// composables/useStreamingChat.ts
import { ref, computed } from 'vue'
import type { ChatMessage, StreamChunk, AIModel } from '~/shared/types/ai'

interface UseStreamingChatOptions {
  apiEndpoint?: string
  defaultModel?: AIModel
  maxRetries?: number
}

export function useStreamingChat(options: UseStreamingChatOptions = {}) {
  const {
    apiEndpoint = '/api/chat/stream',
    defaultModel = 'gpt-4o-mini',
    maxRetries = 2,
  } = options

  const messages = ref<ChatMessage[]>([])
  const currentStreamContent = ref('')
  const isStreaming = ref(false)
  const error = ref<string | null>(null)
  const selectedModel = ref<AIModel>(defaultModel)
  let abortController: AbortController | null = null

  const displayedMessages = computed(() => {
    const base = [...messages.value]
    if (isStreaming.value && currentStreamContent.value) {
      base.push({
        id: 'streaming',
        role: 'assistant',
        content: currentStreamContent.value,
        timestamp: Date.now(),
      })
    }
    return base
  })

  async function sendMessage(content: string) {
    const userMessage: ChatMessage = {
      id: crypto.randomUUID(),
      role: 'user',
      content,
      timestamp: Date.now(),
    }
    messages.value.push(userMessage)
    error.value = null
    currentStreamContent.value = ''
    isStreaming.value = true
    abortController = new AbortController()

    let retryCount = 0

    const attemptStream = async (): Promise<void> => {
      try {
        const response = await fetch(apiEndpoint, {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({
            messages: messages.value.map((m) => ({
              role: m.role,
              content: m.content,
            })),
            model: selectedModel.value,
          }),
          signal: abortController!.signal,
        })

        if (!response.ok) {
          throw new Error(`HTTP ${response.status}: ${response.statusText}`)
        }

        const reader = response.body!.getReader()
        const decoder = new TextDecoder()
        let buffer = ''

        while (true) {
          const { done, value } = await reader.read()
          if (done) break

          buffer += decoder.decode(value, { stream: true })
          const lines = buffer.split('\n')
          buffer = lines.pop() || ''

          for (const line of lines) {
            if (!line.startsWith('data: ')) continue
            const data = line.slice(6)
            if (data === '[DONE]') {
              finalizeStream()
              return
            }
            try {
              const chunk: StreamChunk = JSON.parse(data)
              currentStreamContent.value += chunk.content
            } catch {
              // skip malformed chunks
            }
          }
        }

        finalizeStream()
      } catch (err: any) {
        if (err.name === 'AbortError') return
        if (retryCount < maxRetries) {
          retryCount++
          return attemptStream()
        }
        error.value = err.message
        isStreaming.value = false
      }
    }

    await attemptStream()
  }

  function finalizeStream() {
    if (currentStreamContent.value) {
      messages.value.push({
        id: crypto.randomUUID(),
        role: 'assistant',
        content: currentStreamContent.value,
        timestamp: Date.now(),
      })
    }
    currentStreamContent.value = ''
    isStreaming.value = false
  }

  function stopStreaming() {
    abortController?.abort()
    finalizeStream()
  }

  function retryLastMessage() {
    const lastUserIndex = messages.value.findLastIndex((m) => m.role === 'user')
    if (lastUserIndex === -1) return
    const lastUserContent = messages.value[lastUserIndex].content
    messages.value = messages.value.slice(0, lastUserIndex)
    sendMessage(lastUserContent)
  }

  function clearMessages() {
    messages.value = []
    currentStreamContent.value = ''
    error.value = null
  }

  return {
    messages: displayedMessages,
    isStreaming,
    error,
    selectedModel,
    sendMessage,
    stopStreaming,
    retryLastMessage,
    clearMessages,
  }
}

Chat UI Component

<!-- components/AIChatWindow.vue -->
<script setup lang="ts">
import { useStreamingChat } from '~/composables/useStreamingChat'
import { useConversationStore } from '~/stores/conversation'

const props = defineProps<{
  conversationId?: string
}>()

const {
  messages,
  isStreaming,
  error,
  selectedModel,
  sendMessage,
  stopStreaming,
  retryLastMessage,
  clearMessages,
} = useStreamingChat()

const conversationStore = useConversationStore()
const inputText = ref('')
const messagesContainer = ref<HTMLElement>()

const modelOptions = [
  { label: 'GPT-4o', value: 'gpt-4o' as const },
  { label: 'GPT-4o Mini', value: 'gpt-4o-mini' as const },
  { label: 'Claude Sonnet 4', value: 'claude-sonnet-4-20250514' as const },
]

async function handleSubmit() {
  const text = inputText.value.trim()
  if (!text || isStreaming.value) return
  inputText.value = ''
  await sendMessage(text)
  if (props.conversationId) {
    conversationStore.saveConversation(props.conversationId, messages.value)
  }
  scrollToBottom()
}

function scrollToBottom() {
  nextTick(() => {
    if (messagesContainer.value) {
      messagesContainer.value.scrollTop = messagesContainer.value.scrollHeight
    }
  })
}

watch(messages, () => scrollToBottom(), { deep: true })
</script>

<template>
  <div class="flex h-full flex-col rounded-xl border border-gray-200 bg-white shadow-sm">
    <header class="flex items-center justify-between border-b border-gray-200 px-4 py-3">
      <div class="flex items-center gap-3">
        <span class="text-sm font-medium text-gray-700">AI Chat</span>
        <select
          v-model="selectedModel"
          class="rounded-md border border-gray-300 px-2 py-1 text-xs text-gray-600"
          :disabled="isStreaming"
        >
          <option v-for="opt in modelOptions" :key="opt.value" :value="opt.value">
            {{ opt.label }}
          </option>
        </select>
      </div>
      <div class="flex gap-2">
        <button
          class="rounded-md px-2 py-1 text-xs text-gray-500 hover:bg-gray-100"
          @click="retryLastMessage"
          :disabled="isStreaming || messages.length === 0"
        >
          Retry
        </button>
        <button
          class="rounded-md px-2 py-1 text-xs text-red-500 hover:bg-red-50"
          @click="clearMessages"
          :disabled="isStreaming"
        >
          Clear
        </button>
      </div>
    </header>

    <div ref="messagesContainer" class="flex-1 overflow-y-auto p-4 space-y-4">
      <div
        v-for="msg in messages"
        :key="msg.id"
        :class="[
          'max-w-[80%] rounded-lg px-4 py-2.5 text-sm',
          msg.role === 'user'
            ? 'ml-auto bg-blue-600 text-white'
            : 'mr-auto bg-gray-100 text-gray-900',
        ]"
      >
        <div v-if="msg.role === 'assistant'" class="prose prose-sm max-w-none" v-html="renderMarkdown(msg.content)" />
        <p v-else>{{ msg.content }}</p>
      </div>

      <div v-if="error" class="mx-auto max-w-md rounded-lg bg-red-50 p-3 text-center text-sm text-red-600">
        {{ error }}
        <button class="ml-2 underline" @click="retryLastMessage">Retry</button>
      </div>
    </div>

    <footer class="border-t border-gray-200 p-3">
      <div class="flex gap-2">
        <input
          v-model="inputText"
          type="text"
          placeholder="Type your message..."
          class="flex-1 rounded-lg border border-gray-300 px-3 py-2 text-sm focus:border-blue-500 focus:outline-none focus:ring-1 focus:ring-blue-500"
          @keydown.enter="handleSubmit"
          :disabled="isStreaming"
        />
        <button
          v-if="isStreaming"
          class="rounded-lg bg-red-500 px-4 py-2 text-sm font-medium text-white hover:bg-red-600"
          @click="stopStreaming"
        >
          Stop
        </button>
        <button
          v-else
          class="rounded-lg bg-blue-600 px-4 py-2 text-sm font-medium text-white hover:bg-blue-700"
          @click="handleSubmit"
          :disabled="!inputText.trim()"
        >
          Send
        </button>
      </div>
    </footer>
  </div>
</template>

Pattern 4: Edge Inference with Cloudflare Workers

Edge inference is a key trend for AI applications in 2026—deploying inference logic to the nearest edge node reduces latency from hundreds of milliseconds to single digits. Nuxt 4 + Nitro makes deploying to Cloudflare Workers incredibly simple.

Nitro Edge Configuration

// nuxt.config.ts
export default defineNuxtConfig({
  future: {
    compatibilityVersion: 4,
  },
  nitro: {
    preset: 'cloudflare-module',
    runtimeConfig: {
      aiApiKey: process.env.OPENAI_API_KEY,
      aiBaseUrl: process.env.AI_BASE_URL || 'https://api.openai.com/v1',
    },
    routeRules: {
      '/api/ai/**': {
        cors: true,
        headers: {
          'cache-control': 'no-cache',
        },
      },
    },
  },
})

Edge Inference Server Route

// server/api/ai/edge-chat.post.ts
import { defineEventHandler, readBody, setResponseHeader, sendStream } from 'h3'

interface EdgeAIConfig {
  provider: 'openai' | 'anthropic' | 'local'
  baseUrl: string
  apiKey: string
}

function getAIConfig(event: any): EdgeAIConfig {
  const config = useRuntimeConfig(event)
  const provider = getHeader(event, 'x-ai-provider') || 'openai'

  const configs: Record<string, EdgeAIConfig> = {
    openai: {
      provider: 'openai',
      baseUrl: config.public.aiBaseUrl || 'https://api.openai.com/v1',
      apiKey: config.aiApiKey,
    },
    anthropic: {
      provider: 'anthropic',
      baseUrl: 'https://api.anthropic.com/v1',
      apiKey: process.env.ANTHROPIC_API_KEY || '',
    },
    local: {
      provider: 'local',
      baseUrl: process.env.LOCAL_AI_URL || 'http://localhost:11434/v1',
      apiKey: 'local',
    },
  }

  return configs[provider] || configs.openai
}

export default defineEventHandler(async (event) => {
  const body = await readBody(event)
  const { messages, model = 'gpt-4o-mini' } = body
  const aiConfig = getAIConfig(event)

  setResponseHeader(event, 'Content-Type', 'text/event-stream')
  setResponseHeader(event, 'Cache-Control', 'no-cache')
  setResponseHeader(event, 'Connection', 'keep-alive')

  const endpoint = aiConfig.provider === 'anthropic'
    ? `${aiConfig.baseUrl}/messages`
    : `${aiConfig.baseUrl}/chat/completions`

  const requestHeaders: Record<string, string> = {
    'Content-Type': 'application/json',
  }

  if (aiConfig.provider === 'anthropic') {
    requestHeaders['x-api-key'] = aiConfig.apiKey
    requestHeaders['anthropic-version'] = '2023-06-01'
  } else {
    requestHeaders['Authorization'] = `Bearer ${aiConfig.apiKey}`
  }

  const requestBody = aiConfig.provider === 'anthropic'
    ? {
        model,
        messages: messages.map((m: any) => ({ role: m.role, content: m.content })),
        max_tokens: 2048,
        stream: true,
      }
    : {
        model,
        messages: messages.map((m: any) => ({ role: m.role, content: m.content })),
        stream: true,
      }

  const response = await fetch(endpoint, {
    method: 'POST',
    headers: requestHeaders,
    body: JSON.stringify(requestBody),
  })

  if (!response.ok) {
    throw createError({
      statusCode: response.status,
      statusMessage: `Edge AI error: ${response.statusText}`,
    })
  }

  return sendStream(event, response.body!)
})

Wrangler Deployment Configuration

# wrangler.toml
name = "toolsku-ai-edge"
main = ".output/server/index.mjs"
compatibility_date = "2026-06-01"
compatibility_flags = ["nodejs_compat"]

[vars]
AI_BASE_URL = "https://api.openai.com/v1"

[ai]
binding = "AI"

[[r2_buckets]]
binding = "AI_CACHE"
bucket_name = "toolsku-ai-cache"

[observability]
enabled = true

Pattern 5: RAG Frontend Interaction Design

RAG (Retrieval-Augmented Generation) is one of the most practical patterns for AI applications. The frontend needs to handle the complete interaction chain: document upload, vector search, and result display.

RAG Query Composable

// composables/useRAG.ts
import { ref, computed } from 'vue'
import type { RAGQuery, RAGResult, RAGSource } from '~/shared/types/ai'

interface DocumentChunk {
  id: string
  content: string
  metadata: {
    source: string
    page: number
    section: string
  }
  score: number
}

export function useRAG() {
  const query = ref('')
  const isSearching = ref(false)
  const isGenerating = ref(false)
  const searchResults = ref<DocumentChunk[]>([])
  const ragAnswer = ref('')
  const ragSources = ref<RAGSource[]>([])
  const error = ref<string | null>(null)

  const hasResults = computed(() => searchResults.value.length > 0)
  const isProcessing = computed(() => isSearching.value || isGenerating.value)

  async function searchDocuments(searchQuery: string, topK = 5) {
    isSearching.value = true
    error.value = null
    searchResults.value = []

    try {
      const results = await $fetch<DocumentChunk[]>('/api/rag/search', {
        method: 'POST',
        body: { query: searchQuery, topK },
      })
      searchResults.value = results
    } catch (err: any) {
      error.value = err.data?.message || 'Search failed'
    } finally {
      isSearching.value = false
    }
  }

  async function generateAnswer(searchQuery: string) {
    isGenerating.value = true
    ragAnswer.value = ''
    ragSources.value = []

    try {
      const result = await $fetch<RAGResult>('/api/rag/generate', {
        method: 'POST',
        body: {
          question: searchQuery,
          topK: 5,
          threshold: 0.7,
        } satisfies RAGQuery,
      })
      ragAnswer.value = result.answer
      ragSources.value = result.sources
    } catch (err: any) {
      error.value = err.data?.message || 'Generation failed'
    } finally {
      isGenerating.value = false
    }
  }

  async function fullRAGPipeline(searchQuery: string) {
    await searchDocuments(searchQuery)
    if (searchResults.value.length > 0) {
      await generateAnswer(searchQuery)
    }
  }

  return {
    query,
    isSearching,
    isGenerating,
    isProcessing,
    searchResults,
    ragAnswer,
    ragSources,
    hasResults,
    error,
    searchDocuments,
    generateAnswer,
    fullRAGPipeline,
  }
}

RAG Interaction Component

<!-- components/RAGSearchPanel.vue -->
<script setup lang="ts">
import { useRAG } from '~/composables/useRAG'

const {
  query,
  isProcessing,
  searchResults,
  ragAnswer,
  ragSources,
  error,
  fullRAGPipeline,
} = useRAG()

const showSources = ref(false)

async function handleSearch() {
  if (!query.value.trim()) return
  await fullRAGPipeline(query.value)
}
</script>

<template>
  <div class="mx-auto max-w-3xl space-y-6">
    <div class="flex gap-2">
      <input
        v-model="query"
        type="text"
        placeholder="Ask about your documents..."
        class="flex-1 rounded-lg border border-gray-300 px-4 py-2.5 text-sm focus:border-blue-500 focus:outline-none focus:ring-1 focus:ring-blue-500"
        @keydown.enter="handleSearch"
        :disabled="isProcessing"
      />
      <button
        class="rounded-lg bg-blue-600 px-6 py-2.5 text-sm font-medium text-white hover:bg-blue-700 disabled:opacity-50"
        @click="handleSearch"
        :disabled="isProcessing || !query.trim()"
      >
        {{ isProcessing ? 'Searching...' : 'Search' }}
      </button>
    </div>

    <div v-if="error" class="rounded-lg bg-red-50 p-4 text-sm text-red-600">
      {{ error }}
    </div>

    <div v-if="ragAnswer" class="rounded-lg border border-gray-200 bg-white p-6 shadow-sm">
      <h3 class="mb-3 text-sm font-semibold text-gray-500 uppercase tracking-wide">AI Answer</h3>
      <div class="prose prose-sm max-w-none" v-html="ragAnswer" />
      <button
        class="mt-4 text-xs text-blue-600 hover:underline"
        @click="showSources = !showSources"
      >
        {{ showSources ? 'Hide' : 'Show' }} Sources ({{ ragSources.length }})
      </button>
    </div>

    <div v-if="showSources && ragSources.length" class="space-y-3">
      <h3 class="text-sm font-semibold text-gray-500 uppercase tracking-wide">Sources</h3>
      <div
        v-for="(source, index) in ragSources"
        :key="index"
        class="rounded-lg border border-gray-200 bg-gray-50 p-4"
      >
        <div class="mb-2 flex items-center justify-between">
          <span class="text-xs font-medium text-gray-500">
            Relevance: {{ (source.score * 100).toFixed(1) }}%
          </span>
        </div>
        <p class="text-sm text-gray-700 line-clamp-3">{{ source.content }}</p>
      </div>
    </div>

    <div v-if="searchResults.length && !ragAnswer" class="space-y-3">
      <h3 class="text-sm font-semibold text-gray-500 uppercase tracking-wide">Matching Chunks</h3>
      <div
        v-for="chunk in searchResults"
        :key="chunk.id"
        class="rounded-lg border border-gray-200 bg-white p-4"
      >
        <div class="mb-2 flex items-center gap-2">
          <span class="rounded bg-blue-100 px-2 py-0.5 text-xs font-medium text-blue-700">
            {{ chunk.metadata.source }}
          </span>
          <span class="text-xs text-gray-400">Page {{ chunk.metadata.page }}</span>
        </div>
        <p class="text-sm text-gray-700">{{ chunk.content }}</p>
      </div>
    </div>
  </div>
</template>

Pattern 6: Conversation State with Pinia Persistence

One of the core challenges of AI chat applications is state management—conversation history, model selection, and user preferences all need persistence. Pinia + Nuxt 4's SSR-compatible solution makes this straightforward.

Conversation Store

// stores/conversation.ts
import { defineStore } from 'pinia'
import type { ChatMessage, AIModel } from '~/shared/types/ai'

interface Conversation {
  id: string
  title: string
  messages: ChatMessage[]
  model: AIModel
  createdAt: number
  updatedAt: number
}

interface ConversationState {
  conversations: Map<string, Conversation>
  activeConversationId: string | null
  preferences: {
    defaultModel: AIModel
    temperature: number
    systemPrompt: string
    streamByDefault: boolean
  }
}

export const useConversationStore = defineStore('conversation', {
  state: (): ConversationState => ({
    conversations: new Map(),
    activeConversationId: null,
    preferences: {
      defaultModel: 'gpt-4o-mini',
      temperature: 0.7,
      systemPrompt: 'You are a helpful assistant.',
      streamByDefault: true,
    },
  }),

  getters: {
    activeConversation(state): Conversation | undefined {
      if (!state.activeConversationId) return undefined
      return state.conversations.get(state.activeConversationId)
    },
    conversationList(state): Conversation[] {
      return Array.from(state.conversations.values())
        .sort((a, b) => b.updatedAt - a.updatedAt)
    },
    messageCount(state): number {
      return (id: string) => state.conversations.get(id)?.messages.length || 0
    },
  },

  actions: {
    createConversation(title?: string): string {
      const id = crypto.randomUUID()
      const conversation: Conversation = {
        id,
        title: title || `Chat ${this.conversations.size + 1}`,
        messages: [],
        model: this.preferences.defaultModel,
        createdAt: Date.now(),
        updatedAt: Date.now(),
      }
      this.conversations.set(id, conversation)
      this.activeConversationId = id
      this.persist()
      return id
    },

    saveConversation(id: string, messages: ChatMessage[]) {
      const conversation = this.conversations.get(id)
      if (!conversation) return
      conversation.messages = messages
      conversation.updatedAt = Date.now()
      if (messages.length > 0 && messages[0].role === 'user') {
        conversation.title = messages[0].content.slice(0, 50)
      }
      this.persist()
    },

    deleteConversation(id: string) {
      this.conversations.delete(id)
      if (this.activeConversationId === id) {
        const remaining = this.conversationList
        this.activeConversationId = remaining.length > 0 ? remaining[0].id : null
      }
      this.persist()
    },

    setActiveConversation(id: string) {
      this.activeConversationId = id
    },

    updatePreferences(prefs: Partial<ConversationState['preferences']>) {
      this.preferences = { ...this.preferences, ...prefs }
      this.persist()
    },

    persist() {
      if (import.meta.client) {
        const data = {
          conversations: Object.fromEntries(this.conversations),
          activeConversationId: this.activeConversationId,
          preferences: this.preferences,
        }
        localStorage.setItem('toolsku-ai-conversations', JSON.stringify(data))
      }
    },

    hydrate() {
      if (import.meta.client) {
        const stored = localStorage.getItem('toolsku-ai-conversations')
        if (stored) {
          try {
            const data = JSON.parse(stored)
            this.conversations = new Map(Object.entries(data.conversations))
            this.activeConversationId = data.activeConversationId
            this.preferences = data.preferences
          } catch {
            localStorage.removeItem('toolsku-ai-conversations')
          }
        }
      }
    },
  },
})

SSR-Safe Initialization Plugin

// plugins/conversation-init.client.ts
import { useConversationStore } from '~/stores/conversation'

export default defineNuxtPlugin(() => {
  const store = useConversationStore()
  store.hydrate()
})

Pattern 7: Production Deployment & Performance Optimization

From development to production, Nuxt 4 full-stack AI applications need attention to performance, security, and observability.

Production Configuration

// nuxt.config.ts (production)
export default defineNuxtConfig({
  future: {
    compatibilityVersion: 4,
  },

  nitro: {
    compressPublicAssets: true,
    minify: true,

    routeRules: {
      '/api/ai/**': {
        cors: false,
        headers: {
          'strict-transport-security': 'max-age=31536000; includeSubDomains',
          'x-content-type-options': 'nosniff',
          'x-frame-options': 'DENY',
        },
      },
      '/api/chat/stream': {
        headers: {
          'cache-control': 'no-cache, no-store, must-revalidate',
          'x-accel-buffering': 'no',
        },
      },
    },

    rollupConfig: {
      external: ['sharp', 'canvas'],
    },
  },

  app: {
    head: {
      meta: [
        { 'http-equiv': 'X-UA-Compatible', content: 'IE=edge' },
        { name: 'viewport', content: 'width=device-width, initial-scale=1' },
      ],
    },
  },

  experimental: {
    payloadExtraction: true,
    renderJsonPayloads: true,
  },

  vite: {
    build: {
      rollupOptions: {
        output: {
          manualChunks: {
            'ai-vendor': ['openai'],
            'markdown': ['marked', 'highlight.js'],
          },
        },
      },
    },
  },
})

Performance Monitoring Composable

// composables/useAIPerformance.ts
import { ref, computed } from 'vue'

interface PerformanceMetric {
  name: string
  startTime: number
  endTime: number
  duration: number
  metadata?: Record<string, unknown>
}

export function useAIPerformance() {
  const metrics = ref<PerformanceMetric[]>([])
  const activeTimers = new Map<string, number>()

  function startTimer(name: string) {
    activeTimers.set(name, performance.now())
  }

  function endTimer(name: string, metadata?: Record<string, unknown>) {
    const startTime = activeTimers.get(name)
    if (startTime === undefined) return
    const endTime = performance.now()
    metrics.value.push({
      name,
      startTime,
      endTime,
      duration: endTime - startTime,
      metadata,
    })
    activeTimers.delete(name)
  }

  const averageLatency = computed(() => {
    const chatMetrics = metrics.value.filter((m) => m.name === 'ai-response')
    if (chatMetrics.length === 0) return 0
    return chatMetrics.reduce((sum, m) => sum + m.duration, 0) / chatMetrics.length
  })

  const p95Latency = computed(() => {
    const chatMetrics = metrics.value
      .filter((m) => m.name === 'ai-response')
      .sort((a, b) => a.duration - b.duration)
    if (chatMetrics.length < 2) return 0
    const index = Math.ceil(chatMetrics.length * 0.95) - 1
    return chatMetrics[index].duration
  })

  function getReport() {
    return {
      totalRequests: metrics.value.filter((m) => m.name === 'ai-response').length,
      averageLatency: averageLatency.value,
      p95Latency: p95Latency.value,
      errorRate: metrics.value.filter((m) => m.metadata?.error).length / Math.max(metrics.value.length, 1),
    }
  }

  return { metrics, startTimer, endTimer, averageLatency, p95Latency, getReport }
}

Health Check Endpoint

// server/api/health.get.ts
import { defineEventHandler } from 'h3'

export default defineEventHandler(async () => {
  const checks: Record<string, { status: 'ok' | 'error'; latency?: number; error?: string }> = {}

  const aiStart = Date.now()
  try {
    const apiKey = process.env.OPENAI_API_KEY
    if (!apiKey) throw new Error('API key not configured')
    await fetch('https://api.openai.com/v1/models', {
      headers: { Authorization: `Bearer ${apiKey}` },
      signal: AbortSignal.timeout(5000),
    })
    checks.ai = { status: 'ok', latency: Date.now() - aiStart }
  } catch (err: any) {
    checks.ai = { status: 'error', error: err.message }
  }

  const overallStatus = Object.values(checks).every((c) => c.status === 'ok') ? 'ok' : 'degraded'

  return {
    status: overallStatus,
    timestamp: new Date().toISOString(),
    version: process.env.APP_VERSION || 'unknown',
    checks,
  }
})

5 Common Pitfalls and Solutions

Pitfall 1: Accessing localStorage During SSR Causes Hydration Mismatch

Problem: Pinia persistence reads localStorage during SSR, causing data inconsistency during client hydration.

Solution:

// Only execute persistence logic on the client
if (import.meta.client) {
  store.hydrate()
}

// Or use useCookie instead of localStorage
const savedData = useCookie('ai-conversations', {
  maxAge: 60 * 60 * 24 * 30,
  sameSite: 'lax',
})

Pitfall 2: Streaming Responses Buffered by Nginx/CDN

Problem: SSE streaming responses are buffered by intermediate layers, users don't see character-by-character output.

Solution:

# nginx.conf
location /api/chat/stream {
    proxy_pass http://nuxt_backend;
    proxy_buffering off;
    proxy_cache off;
    proxy_set_header Connection '';
    proxy_http_version 1.1;
    chunked_transfer_encoding off;
}

Pitfall 3: Exposing API Key in Server Routes

Problem: AI API Key leaked in frontend code or error responses.

Solution:

// server/api/chat.post.ts
// Never return API Key in responses
const apiKey = useRuntimeConfig().aiApiKey // Server-side private config

// Error responses must also filter sensitive information
catch (err: any) {
  throw createError({
    statusCode: 500,
    statusMessage: 'AI service error', // Don't expose err.message
  })
}

Pitfall 4: Edge Runtime Doesn't Support Node.js APIs

Problem: Cloudflare Workers don't support fs, path, and other Node.js modules.

Solution:

// Use Nitro's auto-import detection
// server/api/ai/edge-chat.post.ts
// Avoid Node.js APIs, use Web standard APIs

// Wrong: import { readFile } from 'fs'
// Right: Use env variables or KV storage

export default defineEventHandler(async (event) => {
  const config = useRuntimeConfig(event)
  // Use config instead of reading files
})

Pitfall 5: Excessive Conversation History Exceeds Token Limits

Problem: Sending full conversation history to AI API exceeds model token limits.

Solution:

// utils/message-trimmer.ts
import type { ChatMessage } from '~/shared/types/ai'

const MAX_CONTEXT_TOKENS = 4096
const CHARS_PER_TOKEN = 4

export function trimMessages(messages: ChatMessage[], maxTokens = MAX_CONTEXT_TOKENS): ChatMessage[] {
  let totalTokens = 0
  const trimmed: ChatMessage[] = []

  for (let i = messages.length - 1; i >= 0; i--) {
    const estimatedTokens = Math.ceil(messages[i].content.length / CHARS_PER_TOKEN)
    if (totalTokens + estimatedTokens > maxTokens) break
    totalTokens += estimatedTokens
    trimmed.unshift(messages[i])
  }

  if (trimmed[0]?.role !== 'system' && messages[0]?.role === 'system') {
    trimmed.unshift(messages[0])
  }

  return trimmed
}

10 Common Error Troubleshooting

#	Error Message	Cause	Solution
1	`Hydration mismatch`	SSR/CSR data inconsistency	Ensure localStorage operations are in `import.meta.client`
2	`429 Too Many Requests`	AI API rate limiting	Implement request queue and exponential backoff retry
3	`fetch failed` in Server Routes	Cannot access external API during SSR	Check server network and DNS configuration
4	`CORS error`	Cross-origin request blocked	Use Server Routes proxy instead of direct calls
5	`Stream interrupted`	Connection timeout or interruption	Implement resumption and auto-reconnect
6	`context_length_exceeded`	Conversation history too long	Use `trimMessages` to trim context
7	`Invalid API Key`	Environment variable not set	Check `.env` and `runtimeConfig` configuration
8	`Worker exceeded CPU time limit`	Edge runtime timeout	Optimize inference logic, use streaming responses
9	`Module not found: fs`	Edge environment doesn't support Node modules	Use Web standard APIs instead of Node.js APIs
10	`Pinia store not initialized`	Store not ready during SSR	Use `callOnce` or `onNuxtReady` to initialize

Advanced Optimization Techniques

1. AI Response Caching Strategy

// server/utils/ai-cache.ts
import { useStorage } from '#imports'

const cache = useStorage('redis')

interface CacheEntry<T> {
  data: T
  expiresAt: number
  hitCount: number
}

export async function getCachedAIResponse<T>(
  key: string,
  generator: () => Promise<T>,
  ttl = 3600
): Promise<T> {
  const cached = await cache.getItem<CacheEntry<T>>(`ai:${key}`)
  if (cached && cached.expiresAt > Date.now()) {
    cached.hitCount++
    await cache.setItem(`ai:${key}`, cached)
    return cached.data
  }

  const data = await generator()
  await cache.setItem(`ai:${key}`, {
    data,
    expiresAt: Date.now() + ttl * 1000,
    hitCount: 0,
  })
  return data
}

2. Multi-Model Intelligent Routing

// server/utils/model-router.ts
import type { AIModel } from '~/shared/types/ai'

interface ModelRoute {
  model: AIModel
  condition: (messages: any[]) => boolean
  priority: number
}

const routes: ModelRoute[] = [
  {
    model: 'gpt-4o',
    condition: (msgs) => msgs.length > 20 || msgs.some((m) => m.content.length > 2000),
    priority: 10,
  },
  {
    model: 'gpt-4o-mini',
    condition: () => true,
    priority: 1,
  },
]

export function selectModel(messages: any[]): AIModel {
  const matched = routes
    .filter((r) => r.condition(messages))
    .sort((a, b) => b.priority - a.priority)
  return matched[0]?.model || 'gpt-4o-mini'
}

3. Request Deduplication and Batching

// server/utils/request-dedup.ts
const pendingRequests = new Map<string, Promise<any>>()

export async function deduplicatedFetch<T>(
  key: string,
  fetcher: () => Promise<T>
): Promise<T> {
  const pending = pendingRequests.get(key)
  if (pending) return pending as Promise<T>

  const promise = fetcher().finally(() => {
    pendingRequests.delete(key)
  })
  pendingRequests.set(key, promise)
  return promise
}

Comparison: Nuxt 4 vs Next.js 15 vs SvelteKit

Dimension	Nuxt 4	Next.js 15	SvelteKit
Full-Stack AI	Server Routes + Nitro	Route Handlers + Edge	Server Endpoints
SSR Streaming	Native support	App Router support	Supported but smaller ecosystem
Edge Deployment	Cloudflare/Vercel/Deno	Vercel Edge first	Cloudflare adapter
Type Safety	Shared Types across stack	Manual configuration needed	Built-in but different mechanism
State Management	Pinia (official)	Third-party required	Built-in Stores
Learning Curve	Vue developer friendly	React ecosystem	Svelte syntax unique
AI Ecosystem	Vercel AI SDK compatible	Vercel AI SDK native	Community adaptation
Streaming UI	Custom Composable	useChat/useCompletion	Community solutions
Bundle Size	Medium	Larger	Smallest
Developer Experience	Auto-import + HMR	Turbopack HMR	Vite HMR

Selection Guide:

Vue teams → Nuxt 4: Zero learning curve, complete full-stack capabilities
React teams → Next.js 15: Most mature AI SDK ecosystem
Pursuing extreme performance → SvelteKit: Smallest bundle, but AI ecosystem needs improvement

Recommended Online Tools

When developing Vue3 + Nuxt 4 full-stack AI applications, these online tools can help boost your productivity:

JSON Formatter - Format JSON data when debugging AI API responses
Base64 Encode/Decode - Handle Base64 encoded data in AI APIs
Code Formatter - Format Vue3/TypeScript code

Summary

Vue3 + Nuxt 4 full-stack AI applications are fully mature in 2026. The 7 production patterns cover the complete chain from API proxying to edge inference:

Server Routes are the foundation of full-stack AI—no backend needed, API Key secure
SSR + AI lets search engines index AI content, SEO and AI together
Streaming Chat is the core interaction of AI apps, must support interrupt and retry
Edge Inference reduces latency to single digits, Cloudflare Workers is the first choice
RAG Frontend needs a complete search-generate-display interaction chain
Pinia Persistence keeps conversation state across pages and sessions
Production Deployment focuses on security, performance, and observability

Nuxt 4 gives Vue developers full-stack AI capabilities that rival Next.js for the first time—one codebase, one deployment, full-stack AI.

Related Posts:

External References:

Vue3 + Nuxt 4 Full-Stack AI App: 7 Production Patterns from SSR to Edge Inference

Key Takeaways

Table of Contents

Nuxt 4 Full-Stack AI Architecture Overview

Nuxt 4 Full-Stack AI Core Capabilities

Pattern 1: Server Routes + AI API Proxy

Basic Server Route Proxy

Streaming Server Route

Shared Type Definitions

Pattern 2: SSR + AI Streaming Render

SSR Data Preloading

SSR AI Page Component

SSR Cache Middleware

Pattern 3: Streaming Chat UI Component

Streaming Chat Composable

Chat UI Component

Pattern 4: Edge Inference with Cloudflare Workers

Nitro Edge Configuration

Edge Inference Server Route

Wrangler Deployment Configuration

Pattern 5: RAG Frontend Interaction Design

RAG Query Composable

RAG Interaction Component

Pattern 6: Conversation State with Pinia Persistence

Conversation Store

SSR-Safe Initialization Plugin

Pattern 7: Production Deployment & Performance Optimization

Production Configuration

Performance Monitoring Composable

Health Check Endpoint

5 Common Pitfalls and Solutions

Pitfall 1: Accessing localStorage During SSR Causes Hydration Mismatch

Pitfall 2: Streaming Responses Buffered by Nginx/CDN

Pitfall 3: Exposing API Key in Server Routes

Pitfall 4: Edge Runtime Doesn't Support Node.js APIs

Pitfall 5: Excessive Conversation History Exceeds Token Limits

10 Common Error Troubleshooting

Advanced Optimization Techniques

1. AI Response Caching Strategy

2. Multi-Model Intelligent Routing

3. Request Deduplication and Batching

Comparison: Nuxt 4 vs Next.js 15 vs SvelteKit

Recommended Online Tools

Summary