Vue3 + Nuxt 4 Full-Stack AI App: 7 Production Patterns from SSR to Edge Inference
Vue3 + Nuxt 4 Full-Stack AI App: 7 Production Patterns from SSR to Edge Inference
Still building AI apps with separate frontend and backend? Dealing with CORS, deploying two systems? Nuxt 4's Server Routes + SSR + Edge Runtime make Vue3 full-stack AI apps a reality—one codebase, one deployment, SSR-rendered AI content. This article dives into 7 production-grade patterns, from Server Routes proxying to edge inference, every line of code ready for production.
Key Takeaways
- Master Nuxt 4 Server Routes for building AI API proxies
- Implement SSR + AI streaming render first-paint optimization
- Build production-grade streaming chat UI components
- Deploy edge inference to Cloudflare Workers
- Design RAG frontend interaction experiences
- Pinia persistent conversation state with cross-page recovery
- Production performance optimization and deployment best practices
Table of Contents
- Nuxt 4 Full-Stack AI Architecture Overview
- Pattern 1: Server Routes + AI API Proxy
- Pattern 2: SSR + AI Streaming Render
- Pattern 3: Streaming Chat UI Component
- Pattern 4: Edge Inference with Cloudflare Workers
- Pattern 5: RAG Frontend Interaction Design
- Pattern 6: Conversation State with Pinia Persistence
- Pattern 7: Production Deployment & Performance Optimization
- 5 Common Pitfalls and Solutions
- 10 Common Error Troubleshooting
- Advanced Optimization Techniques
- Comparison: Nuxt 4 vs Next.js 15 vs SvelteKit
- Recommended Online Tools
- Summary
Nuxt 4 Full-Stack AI Architecture Overview
Nuxt 4 in 2026 brings critical capabilities for full-stack AI applications: Server Routes as backend API layer, SSR streaming rendering, Edge Runtime support, and native TypeScript end-to-end type safety.
┌──────────────────────────────────────────────────────────┐
│ Nuxt 4 Full-Stack AI Architecture │
├──────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ Browser │───▶│ Nuxt SSR │───▶│ Edge / │ │
│ │ Client │◀───│ Server │◀───│ Node │ │
│ └──────┬──────┘ └──────┬───────┘ └─────┬──────┘ │
│ │ │ │ │
│ ┌──────▼──────┐ ┌──────▼───────┐ ┌─────▼──────┐ │
│ │ Vue3 │ │ Server │ │ AI Models │ │
│ │ Composables│ │ Routes │ │ OpenAI │ │
│ │ Pinia │ │ /api/chat │ │ Anthropic │ │
│ │ Components │ │ /api/embed │ │ Local LLM │ │
│ └─────────────┘ │ /api/rag │ └────────────┘ │
│ └──────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Shared Layer: Types / Utils / Constants │ │
│ └─────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
Nuxt 4 Full-Stack AI Core Capabilities
| Capability | Description | Use Case |
|---|---|---|
| Server Routes | File-based API routing, no Express needed | AI proxy, Webhooks, BFF |
| SSR Streaming | Server-side streaming HTML output | SEO + AI first paint |
| Edge Runtime | Cloudflare/Deno edge deployment | Low-latency inference |
| Nitro Engine | Cross-platform server engine | Multi-environment unified deployment |
| Shared Types | Frontend-backend type sharing | End-to-end type safety |
| useAsyncData | SSR data fetching | AI data preloading |
Pattern 1: Server Routes + AI API Proxy
Nuxt 4 Server Routes let you write APIs directly in your Nuxt project without a separate backend. This is the foundation of full-stack AI—all AI requests are proxied through Server Routes, avoiding API Key exposure, unifying error handling, and implementing rate limiting.
Basic Server Route Proxy
// server/api/chat.post.ts
import { defineEventHandler, readBody, createError } from 'h3'
import { z } from 'zod'
const chatRequestSchema = z.object({
messages: z.array(z.object({
role: z.enum(['user', 'assistant', 'system']),
content: z.string().max(4000),
})).min(1).max(50),
model: z.enum(['gpt-4o', 'gpt-4o-mini', 'claude-sonnet-4-20250514']).default('gpt-4o-mini'),
temperature: z.number().min(0).max(2).default(0.7),
maxTokens: z.number().min(1).max(4096).default(2048),
})
const RATE_LIMIT_WINDOW = 60_000
const RATE_LIMIT_MAX = 20
const requestCounts = new Map<string, { count: number; resetAt: number }>()
function checkRateLimit(ip: string): boolean {
const now = Date.now()
const record = requestCounts.get(ip)
if (!record || now > record.resetAt) {
requestCounts.set(ip, { count: 1, resetAt: now + RATE_LIMIT_WINDOW })
return true
}
if (record.count >= RATE_LIMIT_MAX) {
return false
}
record.count++
return true
}
export default defineEventHandler(async (event) => {
const clientIp = getRequestHeader(event, 'x-forwarded-for') || 'unknown'
if (!checkRateLimit(clientIp)) {
throw createError({
statusCode: 429,
statusMessage: 'Rate limit exceeded. Please try again later.',
})
}
const body = await readBody(event)
const parsed = chatRequestSchema.safeParse(body)
if (!parsed.success) {
throw createError({
statusCode: 400,
statusMessage: `Validation error: ${parsed.error.message}`,
})
}
const { messages, model, temperature, maxTokens } = parsed.data
const apiKey = process.env.OPENAI_API_KEY
if (!apiKey) {
throw createError({
statusCode: 500,
statusMessage: 'AI service not configured',
})
}
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${apiKey}`,
},
body: JSON.stringify({
model,
messages,
temperature,
max_tokens: maxTokens,
}),
})
if (!response.ok) {
const errorData = await response.json().catch(() => ({}))
throw createError({
statusCode: response.status,
statusMessage: errorData.error?.message || 'AI service error',
})
}
return response.json()
})
Streaming Server Route
// server/api/chat/stream.post.ts
import { defineEventHandler, readBody, createError, setResponseHeader, sendStream } from 'h3'
export default defineEventHandler(async (event) => {
const body = await readBody(event)
const { messages, model = 'gpt-4o-mini' } = body
setResponseHeader(event, 'Content-Type', 'text/event-stream')
setResponseHeader(event, 'Cache-Control', 'no-cache')
setResponseHeader(event, 'Connection', 'keep-alive')
setResponseHeader(event, 'X-Accel-Buffering', 'no')
const apiKey = process.env.OPENAI_API_KEY
if (!apiKey) {
throw createError({ statusCode: 500, statusMessage: 'AI service not configured' })
}
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${apiKey}`,
},
body: JSON.stringify({
model,
messages,
stream: true,
}),
})
if (!response.ok) {
throw createError({
statusCode: response.status,
statusMessage: 'AI streaming error',
})
}
const transformStream = new TransformStream({
transform(chunk, controller) {
const text = new TextDecoder().decode(chunk)
const lines = text.split('\n').filter((line) => line.startsWith('data: '))
for (const line of lines) {
const data = line.slice(6)
if (data === '[DONE]') {
controller.enqueue(new TextEncoder().encode('data: [DONE]\n\n'))
continue
}
try {
const parsed = JSON.parse(data)
const content = parsed.choices?.[0]?.delta?.content
if (content) {
controller.enqueue(new TextEncoder().encode(`data: ${JSON.stringify({ content })}\n\n`))
}
} catch {
// skip malformed chunks
}
}
},
})
const readableStream = response.body!.pipeThrough(transformStream)
return sendStream(event, readableStream)
})
Shared Type Definitions
// shared/types/ai.ts
export interface ChatMessage {
id: string
role: 'user' | 'assistant' | 'system'
content: string
timestamp: number
metadata?: MessageMetadata
}
export interface MessageMetadata {
model: string
tokens: number
latency: number
finishReason: string
}
export interface ChatRequest {
messages: Pick<ChatMessage, 'role' | 'content'>[]
model: AIModel
temperature?: number
maxTokens?: number
stream?: boolean
}
export type AIModel = 'gpt-4o' | 'gpt-4o-mini' | 'claude-sonnet-4-20250514'
export interface StreamChunk {
content: string
done: boolean
}
export interface RAGQuery {
question: string
topK?: number
threshold?: number
}
export interface RAGResult {
answer: string
sources: RAGSource[]
confidence: number
}
export interface RAGSource {
content: string
metadata: Record<string, unknown>
score: number
}
Pattern 2: SSR + AI Streaming Render
The core value of combining SSR with AI: search engines can index AI-generated content, and users see AI answers on first paint. Nuxt 4's useAsyncData + Server Routes make SSR AI rendering straightforward.
SSR Data Preloading
// composables/useAIContent.ts
import { useAsyncData, useHead } from '#imports'
interface AIContentOptions {
prompt: string
model?: AIModel
ttl?: number
}
export function useAIContent(options: AIContentOptions) {
const { prompt, model = 'gpt-4o-mini', ttl = 3600 } = options
const { data, pending, error, refresh } = useAsyncData(
`ai-content-${prompt.slice(0, 32)}`,
() => $fetch<string>('/api/ai/generate', {
method: 'POST',
body: { prompt, model },
}),
{
server: true,
lazy: false,
getCachedData(key, nuxtApp) {
const cached = nuxtApp.payload.data[key]
if (cached) {
const expirationDate = new Date(cached.expiresAt)
if (expirationDate.getTime() > Date.now()) {
return cached.data
}
}
return null
},
}
)
useHead({
meta: [
{ name: 'description', content: () => data.value?.slice(0, 160) || '' },
],
})
return { content: data, pending, error, refresh }
}
SSR AI Page Component
<!-- pages/ai-insights/[topic].vue -->
<script setup lang="ts">
const route = useRoute()
const topic = route.params.topic as string
const { content, pending, error } = useAIContent({
prompt: `Generate a comprehensive technical insight about ${topic} for developers in 2026`,
model: 'gpt-4o-mini',
})
useHead({
title: () => `AI Insights: ${topic} | ToolsKu`,
})
</script>
<template>
<div class="mx-auto max-w-4xl px-4 py-8">
<header class="mb-8">
<h2 class="text-3xl font-bold text-gray-900">
AI Insights: {{ topic }}
</h2>
<p class="mt-2 text-gray-500">
AI-generated analysis, verified and curated for developers
</p>
</header>
<div v-if="pending" class="space-y-4">
<div class="h-8 w-3/4 animate-pulse rounded bg-gray-200" />
<div class="h-8 w-1/2 animate-pulse rounded bg-gray-200" />
<div class="h-8 w-2/3 animate-pulse rounded bg-gray-200" />
</div>
<div v-else-if="error" class="rounded-lg bg-red-50 p-4">
<p class="text-red-700">Failed to generate AI content. Please try again.</p>
</div>
<article v-else class="prose prose-lg max-w-none">
<div v-html="content" />
</article>
</div>
</template>
SSR Cache Middleware
// server/middleware/ai-cache.ts
import { defineEventHandler, setResponseHeader } from 'h3'
import { useStorage } from '#imports'
const aiCache = useStorage('ai-cache')
export default defineEventHandler(async (event) => {
if (!event.path.startsWith('/api/ai/')) return
const cacheKey = `ssr-ai:${event.path}:${JSON.stringify(await readBody(event).catch(() => ({})))}`
const cached = await aiCache.getItem<{ data: string; expiresAt: number }>(cacheKey)
if (cached && cached.expiresAt > Date.now()) {
setResponseHeader(event, 'X-AI-Cache', 'HIT')
return cached.data
}
setResponseHeader(event, 'X-AI-Cache', 'MISS')
})
Pattern 3: Streaming Chat UI Component
Streaming chat is the core interaction of AI applications. This pattern implements a complete, production-grade streaming chat component with Markdown rendering, code highlighting, interrupt generation, and message retry.
Streaming Chat Composable
// composables/useStreamingChat.ts
import { ref, computed } from 'vue'
import type { ChatMessage, StreamChunk, AIModel } from '~/shared/types/ai'
interface UseStreamingChatOptions {
apiEndpoint?: string
defaultModel?: AIModel
maxRetries?: number
}
export function useStreamingChat(options: UseStreamingChatOptions = {}) {
const {
apiEndpoint = '/api/chat/stream',
defaultModel = 'gpt-4o-mini',
maxRetries = 2,
} = options
const messages = ref<ChatMessage[]>([])
const currentStreamContent = ref('')
const isStreaming = ref(false)
const error = ref<string | null>(null)
const selectedModel = ref<AIModel>(defaultModel)
let abortController: AbortController | null = null
const displayedMessages = computed(() => {
const base = [...messages.value]
if (isStreaming.value && currentStreamContent.value) {
base.push({
id: 'streaming',
role: 'assistant',
content: currentStreamContent.value,
timestamp: Date.now(),
})
}
return base
})
async function sendMessage(content: string) {
const userMessage: ChatMessage = {
id: crypto.randomUUID(),
role: 'user',
content,
timestamp: Date.now(),
}
messages.value.push(userMessage)
error.value = null
currentStreamContent.value = ''
isStreaming.value = true
abortController = new AbortController()
let retryCount = 0
const attemptStream = async (): Promise<void> => {
try {
const response = await fetch(apiEndpoint, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: messages.value.map((m) => ({
role: m.role,
content: m.content,
})),
model: selectedModel.value,
}),
signal: abortController!.signal,
})
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${response.statusText}`)
}
const reader = response.body!.getReader()
const decoder = new TextDecoder()
let buffer = ''
while (true) {
const { done, value } = await reader.read()
if (done) break
buffer += decoder.decode(value, { stream: true })
const lines = buffer.split('\n')
buffer = lines.pop() || ''
for (const line of lines) {
if (!line.startsWith('data: ')) continue
const data = line.slice(6)
if (data === '[DONE]') {
finalizeStream()
return
}
try {
const chunk: StreamChunk = JSON.parse(data)
currentStreamContent.value += chunk.content
} catch {
// skip malformed chunks
}
}
}
finalizeStream()
} catch (err: any) {
if (err.name === 'AbortError') return
if (retryCount < maxRetries) {
retryCount++
return attemptStream()
}
error.value = err.message
isStreaming.value = false
}
}
await attemptStream()
}
function finalizeStream() {
if (currentStreamContent.value) {
messages.value.push({
id: crypto.randomUUID(),
role: 'assistant',
content: currentStreamContent.value,
timestamp: Date.now(),
})
}
currentStreamContent.value = ''
isStreaming.value = false
}
function stopStreaming() {
abortController?.abort()
finalizeStream()
}
function retryLastMessage() {
const lastUserIndex = messages.value.findLastIndex((m) => m.role === 'user')
if (lastUserIndex === -1) return
const lastUserContent = messages.value[lastUserIndex].content
messages.value = messages.value.slice(0, lastUserIndex)
sendMessage(lastUserContent)
}
function clearMessages() {
messages.value = []
currentStreamContent.value = ''
error.value = null
}
return {
messages: displayedMessages,
isStreaming,
error,
selectedModel,
sendMessage,
stopStreaming,
retryLastMessage,
clearMessages,
}
}
Chat UI Component
<!-- components/AIChatWindow.vue -->
<script setup lang="ts">
import { useStreamingChat } from '~/composables/useStreamingChat'
import { useConversationStore } from '~/stores/conversation'
const props = defineProps<{
conversationId?: string
}>()
const {
messages,
isStreaming,
error,
selectedModel,
sendMessage,
stopStreaming,
retryLastMessage,
clearMessages,
} = useStreamingChat()
const conversationStore = useConversationStore()
const inputText = ref('')
const messagesContainer = ref<HTMLElement>()
const modelOptions = [
{ label: 'GPT-4o', value: 'gpt-4o' as const },
{ label: 'GPT-4o Mini', value: 'gpt-4o-mini' as const },
{ label: 'Claude Sonnet 4', value: 'claude-sonnet-4-20250514' as const },
]
async function handleSubmit() {
const text = inputText.value.trim()
if (!text || isStreaming.value) return
inputText.value = ''
await sendMessage(text)
if (props.conversationId) {
conversationStore.saveConversation(props.conversationId, messages.value)
}
scrollToBottom()
}
function scrollToBottom() {
nextTick(() => {
if (messagesContainer.value) {
messagesContainer.value.scrollTop = messagesContainer.value.scrollHeight
}
})
}
watch(messages, () => scrollToBottom(), { deep: true })
</script>
<template>
<div class="flex h-full flex-col rounded-xl border border-gray-200 bg-white shadow-sm">
<header class="flex items-center justify-between border-b border-gray-200 px-4 py-3">
<div class="flex items-center gap-3">
<span class="text-sm font-medium text-gray-700">AI Chat</span>
<select
v-model="selectedModel"
class="rounded-md border border-gray-300 px-2 py-1 text-xs text-gray-600"
:disabled="isStreaming"
>
<option v-for="opt in modelOptions" :key="opt.value" :value="opt.value">
{{ opt.label }}
</option>
</select>
</div>
<div class="flex gap-2">
<button
class="rounded-md px-2 py-1 text-xs text-gray-500 hover:bg-gray-100"
@click="retryLastMessage"
:disabled="isStreaming || messages.length === 0"
>
Retry
</button>
<button
class="rounded-md px-2 py-1 text-xs text-red-500 hover:bg-red-50"
@click="clearMessages"
:disabled="isStreaming"
>
Clear
</button>
</div>
</header>
<div ref="messagesContainer" class="flex-1 overflow-y-auto p-4 space-y-4">
<div
v-for="msg in messages"
:key="msg.id"
:class="[
'max-w-[80%] rounded-lg px-4 py-2.5 text-sm',
msg.role === 'user'
? 'ml-auto bg-blue-600 text-white'
: 'mr-auto bg-gray-100 text-gray-900',
]"
>
<div v-if="msg.role === 'assistant'" class="prose prose-sm max-w-none" v-html="renderMarkdown(msg.content)" />
<p v-else>{{ msg.content }}</p>
</div>
<div v-if="error" class="mx-auto max-w-md rounded-lg bg-red-50 p-3 text-center text-sm text-red-600">
{{ error }}
<button class="ml-2 underline" @click="retryLastMessage">Retry</button>
</div>
</div>
<footer class="border-t border-gray-200 p-3">
<div class="flex gap-2">
<input
v-model="inputText"
type="text"
placeholder="Type your message..."
class="flex-1 rounded-lg border border-gray-300 px-3 py-2 text-sm focus:border-blue-500 focus:outline-none focus:ring-1 focus:ring-blue-500"
@keydown.enter="handleSubmit"
:disabled="isStreaming"
/>
<button
v-if="isStreaming"
class="rounded-lg bg-red-500 px-4 py-2 text-sm font-medium text-white hover:bg-red-600"
@click="stopStreaming"
>
Stop
</button>
<button
v-else
class="rounded-lg bg-blue-600 px-4 py-2 text-sm font-medium text-white hover:bg-blue-700"
@click="handleSubmit"
:disabled="!inputText.trim()"
>
Send
</button>
</div>
</footer>
</div>
</template>
Pattern 4: Edge Inference with Cloudflare Workers
Edge inference is a key trend for AI applications in 2026—deploying inference logic to the nearest edge node reduces latency from hundreds of milliseconds to single digits. Nuxt 4 + Nitro makes deploying to Cloudflare Workers incredibly simple.
Nitro Edge Configuration
// nuxt.config.ts
export default defineNuxtConfig({
future: {
compatibilityVersion: 4,
},
nitro: {
preset: 'cloudflare-module',
runtimeConfig: {
aiApiKey: process.env.OPENAI_API_KEY,
aiBaseUrl: process.env.AI_BASE_URL || 'https://api.openai.com/v1',
},
routeRules: {
'/api/ai/**': {
cors: true,
headers: {
'cache-control': 'no-cache',
},
},
},
},
})
Edge Inference Server Route
// server/api/ai/edge-chat.post.ts
import { defineEventHandler, readBody, setResponseHeader, sendStream } from 'h3'
interface EdgeAIConfig {
provider: 'openai' | 'anthropic' | 'local'
baseUrl: string
apiKey: string
}
function getAIConfig(event: any): EdgeAIConfig {
const config = useRuntimeConfig(event)
const provider = getHeader(event, 'x-ai-provider') || 'openai'
const configs: Record<string, EdgeAIConfig> = {
openai: {
provider: 'openai',
baseUrl: config.public.aiBaseUrl || 'https://api.openai.com/v1',
apiKey: config.aiApiKey,
},
anthropic: {
provider: 'anthropic',
baseUrl: 'https://api.anthropic.com/v1',
apiKey: process.env.ANTHROPIC_API_KEY || '',
},
local: {
provider: 'local',
baseUrl: process.env.LOCAL_AI_URL || 'http://localhost:11434/v1',
apiKey: 'local',
},
}
return configs[provider] || configs.openai
}
export default defineEventHandler(async (event) => {
const body = await readBody(event)
const { messages, model = 'gpt-4o-mini' } = body
const aiConfig = getAIConfig(event)
setResponseHeader(event, 'Content-Type', 'text/event-stream')
setResponseHeader(event, 'Cache-Control', 'no-cache')
setResponseHeader(event, 'Connection', 'keep-alive')
const endpoint = aiConfig.provider === 'anthropic'
? `${aiConfig.baseUrl}/messages`
: `${aiConfig.baseUrl}/chat/completions`
const requestHeaders: Record<string, string> = {
'Content-Type': 'application/json',
}
if (aiConfig.provider === 'anthropic') {
requestHeaders['x-api-key'] = aiConfig.apiKey
requestHeaders['anthropic-version'] = '2023-06-01'
} else {
requestHeaders['Authorization'] = `Bearer ${aiConfig.apiKey}`
}
const requestBody = aiConfig.provider === 'anthropic'
? {
model,
messages: messages.map((m: any) => ({ role: m.role, content: m.content })),
max_tokens: 2048,
stream: true,
}
: {
model,
messages: messages.map((m: any) => ({ role: m.role, content: m.content })),
stream: true,
}
const response = await fetch(endpoint, {
method: 'POST',
headers: requestHeaders,
body: JSON.stringify(requestBody),
})
if (!response.ok) {
throw createError({
statusCode: response.status,
statusMessage: `Edge AI error: ${response.statusText}`,
})
}
return sendStream(event, response.body!)
})
Wrangler Deployment Configuration
# wrangler.toml
name = "toolsku-ai-edge"
main = ".output/server/index.mjs"
compatibility_date = "2026-06-01"
compatibility_flags = ["nodejs_compat"]
[vars]
AI_BASE_URL = "https://api.openai.com/v1"
[ai]
binding = "AI"
[[r2_buckets]]
binding = "AI_CACHE"
bucket_name = "toolsku-ai-cache"
[observability]
enabled = true
Pattern 5: RAG Frontend Interaction Design
RAG (Retrieval-Augmented Generation) is one of the most practical patterns for AI applications. The frontend needs to handle the complete interaction chain: document upload, vector search, and result display.
RAG Query Composable
// composables/useRAG.ts
import { ref, computed } from 'vue'
import type { RAGQuery, RAGResult, RAGSource } from '~/shared/types/ai'
interface DocumentChunk {
id: string
content: string
metadata: {
source: string
page: number
section: string
}
score: number
}
export function useRAG() {
const query = ref('')
const isSearching = ref(false)
const isGenerating = ref(false)
const searchResults = ref<DocumentChunk[]>([])
const ragAnswer = ref('')
const ragSources = ref<RAGSource[]>([])
const error = ref<string | null>(null)
const hasResults = computed(() => searchResults.value.length > 0)
const isProcessing = computed(() => isSearching.value || isGenerating.value)
async function searchDocuments(searchQuery: string, topK = 5) {
isSearching.value = true
error.value = null
searchResults.value = []
try {
const results = await $fetch<DocumentChunk[]>('/api/rag/search', {
method: 'POST',
body: { query: searchQuery, topK },
})
searchResults.value = results
} catch (err: any) {
error.value = err.data?.message || 'Search failed'
} finally {
isSearching.value = false
}
}
async function generateAnswer(searchQuery: string) {
isGenerating.value = true
ragAnswer.value = ''
ragSources.value = []
try {
const result = await $fetch<RAGResult>('/api/rag/generate', {
method: 'POST',
body: {
question: searchQuery,
topK: 5,
threshold: 0.7,
} satisfies RAGQuery,
})
ragAnswer.value = result.answer
ragSources.value = result.sources
} catch (err: any) {
error.value = err.data?.message || 'Generation failed'
} finally {
isGenerating.value = false
}
}
async function fullRAGPipeline(searchQuery: string) {
await searchDocuments(searchQuery)
if (searchResults.value.length > 0) {
await generateAnswer(searchQuery)
}
}
return {
query,
isSearching,
isGenerating,
isProcessing,
searchResults,
ragAnswer,
ragSources,
hasResults,
error,
searchDocuments,
generateAnswer,
fullRAGPipeline,
}
}
RAG Interaction Component
<!-- components/RAGSearchPanel.vue -->
<script setup lang="ts">
import { useRAG } from '~/composables/useRAG'
const {
query,
isProcessing,
searchResults,
ragAnswer,
ragSources,
error,
fullRAGPipeline,
} = useRAG()
const showSources = ref(false)
async function handleSearch() {
if (!query.value.trim()) return
await fullRAGPipeline(query.value)
}
</script>
<template>
<div class="mx-auto max-w-3xl space-y-6">
<div class="flex gap-2">
<input
v-model="query"
type="text"
placeholder="Ask about your documents..."
class="flex-1 rounded-lg border border-gray-300 px-4 py-2.5 text-sm focus:border-blue-500 focus:outline-none focus:ring-1 focus:ring-blue-500"
@keydown.enter="handleSearch"
:disabled="isProcessing"
/>
<button
class="rounded-lg bg-blue-600 px-6 py-2.5 text-sm font-medium text-white hover:bg-blue-700 disabled:opacity-50"
@click="handleSearch"
:disabled="isProcessing || !query.trim()"
>
{{ isProcessing ? 'Searching...' : 'Search' }}
</button>
</div>
<div v-if="error" class="rounded-lg bg-red-50 p-4 text-sm text-red-600">
{{ error }}
</div>
<div v-if="ragAnswer" class="rounded-lg border border-gray-200 bg-white p-6 shadow-sm">
<h3 class="mb-3 text-sm font-semibold text-gray-500 uppercase tracking-wide">AI Answer</h3>
<div class="prose prose-sm max-w-none" v-html="ragAnswer" />
<button
class="mt-4 text-xs text-blue-600 hover:underline"
@click="showSources = !showSources"
>
{{ showSources ? 'Hide' : 'Show' }} Sources ({{ ragSources.length }})
</button>
</div>
<div v-if="showSources && ragSources.length" class="space-y-3">
<h3 class="text-sm font-semibold text-gray-500 uppercase tracking-wide">Sources</h3>
<div
v-for="(source, index) in ragSources"
:key="index"
class="rounded-lg border border-gray-200 bg-gray-50 p-4"
>
<div class="mb-2 flex items-center justify-between">
<span class="text-xs font-medium text-gray-500">
Relevance: {{ (source.score * 100).toFixed(1) }}%
</span>
</div>
<p class="text-sm text-gray-700 line-clamp-3">{{ source.content }}</p>
</div>
</div>
<div v-if="searchResults.length && !ragAnswer" class="space-y-3">
<h3 class="text-sm font-semibold text-gray-500 uppercase tracking-wide">Matching Chunks</h3>
<div
v-for="chunk in searchResults"
:key="chunk.id"
class="rounded-lg border border-gray-200 bg-white p-4"
>
<div class="mb-2 flex items-center gap-2">
<span class="rounded bg-blue-100 px-2 py-0.5 text-xs font-medium text-blue-700">
{{ chunk.metadata.source }}
</span>
<span class="text-xs text-gray-400">Page {{ chunk.metadata.page }}</span>
</div>
<p class="text-sm text-gray-700">{{ chunk.content }}</p>
</div>
</div>
</div>
</template>
Pattern 6: Conversation State with Pinia Persistence
One of the core challenges of AI chat applications is state management—conversation history, model selection, and user preferences all need persistence. Pinia + Nuxt 4's SSR-compatible solution makes this straightforward.
Conversation Store
// stores/conversation.ts
import { defineStore } from 'pinia'
import type { ChatMessage, AIModel } from '~/shared/types/ai'
interface Conversation {
id: string
title: string
messages: ChatMessage[]
model: AIModel
createdAt: number
updatedAt: number
}
interface ConversationState {
conversations: Map<string, Conversation>
activeConversationId: string | null
preferences: {
defaultModel: AIModel
temperature: number
systemPrompt: string
streamByDefault: boolean
}
}
export const useConversationStore = defineStore('conversation', {
state: (): ConversationState => ({
conversations: new Map(),
activeConversationId: null,
preferences: {
defaultModel: 'gpt-4o-mini',
temperature: 0.7,
systemPrompt: 'You are a helpful assistant.',
streamByDefault: true,
},
}),
getters: {
activeConversation(state): Conversation | undefined {
if (!state.activeConversationId) return undefined
return state.conversations.get(state.activeConversationId)
},
conversationList(state): Conversation[] {
return Array.from(state.conversations.values())
.sort((a, b) => b.updatedAt - a.updatedAt)
},
messageCount(state): number {
return (id: string) => state.conversations.get(id)?.messages.length || 0
},
},
actions: {
createConversation(title?: string): string {
const id = crypto.randomUUID()
const conversation: Conversation = {
id,
title: title || `Chat ${this.conversations.size + 1}`,
messages: [],
model: this.preferences.defaultModel,
createdAt: Date.now(),
updatedAt: Date.now(),
}
this.conversations.set(id, conversation)
this.activeConversationId = id
this.persist()
return id
},
saveConversation(id: string, messages: ChatMessage[]) {
const conversation = this.conversations.get(id)
if (!conversation) return
conversation.messages = messages
conversation.updatedAt = Date.now()
if (messages.length > 0 && messages[0].role === 'user') {
conversation.title = messages[0].content.slice(0, 50)
}
this.persist()
},
deleteConversation(id: string) {
this.conversations.delete(id)
if (this.activeConversationId === id) {
const remaining = this.conversationList
this.activeConversationId = remaining.length > 0 ? remaining[0].id : null
}
this.persist()
},
setActiveConversation(id: string) {
this.activeConversationId = id
},
updatePreferences(prefs: Partial<ConversationState['preferences']>) {
this.preferences = { ...this.preferences, ...prefs }
this.persist()
},
persist() {
if (import.meta.client) {
const data = {
conversations: Object.fromEntries(this.conversations),
activeConversationId: this.activeConversationId,
preferences: this.preferences,
}
localStorage.setItem('toolsku-ai-conversations', JSON.stringify(data))
}
},
hydrate() {
if (import.meta.client) {
const stored = localStorage.getItem('toolsku-ai-conversations')
if (stored) {
try {
const data = JSON.parse(stored)
this.conversations = new Map(Object.entries(data.conversations))
this.activeConversationId = data.activeConversationId
this.preferences = data.preferences
} catch {
localStorage.removeItem('toolsku-ai-conversations')
}
}
}
},
},
})
SSR-Safe Initialization Plugin
// plugins/conversation-init.client.ts
import { useConversationStore } from '~/stores/conversation'
export default defineNuxtPlugin(() => {
const store = useConversationStore()
store.hydrate()
})
Pattern 7: Production Deployment & Performance Optimization
From development to production, Nuxt 4 full-stack AI applications need attention to performance, security, and observability.
Production Configuration
// nuxt.config.ts (production)
export default defineNuxtConfig({
future: {
compatibilityVersion: 4,
},
nitro: {
compressPublicAssets: true,
minify: true,
routeRules: {
'/api/ai/**': {
cors: false,
headers: {
'strict-transport-security': 'max-age=31536000; includeSubDomains',
'x-content-type-options': 'nosniff',
'x-frame-options': 'DENY',
},
},
'/api/chat/stream': {
headers: {
'cache-control': 'no-cache, no-store, must-revalidate',
'x-accel-buffering': 'no',
},
},
},
rollupConfig: {
external: ['sharp', 'canvas'],
},
},
app: {
head: {
meta: [
{ 'http-equiv': 'X-UA-Compatible', content: 'IE=edge' },
{ name: 'viewport', content: 'width=device-width, initial-scale=1' },
],
},
},
experimental: {
payloadExtraction: true,
renderJsonPayloads: true,
},
vite: {
build: {
rollupOptions: {
output: {
manualChunks: {
'ai-vendor': ['openai'],
'markdown': ['marked', 'highlight.js'],
},
},
},
},
},
})
Performance Monitoring Composable
// composables/useAIPerformance.ts
import { ref, computed } from 'vue'
interface PerformanceMetric {
name: string
startTime: number
endTime: number
duration: number
metadata?: Record<string, unknown>
}
export function useAIPerformance() {
const metrics = ref<PerformanceMetric[]>([])
const activeTimers = new Map<string, number>()
function startTimer(name: string) {
activeTimers.set(name, performance.now())
}
function endTimer(name: string, metadata?: Record<string, unknown>) {
const startTime = activeTimers.get(name)
if (startTime === undefined) return
const endTime = performance.now()
metrics.value.push({
name,
startTime,
endTime,
duration: endTime - startTime,
metadata,
})
activeTimers.delete(name)
}
const averageLatency = computed(() => {
const chatMetrics = metrics.value.filter((m) => m.name === 'ai-response')
if (chatMetrics.length === 0) return 0
return chatMetrics.reduce((sum, m) => sum + m.duration, 0) / chatMetrics.length
})
const p95Latency = computed(() => {
const chatMetrics = metrics.value
.filter((m) => m.name === 'ai-response')
.sort((a, b) => a.duration - b.duration)
if (chatMetrics.length < 2) return 0
const index = Math.ceil(chatMetrics.length * 0.95) - 1
return chatMetrics[index].duration
})
function getReport() {
return {
totalRequests: metrics.value.filter((m) => m.name === 'ai-response').length,
averageLatency: averageLatency.value,
p95Latency: p95Latency.value,
errorRate: metrics.value.filter((m) => m.metadata?.error).length / Math.max(metrics.value.length, 1),
}
}
return { metrics, startTimer, endTimer, averageLatency, p95Latency, getReport }
}
Health Check Endpoint
// server/api/health.get.ts
import { defineEventHandler } from 'h3'
export default defineEventHandler(async () => {
const checks: Record<string, { status: 'ok' | 'error'; latency?: number; error?: string }> = {}
const aiStart = Date.now()
try {
const apiKey = process.env.OPENAI_API_KEY
if (!apiKey) throw new Error('API key not configured')
await fetch('https://api.openai.com/v1/models', {
headers: { Authorization: `Bearer ${apiKey}` },
signal: AbortSignal.timeout(5000),
})
checks.ai = { status: 'ok', latency: Date.now() - aiStart }
} catch (err: any) {
checks.ai = { status: 'error', error: err.message }
}
const overallStatus = Object.values(checks).every((c) => c.status === 'ok') ? 'ok' : 'degraded'
return {
status: overallStatus,
timestamp: new Date().toISOString(),
version: process.env.APP_VERSION || 'unknown',
checks,
}
})
5 Common Pitfalls and Solutions
Pitfall 1: Accessing localStorage During SSR Causes Hydration Mismatch
Problem: Pinia persistence reads localStorage during SSR, causing data inconsistency during client hydration.
Solution:
// Only execute persistence logic on the client
if (import.meta.client) {
store.hydrate()
}
// Or use useCookie instead of localStorage
const savedData = useCookie('ai-conversations', {
maxAge: 60 * 60 * 24 * 30,
sameSite: 'lax',
})
Pitfall 2: Streaming Responses Buffered by Nginx/CDN
Problem: SSE streaming responses are buffered by intermediate layers, users don't see character-by-character output.
Solution:
# nginx.conf
location /api/chat/stream {
proxy_pass http://nuxt_backend;
proxy_buffering off;
proxy_cache off;
proxy_set_header Connection '';
proxy_http_version 1.1;
chunked_transfer_encoding off;
}
Pitfall 3: Exposing API Key in Server Routes
Problem: AI API Key leaked in frontend code or error responses.
Solution:
// server/api/chat.post.ts
// Never return API Key in responses
const apiKey = useRuntimeConfig().aiApiKey // Server-side private config
// Error responses must also filter sensitive information
catch (err: any) {
throw createError({
statusCode: 500,
statusMessage: 'AI service error', // Don't expose err.message
})
}
Pitfall 4: Edge Runtime Doesn't Support Node.js APIs
Problem: Cloudflare Workers don't support fs, path, and other Node.js modules.
Solution:
// Use Nitro's auto-import detection
// server/api/ai/edge-chat.post.ts
// Avoid Node.js APIs, use Web standard APIs
// Wrong: import { readFile } from 'fs'
// Right: Use env variables or KV storage
export default defineEventHandler(async (event) => {
const config = useRuntimeConfig(event)
// Use config instead of reading files
})
Pitfall 5: Excessive Conversation History Exceeds Token Limits
Problem: Sending full conversation history to AI API exceeds model token limits.
Solution:
// utils/message-trimmer.ts
import type { ChatMessage } from '~/shared/types/ai'
const MAX_CONTEXT_TOKENS = 4096
const CHARS_PER_TOKEN = 4
export function trimMessages(messages: ChatMessage[], maxTokens = MAX_CONTEXT_TOKENS): ChatMessage[] {
let totalTokens = 0
const trimmed: ChatMessage[] = []
for (let i = messages.length - 1; i >= 0; i--) {
const estimatedTokens = Math.ceil(messages[i].content.length / CHARS_PER_TOKEN)
if (totalTokens + estimatedTokens > maxTokens) break
totalTokens += estimatedTokens
trimmed.unshift(messages[i])
}
if (trimmed[0]?.role !== 'system' && messages[0]?.role === 'system') {
trimmed.unshift(messages[0])
}
return trimmed
}
10 Common Error Troubleshooting
| # | Error Message | Cause | Solution |
|---|---|---|---|
| 1 | Hydration mismatch |
SSR/CSR data inconsistency | Ensure localStorage operations are in import.meta.client |
| 2 | 429 Too Many Requests |
AI API rate limiting | Implement request queue and exponential backoff retry |
| 3 | fetch failed in Server Routes |
Cannot access external API during SSR | Check server network and DNS configuration |
| 4 | CORS error |
Cross-origin request blocked | Use Server Routes proxy instead of direct calls |
| 5 | Stream interrupted |
Connection timeout or interruption | Implement resumption and auto-reconnect |
| 6 | context_length_exceeded |
Conversation history too long | Use trimMessages to trim context |
| 7 | Invalid API Key |
Environment variable not set | Check .env and runtimeConfig configuration |
| 8 | Worker exceeded CPU time limit |
Edge runtime timeout | Optimize inference logic, use streaming responses |
| 9 | Module not found: fs |
Edge environment doesn't support Node modules | Use Web standard APIs instead of Node.js APIs |
| 10 | Pinia store not initialized |
Store not ready during SSR | Use callOnce or onNuxtReady to initialize |
Advanced Optimization Techniques
1. AI Response Caching Strategy
// server/utils/ai-cache.ts
import { useStorage } from '#imports'
const cache = useStorage('redis')
interface CacheEntry<T> {
data: T
expiresAt: number
hitCount: number
}
export async function getCachedAIResponse<T>(
key: string,
generator: () => Promise<T>,
ttl = 3600
): Promise<T> {
const cached = await cache.getItem<CacheEntry<T>>(`ai:${key}`)
if (cached && cached.expiresAt > Date.now()) {
cached.hitCount++
await cache.setItem(`ai:${key}`, cached)
return cached.data
}
const data = await generator()
await cache.setItem(`ai:${key}`, {
data,
expiresAt: Date.now() + ttl * 1000,
hitCount: 0,
})
return data
}
2. Multi-Model Intelligent Routing
// server/utils/model-router.ts
import type { AIModel } from '~/shared/types/ai'
interface ModelRoute {
model: AIModel
condition: (messages: any[]) => boolean
priority: number
}
const routes: ModelRoute[] = [
{
model: 'gpt-4o',
condition: (msgs) => msgs.length > 20 || msgs.some((m) => m.content.length > 2000),
priority: 10,
},
{
model: 'gpt-4o-mini',
condition: () => true,
priority: 1,
},
]
export function selectModel(messages: any[]): AIModel {
const matched = routes
.filter((r) => r.condition(messages))
.sort((a, b) => b.priority - a.priority)
return matched[0]?.model || 'gpt-4o-mini'
}
3. Request Deduplication and Batching
// server/utils/request-dedup.ts
const pendingRequests = new Map<string, Promise<any>>()
export async function deduplicatedFetch<T>(
key: string,
fetcher: () => Promise<T>
): Promise<T> {
const pending = pendingRequests.get(key)
if (pending) return pending as Promise<T>
const promise = fetcher().finally(() => {
pendingRequests.delete(key)
})
pendingRequests.set(key, promise)
return promise
}
Comparison: Nuxt 4 vs Next.js 15 vs SvelteKit
| Dimension | Nuxt 4 | Next.js 15 | SvelteKit |
|---|---|---|---|
| Full-Stack AI | Server Routes + Nitro | Route Handlers + Edge | Server Endpoints |
| SSR Streaming | Native support | App Router support | Supported but smaller ecosystem |
| Edge Deployment | Cloudflare/Vercel/Deno | Vercel Edge first | Cloudflare adapter |
| Type Safety | Shared Types across stack | Manual configuration needed | Built-in but different mechanism |
| State Management | Pinia (official) | Third-party required | Built-in Stores |
| Learning Curve | Vue developer friendly | React ecosystem | Svelte syntax unique |
| AI Ecosystem | Vercel AI SDK compatible | Vercel AI SDK native | Community adaptation |
| Streaming UI | Custom Composable | useChat/useCompletion | Community solutions |
| Bundle Size | Medium | Larger | Smallest |
| Developer Experience | Auto-import + HMR | Turbopack HMR | Vite HMR |
Selection Guide:
- Vue teams → Nuxt 4: Zero learning curve, complete full-stack capabilities
- React teams → Next.js 15: Most mature AI SDK ecosystem
- Pursuing extreme performance → SvelteKit: Smallest bundle, but AI ecosystem needs improvement
Recommended Online Tools
When developing Vue3 + Nuxt 4 full-stack AI applications, these online tools can help boost your productivity:
- JSON Formatter - Format JSON data when debugging AI API responses
- Base64 Encode/Decode - Handle Base64 encoded data in AI APIs
- Code Formatter - Format Vue3/TypeScript code
Summary
Vue3 + Nuxt 4 full-stack AI applications are fully mature in 2026. The 7 production patterns cover the complete chain from API proxying to edge inference:
- Server Routes are the foundation of full-stack AI—no backend needed, API Key secure
- SSR + AI lets search engines index AI content, SEO and AI together
- Streaming Chat is the core interaction of AI apps, must support interrupt and retry
- Edge Inference reduces latency to single digits, Cloudflare Workers is the first choice
- RAG Frontend needs a complete search-generate-display interaction chain
- Pinia Persistence keeps conversation state across pages and sessions
- Production Deployment focuses on security, performance, and observability
Nuxt 4 gives Vue developers full-stack AI capabilities that rival Next.js for the first time—one codebase, one deployment, full-stack AI.
Related Posts:
- Vue3 AI Integration: 5 LLM Interaction Patterns
- Nuxt4 + AI Streaming SSR Optimization
- Vue3 Pinia State Management Deep Guide
External References:
Try these browser-local tools — no sign-up required →