Vue3 AI Integration: 5 LLM Interaction Patterns and Streaming Response Solutions in 2026
Still using fetch to wait for complete LLM responses before rendering? Users staring at blank screens for 10 seconds? In 2026, AI application UX standards have evolved—streaming output, real-time feedback, and intelligent interaction are now the baseline. This article walks you through 5 Vue3 LLM interaction patterns, each ready to copy into your project.
Background: Evolution of Frontend LLM Interaction
| Stage |
Interaction Mode |
User Experience |
Technical Implementation |
| 1.0 |
Request-Wait-Response |
Long blank wait |
HTTP fetch |
| 2.0 |
Streaming Response |
Character-by-character output, like typing |
SSE / WebSocket |
| 3.0 |
Function Calling |
AI proactively invokes tools |
Structured output + frontend routing |
| 4.0 |
Multi-Model Routing |
Select optimal model per scenario |
Intelligent routing layer |
| 5.0 |
Agent Autonomous Interaction |
AI plans and executes autonomously |
Multi-turn dialogue + tool chain |
Problem Analysis: Why Traditional Approaches Fall Short
Three major pain points with traditional frontend LLM integration:
- Waiting Anxiety: No feedback before the complete response arrives
- Timeout Crashes: LLM response times are unpredictable; long text generation may exceed 30 seconds
- Feature Disconnect: AI capabilities are decoupled from UI interaction, preventing intelligent user experiences
Pattern 1: SSE Streaming Response
Composable Implementation
// composables/useSSEChat.ts
import { ref, onUnmounted } from 'vue'
interface ChatMessage {
role: 'user' | 'assistant' | 'system'
content: string
timestamp: number
}
interface UseSSEChatOptions {
apiUrl: string
model?: string
onToken?: (token: string) => void
onError?: (error: Error) => void
onComplete?: (fullText: string) => void
}
export function useSSEChat(options: UseSSEChatOptions) {
const messages = ref<ChatMessage[]>([])
const currentResponse = ref('')
const isLoading = ref(false)
const error = ref<string | null>(null)
let abortController: AbortController | null = null
async function sendMessage(content: string) {
isLoading.value = true
error.value = null
currentResponse.value = ''
abortController = new AbortController()
messages.value.push({
role: 'user',
content,
timestamp: Date.now(),
})
try {
const response = await fetch(options.apiUrl, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${import.meta.env.VITE_AI_API_KEY}`,
},
body: JSON.stringify({
model: options.model || 'gpt-4',
messages: messages.value.map(({ role, content: c }) => ({ role, content: c })),
stream: true,
}),
signal: abortController.signal,
})
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${response.statusText}`)
}
const reader = response.body?.getReader()
if (!reader) throw new Error('ReadableStream not available')
const decoder = new TextDecoder()
let buffer = ''
while (true) {
const { done, value } = await reader.read()
if (done) break
buffer += decoder.decode(value, { stream: true })
const lines = buffer.split('\n')
buffer = lines.pop() || ''
for (const line of lines) {
const trimmed = line.trim()
if (!trimmed || trimmed === 'data: [DONE]') continue
if (!trimmed.startsWith('data: ')) continue
try {
const json = JSON.parse(trimmed.slice(6))
const token = json.choices?.[0]?.delta?.content || ''
if (token) {
currentResponse.value += token
options.onToken?.(token)
}
} catch {
// Ignore parse errors
}
}
}
messages.value.push({
role: 'assistant',
content: currentResponse.value,
timestamp: Date.now(),
})
options.onComplete?.(currentResponse.value)
} catch (e: any) {
if (e.name !== 'AbortError') {
error.value = e.message
options.onError?.(e)
}
} finally {
isLoading.value = false
abortController = null
}
}
function stopGeneration() {
abortController?.abort()
if (currentResponse.value) {
messages.value.push({
role: 'assistant',
content: currentResponse.value,
timestamp: Date.now(),
})
}
isLoading.value = false
}
function clearMessages() {
messages.value = []
currentResponse.value = ''
error.value = null
}
onUnmounted(() => {
abortController?.abort()
})
return {
messages,
currentResponse,
isLoading,
error,
sendMessage,
stopGeneration,
clearMessages,
}
}
Component Usage
<!-- components/AIChat.vue -->
<template>
<div class="ai-chat">
<div class="messages">
<div v-for="msg in messages" :key="msg.timestamp" :class="['message', msg.role]">
<div class="content">{{ msg.content }}</div>
</div>
<div v-if="isLoading" class="message assistant streaming">
<div class="content">{{ currentResponse }}<span class="cursor">▊</span></div>
</div>
</div>
<div class="input-area">
<input
v-model="inputText"
placeholder="Type a message..."
@keydown.enter="handleSend"
:disabled="isLoading"
/>
<button v-if="!isLoading" @click="handleSend" :disabled="!inputText.trim()">Send</button>
<button v-else @click="stopGeneration" class="stop">Stop</button>
</div>
</div>
</template>
<script setup lang="ts">
import { ref } from 'vue'
import { useSSEChat } from '../composables/useSSEChat'
const inputText = ref('')
const { messages, currentResponse, isLoading, sendMessage, stopGeneration } = useSSEChat({
apiUrl: '/api/v1/chat/completions',
model: 'gpt-4',
})
function handleSend() {
const text = inputText.value.trim()
if (!text) return
inputText.value = ''
sendMessage(text)
}
</script>
Pattern 2: Function Calling Frontend Integration
// composables/useFunctionCalling.ts
import { ref } from 'vue'
interface FunctionDefinition {
name: string
description: string
parameters: Record<string, any>
execute: (args: any) => Promise<any>
}
export function useFunctionCalling() {
const functions = ref<Map<string, FunctionDefinition>>(new Map())
const executionLog = ref<Array<{ name: string; args: any; result: any; timestamp: number }>>([])
function registerFunction(fn: FunctionDefinition) {
functions.value.set(fn.name, fn)
}
function getToolsSchema() {
return Array.from(functions.value.values()).map(fn => ({
type: 'function',
function: {
name: fn.name,
description: fn.description,
parameters: fn.parameters,
},
}))
}
async function handleToolCalls(toolCalls: any[]) {
const results = []
for (const call of toolCalls) {
const fn = functions.value.get(call.function.name)
if (!fn) {
results.push({
tool_call_id: call.id,
role: 'tool',
content: JSON.stringify({ error: `Unknown function: ${call.function.name}` }),
})
continue
}
try {
const args = JSON.parse(call.function.arguments)
const result = await fn.execute(args)
executionLog.value.push({ name: call.function.name, args, result, timestamp: Date.now() })
results.push({
tool_call_id: call.id,
role: 'tool',
content: JSON.stringify(result),
})
} catch (e: any) {
results.push({
tool_call_id: call.id,
role: 'tool',
content: JSON.stringify({ error: e.message }),
})
}
}
return results
}
return { functions, executionLog, registerFunction, getToolsSchema, handleToolCalls }
}
Pattern 3: Multi-Model Intelligent Routing
// composables/useModelRouter.ts
import { ref } from 'vue'
interface ModelConfig {
id: string
name: string
maxTokens: number
costPer1k: number
latencyMs: number
capabilities: string[]
}
const MODEL_REGISTRY: ModelConfig[] = [
{ id: 'gpt-4', name: 'GPT-4', maxTokens: 128000, costPer1k: 0.03, latencyMs: 2000, capabilities: ['reasoning', 'code', 'writing'] },
{ id: 'gpt-4o-mini', name: 'GPT-4o Mini', maxTokens: 128000, costPer1k: 0.00015, latencyMs: 500, capabilities: ['chat', 'summary'] },
{ id: 'claude-3.5-sonnet', name: 'Claude 3.5 Sonnet', maxTokens: 200000, costPer1k: 0.003, latencyMs: 1500, capabilities: ['reasoning', 'code', 'analysis'] },
{ id: 'deepseek-v3', name: 'DeepSeek V3', maxTokens: 128000, costPer1k: 0.00027, latencyMs: 800, capabilities: ['code', 'math', 'reasoning'] },
]
export function useModelRouter() {
const currentModel = ref<ModelConfig>(MODEL_REGISTRY[1])
const routingLog = ref<Array<{ input: string; model: string; reason: string }>>([])
function selectModel(input: string, options?: { preferSpeed?: boolean; preferQuality?: boolean }) {
const lower = input.toLowerCase()
if (options?.preferSpeed || lower.length < 50) {
currentModel.value = MODEL_REGISTRY[1]
routingLog.value.push({ input: input.slice(0, 50), model: currentModel.value.id, reason: 'Speed priority / short input' })
return currentModel.value
}
if (lower.includes('code') || lower.includes('debug')) {
currentModel.value = MODEL_REGISTRY[3]
routingLog.value.push({ input: input.slice(0, 50), model: currentModel.value.id, reason: 'Code task' })
return currentModel.value
}
if (options?.preferQuality || lower.length > 500) {
currentModel.value = MODEL_REGISTRY[0]
routingLog.value.push({ input: input.slice(0, 50), model: currentModel.value.id, reason: 'Quality priority / long input' })
return currentModel.value
}
currentModel.value = MODEL_REGISTRY[2]
routingLog.value.push({ input: input.slice(0, 50), model: currentModel.value.id, reason: 'Default reasoning' })
return currentModel.value
}
return { currentModel, routingLog, selectModel, MODEL_REGISTRY }
}
Pattern 4: AI Chat Component with Markdown Rendering
<!-- components/AIMarkdownRenderer.vue -->
<template>
<div class="ai-markdown" v-html="renderedContent"></div>
</template>
<script setup lang="ts">
import { computed } from 'vue'
import { marked } from 'marked'
const props = defineProps<{ content: string }>()
const renderedContent = computed(() => {
if (!props.content) return ''
return marked.parse(props.content, { async: false }) as string
})
</script>
Pattern 5: Agent Autonomous Interaction
// composables/useAIAgent.ts
import { ref } from 'vue'
import { useSSEChat } from './useSSEChat'
import { useFunctionCalling } from './useFunctionCalling'
interface AgentStep {
type: 'thinking' | 'tool_call' | 'tool_result' | 'response'
content: string
timestamp: number
}
export function useAIAgent(apiUrl: string) {
const steps = ref<AgentStep[]>([])
const isRunning = ref(false)
const maxIterations = 5
const chat = useSSEChat({ apiUrl })
const fc = useFunctionCalling()
fc.registerFunction({
name: 'search_web',
description: 'Search the internet for latest information',
parameters: {
type: 'object',
properties: { query: { type: 'string', description: 'Search keywords' } },
required: ['query'],
},
execute: async (args) => {
const resp = await fetch(`/api/search?q=${encodeURIComponent(args.query)}`)
return resp.json()
},
})
fc.registerFunction({
name: 'run_code',
description: 'Execute code and return results',
parameters: {
type: 'object',
properties: { code: { type: 'string', description: 'Code to execute' }, language: { type: 'string', description: 'Programming language' } },
required: ['code'],
},
execute: async (args) => {
const resp = await fetch('/api/execute', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(args),
})
return resp.json()
},
})
async function run(task: string) {
isRunning.value = true
steps.value = []
let iteration = 0
let currentInput = task
while (iteration < maxIterations) {
iteration++
steps.value.push({ type: 'thinking', content: `Iteration ${iteration}...`, timestamp: Date.now() })
await chat.sendMessage(currentInput)
const lastAssistantMsg = chat.messages.value[chat.messages.value.length - 1]
if (!lastAssistantMsg || lastAssistantMsg.role !== 'assistant') break
steps.value.push({ type: 'response', content: lastAssistantMsg.content, timestamp: Date.now() })
const hasToolCall = lastAssistantMsg.content.includes('function_call') || lastAssistantMsg.content.includes('tool_call')
if (!hasToolCall) break
steps.value.push({ type: 'tool_call', content: 'Executing tool call...', timestamp: Date.now() })
break
}
isRunning.value = false
}
return { steps, isRunning, run, fc }
}
Pitfall Guide
| # |
Pitfall |
Symptom |
Solution |
| 1 |
SSE connection not properly closed |
Still receiving data after component unmount, memory leak |
Call abortController.abort() in onUnmounted |
| 2 |
Incomplete streaming parse buffer |
Last line of data lost |
Keep buffer remainder, concatenate on next read |
| 3 |
Function Calling argument parse failure |
JSON.parse(arguments) throws |
Wrap in try-catch, fallback to plain text response |
| 4 |
Multi-model routing infinite loop |
Model A recommends B, B recommends A |
Set maxIterations limit, force default model when exceeded |
| 5 |
Markdown XSS injection |
v-html renders malicious scripts |
Use DOMPurify to sanitize HTML, or use marked sanitize option |
Error Troubleshooting
| Error Message |
Cause |
Solution |
ReadableStream is not supported |
Browser doesn't support streaming API |
Add polyfill web-streams-polyfill, or fallback to polling |
net::ERR_INCOMPLETE_CHUNKED_ENCODING |
Server SSE format error |
Confirm Content-Type: text/event-stream, each message ends with \n\n |
AbortError: The user aborted a request |
User manually cancelled |
Normal behavior, check e.name === 'AbortError' |
429 Too Many Requests |
API rate limit exceeded |
Implement exponential backoff retry, add request queue |
JSON.parse: unexpected character |
SSE data line format error |
Check data: prefix, filter empty lines and non-data lines |
CORS policy: No 'Access-Control-Allow-Origin' |
Cross-origin request rejected |
Add CORS headers on server, or use Vite proxy |
Cannot read property 'delta' of undefined |
SSE response structure changed |
Add optional chaining json.choices?.[0]?.delta?.content |
Maximum call stack size exceeded |
Agent recursive calls too deep |
Limit maxIterations, add recursion depth check |
Failed to execute 'fetch' on 'Window' |
Network disconnected |
Add network status detection, implement offline prompt and auto-reconnect |
TypeError: response.body is null |
Response body is null |
Check if API endpoint supports streaming, add response.body null check |
Advanced Optimization
1. Request Queue and Rate Limiting
// utils/rateLimiter.ts
export class RequestQueue {
private queue: Array<() => Promise<any>> = []
private running = 0
private maxConcurrent: number
private minInterval: number
private lastRun = 0
constructor(maxConcurrent = 3, minInterval = 1000) {
this.maxConcurrent = maxConcurrent
this.minInterval = minInterval
}
async add<T>(fn: () => Promise<T>): Promise<T> {
return new Promise((resolve, reject) => {
this.queue.push(async () => {
try {
const now = Date.now()
const wait = Math.max(0, this.minInterval - (now - this.lastRun))
if (wait > 0) await new Promise(r => setTimeout(r, wait))
this.lastRun = Date.now()
resolve(await fn())
} catch (e) {
reject(e)
}
})
this.process()
})
}
private async process() {
while (this.queue.length > 0 && this.running < this.maxConcurrent) {
this.running++
const fn = this.queue.shift()!
fn().finally(() => { this.running--; this.process() })
}
}
}
2. Response Caching
// composables/useCachedChat.ts
const responseCache = new Map<string, { content: string; timestamp: number }>()
const CACHE_TTL = 5 * 60 * 1000
export function getCachedResponse(input: string): string | null {
const cached = responseCache.get(input)
if (!cached) return null
if (Date.now() - cached.timestamp > CACHE_TTL) {
responseCache.delete(input)
return null
}
return cached.content
}
export function setCachedResponse(input: string, content: string) {
responseCache.set(input, { content, timestamp: Date.now() })
}
import { useVirtualList } from '@vueuse/core'
const { list, containerProps, wrapperProps } = useVirtualList(messages, {
itemHeight: 80,
overscan: 10,
})
Comparison Analysis
| Interaction Pattern |
Real-time |
Complexity |
Use Case |
Perceived Latency |
| SSE Streaming |
★★★★★ |
Low |
General chat, text generation |
<100ms |
| Function Calling |
★★★★ |
Medium |
Tool invocation, data analysis |
200-500ms |
| Multi-Model Routing |
★★★ |
Medium |
Cost-sensitive, multi-scenario |
100-2000ms |
| Markdown Rendering |
★★★★★ |
Low |
Code display, rich text |
<50ms |
| Agent Autonomous |
★★★ |
High |
Complex tasks, automation |
1-10s |
| Frontend AI Solution |
Bundle Size |
Streaming |
SSR Compatible |
Vue3 Integration |
| Custom Composable |
0KB |
★★★★★ |
★★★★★ |
★★★★★ |
| Vercel AI SDK |
12KB |
★★★★ |
★★★★ |
★★★ |
| LangChain.js |
200KB+ |
★★★ |
★★★ |
★★ |
| OpenAI SDK |
50KB |
★★★ |
★★ |
★★ |
Summary: Vue3 + Composable is the optimal paradigm for frontend AI integration—SSE streaming resolves waiting anxiety, Function Calling enables deep AI-UI linkage, multi-model routing balances cost and quality, Agent mode gives AI autonomous execution capability. These 5 patterns are not mutually exclusive but composable—a production-grade AI app often needs streaming output, tool invocation, and intelligent routing simultaneously. In 2026, frontend engineers must not only write UI but also write AI interactions.