BFF and AI Gateway Architecture in 2026: Unified LLM Access Layer
技术架构
Why AI Gateway in the AI Era?
When your system integrates OpenAI, Claude, Gemini, Qwen, DeepSeek... each LLM has different API formats, billing methods, and rate limiting strategies. Without a unified access layer, your business code is completely coupled to LLM vendors.
Real case: A company migrating from OpenAI to Claude changed 47 files over 2 weeks due to different API formats. With AI Gateway, migration requires only 1 line of config change.
Three Evolutions of the BFF Pattern
Traditional BFF (2018)
Aggregate backend APIs for different frontends
Solve over-fetching problems
AI-Enhanced BFF (2024)
BFF layer adds AI capabilities: summarization, translation, content generation
AI is just a downstream service of BFF
AI Gateway (2026)
AI becomes the core, BFF restructures around AI
Unified access to multiple LLMs, managing routing, billing, security
Business code only interfaces with AI Gateway, never directly calls LLMs
AI Gateway Core Capabilities
┌──────────────────────────────────────────────────────┐
│ Business Service Layer │
│ OrderService │ UserService │ ContentService │
├──────────────────────────────────────────────────────┤
│ AI Gateway │
│ ┌──────────┬──────────┬──────────┬───────────────┐ │
│ │ Routing │ Rate │ Cache │ Fallback │ │
│ │ Strategy │ Limiting │ Mgmt │ Strategy │ │
│ ├──────────┼──────────┼──────────┼───────────────┤ │
│ │ Prompt │ Token │ Audit │ Security │ │
│ │ Mgmt │ Billing │ Logging │ Protection │ │
│ ├──────────┴──────────┴──────────┴───────────────┤ │
│ │ Streaming Response Proxy (SSE/WS) │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ Multi-Model Adapter Layer │ │
│ └──┬─────────┬─────────┬─────────┬───────────────┘ │
├─────┼─────────┼─────────┼─────────┼──────────────────┤
│ OpenAI │ Claude │ Gemini │ Qwen │ DeepSeek │
└──────────────────────────────────────────────────────┘
Multi-Model Routing Strategy
Smart LLM Selection by Cost/Latency/Quality
@Configuration
public class AiGatewayConfig {
@Bean
public ModelRouter modelRouter() {
return ModelRouter.builder()
.addStrategy(new CostOptimizedStrategy())
.addStrategy(new LatencyOptimizedStrategy())
.addStrategy(new QualityOptimizedStrategy())
.addStrategy(new FallbackStrategy())
.build();
}
}
public class CostOptimizedStrategy implements RoutingStrategy {
private static final Map<String, ModelPricing> PRICING = Map.of(
"gpt-4o", new ModelPricing(0.005, 0.015),
"gpt-4o-mini", new ModelPricing(0.00015, 0.0006),
"claude-3.5-sonnet", new ModelPricing(0.003, 0.015),
"deepseek-v3", new ModelPricing(0.00027, 0.0011)
);
@Override
public ModelSelection select(RoutingContext context) {
String taskType = context.getTaskType();
int estimatedTokens = context.getEstimatedTokens();
return switch (taskType) {
case "simple_qa" -> selectModel("gpt-4o-mini", estimatedTokens);
case "code_review" -> selectModel("claude-3.5-sonnet", estimatedTokens);
case "creative" -> selectModel("gpt-4o", estimatedTokens);
case "chinese_nlp" -> selectModel("deepseek-v3", estimatedTokens);
default -> selectModel("gpt-4o", estimatedTokens);
};
}
}
Prompt Template Management and Version Control
@Service
public class PromptTemplateService {
private final PromptTemplateRepository templateRepo;
public PromptRenderResult render(String templateId, Map<String, String> variables) {
PromptTemplate template = templateRepo.findLatestVersion(templateId);
String renderedPrompt = template.getContent();
for (Map.Entry<String, String> entry : variables.entrySet()) {
renderedPrompt = renderedPrompt.replace("{{" + entry.getKey() + "}}", entry.getValue());
}
return PromptRenderResult.builder()
.templateId(templateId)
.version(template.getVersion())
.renderedPrompt(renderedPrompt)
.estimatedTokens(estimateTokens(renderedPrompt))
.build();
}
}
Token Billing and Usage Tracking
@Service
public class TokenBillingService {
private final UsageRepository usageRepo;
public UsageRecord recordUsage(UsageRequest request) {
BigDecimal cost = calculateCost(
request.getModel(),
request.getInputTokens(),
request.getOutputTokens()
);
UsageRecord record = UsageRecord.builder()
.tenantId(request.getTenantId())
.model(request.getModel())
.inputTokens(request.getInputTokens())
.outputTokens(request.getOutputTokens())
.cost(cost)
.promptTemplateId(request.getPromptTemplateId())
.latencyMs(request.getLatencyMs())
.build();
return usageRepo.save(record);
}
public BillingSummary getMonthlySummary(String tenantId, YearMonth month) {
List<UsageRecord> records = usageRepo.findByTenantIdAndMonth(tenantId, month);
return BillingSummary.builder()
.tenantId(tenantId)
.month(month)
.totalTokens(records.stream().mapToLong(r -> r.getInputTokens() + r.getOutputTokens()).sum())
.totalCost(records.stream().map(UsageRecord::getCost).reduce(BigDecimal.ZERO, BigDecimal::add))
.byModel(records.stream().collect(Collectors.groupingBy(UsageRecord::getModel, Collectors.summingLong(r -> r.getInputTokens() + r.getOutputTokens()))))
.avgLatencyMs(records.stream().mapToLong(UsageRecord::getLatencyMs).average().orElse(0))
.build();
}
}
Streaming Response Proxy: SSE Passthrough
@RestController
@RequestMapping("/api/ai")
public class StreamingAiController {
private final AiGatewayService gatewayService;
@PostMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ServerSentEvent<String>> streamChat(@RequestBody ChatRequest request) {
return gatewayService.streamChat(request)
.map(chunk -> ServerSentEvent.<String>builder()
.id(chunk.getId())
.event("delta")
.data(chunk.getContent())
.build())
.concatWith(Flux.just(
ServerSentEvent.<String>builder()
.event("done")
.data("[DONE]")
.build()
));
}
}
Spring Cloud Gateway + AI Extension
spring:
cloud:
gateway:
routes:
- id: openai-route
uri: https://api.openai.com
predicates:
- Path=/api/ai/openai/**
filters:
- name: AiGateway
args:
provider: openai
model: gpt-4o
rateLimit: 100/s
timeout: 30s
- id: claude-route
uri: https://api.anthropic.com
predicates:
- Path=/api/ai/claude/**
filters:
- name: AiGateway
args:
provider: anthropic
model: claude-3.5-sonnet
rateLimit: 50/s
timeout: 60s
@Component
public class AiGatewayFilter implements GlobalFilter, Ordered {
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
String provider = exchange.getRequest().getHeaders().getFirst("X-AI-Provider");
if (!rateLimiter.tryAcquire(provider)) {
exchange.getResponse().setStatusCode(HttpStatus.TOO_MANY_REQUESTS);
return exchange.getResponse().setComplete();
}
auditService.log(exchange.getRequest());
return chain.filter(exchange)
.doOnSuccess(v -> billingService.record(exchange))
.onErrorResume(e -> fallbackService.handle(exchange, e));
}
@Override
public int getOrder() {
return -1;
}
}
Security: Prompt Injection Protection and Output Filtering
@Service
public class AiSecurityService {
private static final List<Pattern> INJECTION_PATTERNS = List.of(
Pattern.compile("(?i)ignore\\s+(all\\s+)?previous\\s+instructions"),
Pattern.compile("(?i)system\\s*:\\s*you\\s+are"),
Pattern.compile("(?i)forget\\s+everything"),
Pattern.compile("(?i)pretend\\s+you\\s+are")
);
private static final List<Pattern> SENSITIVE_PATTERNS = List.of(
Pattern.compile("\\b\\d{16}\\b"),
Pattern.compile("\\b\\d{17}[\\dXx]\\b"),
Pattern.compile("[\\w.-]+@[\\w.-]+\\.\\w+")
);
public SecurityCheckResult checkInput(String prompt) {
for (Pattern pattern : INJECTION_PATTERNS) {
if (pattern.matcher(prompt).find()) {
return SecurityCheckResult.blocked("Suspected prompt injection attack");
}
}
return SecurityCheckResult.passed();
}
public String sanitizeOutput(String output) {
String sanitized = output;
for (Pattern pattern : SENSITIVE_PATTERNS) {
sanitized = pattern.matcher(sanitized).replaceAll("[REDACTED]");
}
return sanitized;
}
}
Summary
- AI Gateway is the infrastructure of the AI era — Unified access to multiple LLMs, zero coupling in business code
- Multi-model routing reduces costs by 40% — Smart selection of optimal model by task type
- Security is the baseline — Prompt injection protection, output filtering, and sensitive data masking are essential
- Spring Cloud Gateway + AI extension is best practice — Unified control at gateway layer, transparent to business layer
AI Gateway is not an optional architecture — it's a required course in the AI era. Build it early, escape vendor lock-in early.
Try these browser-local tools — no sign-up required →
#BFF#AI Gateway#Spring Cloud#LLM代理#多模型路由