BFFとAI Gatewayアーキテクチャ：2026年統一LLMアクセス層の設計

AI時代、なぜAI Gatewayが必要なのか？

システムがOpenAI、Claude、Gemini、Qwen、DeepSeek…を同時に統合する場合、各LLMは異なるAPIフォーマット、課金方式、レート制限戦略を持つ。統一アクセス層がなければ、ビジネスコードはLLMベンダーに完全に結合される。

実際のケース：ある企業がOpenAIからClaudeに移行する際、APIフォーマットの違いで47ファイルを変更、2週間かかった。AI Gatewayがあれば、移行は設定1行の変更で済む。

BFFパターンの3つの進化

従来BFF（2018）
  異なるフロントエンド向けにバックエンドAPIを集約
  オーバーフェッチ問題を解決

AI-Enhanced BFF（2024）
  BFF層にAI機能を追加：要約、翻訳、コンテンツ生成
  AIはBFFの下流サービスの1つに過ぎない

AI Gateway（2026）
  AIがコアとなり、BFFはAIを中心に再構築
  複数LLMへの統一アクセス、ルーティング、課金、セキュリティを管理
  ビジネスコードはAI Gatewayのみと対話、LLMを直接呼び出さない

AI Gatewayのコア機能

┌──────────────────────────────────────────────────────┐
│                  ビジネスサービス層                    │
│   OrderService │ UserService │ ContentService         │
├──────────────────────────────────────────────────────┤
│                    AI Gateway                         │
│  ┌──────────┬──────────┬──────────┬───────────────┐  │
│  │ ルーティ │ レート   │ キャッシュ│ フォールバック│  │
│  │ ング戦略 │ リミット │ 管理     │ 戦略          │  │
│  ├──────────┼──────────┼──────────┼───────────────┤  │
│  │ Prompt   │ Token    │ 監査     │ セキュリティ  │  │
│  │ 管理     │ 課金     │ ログ     │ 保護          │  │
│  ├──────────┴──────────┴──────────┴───────────────┤  │
│  │         ストリーミングレスポンスプロキシ（SSE/WS） │  │
│  ├─────────────────────────────────────────────────┤  │
│  │         マルチモデルアダプタ層                    │  │
│  └──┬─────────┬─────────┬─────────┬───────────────┘  │
├─────┼─────────┼─────────┼─────────┼──────────────────┤
│  OpenAI │ Claude  │ Gemini  │ Qwen   │ DeepSeek       │
└──────────────────────────────────────────────────────┘

マルチモデルルーティング戦略

コスト/レイテンシ/品質によるスマートLLM選択

@Configuration
public class AiGatewayConfig {

    @Bean
    public ModelRouter modelRouter() {
        return ModelRouter.builder()
            .addStrategy(new CostOptimizedStrategy())
            .addStrategy(new LatencyOptimizedStrategy())
            .addStrategy(new QualityOptimizedStrategy())
            .addStrategy(new FallbackStrategy())
            .build();
    }
}

public class CostOptimizedStrategy implements RoutingStrategy {

    private static final Map<String, ModelPricing> PRICING = Map.of(
        "gpt-4o",           new ModelPricing(0.005, 0.015),
        "gpt-4o-mini",      new ModelPricing(0.00015, 0.0006),
        "claude-3.5-sonnet", new ModelPricing(0.003, 0.015),
        "deepseek-v3",      new ModelPricing(0.00027, 0.0011)
    );

    @Override
    public ModelSelection select(RoutingContext context) {
        String taskType = context.getTaskType();
        int estimatedTokens = context.getEstimatedTokens();

        return switch (taskType) {
            case "simple_qa"     -> selectModel("gpt-4o-mini", estimatedTokens);
            case "code_review"   -> selectModel("claude-3.5-sonnet", estimatedTokens);
            case "creative"      -> selectModel("gpt-4o", estimatedTokens);
            case "chinese_nlp"   -> selectModel("deepseek-v3", estimatedTokens);
            default              -> selectModel("gpt-4o", estimatedTokens);
        };
    }
}

Promptテンプレート管理とバージョン管理

@Service
public class PromptTemplateService {

    private final PromptTemplateRepository templateRepo;

    public PromptRenderResult render(String templateId, Map<String, String> variables) {
        PromptTemplate template = templateRepo.findLatestVersion(templateId);

        String renderedPrompt = template.getContent();
        for (Map.Entry<String, String> entry : variables.entrySet()) {
            renderedPrompt = renderedPrompt.replace("{{" + entry.getKey() + "}}", entry.getValue());
        }

        return PromptRenderResult.builder()
            .templateId(templateId)
            .version(template.getVersion())
            .renderedPrompt(renderedPrompt)
            .estimatedTokens(estimateTokens(renderedPrompt))
            .build();
    }
}

Token課金と使用量追跡

@Service
public class TokenBillingService {

    private final UsageRepository usageRepo;

    public UsageRecord recordUsage(UsageRequest request) {
        BigDecimal cost = calculateCost(
            request.getModel(),
            request.getInputTokens(),
            request.getOutputTokens()
        );

        UsageRecord record = UsageRecord.builder()
            .tenantId(request.getTenantId())
            .model(request.getModel())
            .inputTokens(request.getInputTokens())
            .outputTokens(request.getOutputTokens())
            .cost(cost)
            .promptTemplateId(request.getPromptTemplateId())
            .latencyMs(request.getLatencyMs())
            .build();

        return usageRepo.save(record);
    }
}

ストリーミングレスポンスプロキシ：SSEパススルー

@RestController
@RequestMapping("/api/ai")
public class StreamingAiController {

    private final AiGatewayService gatewayService;

    @PostMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<ServerSentEvent<String>> streamChat(@RequestBody ChatRequest request) {
        return gatewayService.streamChat(request)
            .map(chunk -> ServerSentEvent.<String>builder()
                .id(chunk.getId())
                .event("delta")
                .data(chunk.getContent())
                .build())
            .concatWith(Flux.just(
                ServerSentEvent.<String>builder()
                    .event("done")
                    .data("[DONE]")
                    .build()
            ));
    }
}

Spring Cloud Gateway + AI拡張実践

spring:
  cloud:
    gateway:
      routes:
        - id: openai-route
          uri: https://api.openai.com
          predicates:
            - Path=/api/ai/openai/**
          filters:
            - name: AiGateway
              args:
                provider: openai
                model: gpt-4o
                rateLimit: 100/s
                timeout: 30s

        - id: claude-route
          uri: https://api.anthropic.com
          predicates:
            - Path=/api/ai/claude/**
          filters:
            - name: AiGateway
              args:
                provider: anthropic
                model: claude-3.5-sonnet
                rateLimit: 50/s
                timeout: 60s

セキュリティ：Promptインジェクション防護と出力フィルタリング

@Service
public class AiSecurityService {

    private static final List<Pattern> INJECTION_PATTERNS = List.of(
        Pattern.compile("(?i)ignore\\s+(all\\s+)?previous\\s+instructions"),
        Pattern.compile("(?i)system\\s*:\\s*you\\s+are"),
        Pattern.compile("(?i)forget\\s+everything"),
        Pattern.compile("(?i)pretend\\s+you\\s+are")
    );

    private static final List<Pattern> SENSITIVE_PATTERNS = List.of(
        Pattern.compile("\\b\\d{16}\\b"),
        Pattern.compile("\\b\\d{17}[\\dXx]\\b"),
        Pattern.compile("[\\w.-]+@[\\w.-]+\\.\\w+")
    );

    public SecurityCheckResult checkInput(String prompt) {
        for (Pattern pattern : INJECTION_PATTERNS) {
            if (pattern.matcher(prompt).find()) {
                return SecurityCheckResult.blocked("Promptインジェクション攻撃の疑い");
            }
        }
        return SecurityCheckResult.passed();
    }

    public String sanitizeOutput(String output) {
        String sanitized = output;
        for (Pattern pattern : SENSITIVE_PATTERNS) {
            sanitized = pattern.matcher(sanitized).replaceAll("[REDACTED]");
        }
        return sanitized;
    }
}

まとめ

AI GatewayはAI時代のインフラ — 複数LLMへの統一アクセス、ビジネスコードはゼロ結合
マルチモデルルーティングでコスト40%削減 — タスクタイプ別に最適モデルをスマート選択
セキュリティはベースライン — Promptインジェクション防護、出力フィルタリング、機密情報マスキングは不可欠
Spring Cloud Gateway + AI拡張がベストプラクティス — ゲートウェイ層で統一管理、ビジネス層は透過的

AI Gatewayはオプションのアーキテクチャではなく、AI時代の必修科目である。早く構築するほど、早くLLMベンダーロックインから脱却できる。