K8s Gateway API gRPC Routing: 7 Production Patterns from Traffic Management to Canary Deployment

DevOps

Your microservices have fully adopted gRPC, but traffic management is still stuck in the HTTP Ingress era? Every gRPC canary deployment requires Envoy Filter or Istio VirtualService hacks? gRPC method matching, traffic splitting, and retry/circuit breaker simply can't be done with Ingress? In 2026, Kubernetes Gateway API's GRPCRoute is GA — it's time to manage gRPC traffic declaratively.


Key Takeaways

  • GRPCRoute is the native Gateway API resource for gRPC routing with method/service-level matching
  • 7 production patterns cover everything from basic routing to multi-cluster and observability
  • Traffic splitting and canary deployment with native weight support — no annotation hacks needed
  • BackendTrafficPolicy decouples retry, circuit breaker, and timeout strategies from routes
  • Multi-cluster gRPC routing via ServiceImport + MultiClusterService

Table of Contents

  1. Gateway API gRPC Core Concepts
  2. Pattern 1: GRPCRoute Basic Configuration
  3. Pattern 2: Traffic Splitting and Weighted Routing
  4. Pattern 3: Canary Deployment and Rollback
  5. Pattern 4: Header/Method Conditional Routing
  6. Pattern 5: Retry and Circuit Breaker Strategies
  7. Pattern 6: Multi-Cluster gRPC Routing
  8. Pattern 7: Observability and Distributed Tracing
  9. 5 Common Pitfalls and Solutions
  10. 10 Common Error Troubleshooting
  11. Advanced Optimization Tips
  12. Comparison: Gateway API vs Istio vs Ingress
  13. Recommended Online Tools

Gateway API gRPC Core Concepts

Why Does gRPC Routing Need a Dedicated Resource?

gRPC is built on HTTP/2, but its routing semantics are fundamentally different from HTTP:

HTTP routing:  GET /api/v1/users/123
gRPC routing:  package.Service/Method
               com.example.api.UserService/GetUser

Ingress path matching is nearly useless for gRPC — you need to match gRPC service and method, not URL paths. Gateway API's GRPCRoute is designed specifically for this.

GRPCRoute Architecture

┌─────────────────────────────────────────────────────────┐
│                     Gateway API gRPC                     │
│                                                          │
│  ┌──────────┐    ┌────────────┐    ┌──────────────────┐ │
│  │GatewayClass│──▶│  Gateway   │──▶│   GRPCRoute      │ │
│  │(Infra)    │    │ (Listeners)│    │ (Routing Rules)  │ │
│  └──────────┘    └────────────┘    └────────┬─────────┘ │
│                                              │           │
│                    ┌─────────────────────────┘           │
│                    ▼                                     │
│  ┌──────────────────────────────────────────────────┐   │
│  │              GRPCRoute Rule                       │   │
│  │  ┌─────────────┐  ┌─────────────┐               │   │
│  │  │  Matches    │  │  Filters    │               │   │
│  │  │  - service  │  │  - Header   │               │   │
│  │  │  - method   │  │  modifier   │               │   │
│  │  │  - headers  │  │  - Request  │               │   │
│  │  └─────────────┘  │  mirror     │               │   │
│  │                    └─────────────┘               │   │
│  │  ┌──────────────────────────────────────────┐   │   │
│  │  │  BackendRefs                              │   │   │
│  │  │  - name: user-svc, port: 9090, weight: 80│   │   │
│  │  │  - name: user-svc-v2, port: 9090, weight:20│  │   │
│  │  └──────────────────────────────────────────┘   │   │
│  └──────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Core Resource Relationships

Resource Role Responsibility
GatewayClass Infrastructure Admin Define gateway implementation type (Istio/Envoy Gateway)
Gateway Cluster Operator Define listeners, TLS, ports
GRPCRoute App Developer Define gRPC routing rules, traffic splitting
BackendTrafficPolicy Platform Engineer Define retry, circuit breaker, timeout policies

Pattern 1: GRPCRoute Basic Configuration

Protobuf Definition

syntax = "proto3";

package com.example.api;

option go_package = "github.com/example/api/gen/go;apipb";

service UserService {
  rpc GetUser(GetUserRequest) returns (GetUserResponse);
  rpc ListUsers(ListUsersRequest) returns (ListUsersResponse);
  rpc CreateUser(CreateUserRequest) returns (CreateUserResponse);
}

service OrderService {
  rpc GetOrder(GetOrderRequest) returns (GetOrderResponse);
  rpc ListOrders(ListOrdersRequest) returns (ListOrdersResponse);
}

message GetUserRequest {
  string user_id = 1;
}

message GetUserResponse {
  string user_id = 1;
  string name = 2;
  string email = 3;
}

Gateway and GRPCRoute Configuration

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: grpc-gateway
  namespace: infra
spec:
  gatewayClassName: istio
  listeners:
    - name: grpc
      port: 443
      protocol: HTTPS
      tls:
        mode: Terminate
        certificateRefs:
          - name: grpc-cert
      allowedRoutes:
        namespaces:
          from: All
        kinds:
          - group: gateway.networking.k8s.io
            kind: GRPCRoute
---
apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
  name: user-service-route
  namespace: app
spec:
  parentRefs:
    - name: grpc-gateway
      namespace: infra
      sectionName: grpc
  hostnames:
    - "grpc.example.com"
  rules:
    - matches:
        - method:
            service: "com.example.api.UserService"
            method: "GetUser"
      backendRefs:
        - name: user-service
          port: 9090
    - matches:
        - method:
            service: "com.example.api.UserService"
      backendRefs:
        - name: user-service
          port: 9090
    - matches:
        - method:
            service: "com.example.api.OrderService"
      backendRefs:
        - name: order-service
          port: 9090

Go Server Implementation

package main

import (
	"context"
	"log"
	"net"
	"os"

	pb "github.com/example/api/gen/go"
	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials"
	"google.golang.org/grpc/health"
	"google.golang.org/grpc/health/grpc_health_v1"
	"google.golang.org/grpc/reflection"
)

type userServiceServer struct {
	pb.UnimplementedUserServiceServer
}

func (s *userServiceServer) GetUser(ctx context.Context, req *pb.GetUserRequest) (*pb.GetUserResponse, error) {
	return &pb.GetUserResponse{
		UserId: req.UserId,
		Name:   "Zhang San",
		Email:  "zhang@example.com",
	}, nil
}

func (s *userServiceServer) ListUsers(ctx context.Context, req *pb.ListUsersRequest) (*pb.ListUsersResponse, error) {
	return &pb.ListUsersResponse{}, nil
}

func (s *userServiceServer) CreateUser(ctx context.Context, req *pb.CreateUserRequest) (*pb.CreateUserResponse, error) {
	return &pb.CreateUserResponse{}, nil
}

func main() {
	port := os.Getenv("GRPC_PORT")
	if port == "" {
		port = "9090"
	}

	creds, err := credentials.NewServerTLSFromFile("/etc/certs/tls.crt", "/etc/certs/tls.key")
	if err != nil {
		log.Fatalf("failed to load TLS certs: %v", err)
	}

	srv := grpc.NewServer(grpc.Creds(creds))
	pb.RegisterUserServiceServer(srv, &userServiceServer{})

	hs := health.NewServer()
	grpc_health_v1.RegisterHealthServer(srv, hs)
	hs.SetServingStatus("com.example.api.UserService", grpc_health_v1.HealthCheckResponse_SERVING)

	reflection.Register(srv)

	lis, err := net.Listen("tcp", ":"+port)
	if err != nil {
		log.Fatalf("failed to listen: %v", err)
	}

	log.Printf("gRPC server listening on :%s", port)
	if err := srv.Serve(lis); err != nil {
		log.Fatalf("failed to serve: %v", err)
	}
}

Verify Routing

kubectl get grpcroute -n app
# NAME                  PARENTREFS         HOSTNAMES              AGE
# user-service-route    grpc-gateway       ["grpc.example.com"]   5m

kubectl get grpcroute user-service-route -n app -o yaml | grep -A5 "accepted"
#     conditions:
#     - lastTransitionTime: "2026-06-16T10:00:00Z"
#       message: Route was accepted
#       reason: Accepted
#       status: "True"

grpcurl -grpc.example.com:443 com.example.api.UserService/GetUser \
  -d '{"user_id": "123"}' -insecure

Pattern 2: Traffic Splitting and Weighted Routing

Scenario: UserService v1/v2 Weighted Split

┌──────────────┐
│  GRPCRoute   │
│  weight: 80  │──────────▶ UserService v1 (9090)
│  weight: 20  │──────────▶ UserService v2 (9090)
└──────────────┘
apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
  name: user-service-weighted
  namespace: app
spec:
  parentRefs:
    - name: grpc-gateway
      namespace: infra
      sectionName: grpc
  hostnames:
    - "grpc.example.com"
  rules:
    - matches:
        - method:
            service: "com.example.api.UserService"
      backendRefs:
        - name: user-service-v1
          port: 9090
          weight: 80
        - name: user-service-v2
          port: 9090
          weight: 20

Dynamic Weight Adjustment

kubectl patch grpcroute user-service-weighted -n app --type='json' \
  -p='[{"op": "replace", "path": "/spec/rules/0/backendRefs/0/weight", "value": 60},
       {"op": "replace", "path": "/spec/rules/0/backendRefs/1/weight", "value": 40}]'

kubectl patch grpcroute user-service-weighted -n app --type='json' \
  -p='[{"op": "replace", "path": "/spec/rules/0/backendRefs/0/weight", "value": 0},
       {"op": "replace", "path": "/spec/rules/0/backendRefs/1/weight", "value": 100}]'

Weight Routing Verification Script

v1Count=0
v2Count=0
total=100

for i in $(seq 1 $total); do
  response=$(grpcurl -grpc.example.com:443 com.example.api.UserService/GetUser \
    -d '{"user_id": "123"}' -insecure 2>&1)
  if echo "$response" | grep -q "v1"; then
    v1Count=$((v1Count + 1))
  else
    v2Count=$((v2Count + 1))
  fi
done

echo "v1: $v1Count/$total ($((v1Count * 100 / total))%)"
echo "v2: $v2Count/$total ($((v2Count * 100 / total))%)"

Pattern 3: Canary Deployment and Rollback

Complete Canary Deployment Flow

┌───────────────────────────────────────────────────────────┐
│                gRPC Canary Deployment Flow                 │
│                                                            │
│  Step 1        Step 2        Step 3        Step 4         │
│  ┌──────┐     ┌──────┐     ┌──────┐     ┌──────┐        │
│  │100%v1│────▶│95%v1 │────▶│80%v1 │────▶│50%v1 │        │
│  │  0%v2│     │ 5%v2 │     │20%v2 │     │50%v2 │        │
│  └──────┘     └──────┘     └──────┘     └──────┘        │
│                                              │            │
│                              ┌───────────────┘            │
│                              ▼                            │
│  Step 5 (Success)  Step 5 (Failure)                      │
│  ┌──────┐          ┌──────┐                               │
│  │ 0%v1 │          │100%v1│◀── Immediate rollback        │
│  │100%v2│          │  0%v2│                               │
│  └──────┘          └──────┘                               │
└───────────────────────────────────────────────────────────┘

Canary Deployment GRPCRoute

apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
  name: user-service-canary
  namespace: app
  annotations:
    argocd.argoproj.io/sync-options: Prune=false
spec:
  parentRefs:
    - name: grpc-gateway
      namespace: infra
      sectionName: grpc
  hostnames:
    - "grpc.example.com"
  rules:
    - matches:
        - method:
            service: "com.example.api.UserService"
      backendRefs:
        - name: user-service-v1
          port: 9090
          weight: 95
        - name: user-service-v2
          port: 9090
          weight: 5
      filters:
        - type: ResponseHeaderModifier
          responseHeaderModifier:
            add:
              - name: X-Backend-Version
                value: "canary-v2"

Argo Rollouts Integration

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: user-service-rollout
  namespace: app
spec:
  replicas: 10
  strategy:
    canary:
      canaryService: user-service-v2
      stableService: user-service-v1
      trafficRouting:
        plugins:
          argoproj-labs/gatewayAPI:
            grpcRoute:
              name: user-service-canary
              namespace: app
      steps:
        - setWeight: 5
        - pause: { duration: 5m }
        - setWeight: 20
        - pause: { duration: 10m }
        - setWeight: 50
        - pause: { duration: 15m }
        - setWeight: 80
        - pause: { duration: 10m }
        - setWeight: 100
        - pause: {}
  selector:
    matchLabels:
      app: user-service
  template:
    metadata:
      labels:
        app: user-service
    spec:
      containers:
        - name: user-service
          image: example/user-service:v2.0.0
          ports:
            - containerPort: 9090
          readinessProbe:
            exec:
              command: ["/bin/grpc_health_probe", "-addr=:9090"]
            initialDelaySeconds: 5
            periodSeconds: 10

One-Click Rollback

kubectl patch grpcroute user-service-canary -n app --type='json' \
  -p='[{"op": "replace", "path": "/spec/rules/0/backendRefs/0/weight", "value": 100},
       {"op": "replace", "path": "/spec/rules/0/backendRefs/1/weight", "value": 0}]'

kubectl rollout undo rollout/user-service-rollout -n app

Pattern 4: Header/Method Conditional Routing

gRPC Header-Based Routing

apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
  name: user-service-header-route
  namespace: app
spec:
  parentRefs:
    - name: grpc-gateway
      namespace: infra
      sectionName: grpc
  hostnames:
    - "grpc.example.com"
  rules:
    - matches:
        - method:
            service: "com.example.api.UserService"
          headers:
            - type: Exact
              name: x-env
              value: "staging"
      backendRefs:
        - name: user-service-staging
          port: 9090
    - matches:
        - method:
            service: "com.example.api.UserService"
          headers:
            - type: Exact
              name: x-env
              value: "canary"
      backendRefs:
        - name: user-service-v2
          port: 9090
    - matches:
        - method:
            service: "com.example.api.UserService"
      backendRefs:
        - name: user-service-v1
          port: 9090

Method-Based Fine-Grained Routing

apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
  name: user-service-method-route
  namespace: app
spec:
  parentRefs:
    - name: grpc-gateway
      namespace: infra
      sectionName: grpc
  hostnames:
    - "grpc.example.com"
  rules:
    - matches:
        - method:
            service: "com.example.api.UserService"
            method: "GetUser"
      backendRefs:
        - name: user-read-service
          port: 9090
    - matches:
        - method:
            service: "com.example.api.UserService"
            method: "CreateUser"
      backendRefs:
        - name: user-write-service
          port: 9090
    - matches:
        - method:
            service: "com.example.api.UserService"
            method: "ListUsers"
      backendRefs:
        - name: user-read-service
          port: 9090
          weight: 90
        - name: user-read-service-v2
          port: 9090
          weight: 10

Header Modification Filter

apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
  name: user-service-header-modify
  namespace: app
spec:
  parentRefs:
    - name: grpc-gateway
      namespace: infra
      sectionName: grpc
  hostnames:
    - "grpc.example.com"
  rules:
    - matches:
        - method:
            service: "com.example.api.UserService"
      filters:
        - type: RequestHeaderModifier
          requestHeaderModifier:
            add:
              - name: x-gateway-source
                value: "gateway-api"
            set:
              - name: x-trace-id
                value: "auto-generated"
            remove:
              - "x-internal-token"
      backendRefs:
        - name: user-service
          port: 9090

Go Client with Custom Headers

package main

import (
	"context"
	"log"
	"time"

	pb "github.com/example/api/gen/go"
	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials"
	"google.golang.org/grpc/metadata"
)

func main() {
	creds, err := credentials.NewClientTLSFromFile("/etc/certs/ca.crt", "grpc.example.com")
	if err != nil {
		log.Fatalf("failed to load TLS creds: %v", err)
	}

	conn, err := grpc.Dial("grpc.example.com:443",
		grpc.WithTransportCredentials(creds),
		grpc.WithTimeout(10*time.Second),
	)
	if err != nil {
		log.Fatalf("failed to connect: %v", err)
	}
	defer conn.Close()

	client := pb.NewUserServiceClient(conn)

	ctx := metadata.AppendToOutgoingContext(context.Background(),
		"x-env", "canary",
		"x-request-id", "req-12345",
	)

	resp, err := client.GetUser(ctx, &pb.GetUserRequest{UserId: "123"})
	if err != nil {
		log.Fatalf("GetUser failed: %v", err)
	}

	log.Printf("Response: %+v", resp)
}

Pattern 5: Retry and Circuit Breaker Strategies

BackendTrafficPolicy Configuration

apiVersion: gateway.networking.k8s.io/v1alpha3
kind: BackendTrafficPolicy
metadata:
  name: user-service-retry
  namespace: app
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: GRPCRoute
    name: user-service-route
  retry:
    retryOn:
      - 5xx
      - connect-failure
      - retriable-status-codes
    retryOnStatusCodes:
      - 503
      - 14
    attempts: 3
    backoff:
      defaultDuration: "100ms"
      maxDuration: "5s"
  connectionPool:
    maxConnections: 1000
    maxPendingRequests: 500
    maxRequestsPerConnection: 100
  circuitBreaker:
    consecutiveFailures: 5
    consecutiveGatewayFailures: 3
    interval: "30s"
    baseEjectionTime: "30s"
    maxEjectionPercent: 50

Timeout Policy

apiVersion: gateway.networking.k8s.io/v1alpha3
kind: BackendTrafficPolicy
metadata:
  name: user-service-timeout
  namespace: app
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: GRPCRoute
    name: user-service-route
  timeout:
    tcp: {
      connectTimeout: "5s"
    }
    http: {
      requestTimeout: "30s"
    }
  rateLimit:
    type: Global
    global:
      rules:
        - clientSelectors:
            - headers:
                - name: x-api-key
          limit:
            requestsPerUnit: 1000
            unit: Minute

Go Server Health Check and Graceful Shutdown

package main

import (
	"context"
	"log"
	"net"
	"os"
	"os/signal"
	"syscall"
	"time"

	pb "github.com/example/api/gen/go"
	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials"
	"google.golang.org/grpc/health"
	"google.golang.org/grpc/health/grpc_health_v1"
)

type userServiceServer struct {
	pb.UnimplementedUserServiceServer
}

func (s *userServiceServer) GetUser(ctx context.Context, req *pb.GetUserRequest) (*pb.GetUserResponse, error) {
	return &pb.GetUserResponse{
		UserId: req.UserId,
		Name:   "Zhang San",
		Email:  "zhang@example.com",
	}, nil
}

func main() {
	creds, err := credentials.NewServerTLSFromFile("/etc/certs/tls.crt", "/etc/certs/tls.key")
	if err != nil {
		log.Fatalf("failed to load TLS certs: %v", err)
	}

	srv := grpc.NewServer(grpc.Creds(creds))
	pb.RegisterUserServiceServer(srv, &userServiceServer{})

	hs := health.NewServer()
	grpc_health_v1.RegisterHealthServer(srv, hs)

	lis, err := net.Listen("tcp", ":9090")
	if err != nil {
		log.Fatalf("failed to listen: %v", err)
	}

	go func() {
		sigCh := make(chan os.Signal, 1)
		signal.Notify(sigCh, syscall.SIGTERM, syscall.SIGINT)
		<-sigCh

		hs.SetServingStatus("", grpc_health_v1.HealthCheckResponse_NOT_SERVING)
		hs.SetServingStatus("com.example.api.UserService", grpc_health_v1.HealthCheckResponse_NOT_SERVING)

		log.Println("graceful shutdown: draining connections...")
		srv.GracefulStop()
	}()

	log.Printf("gRPC server listening on :9090")
	if err := srv.Serve(lis); err != nil {
		log.Fatalf("failed to serve: %v", err)
	}

	<-time.After(15 * time.Second)
	log.Println("server shutdown complete")
}

Circuit Breaker State Monitoring

kubectl get backendtrafficpolicy -n app
# NAME                      TARGETKIND   TARGETNAME               AGE
# user-service-retry        GRPCRoute    user-service-route       5m
# user-service-timeout      GRPCRoute    user-service-route       5m

istioctl dashboard envoy user-service-v1-xxxx.app

Pattern 6: Multi-Cluster gRPC Routing

Multi-Cluster Architecture

┌─────────────────────────────────────────────────────────────┐
│                Multi-Cluster gRPC Routing Architecture      │
│                                                              │
│  ┌─────────────┐         ┌─────────────┐                   │
│  │  Cluster A  │         │  Cluster B  │                   │
│  │  us-west-1  │         │  us-east-1  │                   │
│  │             │         │             │                   │
│  │ ┌─────────┐│         │┌─────────┐  │                   │
│  │ │Gateway  ││◀──────▶││Gateway  │  │                   │
│  │ │(East-West)│       ││(East-West)│ │                   │
│  │ └─────────┘│         │└─────────┘  │                   │
│  │             │         │             │                   │
│  │ ┌─────────┐│         │┌─────────┐  │                   │
│  │ │UserSvc  ││         ││UserSvc  │  │                   │
│  │ │weight:60││         ││weight:40│  │                   │
│  │ └─────────┘│         │└─────────┘  │                   │
│  └─────────────┘         └─────────────┘                   │
│         ▲                       ▲                           │
│         │                       │                           │
│         └───────┬───────────────┘                           │
│                 │                                           │
│          ┌──────┴──────┐                                   │
│          │  GRPCRoute  │                                   │
│          │  MultiCluster│                                  │
│          │  Service     │                                  │
│          └─────────────┘                                   │
└─────────────────────────────────────────────────────────────┘

ServiceImport Configuration

apiVersion: gateway.networking.k8s.io/v1beta1
kind: ServiceImport
metadata:
  name: user-service-import
  namespace: app
spec:
  type: ClusterSetIP
  ports:
    - port: 9090
      protocol: TCP
  resolution: DNS
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: MultiClusterService
metadata:
  name: user-service-global
  namespace: app
spec:
  serviceImport:
    name: user-service-import
    namespace: app
  clusterBackends:
    - cluster: us-west-1
      weight: 60
    - cluster: us-east-1
      weight: 40

Cross-Cluster GRPCRoute

apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
  name: user-service-multi-cluster
  namespace: app
spec:
  parentRefs:
    - name: grpc-gateway
      namespace: infra
      sectionName: grpc
  hostnames:
    - "grpc.example.com"
  rules:
    - matches:
        - method:
            service: "com.example.api.UserService"
      backendRefs:
        - group: gateway.networking.k8s.io
          kind: ServiceImport
          name: user-service-import
          port: 9090
          weight: 100

Failover Configuration

apiVersion: gateway.networking.k8s.io/v1alpha3
kind: BackendTrafficPolicy
metadata:
  name: user-service-failover
  namespace: app
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: GRPCRoute
    name: user-service-multi-cluster
  failover:
    strategy: RegionFailover
    regionFailover:
      - from: us-west-1
        to: us-east-1
      - from: us-east-1
        to: us-west-1
  retry:
    retryOn:
      - 5xx
      - connect-failure
    attempts: 2
    backoff:
      defaultDuration: "200ms"
      maxDuration: "3s"

Pattern 7: Observability and Distributed Tracing

OpenTelemetry Integration

apiVersion: gateway.networking.k8s.io/v1alpha3
kind: ObservabilityPolicy
metadata:
  name: grpc-observability
  namespace: infra
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: grpc-gateway
  tracing:
    provider:
      type: OTel
      backendRef:
        group: ""
        kind: Service
        name: otel-collector
        port: 4317
    samplingRate: 10
    customTags:
      - name: cluster
        literal:
          value: "us-west-1"
      - name: environment
        env:
          name: DEPLOY_ENV
  accessLog:
    type: OpenTelemetry
    backendRef:
      group: ""
      kind: Service
      name: otel-collector
      port: 4317
    format:
      type: JSON

gRPC Metrics Collection

apiVersion: v1
kind: ConfigMap
metadata:
  name: grpc-metrics-config
  namespace: infra
data:
  otel-collector-config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      batch:
        timeout: 5s
        send_batch_size: 1024
      filter:
        error_mode: ignore
        traces:
          span:
            - 'attributes["rpc.system"] == "grpc"'
    exporters:
      prometheus:
        endpoint: "0.0.0.0:8889"
        namespace: grpc_gateway
      otlp:
        endpoint: "jaeger-collector:4317"
        tls:
          insecure: true
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch, filter]
          exporters: [otlp]
        metrics:
          receivers: [otlp]
          processors: [batch]
          exporters: [prometheus]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
  namespace: infra
spec:
  replicas: 2
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      containers:
        - name: otel-collector
          image: otel/opentelemetry-collector-contrib:0.96.0
          ports:
            - containerPort: 4317
            - containerPort: 4318
            - containerPort: 8889
          volumeMounts:
            - name: config
              mountPath: /etc/otelcol-contrib/config.yaml
              subPath: otel-collector-config
      volumes:
        - name: config
          configMap:
            name: grpc-metrics-config

Prometheus gRPC Alerting Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: grpc-gateway-rules
  namespace: infra
spec:
  groups:
    - name: grpc_gateway.rules
      rules:
        - alert: GRPCHighErrorRate
          expr: |
            sum(rate(grpc_server_handled_total{grpc_code!="OK"}[5m]))
            /
            sum(rate(grpc_server_handled_total[5m]))
            > 0.05
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "gRPC error rate exceeds 5%"
            description: "gRPC service {{ $labels.grpc_method }} error rate is {{ $value | humanizePercentage }}"
        - alert: GRPCHighLatency
          expr: |
            histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket[5m])) by (le, grpc_method))
            > 1.0
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "gRPC P99 latency exceeds 1s"
        - alert: GRPCCircuitBreakerOpen
          expr: |
            increase(envoy_cluster_circuit_breakers_open_circuit[1m]) > 0
          for: 1m
          labels:
            severity: critical
          annotations:
            summary: "gRPC circuit breaker is open"

Go Server OpenTelemetry Instrumentation

package main

import (
	"context"
	"log"
	"net"
	"net/http"

	pb "github.com/example/api/gen/go"
	"go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
	"go.opentelemetry.io/otel/propagation"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials"
	"google.golang.org/grpc/health"
	"google.golang.org/grpc/health/grpc_health_v1"
)

func initTracer() func(context.Context) error {
	exporter, err := otlptracegrpc.New(context.Background(),
		otlptracegrpc.WithEndpoint("otel-collector.infra:4317"),
		otlptracegrpc.WithInsecure(),
	)
	if err != nil {
		log.Fatalf("failed to create OTel exporter: %v", err)
	}

	tp := sdktrace.NewTracerProvider(
		sdktrace.WithBatcher(exporter),
		sdktrace.WithResource(nil),
	)

	otel.SetTracerProvider(tp)
	otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(
		propagation.TraceContext{},
		propagation.Baggage{},
	))

	return tp.Shutdown
}

type userServiceServer struct {
	pb.UnimplementedUserServiceServer
}

func (s *userServiceServer) GetUser(ctx context.Context, req *pb.GetUserRequest) (*pb.GetUserResponse, error) {
	return &pb.GetUserResponse{
		UserId: req.UserId,
		Name:   "Zhang San",
		Email:  "zhang@example.com",
	}, nil
}

func main() {
	shutdown := initTracer()
	defer shutdown(context.Background())

	creds, err := credentials.NewServerTLSFromFile("/etc/certs/tls.crt", "/etc/certs/tls.key")
	if err != nil {
		log.Fatalf("failed to load TLS certs: %v", err)
	}

	srv := grpc.NewServer(
		grpc.Creds(creds),
		grpc.StatsHandler(otelgrpc.NewServerHandler()),
	)

	pb.RegisterUserServiceServer(srv, &userServiceServer{})

	hs := health.NewServer()
	grpc_health_v1.RegisterHealthServer(srv, hs)
	hs.SetServingStatus("com.example.api.UserService", grpc_health_v1.HealthCheckResponse_SERVING)

	lis, err := net.Listen("tcp", ":9090")
	if err != nil {
		log.Fatalf("failed to listen: %v", err)
	}

	go func() {
		http.ListenAndServe(":2222", nil)
	}()

	log.Printf("gRPC server with OTel tracing listening on :9090")
	if err := srv.Serve(lis); err != nil {
		log.Fatalf("failed to serve: %v", err)
	}
}

5 Common Pitfalls and Solutions

Pitfall 1: GRPCRoute Method Match Format Error

Symptom: GRPCRoute created, but gRPC requests never match routing rules, returning UNIMPLEMENTED.

Cause: The method field format must be package.Service/Method, not URL path format.

rules:
  - matches:
      - method:
          service: "com.example.api.UserService"
          method: "GetUser"

Solution: Check the protobuf package and service definitions. Ensure the service field in GRPCRoute matches the proto exactly. Use grpcurl list to verify service names.

Pitfall 2: Gateway Listener Not Allowing GRPCRoute

Symptom: GRPCRoute status shows Accepted: False with reason NotAllowedByListeners.

Cause: Gateway's allowedRoutes.kinds does not include GRPCRoute.

listeners:
  - name: grpc
    port: 443
    protocol: HTTPS
    allowedRoutes:
      namespaces:
        from: All
      kinds:
        - group: gateway.networking.k8s.io
          kind: GRPCRoute

Solution: Explicitly declare kinds to allow GRPCRoute in the Gateway listener.

Pitfall 3: gRPC Service Uses HTTP/2 but Gateway Listens on HTTP

Symptom: gRPC client connection fails with protocol error.

Cause: gRPC requires HTTP/2, but the Gateway listener protocol is set to HTTP.

Solution: Use HTTPS protocol for gRPC over TLS, or configure h2c appropriately.

listeners:
  - name: grpc-plaintext
    port: 80
    protocol: HTTP
    tls:
      mode: Passthrough
  - name: grpc-tls
    port: 443
    protocol: HTTPS
    tls:
      mode: Terminate
      certificateRefs:
        - name: grpc-cert

Pitfall 4: BackendTrafficPolicy Naming Conflict with GRPCRoute

Symptom: Multiple BackendTrafficPolicies target the same GRPCRoute, only the last one takes effect.

Cause: Each GRPCRoute can only be associated with one BackendTrafficPolicy.

Solution: Merge policies into a single BackendTrafficPolicy, or split GRPCRoutes by routing.

Pitfall 5: gRPC Health Check and Gateway Readiness Probe Mismatch

Symptom: Pod is Running but Gateway routing doesn't work, traffic is rejected.

Cause: gRPC service starts slowly, health check hasn't passed, but K8s readiness probe has already passed.

Solution: Ensure gRPC health check service uses the same port and logic as the K8s readiness probe.

readinessProbe:
  exec:
    command: ["/bin/grpc_health_probe", "-addr=:9090", "-service=com.example.api.UserService"]
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 3
livenessProbe:
  exec:
    command: ["/bin/grpc_health_probe", "-addr=:9090"]
  initialDelaySeconds: 30
  periodSeconds: 10

10 Common Error Troubleshooting

# Error Message Cause Solution
1 GRPCRoute not accepted: NoMatchingParent parentRefs references non-existent Gateway Check Gateway name, namespace, sectionName
2 UNIMPLEMENTED: unknown method GRPCRoute method match format error Use grpcurl list to verify service name format as package.Service
3 protocol error: HTTP/2 required Gateway listener protocol mismatch Use HTTPS for gRPC over TLS, HTTP for h2c
4 connection refused: backend unhealthy Backend gRPC service not ready Check Pod status and health probe, confirm gRPC health probe passes
5 GRPCRoute condition ResolvedRefs=False backendRef Service doesn't exist Check Service name, namespace, port
6 certificate not found for TLS TLS cert missing or cross-namespace Ensure cert is in Gateway's namespace, use cert-manager
7 weight sum is zero All backendRef weights are 0 At least one backendRef weight must be > 0
8 BackendTrafficPolicy conflict Multiple policies target the same Route Merge policies or split GRPCRoute
9 ServiceImport DNS resolution failed Multi-cluster control plane not connected Check east-west gateway, cross-cluster network connectivity
10 circuit breaker open: ejection threshold exceeded Backend consecutive failures triggered circuit breaker Check backend health, adjust circuit breaker thresholds

Advanced Optimization Tips

1. gRPC Keepalive Optimization

apiVersion: v1
kind: ConfigMap
metadata:
  name: grpc-keepalive-config
  namespace: app
data:
  keepalive.json: |
    {
      "keepalive": {
        "maxConnectionIdle": "300s",
        "maxConnectionAge": "1800s",
        "maxConnectionAgeGrace": "30s",
        "time": "60s",
        "timeout": "20s"
      }
    }

2. Health-Based Traffic Switching

apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
  name: user-service-health-route
  namespace: app
spec:
  parentRefs:
    - name: grpc-gateway
      namespace: infra
      sectionName: grpc
  hostnames:
    - "grpc.example.com"
  rules:
    - matches:
        - method:
            service: "com.example.api.UserService"
      backendRefs:
        - name: user-service-primary
          port: 9090
          weight: 100
        - name: user-service-secondary
          port: 9090
          weight: 0
      filters:
        - type: RequestHeaderModifier
          requestHeaderModifier:
            add:
              - name: x-failover-enabled
                value: "true"

3. Gateway API gRPC and HTTPRoute Coexistence

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: multi-protocol-gateway
  namespace: infra
spec:
  gatewayClassName: istio
  listeners:
    - name: http
      port: 80
      protocol: HTTP
      allowedRoutes:
        namespaces:
          from: All
        kinds:
          - group: gateway.networking.k8s.io
            kind: HTTPRoute
    - name: grpc-https
      port: 443
      protocol: HTTPS
      tls:
        mode: Terminate
        certificateRefs:
          - name: wildcard-cert
      allowedRoutes:
        namespaces:
          from: All
        kinds:
          - group: gateway.networking.k8s.io
            kind: GRPCRoute
    - name: grpc-internal
      port: 15443
      protocol: HTTPS
      tls:
        mode: Passthrough
      allowedRoutes:
        namespaces:
          from: Selector
          selectorLabels:
            internal-grpc: "true"
        kinds:
          - group: gateway.networking.k8s.io
            kind: GRPCRoute

4. gRPC Reflection and Debugging

grpcurl -plaintext grpc.example.com:443 list
# com.example.api.UserService
# com.example.api.OrderService
# grpc.health.v1.Health
# grpc.reflection.v1.ServerReflection

grpcurl -plaintext grpc.example.com:443 list com.example.api.UserService
# com.example.api.UserService.GetUser
# com.example.api.UserService.ListUsers
# com.example.api.UserService.CreateUser

grpcurl -plaintext grpc.example.com:443 describe com.example.api.UserService.GetUser
# com.example.api.UserService.GetUser is a method:
# rpc GetUser(.com.example.api.GetUserRequest) returns (.com.example.api.GetUserResponse) {}

Comparison: Gateway API vs Istio vs Ingress

Dimension Ingress + nginx-ingress Istio VirtualService Gateway API GRPCRoute
gRPC method matching Not supported, annotation hack Native support Native GRPCRoute.method
Traffic splitting Annotation canary-weight weight field Native weight field
Canary deployment Annotation, controller-incompatible VirtualService + DestinationRule GRPCRoute + BackendTrafficPolicy
Header routing Limited annotation support Complete match conditions Native headers matching
Retry/Circuit breaker Not supported DestinationRule BackendTrafficPolicy
Multi-cluster Not supported ServiceEntry + WorkloadEntry ServiceImport + MultiClusterService
Observability External integration needed Native telemetry ObservabilityPolicy
Role separation None Partial Complete three-role model
Standardization Varies by controller Istio-specific Kubernetes standard
Learning curve Low High Medium
Production readiness Mature Mature 2026 GA mature



External References

Try these browser-local tools — no sign-up required →

#Kubernetes#Gateway API#gRPC#流量管理#灰度发布#服务网格#2026#DevOps