K8s Gateway API gRPC Routing: 7 Production Patterns from Traffic Management to Canary Deployment
Your microservices have fully adopted gRPC, but traffic management is still stuck in the HTTP Ingress era? Every gRPC canary deployment requires Envoy Filter or Istio VirtualService hacks? gRPC method matching, traffic splitting, and retry/circuit breaker simply can't be done with Ingress? In 2026, Kubernetes Gateway API's GRPCRoute is GA — it's time to manage gRPC traffic declaratively.
Key Takeaways
- GRPCRoute is the native Gateway API resource for gRPC routing with method/service-level matching
- 7 production patterns cover everything from basic routing to multi-cluster and observability
- Traffic splitting and canary deployment with native weight support — no annotation hacks needed
- BackendTrafficPolicy decouples retry, circuit breaker, and timeout strategies from routes
- Multi-cluster gRPC routing via ServiceImport + MultiClusterService
Table of Contents
- Gateway API gRPC Core Concepts
- Pattern 1: GRPCRoute Basic Configuration
- Pattern 2: Traffic Splitting and Weighted Routing
- Pattern 3: Canary Deployment and Rollback
- Pattern 4: Header/Method Conditional Routing
- Pattern 5: Retry and Circuit Breaker Strategies
- Pattern 6: Multi-Cluster gRPC Routing
- Pattern 7: Observability and Distributed Tracing
- 5 Common Pitfalls and Solutions
- 10 Common Error Troubleshooting
- Advanced Optimization Tips
- Comparison: Gateway API vs Istio vs Ingress
- Recommended Online Tools
Gateway API gRPC Core Concepts
Why Does gRPC Routing Need a Dedicated Resource?
gRPC is built on HTTP/2, but its routing semantics are fundamentally different from HTTP:
HTTP routing: GET /api/v1/users/123
gRPC routing: package.Service/Method
com.example.api.UserService/GetUser
Ingress path matching is nearly useless for gRPC — you need to match gRPC service and method, not URL paths. Gateway API's GRPCRoute is designed specifically for this.
GRPCRoute Architecture
┌─────────────────────────────────────────────────────────┐
│ Gateway API gRPC │
│ │
│ ┌──────────┐ ┌────────────┐ ┌──────────────────┐ │
│ │GatewayClass│──▶│ Gateway │──▶│ GRPCRoute │ │
│ │(Infra) │ │ (Listeners)│ │ (Routing Rules) │ │
│ └──────────┘ └────────────┘ └────────┬─────────┘ │
│ │ │
│ ┌─────────────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ GRPCRoute Rule │ │
│ │ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Matches │ │ Filters │ │ │
│ │ │ - service │ │ - Header │ │ │
│ │ │ - method │ │ modifier │ │ │
│ │ │ - headers │ │ - Request │ │ │
│ │ └─────────────┘ │ mirror │ │ │
│ │ └─────────────┘ │ │
│ │ ┌──────────────────────────────────────────┐ │ │
│ │ │ BackendRefs │ │ │
│ │ │ - name: user-svc, port: 9090, weight: 80│ │ │
│ │ │ - name: user-svc-v2, port: 9090, weight:20│ │ │
│ │ └──────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Core Resource Relationships
| Resource | Role | Responsibility |
|---|---|---|
| GatewayClass | Infrastructure Admin | Define gateway implementation type (Istio/Envoy Gateway) |
| Gateway | Cluster Operator | Define listeners, TLS, ports |
| GRPCRoute | App Developer | Define gRPC routing rules, traffic splitting |
| BackendTrafficPolicy | Platform Engineer | Define retry, circuit breaker, timeout policies |
Pattern 1: GRPCRoute Basic Configuration
Protobuf Definition
syntax = "proto3";
package com.example.api;
option go_package = "github.com/example/api/gen/go;apipb";
service UserService {
rpc GetUser(GetUserRequest) returns (GetUserResponse);
rpc ListUsers(ListUsersRequest) returns (ListUsersResponse);
rpc CreateUser(CreateUserRequest) returns (CreateUserResponse);
}
service OrderService {
rpc GetOrder(GetOrderRequest) returns (GetOrderResponse);
rpc ListOrders(ListOrdersRequest) returns (ListOrdersResponse);
}
message GetUserRequest {
string user_id = 1;
}
message GetUserResponse {
string user_id = 1;
string name = 2;
string email = 3;
}
Gateway and GRPCRoute Configuration
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: grpc-gateway
namespace: infra
spec:
gatewayClassName: istio
listeners:
- name: grpc
port: 443
protocol: HTTPS
tls:
mode: Terminate
certificateRefs:
- name: grpc-cert
allowedRoutes:
namespaces:
from: All
kinds:
- group: gateway.networking.k8s.io
kind: GRPCRoute
---
apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
name: user-service-route
namespace: app
spec:
parentRefs:
- name: grpc-gateway
namespace: infra
sectionName: grpc
hostnames:
- "grpc.example.com"
rules:
- matches:
- method:
service: "com.example.api.UserService"
method: "GetUser"
backendRefs:
- name: user-service
port: 9090
- matches:
- method:
service: "com.example.api.UserService"
backendRefs:
- name: user-service
port: 9090
- matches:
- method:
service: "com.example.api.OrderService"
backendRefs:
- name: order-service
port: 9090
Go Server Implementation
package main
import (
"context"
"log"
"net"
"os"
pb "github.com/example/api/gen/go"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials"
"google.golang.org/grpc/health"
"google.golang.org/grpc/health/grpc_health_v1"
"google.golang.org/grpc/reflection"
)
type userServiceServer struct {
pb.UnimplementedUserServiceServer
}
func (s *userServiceServer) GetUser(ctx context.Context, req *pb.GetUserRequest) (*pb.GetUserResponse, error) {
return &pb.GetUserResponse{
UserId: req.UserId,
Name: "Zhang San",
Email: "zhang@example.com",
}, nil
}
func (s *userServiceServer) ListUsers(ctx context.Context, req *pb.ListUsersRequest) (*pb.ListUsersResponse, error) {
return &pb.ListUsersResponse{}, nil
}
func (s *userServiceServer) CreateUser(ctx context.Context, req *pb.CreateUserRequest) (*pb.CreateUserResponse, error) {
return &pb.CreateUserResponse{}, nil
}
func main() {
port := os.Getenv("GRPC_PORT")
if port == "" {
port = "9090"
}
creds, err := credentials.NewServerTLSFromFile("/etc/certs/tls.crt", "/etc/certs/tls.key")
if err != nil {
log.Fatalf("failed to load TLS certs: %v", err)
}
srv := grpc.NewServer(grpc.Creds(creds))
pb.RegisterUserServiceServer(srv, &userServiceServer{})
hs := health.NewServer()
grpc_health_v1.RegisterHealthServer(srv, hs)
hs.SetServingStatus("com.example.api.UserService", grpc_health_v1.HealthCheckResponse_SERVING)
reflection.Register(srv)
lis, err := net.Listen("tcp", ":"+port)
if err != nil {
log.Fatalf("failed to listen: %v", err)
}
log.Printf("gRPC server listening on :%s", port)
if err := srv.Serve(lis); err != nil {
log.Fatalf("failed to serve: %v", err)
}
}
Verify Routing
kubectl get grpcroute -n app
# NAME PARENTREFS HOSTNAMES AGE
# user-service-route grpc-gateway ["grpc.example.com"] 5m
kubectl get grpcroute user-service-route -n app -o yaml | grep -A5 "accepted"
# conditions:
# - lastTransitionTime: "2026-06-16T10:00:00Z"
# message: Route was accepted
# reason: Accepted
# status: "True"
grpcurl -grpc.example.com:443 com.example.api.UserService/GetUser \
-d '{"user_id": "123"}' -insecure
Pattern 2: Traffic Splitting and Weighted Routing
Scenario: UserService v1/v2 Weighted Split
┌──────────────┐
│ GRPCRoute │
│ weight: 80 │──────────▶ UserService v1 (9090)
│ weight: 20 │──────────▶ UserService v2 (9090)
└──────────────┘
apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
name: user-service-weighted
namespace: app
spec:
parentRefs:
- name: grpc-gateway
namespace: infra
sectionName: grpc
hostnames:
- "grpc.example.com"
rules:
- matches:
- method:
service: "com.example.api.UserService"
backendRefs:
- name: user-service-v1
port: 9090
weight: 80
- name: user-service-v2
port: 9090
weight: 20
Dynamic Weight Adjustment
kubectl patch grpcroute user-service-weighted -n app --type='json' \
-p='[{"op": "replace", "path": "/spec/rules/0/backendRefs/0/weight", "value": 60},
{"op": "replace", "path": "/spec/rules/0/backendRefs/1/weight", "value": 40}]'
kubectl patch grpcroute user-service-weighted -n app --type='json' \
-p='[{"op": "replace", "path": "/spec/rules/0/backendRefs/0/weight", "value": 0},
{"op": "replace", "path": "/spec/rules/0/backendRefs/1/weight", "value": 100}]'
Weight Routing Verification Script
v1Count=0
v2Count=0
total=100
for i in $(seq 1 $total); do
response=$(grpcurl -grpc.example.com:443 com.example.api.UserService/GetUser \
-d '{"user_id": "123"}' -insecure 2>&1)
if echo "$response" | grep -q "v1"; then
v1Count=$((v1Count + 1))
else
v2Count=$((v2Count + 1))
fi
done
echo "v1: $v1Count/$total ($((v1Count * 100 / total))%)"
echo "v2: $v2Count/$total ($((v2Count * 100 / total))%)"
Pattern 3: Canary Deployment and Rollback
Complete Canary Deployment Flow
┌───────────────────────────────────────────────────────────┐
│ gRPC Canary Deployment Flow │
│ │
│ Step 1 Step 2 Step 3 Step 4 │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │100%v1│────▶│95%v1 │────▶│80%v1 │────▶│50%v1 │ │
│ │ 0%v2│ │ 5%v2 │ │20%v2 │ │50%v2 │ │
│ └──────┘ └──────┘ └──────┘ └──────┘ │
│ │ │
│ ┌───────────────┘ │
│ ▼ │
│ Step 5 (Success) Step 5 (Failure) │
│ ┌──────┐ ┌──────┐ │
│ │ 0%v1 │ │100%v1│◀── Immediate rollback │
│ │100%v2│ │ 0%v2│ │
│ └──────┘ └──────┘ │
└───────────────────────────────────────────────────────────┘
Canary Deployment GRPCRoute
apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
name: user-service-canary
namespace: app
annotations:
argocd.argoproj.io/sync-options: Prune=false
spec:
parentRefs:
- name: grpc-gateway
namespace: infra
sectionName: grpc
hostnames:
- "grpc.example.com"
rules:
- matches:
- method:
service: "com.example.api.UserService"
backendRefs:
- name: user-service-v1
port: 9090
weight: 95
- name: user-service-v2
port: 9090
weight: 5
filters:
- type: ResponseHeaderModifier
responseHeaderModifier:
add:
- name: X-Backend-Version
value: "canary-v2"
Argo Rollouts Integration
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: user-service-rollout
namespace: app
spec:
replicas: 10
strategy:
canary:
canaryService: user-service-v2
stableService: user-service-v1
trafficRouting:
plugins:
argoproj-labs/gatewayAPI:
grpcRoute:
name: user-service-canary
namespace: app
steps:
- setWeight: 5
- pause: { duration: 5m }
- setWeight: 20
- pause: { duration: 10m }
- setWeight: 50
- pause: { duration: 15m }
- setWeight: 80
- pause: { duration: 10m }
- setWeight: 100
- pause: {}
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
spec:
containers:
- name: user-service
image: example/user-service:v2.0.0
ports:
- containerPort: 9090
readinessProbe:
exec:
command: ["/bin/grpc_health_probe", "-addr=:9090"]
initialDelaySeconds: 5
periodSeconds: 10
One-Click Rollback
kubectl patch grpcroute user-service-canary -n app --type='json' \
-p='[{"op": "replace", "path": "/spec/rules/0/backendRefs/0/weight", "value": 100},
{"op": "replace", "path": "/spec/rules/0/backendRefs/1/weight", "value": 0}]'
kubectl rollout undo rollout/user-service-rollout -n app
Pattern 4: Header/Method Conditional Routing
gRPC Header-Based Routing
apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
name: user-service-header-route
namespace: app
spec:
parentRefs:
- name: grpc-gateway
namespace: infra
sectionName: grpc
hostnames:
- "grpc.example.com"
rules:
- matches:
- method:
service: "com.example.api.UserService"
headers:
- type: Exact
name: x-env
value: "staging"
backendRefs:
- name: user-service-staging
port: 9090
- matches:
- method:
service: "com.example.api.UserService"
headers:
- type: Exact
name: x-env
value: "canary"
backendRefs:
- name: user-service-v2
port: 9090
- matches:
- method:
service: "com.example.api.UserService"
backendRefs:
- name: user-service-v1
port: 9090
Method-Based Fine-Grained Routing
apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
name: user-service-method-route
namespace: app
spec:
parentRefs:
- name: grpc-gateway
namespace: infra
sectionName: grpc
hostnames:
- "grpc.example.com"
rules:
- matches:
- method:
service: "com.example.api.UserService"
method: "GetUser"
backendRefs:
- name: user-read-service
port: 9090
- matches:
- method:
service: "com.example.api.UserService"
method: "CreateUser"
backendRefs:
- name: user-write-service
port: 9090
- matches:
- method:
service: "com.example.api.UserService"
method: "ListUsers"
backendRefs:
- name: user-read-service
port: 9090
weight: 90
- name: user-read-service-v2
port: 9090
weight: 10
Header Modification Filter
apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
name: user-service-header-modify
namespace: app
spec:
parentRefs:
- name: grpc-gateway
namespace: infra
sectionName: grpc
hostnames:
- "grpc.example.com"
rules:
- matches:
- method:
service: "com.example.api.UserService"
filters:
- type: RequestHeaderModifier
requestHeaderModifier:
add:
- name: x-gateway-source
value: "gateway-api"
set:
- name: x-trace-id
value: "auto-generated"
remove:
- "x-internal-token"
backendRefs:
- name: user-service
port: 9090
Go Client with Custom Headers
package main
import (
"context"
"log"
"time"
pb "github.com/example/api/gen/go"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials"
"google.golang.org/grpc/metadata"
)
func main() {
creds, err := credentials.NewClientTLSFromFile("/etc/certs/ca.crt", "grpc.example.com")
if err != nil {
log.Fatalf("failed to load TLS creds: %v", err)
}
conn, err := grpc.Dial("grpc.example.com:443",
grpc.WithTransportCredentials(creds),
grpc.WithTimeout(10*time.Second),
)
if err != nil {
log.Fatalf("failed to connect: %v", err)
}
defer conn.Close()
client := pb.NewUserServiceClient(conn)
ctx := metadata.AppendToOutgoingContext(context.Background(),
"x-env", "canary",
"x-request-id", "req-12345",
)
resp, err := client.GetUser(ctx, &pb.GetUserRequest{UserId: "123"})
if err != nil {
log.Fatalf("GetUser failed: %v", err)
}
log.Printf("Response: %+v", resp)
}
Pattern 5: Retry and Circuit Breaker Strategies
BackendTrafficPolicy Configuration
apiVersion: gateway.networking.k8s.io/v1alpha3
kind: BackendTrafficPolicy
metadata:
name: user-service-retry
namespace: app
spec:
targetRef:
group: gateway.networking.k8s.io
kind: GRPCRoute
name: user-service-route
retry:
retryOn:
- 5xx
- connect-failure
- retriable-status-codes
retryOnStatusCodes:
- 503
- 14
attempts: 3
backoff:
defaultDuration: "100ms"
maxDuration: "5s"
connectionPool:
maxConnections: 1000
maxPendingRequests: 500
maxRequestsPerConnection: 100
circuitBreaker:
consecutiveFailures: 5
consecutiveGatewayFailures: 3
interval: "30s"
baseEjectionTime: "30s"
maxEjectionPercent: 50
Timeout Policy
apiVersion: gateway.networking.k8s.io/v1alpha3
kind: BackendTrafficPolicy
metadata:
name: user-service-timeout
namespace: app
spec:
targetRef:
group: gateway.networking.k8s.io
kind: GRPCRoute
name: user-service-route
timeout:
tcp: {
connectTimeout: "5s"
}
http: {
requestTimeout: "30s"
}
rateLimit:
type: Global
global:
rules:
- clientSelectors:
- headers:
- name: x-api-key
limit:
requestsPerUnit: 1000
unit: Minute
Go Server Health Check and Graceful Shutdown
package main
import (
"context"
"log"
"net"
"os"
"os/signal"
"syscall"
"time"
pb "github.com/example/api/gen/go"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials"
"google.golang.org/grpc/health"
"google.golang.org/grpc/health/grpc_health_v1"
)
type userServiceServer struct {
pb.UnimplementedUserServiceServer
}
func (s *userServiceServer) GetUser(ctx context.Context, req *pb.GetUserRequest) (*pb.GetUserResponse, error) {
return &pb.GetUserResponse{
UserId: req.UserId,
Name: "Zhang San",
Email: "zhang@example.com",
}, nil
}
func main() {
creds, err := credentials.NewServerTLSFromFile("/etc/certs/tls.crt", "/etc/certs/tls.key")
if err != nil {
log.Fatalf("failed to load TLS certs: %v", err)
}
srv := grpc.NewServer(grpc.Creds(creds))
pb.RegisterUserServiceServer(srv, &userServiceServer{})
hs := health.NewServer()
grpc_health_v1.RegisterHealthServer(srv, hs)
lis, err := net.Listen("tcp", ":9090")
if err != nil {
log.Fatalf("failed to listen: %v", err)
}
go func() {
sigCh := make(chan os.Signal, 1)
signal.Notify(sigCh, syscall.SIGTERM, syscall.SIGINT)
<-sigCh
hs.SetServingStatus("", grpc_health_v1.HealthCheckResponse_NOT_SERVING)
hs.SetServingStatus("com.example.api.UserService", grpc_health_v1.HealthCheckResponse_NOT_SERVING)
log.Println("graceful shutdown: draining connections...")
srv.GracefulStop()
}()
log.Printf("gRPC server listening on :9090")
if err := srv.Serve(lis); err != nil {
log.Fatalf("failed to serve: %v", err)
}
<-time.After(15 * time.Second)
log.Println("server shutdown complete")
}
Circuit Breaker State Monitoring
kubectl get backendtrafficpolicy -n app
# NAME TARGETKIND TARGETNAME AGE
# user-service-retry GRPCRoute user-service-route 5m
# user-service-timeout GRPCRoute user-service-route 5m
istioctl dashboard envoy user-service-v1-xxxx.app
Pattern 6: Multi-Cluster gRPC Routing
Multi-Cluster Architecture
┌─────────────────────────────────────────────────────────────┐
│ Multi-Cluster gRPC Routing Architecture │
│ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Cluster A │ │ Cluster B │ │
│ │ us-west-1 │ │ us-east-1 │ │
│ │ │ │ │ │
│ │ ┌─────────┐│ │┌─────────┐ │ │
│ │ │Gateway ││◀──────▶││Gateway │ │ │
│ │ │(East-West)│ ││(East-West)│ │ │
│ │ └─────────┘│ │└─────────┘ │ │
│ │ │ │ │ │
│ │ ┌─────────┐│ │┌─────────┐ │ │
│ │ │UserSvc ││ ││UserSvc │ │ │
│ │ │weight:60││ ││weight:40│ │ │
│ │ └─────────┘│ │└─────────┘ │ │
│ └─────────────┘ └─────────────┘ │
│ ▲ ▲ │
│ │ │ │
│ └───────┬───────────────┘ │
│ │ │
│ ┌──────┴──────┐ │
│ │ GRPCRoute │ │
│ │ MultiCluster│ │
│ │ Service │ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
ServiceImport Configuration
apiVersion: gateway.networking.k8s.io/v1beta1
kind: ServiceImport
metadata:
name: user-service-import
namespace: app
spec:
type: ClusterSetIP
ports:
- port: 9090
protocol: TCP
resolution: DNS
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: MultiClusterService
metadata:
name: user-service-global
namespace: app
spec:
serviceImport:
name: user-service-import
namespace: app
clusterBackends:
- cluster: us-west-1
weight: 60
- cluster: us-east-1
weight: 40
Cross-Cluster GRPCRoute
apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
name: user-service-multi-cluster
namespace: app
spec:
parentRefs:
- name: grpc-gateway
namespace: infra
sectionName: grpc
hostnames:
- "grpc.example.com"
rules:
- matches:
- method:
service: "com.example.api.UserService"
backendRefs:
- group: gateway.networking.k8s.io
kind: ServiceImport
name: user-service-import
port: 9090
weight: 100
Failover Configuration
apiVersion: gateway.networking.k8s.io/v1alpha3
kind: BackendTrafficPolicy
metadata:
name: user-service-failover
namespace: app
spec:
targetRef:
group: gateway.networking.k8s.io
kind: GRPCRoute
name: user-service-multi-cluster
failover:
strategy: RegionFailover
regionFailover:
- from: us-west-1
to: us-east-1
- from: us-east-1
to: us-west-1
retry:
retryOn:
- 5xx
- connect-failure
attempts: 2
backoff:
defaultDuration: "200ms"
maxDuration: "3s"
Pattern 7: Observability and Distributed Tracing
OpenTelemetry Integration
apiVersion: gateway.networking.k8s.io/v1alpha3
kind: ObservabilityPolicy
metadata:
name: grpc-observability
namespace: infra
spec:
targetRef:
group: gateway.networking.k8s.io
kind: Gateway
name: grpc-gateway
tracing:
provider:
type: OTel
backendRef:
group: ""
kind: Service
name: otel-collector
port: 4317
samplingRate: 10
customTags:
- name: cluster
literal:
value: "us-west-1"
- name: environment
env:
name: DEPLOY_ENV
accessLog:
type: OpenTelemetry
backendRef:
group: ""
kind: Service
name: otel-collector
port: 4317
format:
type: JSON
gRPC Metrics Collection
apiVersion: v1
kind: ConfigMap
metadata:
name: grpc-metrics-config
namespace: infra
data:
otel-collector-config: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 1024
filter:
error_mode: ignore
traces:
span:
- 'attributes["rpc.system"] == "grpc"'
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
namespace: grpc_gateway
otlp:
endpoint: "jaeger-collector:4317"
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, filter]
exporters: [otlp]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
namespace: infra
spec:
replicas: 2
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector-contrib:0.96.0
ports:
- containerPort: 4317
- containerPort: 4318
- containerPort: 8889
volumeMounts:
- name: config
mountPath: /etc/otelcol-contrib/config.yaml
subPath: otel-collector-config
volumes:
- name: config
configMap:
name: grpc-metrics-config
Prometheus gRPC Alerting Rules
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: grpc-gateway-rules
namespace: infra
spec:
groups:
- name: grpc_gateway.rules
rules:
- alert: GRPCHighErrorRate
expr: |
sum(rate(grpc_server_handled_total{grpc_code!="OK"}[5m]))
/
sum(rate(grpc_server_handled_total[5m]))
> 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "gRPC error rate exceeds 5%"
description: "gRPC service {{ $labels.grpc_method }} error rate is {{ $value | humanizePercentage }}"
- alert: GRPCHighLatency
expr: |
histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket[5m])) by (le, grpc_method))
> 1.0
for: 5m
labels:
severity: warning
annotations:
summary: "gRPC P99 latency exceeds 1s"
- alert: GRPCCircuitBreakerOpen
expr: |
increase(envoy_cluster_circuit_breakers_open_circuit[1m]) > 0
for: 1m
labels:
severity: critical
annotations:
summary: "gRPC circuit breaker is open"
Go Server OpenTelemetry Instrumentation
package main
import (
"context"
"log"
"net"
"net/http"
pb "github.com/example/api/gen/go"
"go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
"go.opentelemetry.io/otel/propagation"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials"
"google.golang.org/grpc/health"
"google.golang.org/grpc/health/grpc_health_v1"
)
func initTracer() func(context.Context) error {
exporter, err := otlptracegrpc.New(context.Background(),
otlptracegrpc.WithEndpoint("otel-collector.infra:4317"),
otlptracegrpc.WithInsecure(),
)
if err != nil {
log.Fatalf("failed to create OTel exporter: %v", err)
}
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(nil),
)
otel.SetTracerProvider(tp)
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(
propagation.TraceContext{},
propagation.Baggage{},
))
return tp.Shutdown
}
type userServiceServer struct {
pb.UnimplementedUserServiceServer
}
func (s *userServiceServer) GetUser(ctx context.Context, req *pb.GetUserRequest) (*pb.GetUserResponse, error) {
return &pb.GetUserResponse{
UserId: req.UserId,
Name: "Zhang San",
Email: "zhang@example.com",
}, nil
}
func main() {
shutdown := initTracer()
defer shutdown(context.Background())
creds, err := credentials.NewServerTLSFromFile("/etc/certs/tls.crt", "/etc/certs/tls.key")
if err != nil {
log.Fatalf("failed to load TLS certs: %v", err)
}
srv := grpc.NewServer(
grpc.Creds(creds),
grpc.StatsHandler(otelgrpc.NewServerHandler()),
)
pb.RegisterUserServiceServer(srv, &userServiceServer{})
hs := health.NewServer()
grpc_health_v1.RegisterHealthServer(srv, hs)
hs.SetServingStatus("com.example.api.UserService", grpc_health_v1.HealthCheckResponse_SERVING)
lis, err := net.Listen("tcp", ":9090")
if err != nil {
log.Fatalf("failed to listen: %v", err)
}
go func() {
http.ListenAndServe(":2222", nil)
}()
log.Printf("gRPC server with OTel tracing listening on :9090")
if err := srv.Serve(lis); err != nil {
log.Fatalf("failed to serve: %v", err)
}
}
5 Common Pitfalls and Solutions
Pitfall 1: GRPCRoute Method Match Format Error
Symptom: GRPCRoute created, but gRPC requests never match routing rules, returning UNIMPLEMENTED.
Cause: The method field format must be package.Service/Method, not URL path format.
rules:
- matches:
- method:
service: "com.example.api.UserService"
method: "GetUser"
Solution: Check the protobuf package and service definitions. Ensure the service field in GRPCRoute matches the proto exactly. Use grpcurl list to verify service names.
Pitfall 2: Gateway Listener Not Allowing GRPCRoute
Symptom: GRPCRoute status shows Accepted: False with reason NotAllowedByListeners.
Cause: Gateway's allowedRoutes.kinds does not include GRPCRoute.
listeners:
- name: grpc
port: 443
protocol: HTTPS
allowedRoutes:
namespaces:
from: All
kinds:
- group: gateway.networking.k8s.io
kind: GRPCRoute
Solution: Explicitly declare kinds to allow GRPCRoute in the Gateway listener.
Pitfall 3: gRPC Service Uses HTTP/2 but Gateway Listens on HTTP
Symptom: gRPC client connection fails with protocol error.
Cause: gRPC requires HTTP/2, but the Gateway listener protocol is set to HTTP.
Solution: Use HTTPS protocol for gRPC over TLS, or configure h2c appropriately.
listeners:
- name: grpc-plaintext
port: 80
protocol: HTTP
tls:
mode: Passthrough
- name: grpc-tls
port: 443
protocol: HTTPS
tls:
mode: Terminate
certificateRefs:
- name: grpc-cert
Pitfall 4: BackendTrafficPolicy Naming Conflict with GRPCRoute
Symptom: Multiple BackendTrafficPolicies target the same GRPCRoute, only the last one takes effect.
Cause: Each GRPCRoute can only be associated with one BackendTrafficPolicy.
Solution: Merge policies into a single BackendTrafficPolicy, or split GRPCRoutes by routing.
Pitfall 5: gRPC Health Check and Gateway Readiness Probe Mismatch
Symptom: Pod is Running but Gateway routing doesn't work, traffic is rejected.
Cause: gRPC service starts slowly, health check hasn't passed, but K8s readiness probe has already passed.
Solution: Ensure gRPC health check service uses the same port and logic as the K8s readiness probe.
readinessProbe:
exec:
command: ["/bin/grpc_health_probe", "-addr=:9090", "-service=com.example.api.UserService"]
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
livenessProbe:
exec:
command: ["/bin/grpc_health_probe", "-addr=:9090"]
initialDelaySeconds: 30
periodSeconds: 10
10 Common Error Troubleshooting
| # | Error Message | Cause | Solution |
|---|---|---|---|
| 1 | GRPCRoute not accepted: NoMatchingParent |
parentRefs references non-existent Gateway | Check Gateway name, namespace, sectionName |
| 2 | UNIMPLEMENTED: unknown method |
GRPCRoute method match format error | Use grpcurl list to verify service name format as package.Service |
| 3 | protocol error: HTTP/2 required |
Gateway listener protocol mismatch | Use HTTPS for gRPC over TLS, HTTP for h2c |
| 4 | connection refused: backend unhealthy |
Backend gRPC service not ready | Check Pod status and health probe, confirm gRPC health probe passes |
| 5 | GRPCRoute condition ResolvedRefs=False |
backendRef Service doesn't exist | Check Service name, namespace, port |
| 6 | certificate not found for TLS |
TLS cert missing or cross-namespace | Ensure cert is in Gateway's namespace, use cert-manager |
| 7 | weight sum is zero |
All backendRef weights are 0 | At least one backendRef weight must be > 0 |
| 8 | BackendTrafficPolicy conflict |
Multiple policies target the same Route | Merge policies or split GRPCRoute |
| 9 | ServiceImport DNS resolution failed |
Multi-cluster control plane not connected | Check east-west gateway, cross-cluster network connectivity |
| 10 | circuit breaker open: ejection threshold exceeded |
Backend consecutive failures triggered circuit breaker | Check backend health, adjust circuit breaker thresholds |
Advanced Optimization Tips
1. gRPC Keepalive Optimization
apiVersion: v1
kind: ConfigMap
metadata:
name: grpc-keepalive-config
namespace: app
data:
keepalive.json: |
{
"keepalive": {
"maxConnectionIdle": "300s",
"maxConnectionAge": "1800s",
"maxConnectionAgeGrace": "30s",
"time": "60s",
"timeout": "20s"
}
}
2. Health-Based Traffic Switching
apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
name: user-service-health-route
namespace: app
spec:
parentRefs:
- name: grpc-gateway
namespace: infra
sectionName: grpc
hostnames:
- "grpc.example.com"
rules:
- matches:
- method:
service: "com.example.api.UserService"
backendRefs:
- name: user-service-primary
port: 9090
weight: 100
- name: user-service-secondary
port: 9090
weight: 0
filters:
- type: RequestHeaderModifier
requestHeaderModifier:
add:
- name: x-failover-enabled
value: "true"
3. Gateway API gRPC and HTTPRoute Coexistence
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: multi-protocol-gateway
namespace: infra
spec:
gatewayClassName: istio
listeners:
- name: http
port: 80
protocol: HTTP
allowedRoutes:
namespaces:
from: All
kinds:
- group: gateway.networking.k8s.io
kind: HTTPRoute
- name: grpc-https
port: 443
protocol: HTTPS
tls:
mode: Terminate
certificateRefs:
- name: wildcard-cert
allowedRoutes:
namespaces:
from: All
kinds:
- group: gateway.networking.k8s.io
kind: GRPCRoute
- name: grpc-internal
port: 15443
protocol: HTTPS
tls:
mode: Passthrough
allowedRoutes:
namespaces:
from: Selector
selectorLabels:
internal-grpc: "true"
kinds:
- group: gateway.networking.k8s.io
kind: GRPCRoute
4. gRPC Reflection and Debugging
grpcurl -plaintext grpc.example.com:443 list
# com.example.api.UserService
# com.example.api.OrderService
# grpc.health.v1.Health
# grpc.reflection.v1.ServerReflection
grpcurl -plaintext grpc.example.com:443 list com.example.api.UserService
# com.example.api.UserService.GetUser
# com.example.api.UserService.ListUsers
# com.example.api.UserService.CreateUser
grpcurl -plaintext grpc.example.com:443 describe com.example.api.UserService.GetUser
# com.example.api.UserService.GetUser is a method:
# rpc GetUser(.com.example.api.GetUserRequest) returns (.com.example.api.GetUserResponse) {}
Comparison: Gateway API vs Istio vs Ingress
| Dimension | Ingress + nginx-ingress | Istio VirtualService | Gateway API GRPCRoute |
|---|---|---|---|
| gRPC method matching | Not supported, annotation hack | Native support | Native GRPCRoute.method |
| Traffic splitting | Annotation canary-weight | weight field | Native weight field |
| Canary deployment | Annotation, controller-incompatible | VirtualService + DestinationRule | GRPCRoute + BackendTrafficPolicy |
| Header routing | Limited annotation support | Complete match conditions | Native headers matching |
| Retry/Circuit breaker | Not supported | DestinationRule | BackendTrafficPolicy |
| Multi-cluster | Not supported | ServiceEntry + WorkloadEntry | ServiceImport + MultiClusterService |
| Observability | External integration needed | Native telemetry | ObservabilityPolicy |
| Role separation | None | Partial | Complete three-role model |
| Standardization | Varies by controller | Istio-specific | Kubernetes standard |
| Learning curve | Low | High | Medium |
| Production readiness | Mature | Mature | 2026 GA mature |
Recommended Online Tools
- JSON/YAML Formatter: /en/json/format
- Base64 Encode/Decode (certificate handling): /en/encode/base64
- curl to Code (API testing): /en/dev/curl-to-code
Related Posts
- K8s Gateway API Service Mesh Complete Guide
- K8s Gateway API Migration in Practice
- Go gRPC Performance Tuning
External References
Try these browser-local tools — no sign-up required →