Go GraphQL Federation: 6 Production Patterns from Subgraph to Supergraph
When Monolithic GraphQL Schema Meets Team Boundaries: The Microservice GraphQL Dilemma
An e-commerce platform's GraphQL Schema balloons to 3,000 lines. The user team, product team, and order team all modify the same schema. Every release requires coordination — one team's breaking change takes down the entire API. Worse, cross-service N+1 queries push response times from 50ms to 3s — a user queries order lists, each order needs product details, each product needs inventory status, and 100 orders means 300 downstream calls.
This isn't hypothetical. When your microservice architecture is already split but the GraphQL layer remains monolithic, team autonomy and API performance become irreconcilable conflicts. GraphQL Federation exists to solve this — each service owns its GraphQL Schema (subgraph), composed into a unified API (supergraph) via a gateway, while preventing N+1 queries and cross-team coupling.
Core Concepts Reference
| Concept | Purpose | Key Features | Typical Use Case |
|---|---|---|---|
Federation |
Compose multiple GraphQL services into a unified API | Transparent to clients, each service deploys independently | Unified API layer in microservice architecture |
Subgraph |
A single service's GraphQL Schema | Owns independent types and resolvers, declares entities via @key | User service, product service, order service |
Supergraph |
Complete schema composed from all subgraphs | Auto-synthesized by gateway, clients only see the supergraph | Unified API entry point |
Entity |
Type shared across subgraphs | Identified by @key, multiple subgraphs can contribute fields | User, Product, Order and other core domain objects |
@key |
Declares unique identifier fields for an entity | Supports composite keys, multiple @keys for alternate identifiers | @key(fields: "id") or @key(fields: "sku warehouseId") |
Gateway |
Federation query routing and execution engine | Query planning, batch entity resolution, caching | Apollo Router, Apollo Gateway |
Schema Stitching |
Manually composing multiple GraphQL Schemas | More flexible but requires manual conflict resolution | Custom composition logic, non-standard federation scenarios |
5 Challenges of GraphQL Federation Architecture
Challenge 1: Unclear Entity Boundary Definition
The user service has User's name and email, while the order service also has User but only cares about id and order list. If all User fields live in the user service, the order service must make cross-service calls every time. If scattered across services, entity ownership and consistency become problematic.
Challenge 2: N+1 Queries Amplified at the Federation Layer
A client queries { orders { user { name } } }. The gateway first fetches orders from the order service, then resolves the User entity for each order's userId from the user service. 100 orders means 100 User entity resolution requests — a performance disaster.
Challenge 3: Schema Evolution and Compatibility
The product service wants to add a required field to Product, but the order service's Product reference may not be compatible. A subgraph's breaking change can affect the entire supergraph, but who performs global compatibility checks?
Challenge 4: Authentication and Authorization Passthrough
JWT tokens need to propagate from the gateway to every subgraph. Different subgraphs may have different permission models. The user service needs user:read permission, the order service needs order:read — how to handle this uniformly at the gateway layer?
Challenge 5: Observability and Error Tracing
A single query may involve 3 subgraphs. When a query fails, which subgraph produced the error? Where is the latency bottleneck? How does distributed tracing propagate correctly through the GraphQL layer?
6 Production-Grade Federation Patterns
Pattern 1: Subgraph Service Definition — gqlgen Foundation
The subgraph is the fundamental unit of federation. Use gqlgen to generate the GraphQL service, declare federation directives, and define entity types.
GraphQL Schema (users.graphqls):
extend schema
@link(url: "https://specs.apollo.dev/federation/v2.0",
import: ["@key", "@shareable", "@external", "@requires"])
type User @key(fields: "id") @key(fields: "email") {
id: ID!
email: String!
name: String!
avatar: String
createdAt: String!
orders: [Order!]!
}
type Order @key(fields: "id") @shareable {
id: ID!
userId: ID!
items: [OrderItem!]!
total: Float!
status: OrderStatus!
}
enum OrderStatus {
PENDING
CONFIRMED
SHIPPED
DELIVERED
CANCELLED
}
type OrderItem {
productId: ID!
quantity: Int!
price: Float!
}
Go Resolver Implementation:
package graph
import (
"context"
"fmt"
"github.com/99designs/gqlgen/graphql"
"github.com/99designs/gqlgen/graphql/handler"
"github.com/99designs/gqlgen/graphql/handler/extension"
"github.com/99designs/gqlgen/graphql/handler/transport"
)
type User struct {
ID string `json:"id"`
Email string `json:"email"`
Name string `json:"name"`
Avatar string `json:"avatar,omitempty"`
CreatedAt string `json:"createdAt"`
}
type Order struct {
ID string `json:"id"`
UserID string `json:"userId"`
Items []OrderItem `json:"items"`
Total float64 `json:"total"`
Status string `json:"status"`
}
type OrderItem struct {
ProductID string `json:"productId"`
Quantity int `json:"quantity"`
Price float64 `json:"price"`
}
type Resolver struct {
userRepo UserRepository
orderRepo OrderRepository
}
func NewResolver(userRepo UserRepository, orderRepo OrderRepository) *Resolver {
return &Resolver{userRepo: userRepo, orderRepo: orderRepo}
}
func (r *Resolver) User(ctx context.Context, id string) (*User, error) {
user, err := r.userRepo.FindByID(ctx, id)
if err != nil {
return nil, fmt.Errorf("user not found: %w", err)
}
return user, nil
}
func (r *Resolver) Users(ctx context.Context, limit int, offset int) ([]*User, error) {
return r.userRepo.List(ctx, limit, offset)
}
func (r *entityResolver) FindUserByID(ctx context.Context, id string) (*User, error) {
return r.userRepo.FindByID(ctx, id)
}
func (r *entityResolver) FindUserByEmail(ctx context.Context, email string) (*User, error) {
return r.userRepo.FindByEmail(ctx, email)
}
func NewGraphQLServer(resolver *Resolver) *handler.Server {
srv := handler.New(NewExecutableSchema(Config{Resolvers: resolver}))
srv.AddTransport(transport.POST{})
srv.AddTransport(transport.GET{})
srv.Use(extension.Introspection{})
return srv
}
gqlgen Configuration (gqlgen.yml):
schema:
- users.graphqls
exec:
filename: graph/generated.go
model:
filename: graph/model/models_gen.go
resolver:
filename: graph/resolver.go
type: Resolver
federation:
filename: graph/federation.go
package: graph
Pattern 2: Entity Resolution with @key Directive — Cross-Service Type Stitching
@key declares an entity's identifier fields. The gateway resolves entities across subgraphs via the __resolveReference function. This is the core mechanism of federation.
Product Subgraph Schema (products.graphqls):
extend schema
@link(url: "https://specs.apollo.dev/federation/v2.0",
import: ["@key", "@shareable", "@external", "@requires", "@provides"])
type Product @key(fields: "id") @key(fields: "sku") {
id: ID!
sku: String!
name: String!
description: String
price: Float!
inventory: Int!
category: Category!
reviews: [Review!]! @provides(fields: "rating")
}
type Category @key(fields: "id") {
id: ID!
name: String!
parent: Category
}
type Review {
id: ID!
userId: ID!
productId: ID!
rating: Int!
comment: String
}
type Query {
product(id: ID!): Product
products(categoryId: ID, limit: Int, offset: Int): [Product!]!
category(id: ID!): Category
}
Go Entity Resolver:
package graph
import (
"context"
"fmt"
)
type Product struct {
ID string `json:"id"`
SKU string `json:"sku"`
Name string `json:"name"`
Description string `json:"description,omitempty"`
Price float64 `json:"price"`
Inventory int `json:"inventory"`
CategoryID string `json:"categoryId"`
}
type Category struct {
ID string `json:"id"`
Name string `json:"name"`
ParentID string `json:"parentId,omitempty"`
}
type ProductRepository interface {
FindByID(ctx context.Context, id string) (*Product, error)
FindBySKU(ctx context.Context, sku string) (*Product, error)
ListByCategory(ctx context.Context, categoryID string, limit, offset int) ([]*Product, error)
}
type entityResolver struct {
productRepo ProductRepository
categoryRepo CategoryRepository
}
func (r *entityResolver) FindProductByID(ctx context.Context, id string) (*Product, error) {
product, err := r.productRepo.FindByID(ctx, id)
if err != nil {
return nil, fmt.Errorf("product entity resolution failed for id=%s: %w", id, err)
}
return product, nil
}
func (r *entityResolver) FindProductBySKU(ctx context.Context, sku string) (*Product, error) {
product, err := r.productRepo.FindBySKU(ctx, sku)
if err != nil {
return nil, fmt.Errorf("product entity resolution failed for sku=%s: %w", sku, err)
}
return product, nil
}
func (r *entityResolver) FindCategoryByID(ctx context.Context, id string) (*Category, error) {
category, err := r.categoryRepo.FindByID(ctx, id)
if err != nil {
return nil, fmt.Errorf("category entity resolution failed for id=%s: %w", id, err)
}
return category, nil
}
func (r *Resolver) Product(ctx context.Context, id string) (*Product, error) {
return r.productRepo.FindByID(ctx, id)
}
func (r *Resolver) Products(ctx context.Context, categoryId *string, limit *int, offset *int) ([]*Product, error) {
lim := 20
off := 0
if limit != nil {
lim = *limit
}
if offset != nil {
off = *offset
}
if categoryId != nil {
return r.productRepo.ListByCategory(ctx, *categoryId, lim, off)
}
return r.productRepo.List(ctx, lim, off)
}
Composite Key Entity:
type WarehouseStock @key(fields: "sku warehouseId") {
sku: String!
warehouseId: ID!
quantity: Int!
reservedQuantity: Int!
location: String!
}
type WarehouseStock struct {
SKU string `json:"sku"`
WarehouseID string `json:"warehouseId"`
Quantity int `json:"quantity"`
ReservedQuantity int `json:"reservedQuantity"`
Location string `json:"location"`
}
type WarehouseStockRef struct {
SKU string `json:"sku"`
WarehouseID string `json:"warehouseId"`
}
func (r *entityResolver) FindWarehouseStockBySkuAndWarehouseId(
ctx context.Context,
sku string,
warehouseId string,
) (*WarehouseStock, error) {
stock, err := r.stockRepo.FindBySKUAndWarehouse(ctx, sku, warehouseId)
if err != nil {
return nil, fmt.Errorf("warehouse stock resolution failed: %w", err)
}
return stock, nil
}
Pattern 3: Apollo Federation v2 Composition — From Subgraph to Supergraph
Federation v2 introduces @link, @shareable, @override and other new directives for more flexible schema composition. Use the rover CLI for schema checking and publishing.
Supergraph Configuration (supergraph.yaml):
federation_version: =2.8.0
subgraphs:
users:
routing_url: http://users-service:4001/graphql
schema:
file: ./schemas/users.graphqls
products:
routing_url: http://products-service:4002/graphql
schema:
file: ./schemas/products.graphqls
orders:
routing_url: http://orders-service:4003/graphql
schema:
file: ./schemas/orders.graphqls
reviews:
routing_url: http://reviews-service:4004/graphql
schema:
file: ./schemas/reviews.graphqls
Order Subgraph Schema (orders.graphqls):
extend schema
@link(url: "https://specs.apollo.dev/federation/v2.0",
import: ["@key", "@shareable", "@external", "@requires"])
type Order @key(fields: "id") {
id: ID!
userId: ID!
items: [OrderItem!]!
total: Float!
status: OrderStatus!
shippingAddress: Address
createdAt: String!
user: User @requires(fields: "userId")
}
type OrderItem {
productId: ID!
quantity: Int!
unitPrice: Float!
product: Product
}
type Address @shareable {
street: String!
city: String!
state: String!
zipCode: String!
country: String!
}
type User @key(fields: "id") @shareable {
id: ID! @external
orders: [Order!]!
}
type Product @key(fields: "id") @shareable {
id: ID! @external
orderItems: [OrderItem!]!
}
enum OrderStatus {
PENDING
CONFIRMED
SHIPPED
DELIVERED
CANCELLED
}
type Query {
order(id: ID!): Order
orders(userId: ID, status: OrderStatus, limit: Int, offset: Int): [Order!]!
}
Schema Check and Publish:
# Check subgraph schema compatibility
rover subgraph check my-graph \
--name users \
--schema ./schemas/users.graphqls
# Publish subgraph schema
rover subgraph publish my-graph@production \
--name users \
--schema ./schemas/users.graphqls \
--routing-url http://users-service:4001/graphql
# Compose supergraph
rover supergraph compose --config supergraph.yaml > supergraph.graphqls
Go Subgraph HTTP Service:
package main
import (
"log"
"net/http"
"os"
"github.com/99designs/gqlgen/graphql/handler"
"github.com/99designs/gqlgen/graphql/playground"
"github.com/go-chi/chi/v5"
)
func main() {
port := os.Getenv("PORT")
if port == "" {
port = "4001"
}
router := chi.NewRouter()
userRepo := NewPostgresUserRepository(os.Getenv("DATABASE_URL"))
orderRepo := NewPostgresOrderRepository(os.Getenv("DATABASE_URL"))
resolver := graph.NewResolver(userRepo, orderRepo)
srv := handler.NewDefaultServer(graph.NewExecutableSchema(
graph.Config{Resolvers: resolver},
))
router.Handle("/", playground.Handler("GraphQL Playground", "/query"))
router.Handle("/query", srv)
log.Printf("🚀 Users subgraph running on :%s", port)
log.Fatal(http.ListenAndServe(":"+port, router))
}
Pattern 4: Gateway Router with Query Planning — Apollo Router
Apollo Router is a high-performance gateway written in Rust, supporting query planning, batch entity resolution, caching, and observability.
Router Configuration (router.yaml):
supergraph:
listen: 0.0.0.0:4000
path: /graphql
introspection: true
health_check:
listen: 0.0.0.0:8088
cors:
origins:
- https://app.example.com
- http://localhost:3000
methods:
- GET
- POST
headers:
- Authorization
- Content-Type
- X-Request-ID
headers:
all:
request:
- propagate:
matching: "^X-.*"
- propagate:
named: Authorization
subgraphs:
users:
request:
- propagate:
named: Authorization
- set:
name: X-User-Service-Key
value: "${USERS_SERVICE_KEY}"
orders:
request:
- propagate:
named: Authorization
traffic_shaping:
all:
rate_limit:
capacity: 1000
interval: 1s
subgraphs:
users:
timeout: 5s
rate_limit:
capacity: 500
interval: 1s
products:
timeout: 3s
orders:
timeout: 10s
telemetry:
tracing:
common:
service_name: apollo-router
otlp:
endpoint: http://otel-collector:4317
protocol: grpc
metrics:
common:
service_name: apollo-router
otlp:
endpoint: http://otel-collector:4317
protocol: grpc
logging:
format: json
Docker Compose Deployment:
version: "3.9"
services:
router:
image: ghcr.io/apollographql/router:v1.45.0
ports:
- "4000:4000"
- "8088:8088"
volumes:
- ./router.yaml:/dist/configuration/router.yaml:ro
- ./supergraph.graphqls:/dist/schema/supergraph.graphqls:ro
environment:
- USERS_SERVICE_KEY=${USERS_SERVICE_KEY}
- APOLLO_KEY=${APOLLO_KEY}
- APOLLO_GRAPH_REF=${APOLLO_GRAPH_REF}
depends_on:
- users-service
- products-service
- orders-service
- reviews-service
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8088/health"]
interval: 10s
timeout: 5s
retries: 3
users-service:
build:
context: ./services/users
dockerfile: Dockerfile
ports:
- "4001:4001"
environment:
- DATABASE_URL=postgres://users:password@postgres:5432/users?sslmode=disable
- PORT=4001
depends_on:
- postgres
products-service:
build:
context: ./services/products
dockerfile: Dockerfile
ports:
- "4002:4002"
environment:
- DATABASE_URL=postgres://products:password@postgres:5432/products?sslmode=disable
- PORT=4002
orders-service:
build:
context: ./services/orders
dockerfile: Dockerfile
ports:
- "4003:4003"
environment:
- DATABASE_URL=postgres://orders:password@postgres:5432/orders?sslmode=disable
- PORT=4003
reviews-service:
build:
context: ./services/reviews
dockerfile: Dockerfile
ports:
- "4004:4004"
environment:
- DATABASE_URL=postgres://reviews:password@postgres:5432/reviews?sslmode=disable
- PORT=4004
postgres:
image: postgres:16-alpine
ports:
- "5432:5432"
environment:
- POSTGRES_MULTIPLE_DATABASES=users,products,orders,reviews
- POSTGRES_PASSWORD=password
volumes:
- pgdata:/var/lib/postgresql/data
volumes:
pgdata:
Query Planning Example:
query GetUserWithOrders {
user(id: "user-123") {
name
email
orders(limit: 10) {
id
total
status
items {
product {
name
price
}
quantity
}
}
}
}
The gateway's query planner generates the following execution plan:
- Fetch User's name and email from the users subgraph
- Fetch User(id="user-123")'s orders from the orders subgraph
- Batch resolve Product entities in OrderItems from the products subgraph
- Merge results and return to the client
Pattern 5: Cross-Service Data Fetching and N+1 Prevention — DataLoader Pattern
N+1 is the most severe performance issue in GraphQL Federation. DataLoader prevents it through batch loading and deduplication, merging N entity resolutions into 1 batch query.
Go DataLoader Implementation:
package dataloader
import (
"context"
"fmt"
"sync"
"time"
)
type BatchFunc[K comparable, V any] func(ctx context.Context, keys []K) (map[K]V, error)
type Loader[K comparable, V any] struct {
batchFn BatchFunc[K, V]
cache map[K]V
pending map[K]chan result[V]
mu sync.Mutex
maxBatch int
wait time.Duration
}
type result[V any] struct {
value V
err error
}
func NewLoader[K comparable, V any](batchFn BatchFunc[K, V], opts ...Option[K, V]) *Loader[K, V] {
l := &Loader[K, V]{
batchFn: batchFn,
cache: make(map[K]V),
pending: make(map[K]chan result[V]),
maxBatch: 100,
wait: 10 * time.Millisecond,
}
for _, opt := range opts {
opt(l)
}
return l
}
type Option[K comparable, V any] func(*Loader[K, V])
func WithMaxBatch[K comparable, V any](n int) Option[K, V] {
return func(l *Loader[K, V]) { l.maxBatch = n }
}
func WithWait[K comparable, V any](d time.Duration) Option[K, V] {
return func(l *Loader[K, V]) { l.wait = d }
}
func (l *Loader[K, V]) Load(ctx context.Context, key K) (V, error) {
l.mu.Lock()
if v, ok := l.cache[key]; ok {
l.mu.Unlock()
return v, nil
}
if ch, ok := l.pending[key]; ok {
l.mu.Unlock()
res := <-ch
return res.value, res.err
}
ch := make(chan result[V], 1)
l.pending[key] = ch
if len(l.pending) >= l.maxBatch {
l.mu.Unlock()
l.dispatch(ctx)
} else {
l.mu.Unlock()
time.AfterFunc(l.wait, func() { l.dispatch(ctx) })
}
res := <-ch
return res.value, res.err
}
func (l *Loader[K, V]) dispatch(ctx context.Context) {
l.mu.Lock()
if len(l.pending) == 0 {
l.mu.Unlock()
return
}
keys := make([]K, 0, len(l.pending))
chs := make(map[K][]chan result[V], len(l.pending))
for k, ch := range l.pending {
keys = append(keys, k)
chs[k] = append(chs[k], ch)
delete(l.pending, k)
}
l.mu.Unlock()
results, err := l.batchFn(ctx, keys)
for _, key := range keys {
var res result[V]
if err != nil {
res = result[V]{err: err}
} else if v, ok := results[key]; ok {
res = result[V]{value: v}
l.mu.Lock()
l.cache[key] = v
l.mu.Unlock()
} else {
res = result[V]{err: fmt.Errorf("key not found: %v", key)}
}
for _, ch := range chs[key] {
ch <- res
}
}
}
func (l *Loader[K, V]) LoadMany(ctx context.Context, keys []K) ([]V, error) {
values := make([]V, len(keys))
var firstErr error
for i, key := range keys {
v, err := l.Load(ctx, key)
if err != nil && firstErr == nil {
firstErr = err
}
values[i] = v
}
return values, firstErr
}
Using DataLoader in Resolvers:
package graph
import (
"context"
"fmt"
"myapp/dataloader"
)
type Loaders struct {
UserByID *dataloader.Loader[string, *User]
ProductByID *dataloader.Loader[string, *Product]
OrderByID *dataloader.Loader[string, *Order]
}
func NewLoaders(userRepo UserRepository, productRepo ProductRepository, orderRepo OrderRepository) *Loaders {
return &Loaders{
UserByID: dataloader.NewLoader(func(ctx context.Context, ids []string) (map[string]*User, error) {
users, err := userRepo.FindByIDs(ctx, ids)
if err != nil {
return nil, fmt.Errorf("batch user load failed: %w", err)
}
result := make(map[string]*User, len(users))
for _, u := range users {
result[u.ID] = u
}
return result, nil
}, dataloader.WithMaxBatch[string, *User](200), dataloader.WithWait[string, *User](5*time.Millisecond)),
ProductByID: dataloader.NewLoader(func(ctx context.Context, ids []string) (map[string]*Product, error) {
products, err := productRepo.FindByIDs(ctx, ids)
if err != nil {
return nil, fmt.Errorf("batch product load failed: %w", err)
}
result := make(map[string]*Product, len(products))
for _, p := range products {
result[p.ID] = p
}
return result, nil
}, dataloader.WithMaxBatch[string, *Product](200), dataloader.WithWait[string, *Product](5*time.Millisecond)),
OrderByID: dataloader.NewLoader(func(ctx context.Context, ids []string) (map[string]*Order, error) {
orders, err := orderRepo.FindByIDs(ctx, ids)
if err != nil {
return nil, fmt.Errorf("batch order load failed: %w", err)
}
result := make(map[string]*Order, len(orders))
for _, o := range orders {
result[o.ID] = o
}
return result, nil
}, dataloader.WithMaxBatch[string, *Order](200), dataloader.WithWait[string, *Order](5*time.Millisecond)),
}
}
func (r *orderResolver) User(ctx context.Context, obj *Order) (*User, error) {
loader := ctx.Value(loaderKey).(*Loaders)
return loader.UserByID.Load(ctx, obj.UserID)
}
func (r *orderItemResolver) Product(ctx context.Context, obj *OrderItem) (*Product, error) {
loader := ctx.Value(loaderKey).(*Loaders)
return loader.ProductByID.Load(ctx, obj.ProductID)
}
Batch Query Repository:
package repository
import (
"context"
"database/sql"
"fmt"
"strings"
_ "github.com/lib/pq"
)
type PostgresUserRepository struct {
db *sql.DB
}
func NewPostgresUserRepository(dbURL string) (*PostgresUserRepository, error) {
db, err := sql.Open("postgres", dbURL)
if err != nil {
return nil, fmt.Errorf("failed to connect to database: %w", err)
}
db.SetMaxOpenConns(25)
db.SetMaxIdleConns(10)
return &PostgresUserRepository{db: db}, nil
}
func (r *PostgresUserRepository) FindByIDs(ctx context.Context, ids []string) ([]*User, error) {
if len(ids) == 0 {
return nil, nil
}
placeholders := make([]string, len(ids))
args := make([]interface{}, len(ids))
for i, id := range ids {
placeholders[i] = fmt.Sprintf("$%d", i+1)
args[i] = id
}
query := fmt.Sprintf(
"SELECT id, email, name, avatar, created_at FROM users WHERE id IN (%s)",
strings.Join(placeholders, ","),
)
rows, err := r.db.QueryContext(ctx, query, args...)
if err != nil {
return nil, fmt.Errorf("batch query users failed: %w", err)
}
defer rows.Close()
users := make([]*User, 0, len(ids))
for rows.Next() {
var u User
var avatar sql.NullString
if err := rows.Scan(&u.ID, &u.Email, &u.Name, &avatar, &u.CreatedAt); err != nil {
return nil, fmt.Errorf("scan user row failed: %w", err)
}
if avatar.Valid {
u.Avatar = avatar.String
}
users = append(users, &u)
}
return users, rows.Err()
}
Pattern 6: Production Federation Architecture — Auth, Monitoring, and High Availability
Production federation architecture requires handling authentication passthrough, distributed tracing, rate limiting, circuit breaking, and graceful degradation.
Authentication Middleware:
package middleware
import (
"context"
"net/http"
"strings"
"github.com/golang-jwt/jwt/v5"
)
type contextKey string
const (
userIDKey contextKey = "userID"
userRoleKey contextKey = "userRole"
authHeaderKey = "Authorization"
)
type Claims struct {
UserID string `json:"sub"`
Role string `json:"role"`
jwt.RegisteredClaims
}
func AuthMiddleware(jwtSecret string) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
authHeader := r.Header.Get(authHeaderKey)
if authHeader == "" {
next.ServeHTTP(w, r)
return
}
tokenStr := strings.TrimPrefix(authHeader, "Bearer ")
if tokenStr == authHeader {
next.ServeHTTP(w, r)
return
}
token, err := jwt.ParseWithClaims(tokenStr, &Claims{}, func(t *jwt.Token) (interface{}, error) {
if _, ok := t.Method.(*jwt.SigningMethodHMAC); !ok {
return nil, fmt.Errorf("unexpected signing method: %v", t.Header["alg"])
}
return []byte(jwtSecret), nil
})
if err != nil || !token.Valid {
http.Error(w, "invalid token", http.StatusUnauthorized)
return
}
claims, ok := token.Claims.(*Claims)
if !ok {
http.Error(w, "invalid claims", http.StatusUnauthorized)
return
}
ctx := context.WithValue(r.Context(), userIDKey, claims.UserID)
ctx = context.WithValue(ctx, userRoleKey, claims.Role)
next.ServeHTTP(w, r.WithContext(ctx))
})
}
}
func GetUserID(ctx context.Context) string {
if v, ok := ctx.Value(userIDKey).(string); ok {
return v
}
return ""
}
func GetUserRole(ctx context.Context) string {
if v, ok := ctx.Value(userRoleKey).(string); ok {
return v
}
return ""
}
OpenTelemetry Tracing Integration:
package telemetry
import (
"context"
"fmt"
"time"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
"go.opentelemetry.io/otel/propagation"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
"go.opentelemetry.io/otel/trace"
)
func InitTracer(ctx context.Context, endpoint string) (func(context.Context) error, error) {
exporter, err := otlptracegrpc.New(ctx,
otlptracegrpc.WithEndpoint(endpoint),
otlptracegrpc.WithInsecure(),
)
if err != nil {
return nil, fmt.Errorf("failed to create OTLP exporter: %w", err)
}
provider := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(newResource()),
)
otel.SetTracerProvider(provider)
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(
propagation.TraceContext{},
propagation.Baggage{},
))
return provider.Shutdown, nil
}
func StartSpan(ctx context.Context, name string) (context.Context, trace.Span) {
tracer := otel.Tracer("graphql-federation")
return tracer.Start(ctx, name)
}
Rate Limiting and Circuit Breaking:
package middleware
import (
"context"
"fmt"
"net/http"
"sync"
"time"
)
type RateLimiter struct {
mu sync.Mutex
clients map[string]*clientBucket
rate int
interval time.Duration
}
type clientBucket struct {
tokens int
lastSeen time.Time
}
func NewRateLimiter(rate int, interval time.Duration) *RateLimiter {
rl := &RateLimiter{
clients: make(map[string]*clientBucket),
rate: rate,
interval: interval,
}
go rl.cleanup()
return rl
}
func (rl *RateLimiter) Allow(key string) bool {
rl.mu.Lock()
defer rl.mu.Unlock()
now := time.Now()
bucket, ok := rl.clients[key]
if !ok {
rl.clients[key] = &clientBucket{tokens: rl.rate - 1, lastSeen: now}
return true
}
elapsed := now.Sub(bucket.lastSeen)
if elapsed >= rl.interval {
bucket.tokens = rl.rate - 1
bucket.lastSeen = now
return true
}
if bucket.tokens <= 0 {
return false
}
bucket.tokens--
return true
}
func (rl *RateLimiter) cleanup() {
ticker := time.NewTicker(time.Minute)
for range ticker.C {
rl.mu.Lock()
for key, bucket := range rl.clients {
if time.Since(bucket.lastSeen) > 3*rl.interval {
delete(rl.clients, key)
}
}
rl.mu.Unlock()
}
}
func RateLimitMiddleware(limiter *RateLimiter) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
clientID := r.Header.Get("X-Client-ID")
if clientID == "" {
clientID = r.RemoteAddr
}
if !limiter.Allow(clientID) {
http.Error(w, "rate limit exceeded", http.StatusTooManyRequests)
return
}
next.ServeHTTP(w, r)
})
}
}
type CircuitBreaker struct {
mu sync.Mutex
failureCount int
threshold int
timeout time.Duration
state string
lastFailure time.Time
}
func NewCircuitBreaker(threshold int, timeout time.Duration) *CircuitBreaker {
return &CircuitBreaker{
threshold: threshold,
timeout: timeout,
state: "closed",
}
}
func (cb *CircuitBreaker) Execute(fn func() error) error {
cb.mu.Lock()
if cb.state == "open" {
if time.Since(cb.lastFailure) > cb.timeout {
cb.state = "half-open"
cb.mu.Unlock()
} else {
cb.mu.Unlock()
return fmt.Errorf("circuit breaker is open")
}
} else {
cb.mu.Unlock()
}
err := fn()
cb.mu.Lock()
defer cb.mu.Unlock()
if err != nil {
cb.failureCount++
cb.lastFailure = time.Now()
if cb.failureCount >= cb.threshold {
cb.state = "open"
}
return err
}
cb.failureCount = 0
cb.state = "closed"
return nil
}
Complete Service Startup:
package main
import (
"context"
"log"
"net/http"
"os"
"os/signal"
"syscall"
"time"
"github.com/99designs/gqlgen/graphql/handler"
"github.com/99designs/gqlgen/graphql/playground"
"github.com/go-chi/chi/v5"
chimw "github.com/go-chi/chi/v5/middleware"
"myapp/graph"
"myapp/middleware"
"myapp/telemetry"
)
func main() {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
shutdown, err := telemetry.InitTracer(ctx, os.Getenv("OTEL_ENDPOINT"))
if err != nil {
log.Printf("⚠️ Tracer init failed: %v", err)
} else {
defer shutdown(ctx)
}
userRepo, err := NewPostgresUserRepository(os.Getenv("DATABASE_URL"))
if err != nil {
log.Fatalf("Failed to connect to user database: %v", err)
}
orderRepo, err := NewPostgresOrderRepository(os.Getenv("DATABASE_URL"))
if err != nil {
log.Fatalf("Failed to connect to order database: %v", err)
}
resolver := graph.NewResolver(userRepo, orderRepo)
loaders := graph.NewLoaders(userRepo, nil, orderRepo)
srv := handler.NewDefaultServer(graph.NewExecutableSchema(
graph.Config{Resolvers: resolver},
))
limiter := middleware.NewRateLimiter(100, time.Second)
breaker := middleware.NewCircuitBreaker(5, 30*time.Second)
router := chi.NewRouter()
router.Use(chimw.RequestID)
router.Use(chimw.RealIP)
router.Use(chimw.Logger)
router.Use(chimw.Recoverer)
router.Use(chimw.Timeout(30 * time.Second))
router.Use(middleware.AuthMiddleware(os.Getenv("JWT_SECRET")))
router.Use(middleware.RateLimitMiddleware(limiter))
router.Handle("/", playground.Handler("GraphQL Playground", "/query"))
router.Handle("/query", srv)
router.Get("/health", func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
w.Write([]byte("ok"))
})
port := os.Getenv("PORT")
if port == "" {
port = "4001"
}
server := &http.Server{
Addr: ":" + port,
Handler: router,
ReadTimeout: 15 * time.Second,
WriteTimeout: 30 * time.Second,
IdleTimeout: 60 * time.Second,
}
go func() {
log.Printf("🚀 Subgraph running on :%s", port)
if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
log.Fatalf("Server error: %v", err)
}
}()
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
<-quit
log.Println("Shutting down gracefully...")
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 15*time.Second)
defer shutdownCancel()
if err := server.Shutdown(shutdownCtx); err != nil {
log.Fatalf("Forced shutdown: %v", err)
}
log.Println("Server stopped")
_ = breaker
_ = loaders
}
5 Common Pitfalls
Pitfall 1: Forgetting to Implement __resolveReference in Subgraphs
❌ Wrong:
// Only defined @key but didn't implement entity resolution
type Resolver struct{}
// Missing this method — gateway cannot resolve cross-subgraph entities
// func (r *entityResolver) FindUserByID(ctx context.Context, id string) (*User, error) { ... }
✅ Correct:
func (r *entityResolver) FindUserByID(ctx context.Context, id string) (*User, error) {
return r.userRepo.FindByID(ctx, id)
}
func (r *entityResolver) FindUserByEmail(ctx context.Context, email string) (*User, error) {
return r.userRepo.FindByEmail(ctx, email)
}
Pitfall 2: @shareable Overuse Causing Data Inconsistency
❌ Wrong:
type User @key(fields: "id") @shareable {
id: ID!
name: String!
email: String!
orderCount: Int! # Multiple subgraphs provide this field with different logic
}
✅ Correct:
type User @key(fields: "id") {
id: ID!
name: String!
email: String!
}
type UserOrderStats @key(fields: "userId") {
userId: ID!
orderCount: Int!
totalSpent: Float!
}
Pitfall 3: N+1 Queries Without DataLoader
❌ Wrong:
func (r *orderResolver) User(ctx context.Context, obj *Order) (*User, error) {
// Each Order triggers a separate HTTP request to user service
resp, err := http.Get(fmt.Sprintf("http://users-service/users/%s", obj.UserID))
if err != nil {
return nil, err
}
defer resp.Body.Close()
var user User
json.NewDecoder(resp.Body).Decode(&user)
return &user, nil
}
✅ Correct:
func (r *orderResolver) User(ctx context.Context, obj *Order) (*User, error) {
loader := ctx.Value(loaderKey).(*Loaders)
return loader.UserByID.Load(ctx, obj.UserID)
}
Pitfall 4: Subgraph Schema Changes Without Compatibility Checks
❌ Wrong:
# Publish directly without checking
rover subgraph publish my-graph@production \
--name users \
--schema ./schemas/users.graphqls
✅ Correct:
# Check compatibility first
rover subgraph check my-graph@production \
--name users \
--schema ./schemas/users.graphqls
# Confirm no breaking changes, then publish
rover subgraph publish my-graph@production \
--name users \
--schema ./schemas/users.graphqls \
--routing-url http://users-service:4001/graphql
Pitfall 5: Missing Timeout and Retry Configuration at Gateway
❌ Wrong:
# router.yaml - no timeout or retry configuration
supergraph:
listen: 0.0.0.0:4000
✅ Correct:
supergraph:
listen: 0.0.0.0:4000
traffic_shaping:
all:
timeout: 30s
rate_limit:
capacity: 1000
interval: 1s
subgraphs:
users:
timeout: 5s
products:
timeout: 3s
orders:
timeout: 10s
Error Troubleshooting Reference
| Error Message | Cause | Solution |
|---|---|---|
ENCORE_UNKNOWN_DIRECTIVE |
Subgraph uses federation directive not imported | Add missing directive import in @link |
KEY_FIELDS_MISSING_ON_BASE |
@key references fields that don't exist on the type | Ensure @key-specified fields are declared in the type definition |
EXTERNAL_TYPE_MISMATCH |
@external declared type doesn't match the owning subgraph | Verify @external field types match the original definition |
SHAREABLE_MISMATCH |
Same type has inconsistent @shareable declarations across subgraphs | All subgraphs sharing a type must mark it @shareable |
RESOLVE_REFERENCE_FAILED |
__resolveReference implementation returns an error | Check entity resolver database queries and error handling |
QUERY_PLAN_TIMEOUT |
Query planning timeout — too many subgraphs or query too deep | Limit query depth, optimize schema structure |
SUBGRAPH_UNREACHABLE |
Subgraph service unreachable | Check subgraph health status and network connectivity |
COMPOSITION_ERROR |
Schema composition failure due to type conflicts | Use rover subgraph check to verify compatibility |
N+1_DETECTED |
Gateway detects N+1 query pattern | Add DataLoader batch loading for entity resolution |
CIRCULAR_DEPENDENCY |
Circular dependencies between subgraphs | Refactor entity boundaries, use @requires instead of direct references |
Advanced Optimization
Query Complexity Analysis and Limiting
GraphQL query complexity can be exploited — a deeply nested query can produce exponential data volume. Use complexity analysis to limit query cost.
package middleware
import (
"context"
"fmt"
"github.com/99designs/gqlgen/graphql"
)
type ComplexityLimit struct {
maxComplexity int
}
func NewComplexityLimit(max int) *ComplexityLimit {
return &ComplexityLimit{maxComplexity: max}
}
func (cl *ComplexityLimit) Extension() graphql.HandlerExtension {
return graphql.FixedComplexityLimit(cl.maxComplexity)
}
type fieldComplexity struct {
complexity int
details map[string]int
}
func CalculateQueryComplexity(ctx context.Context, req *graphql.Request) (*fieldComplexity, error) {
complexity := 0
details := make(map[string]int)
operation := req.Doc().Operations
for _, op := range operation {
for _, sel := range op.SelectionSet {
calcSelectionComplexity(sel, &complexity, details, 1)
}
}
if complexity > 500 {
return nil, fmt.Errorf("query complexity %d exceeds limit 500", complexity)
}
return &fieldComplexity{complexity: complexity, details: details}, nil
}
func calcSelectionComplexity(sel ast.Selection, total *int, details map[string]int, depth int) {
switch s := sel.(type) {
case *ast.Field:
fieldCost := 1
if s.SelectionSet != nil {
fieldCost *= depth
}
*total += fieldCost
details[s.Name.Value] += fieldCost
if s.SelectionSet != nil {
for _, child := range s.SelectionSet {
calcSelectionComplexity(child, total, details, depth+1)
}
}
case *ast.InlineFragment:
for _, child := range s.SelectionSet {
calcSelectionComplexity(child, total, details, depth)
}
case *ast.FragmentSpread:
for _, child := range s.Definition.SelectionSet {
calcSelectionComplexity(child, total, details, depth)
}
}
}
Persisted Queries and Query Registration
Production environments should use Persisted Queries — clients only send query hashes, avoiding transmitting full query text and preventing unknown query execution.
package persistedquery
import (
"context"
"crypto/sha256"
"encoding/hex"
"fmt"
"sync"
"github.com/99designs/gqlgen/graphql"
)
type PersistedQueryManager struct {
mu sync.RWMutex
queries map[string]string
strict bool
}
func NewPersistedQueryManager(strict bool) *PersistedQueryManager {
return &PersistedQueryManager{
queries: make(map[string]string),
strict: strict,
}
}
func (pqm *PersistedQueryManager) Register(hash, query string) {
pqm.mu.Lock()
defer pqm.mu.Unlock()
pqm.queries[hash] = query
}
func (pqm *PersistedQueryManager) Middleware() graphql.RequestMiddleware {
return func(ctx context.Context, next graphql.ResponseHandler) *graphql.Response {
reqCtx := graphql.GetRequestContext(ctx)
hash := reqCtx.RawQuery
if len(hash) == 64 {
pqm.mu.RLock()
query, ok := pqm.queries[hash]
pqm.mu.RUnlock()
if ok {
reqCtx.RawQuery = query
} else if pqm.strict {
panic(fmt.Sprintf("unknown persisted query: %s", hash))
}
}
return next(ctx)
}
}
func HashQuery(query string) string {
h := sha256.Sum256([]byte(query))
return hex.EncodeToString(h[:])
}
Subgraph Caching Strategy
Subgraph-level caching significantly reduces repeated queries, especially for hot entities.
package cache
import (
"context"
"encoding/json"
"fmt"
"time"
"github.com/redis/go-redis/v9"
)
type EntityCache struct {
rdb *redis.Client
prefix string
ttl time.Duration
}
func NewEntityCache(redisURL, prefix string, ttl time.Duration) (*EntityCache, error) {
opts, err := redis.ParseURL(redisURL)
if err != nil {
return nil, fmt.Errorf("invalid redis URL: %w", err)
}
return &EntityCache{
rdb: redis.NewClient(opts),
prefix: prefix,
ttl: ttl,
}, nil
}
func (c *EntityCache) Get(ctx context.Context, entityType, id string, dest interface{}) error {
key := fmt.Sprintf("%s:%s:%s", c.prefix, entityType, id)
val, err := c.rdb.Get(ctx, key).Result()
if err == redis.Nil {
return fmt.Errorf("cache miss for %s:%s", entityType, id)
}
if err != nil {
return fmt.Errorf("cache read error: %w", err)
}
return json.Unmarshal([]byte(val), dest)
}
func (c *EntityCache) Set(ctx context.Context, entityType, id string, val interface{}) error {
key := fmt.Sprintf("%s:%s:%s", c.prefix, entityType, id)
data, err := json.Marshal(val)
if err != nil {
return fmt.Errorf("cache marshal error: %w", err)
}
return c.rdb.Set(ctx, key, data, c.ttl).Err()
}
func (c *EntityCache) Invalidate(ctx context.Context, entityType, id string) error {
key := fmt.Sprintf("%s:%s:%s", c.prefix, entityType, id)
return c.rdb.Del(ctx, key).Err()
}
func (c *EntityCache) InvalidatePattern(ctx context.Context, pattern string) error {
iter := c.rdb.Scan(ctx, 0, fmt.Sprintf("%s:%s:*", c.prefix, pattern), 100).Iterator()
var keys []string
for iter.Next(ctx) {
keys = append(keys, iter.Val())
}
if err := iter.Err(); err != nil {
return fmt.Errorf("cache scan error: %w", err)
}
if len(keys) > 0 {
return c.rdb.Del(ctx, keys...).Err()
}
return nil
}
Technology Comparison
| Dimension | Apollo Federation | Schema Stitching | REST API | gRPC | tRPC |
|---|---|---|---|---|---|
| Learning Curve | Medium, requires federation concepts | High, manual conflict resolution | Low | Medium, requires Proto | Low (TypeScript only) |
| Schema Management | Auto composition, rover CLI | Manual stitching, custom resolvers | No unified schema | Proto definition, auto-generated | TypeScript type inference |
| Cross-team Collaboration | Excellent, subgraphs evolve independently | Fair, conflicts need manual resolution | Poor, API docs easily outdated | Good, Proto as contract | TS full-stack only |
| Performance | Good, query planning + batch resolution | Fair, N+1 needs manual handling | Poor, multiple requests | Excellent, binary + HTTP/2 | Good, end-to-end type safety |
| N+1 Prevention | Built-in DataLoader support | Manual implementation required | None | None | None |
| Ecosystem Maturity | High, Apollo full-stack | Medium, community solutions | High | High | Medium |
| Language Support | All languages, Go/Java/TS etc. | All languages | All languages | All languages | TypeScript only |
| Real-time Subscriptions | Supported | Supported | Requires WebSocket | Requires bidirectional stream | Supported |
| Observability | Apollo Studio integration | Self-built | Self-built | OpenTelemetry | Self-built |
| Use Case | Large-scale microservice APIs | Custom composition logic | Simple CRUD | Internal high-performance communication | TS full-stack projects |
Summary: GraphQL Federation isn't a silver bullet, but it's the most mature solution for microservice GraphQL architecture today. Core principles: split subgraphs by domain boundaries, declare entities with @key, prevent N+1 with DataLoader, handle query planning and rate limiting at the gateway, and always add auth, tracing, and caching in production. Start with 2 subgraphs and incrementally split — don't go all-in at once. Schema checks must be enforced in CI, or a breaking change will eventually take down the entire supergraph.
Recommended Tools
- JSON Formatter — Format GraphQL query responses, debug supergraph composition results
- Base64 Encode — Encode JWT tokens and authentication headers
- Hash Calculator — Calculate SHA256 hashes for persisted queries
Try these browser-local tools — no sign-up required →