Go GraphQL Federation: 6 Production Patterns from Subgraph to Supergraph

技术架构

When Monolithic GraphQL Schema Meets Team Boundaries: The Microservice GraphQL Dilemma

An e-commerce platform's GraphQL Schema balloons to 3,000 lines. The user team, product team, and order team all modify the same schema. Every release requires coordination — one team's breaking change takes down the entire API. Worse, cross-service N+1 queries push response times from 50ms to 3s — a user queries order lists, each order needs product details, each product needs inventory status, and 100 orders means 300 downstream calls.

This isn't hypothetical. When your microservice architecture is already split but the GraphQL layer remains monolithic, team autonomy and API performance become irreconcilable conflicts. GraphQL Federation exists to solve this — each service owns its GraphQL Schema (subgraph), composed into a unified API (supergraph) via a gateway, while preventing N+1 queries and cross-team coupling.


Core Concepts Reference

Concept Purpose Key Features Typical Use Case
Federation Compose multiple GraphQL services into a unified API Transparent to clients, each service deploys independently Unified API layer in microservice architecture
Subgraph A single service's GraphQL Schema Owns independent types and resolvers, declares entities via @key User service, product service, order service
Supergraph Complete schema composed from all subgraphs Auto-synthesized by gateway, clients only see the supergraph Unified API entry point
Entity Type shared across subgraphs Identified by @key, multiple subgraphs can contribute fields User, Product, Order and other core domain objects
@key Declares unique identifier fields for an entity Supports composite keys, multiple @keys for alternate identifiers @key(fields: "id") or @key(fields: "sku warehouseId")
Gateway Federation query routing and execution engine Query planning, batch entity resolution, caching Apollo Router, Apollo Gateway
Schema Stitching Manually composing multiple GraphQL Schemas More flexible but requires manual conflict resolution Custom composition logic, non-standard federation scenarios

5 Challenges of GraphQL Federation Architecture

Challenge 1: Unclear Entity Boundary Definition

The user service has User's name and email, while the order service also has User but only cares about id and order list. If all User fields live in the user service, the order service must make cross-service calls every time. If scattered across services, entity ownership and consistency become problematic.

Challenge 2: N+1 Queries Amplified at the Federation Layer

A client queries { orders { user { name } } }. The gateway first fetches orders from the order service, then resolves the User entity for each order's userId from the user service. 100 orders means 100 User entity resolution requests — a performance disaster.

Challenge 3: Schema Evolution and Compatibility

The product service wants to add a required field to Product, but the order service's Product reference may not be compatible. A subgraph's breaking change can affect the entire supergraph, but who performs global compatibility checks?

Challenge 4: Authentication and Authorization Passthrough

JWT tokens need to propagate from the gateway to every subgraph. Different subgraphs may have different permission models. The user service needs user:read permission, the order service needs order:read — how to handle this uniformly at the gateway layer?

Challenge 5: Observability and Error Tracing

A single query may involve 3 subgraphs. When a query fails, which subgraph produced the error? Where is the latency bottleneck? How does distributed tracing propagate correctly through the GraphQL layer?


6 Production-Grade Federation Patterns

Pattern 1: Subgraph Service Definition — gqlgen Foundation

The subgraph is the fundamental unit of federation. Use gqlgen to generate the GraphQL service, declare federation directives, and define entity types.

GraphQL Schema (users.graphqls):

extend schema
  @link(url: "https://specs.apollo.dev/federation/v2.0",
        import: ["@key", "@shareable", "@external", "@requires"])

type User @key(fields: "id") @key(fields: "email") {
  id: ID!
  email: String!
  name: String!
  avatar: String
  createdAt: String!
  orders: [Order!]!
}

type Order @key(fields: "id") @shareable {
  id: ID!
  userId: ID!
  items: [OrderItem!]!
  total: Float!
  status: OrderStatus!
}

enum OrderStatus {
  PENDING
  CONFIRMED
  SHIPPED
  DELIVERED
  CANCELLED
}

type OrderItem {
  productId: ID!
  quantity: Int!
  price: Float!
}

Go Resolver Implementation:

package graph

import (
	"context"
	"fmt"

	"github.com/99designs/gqlgen/graphql"
	"github.com/99designs/gqlgen/graphql/handler"
	"github.com/99designs/gqlgen/graphql/handler/extension"
	"github.com/99designs/gqlgen/graphql/handler/transport"
)

type User struct {
	ID        string `json:"id"`
	Email     string `json:"email"`
	Name      string `json:"name"`
	Avatar    string `json:"avatar,omitempty"`
	CreatedAt string `json:"createdAt"`
}

type Order struct {
	ID     string     `json:"id"`
	UserID string     `json:"userId"`
	Items  []OrderItem `json:"items"`
	Total  float64    `json:"total"`
	Status string     `json:"status"`
}

type OrderItem struct {
	ProductID string  `json:"productId"`
	Quantity  int     `json:"quantity"`
	Price     float64 `json:"price"`
}

type Resolver struct {
	userRepo  UserRepository
	orderRepo OrderRepository
}

func NewResolver(userRepo UserRepository, orderRepo OrderRepository) *Resolver {
	return &Resolver{userRepo: userRepo, orderRepo: orderRepo}
}

func (r *Resolver) User(ctx context.Context, id string) (*User, error) {
	user, err := r.userRepo.FindByID(ctx, id)
	if err != nil {
		return nil, fmt.Errorf("user not found: %w", err)
	}
	return user, nil
}

func (r *Resolver) Users(ctx context.Context, limit int, offset int) ([]*User, error) {
	return r.userRepo.List(ctx, limit, offset)
}

func (r *entityResolver) FindUserByID(ctx context.Context, id string) (*User, error) {
	return r.userRepo.FindByID(ctx, id)
}

func (r *entityResolver) FindUserByEmail(ctx context.Context, email string) (*User, error) {
	return r.userRepo.FindByEmail(ctx, email)
}

func NewGraphQLServer(resolver *Resolver) *handler.Server {
	srv := handler.New(NewExecutableSchema(Config{Resolvers: resolver}))
	srv.AddTransport(transport.POST{})
	srv.AddTransport(transport.GET{})
	srv.Use(extension.Introspection{})
	return srv
}

gqlgen Configuration (gqlgen.yml):

schema:
  - users.graphqls
exec:
  filename: graph/generated.go
model:
  filename: graph/model/models_gen.go
resolver:
  filename: graph/resolver.go
  type: Resolver
federation:
  filename: graph/federation.go
  package: graph

Pattern 2: Entity Resolution with @key Directive — Cross-Service Type Stitching

@key declares an entity's identifier fields. The gateway resolves entities across subgraphs via the __resolveReference function. This is the core mechanism of federation.

Product Subgraph Schema (products.graphqls):

extend schema
  @link(url: "https://specs.apollo.dev/federation/v2.0",
        import: ["@key", "@shareable", "@external", "@requires", "@provides"])

type Product @key(fields: "id") @key(fields: "sku") {
  id: ID!
  sku: String!
  name: String!
  description: String
  price: Float!
  inventory: Int!
  category: Category!
  reviews: [Review!]! @provides(fields: "rating")
}

type Category @key(fields: "id") {
  id: ID!
  name: String!
  parent: Category
}

type Review {
  id: ID!
  userId: ID!
  productId: ID!
  rating: Int!
  comment: String
}

type Query {
  product(id: ID!): Product
  products(categoryId: ID, limit: Int, offset: Int): [Product!]!
  category(id: ID!): Category
}

Go Entity Resolver:

package graph

import (
	"context"
	"fmt"
)

type Product struct {
	ID          string  `json:"id"`
	SKU         string  `json:"sku"`
	Name        string  `json:"name"`
	Description string  `json:"description,omitempty"`
	Price       float64 `json:"price"`
	Inventory   int     `json:"inventory"`
	CategoryID  string  `json:"categoryId"`
}

type Category struct {
	ID       string `json:"id"`
	Name     string `json:"name"`
	ParentID string `json:"parentId,omitempty"`
}

type ProductRepository interface {
	FindByID(ctx context.Context, id string) (*Product, error)
	FindBySKU(ctx context.Context, sku string) (*Product, error)
	ListByCategory(ctx context.Context, categoryID string, limit, offset int) ([]*Product, error)
}

type entityResolver struct {
	productRepo  ProductRepository
	categoryRepo CategoryRepository
}

func (r *entityResolver) FindProductByID(ctx context.Context, id string) (*Product, error) {
	product, err := r.productRepo.FindByID(ctx, id)
	if err != nil {
		return nil, fmt.Errorf("product entity resolution failed for id=%s: %w", id, err)
	}
	return product, nil
}

func (r *entityResolver) FindProductBySKU(ctx context.Context, sku string) (*Product, error) {
	product, err := r.productRepo.FindBySKU(ctx, sku)
	if err != nil {
		return nil, fmt.Errorf("product entity resolution failed for sku=%s: %w", sku, err)
	}
	return product, nil
}

func (r *entityResolver) FindCategoryByID(ctx context.Context, id string) (*Category, error) {
	category, err := r.categoryRepo.FindByID(ctx, id)
	if err != nil {
		return nil, fmt.Errorf("category entity resolution failed for id=%s: %w", id, err)
	}
	return category, nil
}

func (r *Resolver) Product(ctx context.Context, id string) (*Product, error) {
	return r.productRepo.FindByID(ctx, id)
}

func (r *Resolver) Products(ctx context.Context, categoryId *string, limit *int, offset *int) ([]*Product, error) {
	lim := 20
	off := 0
	if limit != nil {
		lim = *limit
	}
	if offset != nil {
		off = *offset
	}
	if categoryId != nil {
		return r.productRepo.ListByCategory(ctx, *categoryId, lim, off)
	}
	return r.productRepo.List(ctx, lim, off)
}

Composite Key Entity:

type WarehouseStock @key(fields: "sku warehouseId") {
  sku: String!
  warehouseId: ID!
  quantity: Int!
  reservedQuantity: Int!
  location: String!
}
type WarehouseStock struct {
	SKU              string `json:"sku"`
	WarehouseID      string `json:"warehouseId"`
	Quantity         int    `json:"quantity"`
	ReservedQuantity int    `json:"reservedQuantity"`
	Location         string `json:"location"`
}

type WarehouseStockRef struct {
	SKU         string `json:"sku"`
	WarehouseID string `json:"warehouseId"`
}

func (r *entityResolver) FindWarehouseStockBySkuAndWarehouseId(
	ctx context.Context,
	sku string,
	warehouseId string,
) (*WarehouseStock, error) {
	stock, err := r.stockRepo.FindBySKUAndWarehouse(ctx, sku, warehouseId)
	if err != nil {
		return nil, fmt.Errorf("warehouse stock resolution failed: %w", err)
	}
	return stock, nil
}

Pattern 3: Apollo Federation v2 Composition — From Subgraph to Supergraph

Federation v2 introduces @link, @shareable, @override and other new directives for more flexible schema composition. Use the rover CLI for schema checking and publishing.

Supergraph Configuration (supergraph.yaml):

federation_version: =2.8.0
subgraphs:
  users:
    routing_url: http://users-service:4001/graphql
    schema:
      file: ./schemas/users.graphqls
  products:
    routing_url: http://products-service:4002/graphql
    schema:
      file: ./schemas/products.graphqls
  orders:
    routing_url: http://orders-service:4003/graphql
    schema:
      file: ./schemas/orders.graphqls
  reviews:
    routing_url: http://reviews-service:4004/graphql
    schema:
      file: ./schemas/reviews.graphqls

Order Subgraph Schema (orders.graphqls):

extend schema
  @link(url: "https://specs.apollo.dev/federation/v2.0",
        import: ["@key", "@shareable", "@external", "@requires"])

type Order @key(fields: "id") {
  id: ID!
  userId: ID!
  items: [OrderItem!]!
  total: Float!
  status: OrderStatus!
  shippingAddress: Address
  createdAt: String!
  user: User @requires(fields: "userId")
}

type OrderItem {
  productId: ID!
  quantity: Int!
  unitPrice: Float!
  product: Product
}

type Address @shareable {
  street: String!
  city: String!
  state: String!
  zipCode: String!
  country: String!
}

type User @key(fields: "id") @shareable {
  id: ID! @external
  orders: [Order!]!
}

type Product @key(fields: "id") @shareable {
  id: ID! @external
  orderItems: [OrderItem!]!
}

enum OrderStatus {
  PENDING
  CONFIRMED
  SHIPPED
  DELIVERED
  CANCELLED
}

type Query {
  order(id: ID!): Order
  orders(userId: ID, status: OrderStatus, limit: Int, offset: Int): [Order!]!
}

Schema Check and Publish:

# Check subgraph schema compatibility
rover subgraph check my-graph \
  --name users \
  --schema ./schemas/users.graphqls

# Publish subgraph schema
rover subgraph publish my-graph@production \
  --name users \
  --schema ./schemas/users.graphqls \
  --routing-url http://users-service:4001/graphql

# Compose supergraph
rover supergraph compose --config supergraph.yaml > supergraph.graphqls

Go Subgraph HTTP Service:

package main

import (
	"log"
	"net/http"
	"os"

	"github.com/99designs/gqlgen/graphql/handler"
	"github.com/99designs/gqlgen/graphql/playground"
	"github.com/go-chi/chi/v5"
)

func main() {
	port := os.Getenv("PORT")
	if port == "" {
		port = "4001"
	}

	router := chi.NewRouter()

	userRepo := NewPostgresUserRepository(os.Getenv("DATABASE_URL"))
	orderRepo := NewPostgresOrderRepository(os.Getenv("DATABASE_URL"))
	resolver := graph.NewResolver(userRepo, orderRepo)

	srv := handler.NewDefaultServer(graph.NewExecutableSchema(
		graph.Config{Resolvers: resolver},
	))

	router.Handle("/", playground.Handler("GraphQL Playground", "/query"))
	router.Handle("/query", srv)

	log.Printf("🚀 Users subgraph running on :%s", port)
	log.Fatal(http.ListenAndServe(":"+port, router))
}

Pattern 4: Gateway Router with Query Planning — Apollo Router

Apollo Router is a high-performance gateway written in Rust, supporting query planning, batch entity resolution, caching, and observability.

Router Configuration (router.yaml):

supergraph:
  listen: 0.0.0.0:4000
  path: /graphql
  introspection: true

health_check:
  listen: 0.0.0.0:8088

cors:
  origins:
    - https://app.example.com
    - http://localhost:3000
  methods:
    - GET
    - POST
  headers:
    - Authorization
    - Content-Type
    - X-Request-ID

headers:
  all:
    request:
      - propagate:
          matching: "^X-.*"
      - propagate:
          named: Authorization
  subgraphs:
    users:
      request:
        - propagate:
            named: Authorization
        - set:
            name: X-User-Service-Key
            value: "${USERS_SERVICE_KEY}"
    orders:
      request:
        - propagate:
            named: Authorization

traffic_shaping:
  all:
    rate_limit:
      capacity: 1000
      interval: 1s
  subgraphs:
    users:
      timeout: 5s
      rate_limit:
        capacity: 500
        interval: 1s
    products:
      timeout: 3s
    orders:
      timeout: 10s

telemetry:
  tracing:
    common:
      service_name: apollo-router
    otlp:
      endpoint: http://otel-collector:4317
      protocol: grpc
  metrics:
    common:
      service_name: apollo-router
    otlp:
      endpoint: http://otel-collector:4317
      protocol: grpc
  logging:
    format: json

Docker Compose Deployment:

version: "3.9"

services:
  router:
    image: ghcr.io/apollographql/router:v1.45.0
    ports:
      - "4000:4000"
      - "8088:8088"
    volumes:
      - ./router.yaml:/dist/configuration/router.yaml:ro
      - ./supergraph.graphqls:/dist/schema/supergraph.graphqls:ro
    environment:
      - USERS_SERVICE_KEY=${USERS_SERVICE_KEY}
      - APOLLO_KEY=${APOLLO_KEY}
      - APOLLO_GRAPH_REF=${APOLLO_GRAPH_REF}
    depends_on:
      - users-service
      - products-service
      - orders-service
      - reviews-service
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8088/health"]
      interval: 10s
      timeout: 5s
      retries: 3

  users-service:
    build:
      context: ./services/users
      dockerfile: Dockerfile
    ports:
      - "4001:4001"
    environment:
      - DATABASE_URL=postgres://users:password@postgres:5432/users?sslmode=disable
      - PORT=4001
    depends_on:
      - postgres

  products-service:
    build:
      context: ./services/products
      dockerfile: Dockerfile
    ports:
      - "4002:4002"
    environment:
      - DATABASE_URL=postgres://products:password@postgres:5432/products?sslmode=disable
      - PORT=4002

  orders-service:
    build:
      context: ./services/orders
      dockerfile: Dockerfile
    ports:
      - "4003:4003"
    environment:
      - DATABASE_URL=postgres://orders:password@postgres:5432/orders?sslmode=disable
      - PORT=4003

  reviews-service:
    build:
      context: ./services/reviews
      dockerfile: Dockerfile
    ports:
      - "4004:4004"
    environment:
      - DATABASE_URL=postgres://reviews:password@postgres:5432/reviews?sslmode=disable
      - PORT=4004

  postgres:
    image: postgres:16-alpine
    ports:
      - "5432:5432"
    environment:
      - POSTGRES_MULTIPLE_DATABASES=users,products,orders,reviews
      - POSTGRES_PASSWORD=password
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

Query Planning Example:

query GetUserWithOrders {
  user(id: "user-123") {
    name
    email
    orders(limit: 10) {
      id
      total
      status
      items {
        product {
          name
          price
        }
        quantity
      }
    }
  }
}

The gateway's query planner generates the following execution plan:

  1. Fetch User's name and email from the users subgraph
  2. Fetch User(id="user-123")'s orders from the orders subgraph
  3. Batch resolve Product entities in OrderItems from the products subgraph
  4. Merge results and return to the client

Pattern 5: Cross-Service Data Fetching and N+1 Prevention — DataLoader Pattern

N+1 is the most severe performance issue in GraphQL Federation. DataLoader prevents it through batch loading and deduplication, merging N entity resolutions into 1 batch query.

Go DataLoader Implementation:

package dataloader

import (
	"context"
	"fmt"
	"sync"
	"time"
)

type BatchFunc[K comparable, V any] func(ctx context.Context, keys []K) (map[K]V, error)

type Loader[K comparable, V any] struct {
	batchFn  BatchFunc[K, V]
	cache    map[K]V
	pending  map[K]chan result[V]
	mu       sync.Mutex
	maxBatch int
	wait     time.Duration
}

type result[V any] struct {
	value V
	err   error
}

func NewLoader[K comparable, V any](batchFn BatchFunc[K, V], opts ...Option[K, V]) *Loader[K, V] {
	l := &Loader[K, V]{
		batchFn:  batchFn,
		cache:    make(map[K]V),
		pending:  make(map[K]chan result[V]),
		maxBatch: 100,
		wait:     10 * time.Millisecond,
	}
	for _, opt := range opts {
		opt(l)
	}
	return l
}

type Option[K comparable, V any] func(*Loader[K, V])

func WithMaxBatch[K comparable, V any](n int) Option[K, V] {
	return func(l *Loader[K, V]) { l.maxBatch = n }
}

func WithWait[K comparable, V any](d time.Duration) Option[K, V] {
	return func(l *Loader[K, V]) { l.wait = d }
}

func (l *Loader[K, V]) Load(ctx context.Context, key K) (V, error) {
	l.mu.Lock()

	if v, ok := l.cache[key]; ok {
		l.mu.Unlock()
		return v, nil
	}

	if ch, ok := l.pending[key]; ok {
		l.mu.Unlock()
		res := <-ch
		return res.value, res.err
	}

	ch := make(chan result[V], 1)
	l.pending[key] = ch

	if len(l.pending) >= l.maxBatch {
		l.mu.Unlock()
		l.dispatch(ctx)
	} else {
		l.mu.Unlock()
		time.AfterFunc(l.wait, func() { l.dispatch(ctx) })
	}

	res := <-ch
	return res.value, res.err
}

func (l *Loader[K, V]) dispatch(ctx context.Context) {
	l.mu.Lock()
	if len(l.pending) == 0 {
		l.mu.Unlock()
		return
	}

	keys := make([]K, 0, len(l.pending))
	chs := make(map[K][]chan result[V], len(l.pending))
	for k, ch := range l.pending {
		keys = append(keys, k)
		chs[k] = append(chs[k], ch)
		delete(l.pending, k)
	}
	l.mu.Unlock()

	results, err := l.batchFn(ctx, keys)

	for _, key := range keys {
		var res result[V]
		if err != nil {
			res = result[V]{err: err}
		} else if v, ok := results[key]; ok {
			res = result[V]{value: v}
			l.mu.Lock()
			l.cache[key] = v
			l.mu.Unlock()
		} else {
			res = result[V]{err: fmt.Errorf("key not found: %v", key)}
		}
		for _, ch := range chs[key] {
			ch <- res
		}
	}
}

func (l *Loader[K, V]) LoadMany(ctx context.Context, keys []K) ([]V, error) {
	values := make([]V, len(keys))
	var firstErr error
	for i, key := range keys {
		v, err := l.Load(ctx, key)
		if err != nil && firstErr == nil {
			firstErr = err
		}
		values[i] = v
	}
	return values, firstErr
}

Using DataLoader in Resolvers:

package graph

import (
	"context"
	"fmt"

	"myapp/dataloader"
)

type Loaders struct {
	UserByID    *dataloader.Loader[string, *User]
	ProductByID *dataloader.Loader[string, *Product]
	OrderByID   *dataloader.Loader[string, *Order]
}

func NewLoaders(userRepo UserRepository, productRepo ProductRepository, orderRepo OrderRepository) *Loaders {
	return &Loaders{
		UserByID: dataloader.NewLoader(func(ctx context.Context, ids []string) (map[string]*User, error) {
			users, err := userRepo.FindByIDs(ctx, ids)
			if err != nil {
				return nil, fmt.Errorf("batch user load failed: %w", err)
			}
			result := make(map[string]*User, len(users))
			for _, u := range users {
				result[u.ID] = u
			}
			return result, nil
		}, dataloader.WithMaxBatch[string, *User](200), dataloader.WithWait[string, *User](5*time.Millisecond)),

		ProductByID: dataloader.NewLoader(func(ctx context.Context, ids []string) (map[string]*Product, error) {
			products, err := productRepo.FindByIDs(ctx, ids)
			if err != nil {
				return nil, fmt.Errorf("batch product load failed: %w", err)
			}
			result := make(map[string]*Product, len(products))
			for _, p := range products {
				result[p.ID] = p
			}
			return result, nil
		}, dataloader.WithMaxBatch[string, *Product](200), dataloader.WithWait[string, *Product](5*time.Millisecond)),

		OrderByID: dataloader.NewLoader(func(ctx context.Context, ids []string) (map[string]*Order, error) {
			orders, err := orderRepo.FindByIDs(ctx, ids)
			if err != nil {
				return nil, fmt.Errorf("batch order load failed: %w", err)
			}
			result := make(map[string]*Order, len(orders))
			for _, o := range orders {
				result[o.ID] = o
			}
			return result, nil
		}, dataloader.WithMaxBatch[string, *Order](200), dataloader.WithWait[string, *Order](5*time.Millisecond)),
	}
}

func (r *orderResolver) User(ctx context.Context, obj *Order) (*User, error) {
	loader := ctx.Value(loaderKey).(*Loaders)
	return loader.UserByID.Load(ctx, obj.UserID)
}

func (r *orderItemResolver) Product(ctx context.Context, obj *OrderItem) (*Product, error) {
	loader := ctx.Value(loaderKey).(*Loaders)
	return loader.ProductByID.Load(ctx, obj.ProductID)
}

Batch Query Repository:

package repository

import (
	"context"
	"database/sql"
	"fmt"
	"strings"

	_ "github.com/lib/pq"
)

type PostgresUserRepository struct {
	db *sql.DB
}

func NewPostgresUserRepository(dbURL string) (*PostgresUserRepository, error) {
	db, err := sql.Open("postgres", dbURL)
	if err != nil {
		return nil, fmt.Errorf("failed to connect to database: %w", err)
	}
	db.SetMaxOpenConns(25)
	db.SetMaxIdleConns(10)
	return &PostgresUserRepository{db: db}, nil
}

func (r *PostgresUserRepository) FindByIDs(ctx context.Context, ids []string) ([]*User, error) {
	if len(ids) == 0 {
		return nil, nil
	}

	placeholders := make([]string, len(ids))
	args := make([]interface{}, len(ids))
	for i, id := range ids {
		placeholders[i] = fmt.Sprintf("$%d", i+1)
		args[i] = id
	}

	query := fmt.Sprintf(
		"SELECT id, email, name, avatar, created_at FROM users WHERE id IN (%s)",
		strings.Join(placeholders, ","),
	)

	rows, err := r.db.QueryContext(ctx, query, args...)
	if err != nil {
		return nil, fmt.Errorf("batch query users failed: %w", err)
	}
	defer rows.Close()

	users := make([]*User, 0, len(ids))
	for rows.Next() {
		var u User
		var avatar sql.NullString
		if err := rows.Scan(&u.ID, &u.Email, &u.Name, &avatar, &u.CreatedAt); err != nil {
			return nil, fmt.Errorf("scan user row failed: %w", err)
		}
		if avatar.Valid {
			u.Avatar = avatar.String
		}
		users = append(users, &u)
	}
	return users, rows.Err()
}

Pattern 6: Production Federation Architecture — Auth, Monitoring, and High Availability

Production federation architecture requires handling authentication passthrough, distributed tracing, rate limiting, circuit breaking, and graceful degradation.

Authentication Middleware:

package middleware

import (
	"context"
	"net/http"
	"strings"

	"github.com/golang-jwt/jwt/v5"
)

type contextKey string

const (
	userIDKey    contextKey = "userID"
	userRoleKey  contextKey = "userRole"
	authHeaderKey           = "Authorization"
)

type Claims struct {
	UserID string `json:"sub"`
	Role   string `json:"role"`
	jwt.RegisteredClaims
}

func AuthMiddleware(jwtSecret string) func(http.Handler) http.Handler {
	return func(next http.Handler) http.Handler {
		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			authHeader := r.Header.Get(authHeaderKey)
			if authHeader == "" {
				next.ServeHTTP(w, r)
				return
			}

			tokenStr := strings.TrimPrefix(authHeader, "Bearer ")
			if tokenStr == authHeader {
				next.ServeHTTP(w, r)
				return
			}

			token, err := jwt.ParseWithClaims(tokenStr, &Claims{}, func(t *jwt.Token) (interface{}, error) {
				if _, ok := t.Method.(*jwt.SigningMethodHMAC); !ok {
					return nil, fmt.Errorf("unexpected signing method: %v", t.Header["alg"])
				}
				return []byte(jwtSecret), nil
			})

			if err != nil || !token.Valid {
				http.Error(w, "invalid token", http.StatusUnauthorized)
				return
			}

			claims, ok := token.Claims.(*Claims)
			if !ok {
				http.Error(w, "invalid claims", http.StatusUnauthorized)
				return
			}

			ctx := context.WithValue(r.Context(), userIDKey, claims.UserID)
			ctx = context.WithValue(ctx, userRoleKey, claims.Role)
			next.ServeHTTP(w, r.WithContext(ctx))
		})
	}
}

func GetUserID(ctx context.Context) string {
	if v, ok := ctx.Value(userIDKey).(string); ok {
		return v
	}
	return ""
}

func GetUserRole(ctx context.Context) string {
	if v, ok := ctx.Value(userRoleKey).(string); ok {
		return v
	}
	return ""
}

OpenTelemetry Tracing Integration:

package telemetry

import (
	"context"
	"fmt"
	"time"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
	"go.opentelemetry.io/otel/propagation"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
	"go.opentelemetry.io/otel/trace"
)

func InitTracer(ctx context.Context, endpoint string) (func(context.Context) error, error) {
	exporter, err := otlptracegrpc.New(ctx,
		otlptracegrpc.WithEndpoint(endpoint),
		otlptracegrpc.WithInsecure(),
	)
	if err != nil {
		return nil, fmt.Errorf("failed to create OTLP exporter: %w", err)
	}

	provider := sdktrace.NewTracerProvider(
		sdktrace.WithBatcher(exporter),
		sdktrace.WithResource(newResource()),
	)

	otel.SetTracerProvider(provider)
	otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(
		propagation.TraceContext{},
		propagation.Baggage{},
	))

	return provider.Shutdown, nil
}

func StartSpan(ctx context.Context, name string) (context.Context, trace.Span) {
	tracer := otel.Tracer("graphql-federation")
	return tracer.Start(ctx, name)
}

Rate Limiting and Circuit Breaking:

package middleware

import (
	"context"
	"fmt"
	"net/http"
	"sync"
	"time"
)

type RateLimiter struct {
	mu       sync.Mutex
	clients  map[string]*clientBucket
	rate     int
	interval time.Duration
}

type clientBucket struct {
	tokens   int
	lastSeen time.Time
}

func NewRateLimiter(rate int, interval time.Duration) *RateLimiter {
	rl := &RateLimiter{
		clients:  make(map[string]*clientBucket),
		rate:     rate,
		interval: interval,
	}
	go rl.cleanup()
	return rl
}

func (rl *RateLimiter) Allow(key string) bool {
	rl.mu.Lock()
	defer rl.mu.Unlock()

	now := time.Now()
	bucket, ok := rl.clients[key]
	if !ok {
		rl.clients[key] = &clientBucket{tokens: rl.rate - 1, lastSeen: now}
		return true
	}

	elapsed := now.Sub(bucket.lastSeen)
	if elapsed >= rl.interval {
		bucket.tokens = rl.rate - 1
		bucket.lastSeen = now
		return true
	}

	if bucket.tokens <= 0 {
		return false
	}

	bucket.tokens--
	return true
}

func (rl *RateLimiter) cleanup() {
	ticker := time.NewTicker(time.Minute)
	for range ticker.C {
		rl.mu.Lock()
		for key, bucket := range rl.clients {
			if time.Since(bucket.lastSeen) > 3*rl.interval {
				delete(rl.clients, key)
			}
		}
		rl.mu.Unlock()
	}
}

func RateLimitMiddleware(limiter *RateLimiter) func(http.Handler) http.Handler {
	return func(next http.Handler) http.Handler {
		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			clientID := r.Header.Get("X-Client-ID")
			if clientID == "" {
				clientID = r.RemoteAddr
			}

			if !limiter.Allow(clientID) {
				http.Error(w, "rate limit exceeded", http.StatusTooManyRequests)
				return
			}

			next.ServeHTTP(w, r)
		})
	}
}

type CircuitBreaker struct {
	mu           sync.Mutex
	failureCount int
	threshold    int
	timeout      time.Duration
	state        string
	lastFailure  time.Time
}

func NewCircuitBreaker(threshold int, timeout time.Duration) *CircuitBreaker {
	return &CircuitBreaker{
		threshold: threshold,
		timeout:   timeout,
		state:     "closed",
	}
}

func (cb *CircuitBreaker) Execute(fn func() error) error {
	cb.mu.Lock()
	if cb.state == "open" {
		if time.Since(cb.lastFailure) > cb.timeout {
			cb.state = "half-open"
			cb.mu.Unlock()
		} else {
			cb.mu.Unlock()
			return fmt.Errorf("circuit breaker is open")
		}
	} else {
		cb.mu.Unlock()
	}

	err := fn()
	cb.mu.Lock()
	defer cb.mu.Unlock()

	if err != nil {
		cb.failureCount++
		cb.lastFailure = time.Now()
		if cb.failureCount >= cb.threshold {
			cb.state = "open"
		}
		return err
	}

	cb.failureCount = 0
	cb.state = "closed"
	return nil
}

Complete Service Startup:

package main

import (
	"context"
	"log"
	"net/http"
	"os"
	"os/signal"
	"syscall"
	"time"

	"github.com/99designs/gqlgen/graphql/handler"
	"github.com/99designs/gqlgen/graphql/playground"
	"github.com/go-chi/chi/v5"
	chimw "github.com/go-chi/chi/v5/middleware"

	"myapp/graph"
	"myapp/middleware"
	"myapp/telemetry"
)

func main() {
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	shutdown, err := telemetry.InitTracer(ctx, os.Getenv("OTEL_ENDPOINT"))
	if err != nil {
		log.Printf("⚠️ Tracer init failed: %v", err)
	} else {
		defer shutdown(ctx)
	}

	userRepo, err := NewPostgresUserRepository(os.Getenv("DATABASE_URL"))
	if err != nil {
		log.Fatalf("Failed to connect to user database: %v", err)
	}
	orderRepo, err := NewPostgresOrderRepository(os.Getenv("DATABASE_URL"))
	if err != nil {
		log.Fatalf("Failed to connect to order database: %v", err)
	}

	resolver := graph.NewResolver(userRepo, orderRepo)
	loaders := graph.NewLoaders(userRepo, nil, orderRepo)

	srv := handler.NewDefaultServer(graph.NewExecutableSchema(
		graph.Config{Resolvers: resolver},
	))

	limiter := middleware.NewRateLimiter(100, time.Second)
	breaker := middleware.NewCircuitBreaker(5, 30*time.Second)

	router := chi.NewRouter()
	router.Use(chimw.RequestID)
	router.Use(chimw.RealIP)
	router.Use(chimw.Logger)
	router.Use(chimw.Recoverer)
	router.Use(chimw.Timeout(30 * time.Second))
	router.Use(middleware.AuthMiddleware(os.Getenv("JWT_SECRET")))
	router.Use(middleware.RateLimitMiddleware(limiter))

	router.Handle("/", playground.Handler("GraphQL Playground", "/query"))
	router.Handle("/query", srv)
	router.Get("/health", func(w http.ResponseWriter, r *http.Request) {
		w.WriteHeader(http.StatusOK)
		w.Write([]byte("ok"))
	})

	port := os.Getenv("PORT")
	if port == "" {
		port = "4001"
	}

	server := &http.Server{
		Addr:         ":" + port,
		Handler:      router,
		ReadTimeout:  15 * time.Second,
		WriteTimeout: 30 * time.Second,
		IdleTimeout:  60 * time.Second,
	}

	go func() {
		log.Printf("🚀 Subgraph running on :%s", port)
		if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
			log.Fatalf("Server error: %v", err)
		}
	}()

	quit := make(chan os.Signal, 1)
	signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
	<-quit

	log.Println("Shutting down gracefully...")
	shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 15*time.Second)
	defer shutdownCancel()

	if err := server.Shutdown(shutdownCtx); err != nil {
		log.Fatalf("Forced shutdown: %v", err)
	}
	log.Println("Server stopped")
	_ = breaker
	_ = loaders
}

5 Common Pitfalls

Pitfall 1: Forgetting to Implement __resolveReference in Subgraphs

Wrong:

// Only defined @key but didn't implement entity resolution
type Resolver struct{}

// Missing this method — gateway cannot resolve cross-subgraph entities
// func (r *entityResolver) FindUserByID(ctx context.Context, id string) (*User, error) { ... }

Correct:

func (r *entityResolver) FindUserByID(ctx context.Context, id string) (*User, error) {
	return r.userRepo.FindByID(ctx, id)
}

func (r *entityResolver) FindUserByEmail(ctx context.Context, email string) (*User, error) {
	return r.userRepo.FindByEmail(ctx, email)
}

Pitfall 2: @shareable Overuse Causing Data Inconsistency

Wrong:

type User @key(fields: "id") @shareable {
  id: ID!
  name: String!
  email: String!
  orderCount: Int!  # Multiple subgraphs provide this field with different logic
}

Correct:

type User @key(fields: "id") {
  id: ID!
  name: String!
  email: String!
}

type UserOrderStats @key(fields: "userId") {
  userId: ID!
  orderCount: Int!
  totalSpent: Float!
}

Pitfall 3: N+1 Queries Without DataLoader

Wrong:

func (r *orderResolver) User(ctx context.Context, obj *Order) (*User, error) {
    // Each Order triggers a separate HTTP request to user service
    resp, err := http.Get(fmt.Sprintf("http://users-service/users/%s", obj.UserID))
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()
    var user User
    json.NewDecoder(resp.Body).Decode(&user)
    return &user, nil
}

Correct:

func (r *orderResolver) User(ctx context.Context, obj *Order) (*User, error) {
    loader := ctx.Value(loaderKey).(*Loaders)
    return loader.UserByID.Load(ctx, obj.UserID)
}

Pitfall 4: Subgraph Schema Changes Without Compatibility Checks

Wrong:

# Publish directly without checking
rover subgraph publish my-graph@production \
  --name users \
  --schema ./schemas/users.graphqls

Correct:

# Check compatibility first
rover subgraph check my-graph@production \
  --name users \
  --schema ./schemas/users.graphqls

# Confirm no breaking changes, then publish
rover subgraph publish my-graph@production \
  --name users \
  --schema ./schemas/users.graphqls \
  --routing-url http://users-service:4001/graphql

Pitfall 5: Missing Timeout and Retry Configuration at Gateway

Wrong:

# router.yaml - no timeout or retry configuration
supergraph:
  listen: 0.0.0.0:4000

Correct:

supergraph:
  listen: 0.0.0.0:4000

traffic_shaping:
  all:
    timeout: 30s
    rate_limit:
      capacity: 1000
      interval: 1s
  subgraphs:
    users:
      timeout: 5s
    products:
      timeout: 3s
    orders:
      timeout: 10s

Error Troubleshooting Reference

Error Message Cause Solution
ENCORE_UNKNOWN_DIRECTIVE Subgraph uses federation directive not imported Add missing directive import in @link
KEY_FIELDS_MISSING_ON_BASE @key references fields that don't exist on the type Ensure @key-specified fields are declared in the type definition
EXTERNAL_TYPE_MISMATCH @external declared type doesn't match the owning subgraph Verify @external field types match the original definition
SHAREABLE_MISMATCH Same type has inconsistent @shareable declarations across subgraphs All subgraphs sharing a type must mark it @shareable
RESOLVE_REFERENCE_FAILED __resolveReference implementation returns an error Check entity resolver database queries and error handling
QUERY_PLAN_TIMEOUT Query planning timeout — too many subgraphs or query too deep Limit query depth, optimize schema structure
SUBGRAPH_UNREACHABLE Subgraph service unreachable Check subgraph health status and network connectivity
COMPOSITION_ERROR Schema composition failure due to type conflicts Use rover subgraph check to verify compatibility
N+1_DETECTED Gateway detects N+1 query pattern Add DataLoader batch loading for entity resolution
CIRCULAR_DEPENDENCY Circular dependencies between subgraphs Refactor entity boundaries, use @requires instead of direct references

Advanced Optimization

Query Complexity Analysis and Limiting

GraphQL query complexity can be exploited — a deeply nested query can produce exponential data volume. Use complexity analysis to limit query cost.

package middleware

import (
	"context"
	"fmt"

	"github.com/99designs/gqlgen/graphql"
)

type ComplexityLimit struct {
	maxComplexity int
}

func NewComplexityLimit(max int) *ComplexityLimit {
	return &ComplexityLimit{maxComplexity: max}
}

func (cl *ComplexityLimit) Extension() graphql.HandlerExtension {
	return graphql.FixedComplexityLimit(cl.maxComplexity)
}

type fieldComplexity struct {
	complexity int
	details    map[string]int
}

func CalculateQueryComplexity(ctx context.Context, req *graphql.Request) (*fieldComplexity, error) {
	complexity := 0
	details := make(map[string]int)

	operation := req.Doc().Operations
	for _, op := range operation {
		for _, sel := range op.SelectionSet {
			calcSelectionComplexity(sel, &complexity, details, 1)
		}
	}

	if complexity > 500 {
		return nil, fmt.Errorf("query complexity %d exceeds limit 500", complexity)
	}

	return &fieldComplexity{complexity: complexity, details: details}, nil
}

func calcSelectionComplexity(sel ast.Selection, total *int, details map[string]int, depth int) {
	switch s := sel.(type) {
	case *ast.Field:
		fieldCost := 1
		if s.SelectionSet != nil {
			fieldCost *= depth
		}
		*total += fieldCost
		details[s.Name.Value] += fieldCost
		if s.SelectionSet != nil {
			for _, child := range s.SelectionSet {
				calcSelectionComplexity(child, total, details, depth+1)
			}
		}
	case *ast.InlineFragment:
		for _, child := range s.SelectionSet {
			calcSelectionComplexity(child, total, details, depth)
		}
	case *ast.FragmentSpread:
		for _, child := range s.Definition.SelectionSet {
			calcSelectionComplexity(child, total, details, depth)
		}
	}
}

Persisted Queries and Query Registration

Production environments should use Persisted Queries — clients only send query hashes, avoiding transmitting full query text and preventing unknown query execution.

package persistedquery

import (
	"context"
	"crypto/sha256"
	"encoding/hex"
	"fmt"
	"sync"

	"github.com/99designs/gqlgen/graphql"
)

type PersistedQueryManager struct {
	mu      sync.RWMutex
	queries map[string]string
	strict  bool
}

func NewPersistedQueryManager(strict bool) *PersistedQueryManager {
	return &PersistedQueryManager{
		queries: make(map[string]string),
		strict:  strict,
	}
}

func (pqm *PersistedQueryManager) Register(hash, query string) {
	pqm.mu.Lock()
	defer pqm.mu.Unlock()
	pqm.queries[hash] = query
}

func (pqm *PersistedQueryManager) Middleware() graphql.RequestMiddleware {
	return func(ctx context.Context, next graphql.ResponseHandler) *graphql.Response {
		reqCtx := graphql.GetRequestContext(ctx)
		hash := reqCtx.RawQuery

		if len(hash) == 64 {
			pqm.mu.RLock()
			query, ok := pqm.queries[hash]
			pqm.mu.RUnlock()

			if ok {
				reqCtx.RawQuery = query
			} else if pqm.strict {
				panic(fmt.Sprintf("unknown persisted query: %s", hash))
			}
		}

		return next(ctx)
	}
}

func HashQuery(query string) string {
	h := sha256.Sum256([]byte(query))
	return hex.EncodeToString(h[:])
}

Subgraph Caching Strategy

Subgraph-level caching significantly reduces repeated queries, especially for hot entities.

package cache

import (
	"context"
	"encoding/json"
	"fmt"
	"time"

	"github.com/redis/go-redis/v9"
)

type EntityCache struct {
	rdb    *redis.Client
	prefix string
	ttl    time.Duration
}

func NewEntityCache(redisURL, prefix string, ttl time.Duration) (*EntityCache, error) {
	opts, err := redis.ParseURL(redisURL)
	if err != nil {
		return nil, fmt.Errorf("invalid redis URL: %w", err)
	}

	return &EntityCache{
		rdb:    redis.NewClient(opts),
		prefix: prefix,
		ttl:    ttl,
	}, nil
}

func (c *EntityCache) Get(ctx context.Context, entityType, id string, dest interface{}) error {
	key := fmt.Sprintf("%s:%s:%s", c.prefix, entityType, id)
	val, err := c.rdb.Get(ctx, key).Result()
	if err == redis.Nil {
		return fmt.Errorf("cache miss for %s:%s", entityType, id)
	}
	if err != nil {
		return fmt.Errorf("cache read error: %w", err)
	}
	return json.Unmarshal([]byte(val), dest)
}

func (c *EntityCache) Set(ctx context.Context, entityType, id string, val interface{}) error {
	key := fmt.Sprintf("%s:%s:%s", c.prefix, entityType, id)
	data, err := json.Marshal(val)
	if err != nil {
		return fmt.Errorf("cache marshal error: %w", err)
	}
	return c.rdb.Set(ctx, key, data, c.ttl).Err()
}

func (c *EntityCache) Invalidate(ctx context.Context, entityType, id string) error {
	key := fmt.Sprintf("%s:%s:%s", c.prefix, entityType, id)
	return c.rdb.Del(ctx, key).Err()
}

func (c *EntityCache) InvalidatePattern(ctx context.Context, pattern string) error {
	iter := c.rdb.Scan(ctx, 0, fmt.Sprintf("%s:%s:*", c.prefix, pattern), 100).Iterator()
	var keys []string
	for iter.Next(ctx) {
		keys = append(keys, iter.Val())
	}
	if err := iter.Err(); err != nil {
		return fmt.Errorf("cache scan error: %w", err)
	}
	if len(keys) > 0 {
		return c.rdb.Del(ctx, keys...).Err()
	}
	return nil
}

Technology Comparison

Dimension Apollo Federation Schema Stitching REST API gRPC tRPC
Learning Curve Medium, requires federation concepts High, manual conflict resolution Low Medium, requires Proto Low (TypeScript only)
Schema Management Auto composition, rover CLI Manual stitching, custom resolvers No unified schema Proto definition, auto-generated TypeScript type inference
Cross-team Collaboration Excellent, subgraphs evolve independently Fair, conflicts need manual resolution Poor, API docs easily outdated Good, Proto as contract TS full-stack only
Performance Good, query planning + batch resolution Fair, N+1 needs manual handling Poor, multiple requests Excellent, binary + HTTP/2 Good, end-to-end type safety
N+1 Prevention Built-in DataLoader support Manual implementation required None None None
Ecosystem Maturity High, Apollo full-stack Medium, community solutions High High Medium
Language Support All languages, Go/Java/TS etc. All languages All languages All languages TypeScript only
Real-time Subscriptions Supported Supported Requires WebSocket Requires bidirectional stream Supported
Observability Apollo Studio integration Self-built Self-built OpenTelemetry Self-built
Use Case Large-scale microservice APIs Custom composition logic Simple CRUD Internal high-performance communication TS full-stack projects

Summary: GraphQL Federation isn't a silver bullet, but it's the most mature solution for microservice GraphQL architecture today. Core principles: split subgraphs by domain boundaries, declare entities with @key, prevent N+1 with DataLoader, handle query planning and rate limiting at the gateway, and always add auth, tracing, and caching in production. Start with 2 subgraphs and incrementally split — don't go all-in at once. Schema checks must be enforced in CI, or a breaking change will eventually take down the entire supergraph.


  • JSON Formatter — Format GraphQL query responses, debug supergraph composition results
  • Base64 Encode — Encode JWT tokens and authentication headers
  • Hash Calculator — Calculate SHA256 hashes for persisted queries

Try these browser-local tools — no sign-up required →

#GraphQL#Federation#Go#微服务#Apollo#2026#API网关