Go GraphQL Federation: 6 Production Patterns from Subgraph to Supergraph

When Monolithic GraphQL Schema Meets Team Boundaries: The Microservice GraphQL Dilemma

An e-commerce platform's GraphQL Schema balloons to 3,000 lines. The user team, product team, and order team all modify the same schema. Every release requires coordination — one team's breaking change takes down the entire API. Worse, cross-service N+1 queries push response times from 50ms to 3s — a user queries order lists, each order needs product details, each product needs inventory status, and 100 orders means 300 downstream calls.

This isn't hypothetical. When your microservice architecture is already split but the GraphQL layer remains monolithic, team autonomy and API performance become irreconcilable conflicts. GraphQL Federation exists to solve this — each service owns its GraphQL Schema (subgraph), composed into a unified API (supergraph) via a gateway, while preventing N+1 queries and cross-team coupling.

Core Concepts Reference

Concept	Purpose	Key Features	Typical Use Case
`Federation`	Compose multiple GraphQL services into a unified API	Transparent to clients, each service deploys independently	Unified API layer in microservice architecture
`Subgraph`	A single service's GraphQL Schema	Owns independent types and resolvers, declares entities via @key	User service, product service, order service
`Supergraph`	Complete schema composed from all subgraphs	Auto-synthesized by gateway, clients only see the supergraph	Unified API entry point
`Entity`	Type shared across subgraphs	Identified by @key, multiple subgraphs can contribute fields	User, Product, Order and other core domain objects
`@key`	Declares unique identifier fields for an entity	Supports composite keys, multiple @keys for alternate identifiers	`@key(fields: "id")` or `@key(fields: "sku warehouseId")`
`Gateway`	Federation query routing and execution engine	Query planning, batch entity resolution, caching	Apollo Router, Apollo Gateway
`Schema Stitching`	Manually composing multiple GraphQL Schemas	More flexible but requires manual conflict resolution	Custom composition logic, non-standard federation scenarios

5 Challenges of GraphQL Federation Architecture

Challenge 1: Unclear Entity Boundary Definition

The user service has User's name and email, while the order service also has User but only cares about id and order list. If all User fields live in the user service, the order service must make cross-service calls every time. If scattered across services, entity ownership and consistency become problematic.

Challenge 2: N+1 Queries Amplified at the Federation Layer

A client queries { orders { user { name } } }. The gateway first fetches orders from the order service, then resolves the User entity for each order's userId from the user service. 100 orders means 100 User entity resolution requests — a performance disaster.

Challenge 3: Schema Evolution and Compatibility

The product service wants to add a required field to Product, but the order service's Product reference may not be compatible. A subgraph's breaking change can affect the entire supergraph, but who performs global compatibility checks?

Challenge 4: Authentication and Authorization Passthrough

JWT tokens need to propagate from the gateway to every subgraph. Different subgraphs may have different permission models. The user service needs user:read permission, the order service needs order:read — how to handle this uniformly at the gateway layer?

Challenge 5: Observability and Error Tracing

A single query may involve 3 subgraphs. When a query fails, which subgraph produced the error? Where is the latency bottleneck? How does distributed tracing propagate correctly through the GraphQL layer?

6 Production-Grade Federation Patterns

Pattern 1: Subgraph Service Definition — gqlgen Foundation

The subgraph is the fundamental unit of federation. Use gqlgen to generate the GraphQL service, declare federation directives, and define entity types.

GraphQL Schema (users.graphqls):

extend schema
  @link(url: "https://specs.apollo.dev/federation/v2.0",
        import: ["@key", "@shareable", "@external", "@requires"])

type User @key(fields: "id") @key(fields: "email") {
  id: ID!
  email: String!
  name: String!
  avatar: String
  createdAt: String!
  orders: [Order!]!
}

type Order @key(fields: "id") @shareable {
  id: ID!
  userId: ID!
  items: [OrderItem!]!
  total: Float!
  status: OrderStatus!
}

enum OrderStatus {
  PENDING
  CONFIRMED
  SHIPPED
  DELIVERED
  CANCELLED
}

type OrderItem {
  productId: ID!
  quantity: Int!
  price: Float!
}

Go Resolver Implementation:

package graph

import (
	"context"
	"fmt"

	"github.com/99designs/gqlgen/graphql"
	"github.com/99designs/gqlgen/graphql/handler"
	"github.com/99designs/gqlgen/graphql/handler/extension"
	"github.com/99designs/gqlgen/graphql/handler/transport"
)

type User struct {
	ID        string `json:"id"`
	Email     string `json:"email"`
	Name      string `json:"name"`
	Avatar    string `json:"avatar,omitempty"`
	CreatedAt string `json:"createdAt"`
}

type Order struct {
	ID     string     `json:"id"`
	UserID string     `json:"userId"`
	Items  []OrderItem `json:"items"`
	Total  float64    `json:"total"`
	Status string     `json:"status"`
}

type OrderItem struct {
	ProductID string  `json:"productId"`
	Quantity  int     `json:"quantity"`
	Price     float64 `json:"price"`
}

type Resolver struct {
	userRepo  UserRepository
	orderRepo OrderRepository
}

func NewResolver(userRepo UserRepository, orderRepo OrderRepository) *Resolver {
	return &Resolver{userRepo: userRepo, orderRepo: orderRepo}
}

func (r *Resolver) User(ctx context.Context, id string) (*User, error) {
	user, err := r.userRepo.FindByID(ctx, id)
	if err != nil {
		return nil, fmt.Errorf("user not found: %w", err)
	}
	return user, nil
}

func (r *Resolver) Users(ctx context.Context, limit int, offset int) ([]*User, error) {
	return r.userRepo.List(ctx, limit, offset)
}

func (r *entityResolver) FindUserByID(ctx context.Context, id string) (*User, error) {
	return r.userRepo.FindByID(ctx, id)
}

func (r *entityResolver) FindUserByEmail(ctx context.Context, email string) (*User, error) {
	return r.userRepo.FindByEmail(ctx, email)
}

func NewGraphQLServer(resolver *Resolver) *handler.Server {
	srv := handler.New(NewExecutableSchema(Config{Resolvers: resolver}))
	srv.AddTransport(transport.POST{})
	srv.AddTransport(transport.GET{})
	srv.Use(extension.Introspection{})
	return srv
}

gqlgen Configuration (gqlgen.yml):

schema:
  - users.graphqls
exec:
  filename: graph/generated.go
model:
  filename: graph/model/models_gen.go
resolver:
  filename: graph/resolver.go
  type: Resolver
federation:
  filename: graph/federation.go
  package: graph

Pattern 2: Entity Resolution with @key Directive — Cross-Service Type Stitching

@key declares an entity's identifier fields. The gateway resolves entities across subgraphs via the __resolveReference function. This is the core mechanism of federation.

Product Subgraph Schema (products.graphqls):

extend schema
  @link(url: "https://specs.apollo.dev/federation/v2.0",
        import: ["@key", "@shareable", "@external", "@requires", "@provides"])

type Product @key(fields: "id") @key(fields: "sku") {
  id: ID!
  sku: String!
  name: String!
  description: String
  price: Float!
  inventory: Int!
  category: Category!
  reviews: [Review!]! @provides(fields: "rating")
}

type Category @key(fields: "id") {
  id: ID!
  name: String!
  parent: Category
}

type Review {
  id: ID!
  userId: ID!
  productId: ID!
  rating: Int!
  comment: String
}

type Query {
  product(id: ID!): Product
  products(categoryId: ID, limit: Int, offset: Int): [Product!]!
  category(id: ID!): Category
}

Go Entity Resolver:

package graph

import (
	"context"
	"fmt"
)

type Product struct {
	ID          string  `json:"id"`
	SKU         string  `json:"sku"`
	Name        string  `json:"name"`
	Description string  `json:"description,omitempty"`
	Price       float64 `json:"price"`
	Inventory   int     `json:"inventory"`
	CategoryID  string  `json:"categoryId"`
}

type Category struct {
	ID       string `json:"id"`
	Name     string `json:"name"`
	ParentID string `json:"parentId,omitempty"`
}

type ProductRepository interface {
	FindByID(ctx context.Context, id string) (*Product, error)
	FindBySKU(ctx context.Context, sku string) (*Product, error)
	ListByCategory(ctx context.Context, categoryID string, limit, offset int) ([]*Product, error)
}

type entityResolver struct {
	productRepo  ProductRepository
	categoryRepo CategoryRepository
}

func (r *entityResolver) FindProductByID(ctx context.Context, id string) (*Product, error) {
	product, err := r.productRepo.FindByID(ctx, id)
	if err != nil {
		return nil, fmt.Errorf("product entity resolution failed for id=%s: %w", id, err)
	}
	return product, nil
}

func (r *entityResolver) FindProductBySKU(ctx context.Context, sku string) (*Product, error) {
	product, err := r.productRepo.FindBySKU(ctx, sku)
	if err != nil {
		return nil, fmt.Errorf("product entity resolution failed for sku=%s: %w", sku, err)
	}
	return product, nil
}

func (r *entityResolver) FindCategoryByID(ctx context.Context, id string) (*Category, error) {
	category, err := r.categoryRepo.FindByID(ctx, id)
	if err != nil {
		return nil, fmt.Errorf("category entity resolution failed for id=%s: %w", id, err)
	}
	return category, nil
}

func (r *Resolver) Product(ctx context.Context, id string) (*Product, error) {
	return r.productRepo.FindByID(ctx, id)
}

func (r *Resolver) Products(ctx context.Context, categoryId *string, limit *int, offset *int) ([]*Product, error) {
	lim := 20
	off := 0
	if limit != nil {
		lim = *limit
	}
	if offset != nil {
		off = *offset
	}
	if categoryId != nil {
		return r.productRepo.ListByCategory(ctx, *categoryId, lim, off)
	}
	return r.productRepo.List(ctx, lim, off)
}

Composite Key Entity:

type WarehouseStock @key(fields: "sku warehouseId") {
  sku: String!
  warehouseId: ID!
  quantity: Int!
  reservedQuantity: Int!
  location: String!
}

type WarehouseStock struct {
	SKU              string `json:"sku"`
	WarehouseID      string `json:"warehouseId"`
	Quantity         int    `json:"quantity"`
	ReservedQuantity int    `json:"reservedQuantity"`
	Location         string `json:"location"`
}

type WarehouseStockRef struct {
	SKU         string `json:"sku"`
	WarehouseID string `json:"warehouseId"`
}

func (r *entityResolver) FindWarehouseStockBySkuAndWarehouseId(
	ctx context.Context,
	sku string,
	warehouseId string,
) (*WarehouseStock, error) {
	stock, err := r.stockRepo.FindBySKUAndWarehouse(ctx, sku, warehouseId)
	if err != nil {
		return nil, fmt.Errorf("warehouse stock resolution failed: %w", err)
	}
	return stock, nil
}

Pattern 3: Apollo Federation v2 Composition — From Subgraph to Supergraph

Federation v2 introduces @link, @shareable, @override and other new directives for more flexible schema composition. Use the rover CLI for schema checking and publishing.

Supergraph Configuration (supergraph.yaml):

federation_version: =2.8.0
subgraphs:
  users:
    routing_url: http://users-service:4001/graphql
    schema:
      file: ./schemas/users.graphqls
  products:
    routing_url: http://products-service:4002/graphql
    schema:
      file: ./schemas/products.graphqls
  orders:
    routing_url: http://orders-service:4003/graphql
    schema:
      file: ./schemas/orders.graphqls
  reviews:
    routing_url: http://reviews-service:4004/graphql
    schema:
      file: ./schemas/reviews.graphqls

Order Subgraph Schema (orders.graphqls):

extend schema
  @link(url: "https://specs.apollo.dev/federation/v2.0",
        import: ["@key", "@shareable", "@external", "@requires"])

type Order @key(fields: "id") {
  id: ID!
  userId: ID!
  items: [OrderItem!]!
  total: Float!
  status: OrderStatus!
  shippingAddress: Address
  createdAt: String!
  user: User @requires(fields: "userId")
}

type OrderItem {
  productId: ID!
  quantity: Int!
  unitPrice: Float!
  product: Product
}

type Address @shareable {
  street: String!
  city: String!
  state: String!
  zipCode: String!
  country: String!
}

type User @key(fields: "id") @shareable {
  id: ID! @external
  orders: [Order!]!
}

type Product @key(fields: "id") @shareable {
  id: ID! @external
  orderItems: [OrderItem!]!
}

enum OrderStatus {
  PENDING
  CONFIRMED
  SHIPPED
  DELIVERED
  CANCELLED
}

type Query {
  order(id: ID!): Order
  orders(userId: ID, status: OrderStatus, limit: Int, offset: Int): [Order!]!
}

Schema Check and Publish:

# Check subgraph schema compatibility
rover subgraph check my-graph \
  --name users \
  --schema ./schemas/users.graphqls

# Publish subgraph schema
rover subgraph publish my-graph@production \
  --name users \
  --schema ./schemas/users.graphqls \
  --routing-url http://users-service:4001/graphql

# Compose supergraph
rover supergraph compose --config supergraph.yaml > supergraph.graphqls

Go Subgraph HTTP Service:

package main

import (
	"log"
	"net/http"
	"os"

	"github.com/99designs/gqlgen/graphql/handler"
	"github.com/99designs/gqlgen/graphql/playground"
	"github.com/go-chi/chi/v5"
)

func main() {
	port := os.Getenv("PORT")
	if port == "" {
		port = "4001"
	}

	router := chi.NewRouter()

	userRepo := NewPostgresUserRepository(os.Getenv("DATABASE_URL"))
	orderRepo := NewPostgresOrderRepository(os.Getenv("DATABASE_URL"))
	resolver := graph.NewResolver(userRepo, orderRepo)

	srv := handler.NewDefaultServer(graph.NewExecutableSchema(
		graph.Config{Resolvers: resolver},
	))

	router.Handle("/", playground.Handler("GraphQL Playground", "/query"))
	router.Handle("/query", srv)

	log.Printf("🚀 Users subgraph running on :%s", port)
	log.Fatal(http.ListenAndServe(":"+port, router))
}

Pattern 4: Gateway Router with Query Planning — Apollo Router

Apollo Router is a high-performance gateway written in Rust, supporting query planning, batch entity resolution, caching, and observability.

Router Configuration (router.yaml):

supergraph:
  listen: 0.0.0.0:4000
  path: /graphql
  introspection: true

health_check:
  listen: 0.0.0.0:8088

cors:
  origins:
    - https://app.example.com
    - http://localhost:3000
  methods:
    - GET
    - POST
  headers:
    - Authorization
    - Content-Type
    - X-Request-ID

headers:
  all:
    request:
      - propagate:
          matching: "^X-.*"
      - propagate:
          named: Authorization
  subgraphs:
    users:
      request:
        - propagate:
            named: Authorization
        - set:
            name: X-User-Service-Key
            value: "${USERS_SERVICE_KEY}"
    orders:
      request:
        - propagate:
            named: Authorization

traffic_shaping:
  all:
    rate_limit:
      capacity: 1000
      interval: 1s
  subgraphs:
    users:
      timeout: 5s
      rate_limit:
        capacity: 500
        interval: 1s
    products:
      timeout: 3s
    orders:
      timeout: 10s

telemetry:
  tracing:
    common:
      service_name: apollo-router
    otlp:
      endpoint: http://otel-collector:4317
      protocol: grpc
  metrics:
    common:
      service_name: apollo-router
    otlp:
      endpoint: http://otel-collector:4317
      protocol: grpc
  logging:
    format: json

Docker Compose Deployment:

version: "3.9"

services:
  router:
    image: ghcr.io/apollographql/router:v1.45.0
    ports:
      - "4000:4000"
      - "8088:8088"
    volumes:
      - ./router.yaml:/dist/configuration/router.yaml:ro
      - ./supergraph.graphqls:/dist/schema/supergraph.graphqls:ro
    environment:
      - USERS_SERVICE_KEY=${USERS_SERVICE_KEY}
      - APOLLO_KEY=${APOLLO_KEY}
      - APOLLO_GRAPH_REF=${APOLLO_GRAPH_REF}
    depends_on:
      - users-service
      - products-service
      - orders-service
      - reviews-service
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8088/health"]
      interval: 10s
      timeout: 5s
      retries: 3

  users-service:
    build:
      context: ./services/users
      dockerfile: Dockerfile
    ports:
      - "4001:4001"
    environment:
      - DATABASE_URL=postgres://users:password@postgres:5432/users?sslmode=disable
      - PORT=4001
    depends_on:
      - postgres

  products-service:
    build:
      context: ./services/products
      dockerfile: Dockerfile
    ports:
      - "4002:4002"
    environment:
      - DATABASE_URL=postgres://products:password@postgres:5432/products?sslmode=disable
      - PORT=4002

  orders-service:
    build:
      context: ./services/orders
      dockerfile: Dockerfile
    ports:
      - "4003:4003"
    environment:
      - DATABASE_URL=postgres://orders:password@postgres:5432/orders?sslmode=disable
      - PORT=4003

  reviews-service:
    build:
      context: ./services/reviews
      dockerfile: Dockerfile
    ports:
      - "4004:4004"
    environment:
      - DATABASE_URL=postgres://reviews:password@postgres:5432/reviews?sslmode=disable
      - PORT=4004

  postgres:
    image: postgres:16-alpine
    ports:
      - "5432:5432"
    environment:
      - POSTGRES_MULTIPLE_DATABASES=users,products,orders,reviews
      - POSTGRES_PASSWORD=password
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

Query Planning Example:

query GetUserWithOrders {
  user(id: "user-123") {
    name
    email
    orders(limit: 10) {
      id
      total
      status
      items {
        product {
          name
          price
        }
        quantity
      }
    }
  }
}

The gateway's query planner generates the following execution plan:

Fetch User's name and email from the users subgraph
Fetch User(id="user-123")'s orders from the orders subgraph
Batch resolve Product entities in OrderItems from the products subgraph
Merge results and return to the client

Pattern 5: Cross-Service Data Fetching and N+1 Prevention — DataLoader Pattern

N+1 is the most severe performance issue in GraphQL Federation. DataLoader prevents it through batch loading and deduplication, merging N entity resolutions into 1 batch query.

Go DataLoader Implementation:

package dataloader

import (
	"context"
	"fmt"
	"sync"
	"time"
)

type BatchFunc[K comparable, V any] func(ctx context.Context, keys []K) (map[K]V, error)

type Loader[K comparable, V any] struct {
	batchFn  BatchFunc[K, V]
	cache    map[K]V
	pending  map[K]chan result[V]
	mu       sync.Mutex
	maxBatch int
	wait     time.Duration
}

type result[V any] struct {
	value V
	err   error
}

func NewLoader[K comparable, V any](batchFn BatchFunc[K, V], opts ...Option[K, V]) *Loader[K, V] {
	l := &Loader[K, V]{
		batchFn:  batchFn,
		cache:    make(map[K]V),
		pending:  make(map[K]chan result[V]),
		maxBatch: 100,
		wait:     10 * time.Millisecond,
	}
	for _, opt := range opts {
		opt(l)
	}
	return l
}

type Option[K comparable, V any] func(*Loader[K, V])

func WithMaxBatch[K comparable, V any](n int) Option[K, V] {
	return func(l *Loader[K, V]) { l.maxBatch = n }
}

func WithWait[K comparable, V any](d time.Duration) Option[K, V] {
	return func(l *Loader[K, V]) { l.wait = d }
}

func (l *Loader[K, V]) Load(ctx context.Context, key K) (V, error) {
	l.mu.Lock()

	if v, ok := l.cache[key]; ok {
		l.mu.Unlock()
		return v, nil
	}

	if ch, ok := l.pending[key]; ok {
		l.mu.Unlock()
		res := <-ch
		return res.value, res.err
	}

	ch := make(chan result[V], 1)
	l.pending[key] = ch

	if len(l.pending) >= l.maxBatch {
		l.mu.Unlock()
		l.dispatch(ctx)
	} else {
		l.mu.Unlock()
		time.AfterFunc(l.wait, func() { l.dispatch(ctx) })
	}

	res := <-ch
	return res.value, res.err
}

func (l *Loader[K, V]) dispatch(ctx context.Context) {
	l.mu.Lock()
	if len(l.pending) == 0 {
		l.mu.Unlock()
		return
	}

	keys := make([]K, 0, len(l.pending))
	chs := make(map[K][]chan result[V], len(l.pending))
	for k, ch := range l.pending {
		keys = append(keys, k)
		chs[k] = append(chs[k], ch)
		delete(l.pending, k)
	}
	l.mu.Unlock()

	results, err := l.batchFn(ctx, keys)

	for _, key := range keys {
		var res result[V]
		if err != nil {
			res = result[V]{err: err}
		} else if v, ok := results[key]; ok {
			res = result[V]{value: v}
			l.mu.Lock()
			l.cache[key] = v
			l.mu.Unlock()
		} else {
			res = result[V]{err: fmt.Errorf("key not found: %v", key)}
		}
		for _, ch := range chs[key] {
			ch <- res
		}
	}
}

func (l *Loader[K, V]) LoadMany(ctx context.Context, keys []K) ([]V, error) {
	values := make([]V, len(keys))
	var firstErr error
	for i, key := range keys {
		v, err := l.Load(ctx, key)
		if err != nil && firstErr == nil {
			firstErr = err
		}
		values[i] = v
	}
	return values, firstErr
}

Using DataLoader in Resolvers:

package graph

import (
	"context"
	"fmt"

	"myapp/dataloader"
)

type Loaders struct {
	UserByID    *dataloader.Loader[string, *User]
	ProductByID *dataloader.Loader[string, *Product]
	OrderByID   *dataloader.Loader[string, *Order]
}

func NewLoaders(userRepo UserRepository, productRepo ProductRepository, orderRepo OrderRepository) *Loaders {
	return &Loaders{
		UserByID: dataloader.NewLoader(func(ctx context.Context, ids []string) (map[string]*User, error) {
			users, err := userRepo.FindByIDs(ctx, ids)
			if err != nil {
				return nil, fmt.Errorf("batch user load failed: %w", err)
			}
			result := make(map[string]*User, len(users))
			for _, u := range users {
				result[u.ID] = u
			}
			return result, nil
		}, dataloader.WithMaxBatch[string, *User](200), dataloader.WithWait[string, *User](5*time.Millisecond)),

		ProductByID: dataloader.NewLoader(func(ctx context.Context, ids []string) (map[string]*Product, error) {
			products, err := productRepo.FindByIDs(ctx, ids)
			if err != nil {
				return nil, fmt.Errorf("batch product load failed: %w", err)
			}
			result := make(map[string]*Product, len(products))
			for _, p := range products {
				result[p.ID] = p
			}
			return result, nil
		}, dataloader.WithMaxBatch[string, *Product](200), dataloader.WithWait[string, *Product](5*time.Millisecond)),

		OrderByID: dataloader.NewLoader(func(ctx context.Context, ids []string) (map[string]*Order, error) {
			orders, err := orderRepo.FindByIDs(ctx, ids)
			if err != nil {
				return nil, fmt.Errorf("batch order load failed: %w", err)
			}
			result := make(map[string]*Order, len(orders))
			for _, o := range orders {
				result[o.ID] = o
			}
			return result, nil
		}, dataloader.WithMaxBatch[string, *Order](200), dataloader.WithWait[string, *Order](5*time.Millisecond)),
	}
}

func (r *orderResolver) User(ctx context.Context, obj *Order) (*User, error) {
	loader := ctx.Value(loaderKey).(*Loaders)
	return loader.UserByID.Load(ctx, obj.UserID)
}

func (r *orderItemResolver) Product(ctx context.Context, obj *OrderItem) (*Product, error) {
	loader := ctx.Value(loaderKey).(*Loaders)
	return loader.ProductByID.Load(ctx, obj.ProductID)
}

Batch Query Repository:

package repository

import (
	"context"
	"database/sql"
	"fmt"
	"strings"

	_ "github.com/lib/pq"
)

type PostgresUserRepository struct {
	db *sql.DB
}

func NewPostgresUserRepository(dbURL string) (*PostgresUserRepository, error) {
	db, err := sql.Open("postgres", dbURL)
	if err != nil {
		return nil, fmt.Errorf("failed to connect to database: %w", err)
	}
	db.SetMaxOpenConns(25)
	db.SetMaxIdleConns(10)
	return &PostgresUserRepository{db: db}, nil
}

func (r *PostgresUserRepository) FindByIDs(ctx context.Context, ids []string) ([]*User, error) {
	if len(ids) == 0 {
		return nil, nil
	}

	placeholders := make([]string, len(ids))
	args := make([]interface{}, len(ids))
	for i, id := range ids {
		placeholders[i] = fmt.Sprintf("$%d", i+1)
		args[i] = id
	}

	query := fmt.Sprintf(
		"SELECT id, email, name, avatar, created_at FROM users WHERE id IN (%s)",
		strings.Join(placeholders, ","),
	)

	rows, err := r.db.QueryContext(ctx, query, args...)
	if err != nil {
		return nil, fmt.Errorf("batch query users failed: %w", err)
	}
	defer rows.Close()

	users := make([]*User, 0, len(ids))
	for rows.Next() {
		var u User
		var avatar sql.NullString
		if err := rows.Scan(&u.ID, &u.Email, &u.Name, &avatar, &u.CreatedAt); err != nil {
			return nil, fmt.Errorf("scan user row failed: %w", err)
		}
		if avatar.Valid {
			u.Avatar = avatar.String
		}
		users = append(users, &u)
	}
	return users, rows.Err()
}

Pattern 6: Production Federation Architecture — Auth, Monitoring, and High Availability

Production federation architecture requires handling authentication passthrough, distributed tracing, rate limiting, circuit breaking, and graceful degradation.

Authentication Middleware:

package middleware

import (
	"context"
	"net/http"
	"strings"

	"github.com/golang-jwt/jwt/v5"
)

type contextKey string

const (
	userIDKey    contextKey = "userID"
	userRoleKey  contextKey = "userRole"
	authHeaderKey           = "Authorization"
)

type Claims struct {
	UserID string `json:"sub"`
	Role   string `json:"role"`
	jwt.RegisteredClaims
}

func AuthMiddleware(jwtSecret string) func(http.Handler) http.Handler {
	return func(next http.Handler) http.Handler {
		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			authHeader := r.Header.Get(authHeaderKey)
			if authHeader == "" {
				next.ServeHTTP(w, r)
				return
			}

			tokenStr := strings.TrimPrefix(authHeader, "Bearer ")
			if tokenStr == authHeader {
				next.ServeHTTP(w, r)
				return
			}

			token, err := jwt.ParseWithClaims(tokenStr, &Claims{}, func(t *jwt.Token) (interface{}, error) {
				if _, ok := t.Method.(*jwt.SigningMethodHMAC); !ok {
					return nil, fmt.Errorf("unexpected signing method: %v", t.Header["alg"])
				}
				return []byte(jwtSecret), nil
			})

			if err != nil || !token.Valid {
				http.Error(w, "invalid token", http.StatusUnauthorized)
				return
			}

			claims, ok := token.Claims.(*Claims)
			if !ok {
				http.Error(w, "invalid claims", http.StatusUnauthorized)
				return
			}

			ctx := context.WithValue(r.Context(), userIDKey, claims.UserID)
			ctx = context.WithValue(ctx, userRoleKey, claims.Role)
			next.ServeHTTP(w, r.WithContext(ctx))
		})
	}
}

func GetUserID(ctx context.Context) string {
	if v, ok := ctx.Value(userIDKey).(string); ok {
		return v
	}
	return ""
}

func GetUserRole(ctx context.Context) string {
	if v, ok := ctx.Value(userRoleKey).(string); ok {
		return v
	}
	return ""
}

OpenTelemetry Tracing Integration:

package telemetry

import (
	"context"
	"fmt"
	"time"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
	"go.opentelemetry.io/otel/propagation"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
	"go.opentelemetry.io/otel/trace"
)

func InitTracer(ctx context.Context, endpoint string) (func(context.Context) error, error) {
	exporter, err := otlptracegrpc.New(ctx,
		otlptracegrpc.WithEndpoint(endpoint),
		otlptracegrpc.WithInsecure(),
	)
	if err != nil {
		return nil, fmt.Errorf("failed to create OTLP exporter: %w", err)
	}

	provider := sdktrace.NewTracerProvider(
		sdktrace.WithBatcher(exporter),
		sdktrace.WithResource(newResource()),
	)

	otel.SetTracerProvider(provider)
	otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(
		propagation.TraceContext{},
		propagation.Baggage{},
	))

	return provider.Shutdown, nil
}

func StartSpan(ctx context.Context, name string) (context.Context, trace.Span) {
	tracer := otel.Tracer("graphql-federation")
	return tracer.Start(ctx, name)
}

Rate Limiting and Circuit Breaking:

package middleware

import (
	"context"
	"fmt"
	"net/http"
	"sync"
	"time"
)

type RateLimiter struct {
	mu       sync.Mutex
	clients  map[string]*clientBucket
	rate     int
	interval time.Duration
}

type clientBucket struct {
	tokens   int
	lastSeen time.Time
}

func NewRateLimiter(rate int, interval time.Duration) *RateLimiter {
	rl := &RateLimiter{
		clients:  make(map[string]*clientBucket),
		rate:     rate,
		interval: interval,
	}
	go rl.cleanup()
	return rl
}

func (rl *RateLimiter) Allow(key string) bool {
	rl.mu.Lock()
	defer rl.mu.Unlock()

	now := time.Now()
	bucket, ok := rl.clients[key]
	if !ok {
		rl.clients[key] = &clientBucket{tokens: rl.rate - 1, lastSeen: now}
		return true
	}

	elapsed := now.Sub(bucket.lastSeen)
	if elapsed >= rl.interval {
		bucket.tokens = rl.rate - 1
		bucket.lastSeen = now
		return true
	}

	if bucket.tokens <= 0 {
		return false
	}

	bucket.tokens--
	return true
}

func (rl *RateLimiter) cleanup() {
	ticker := time.NewTicker(time.Minute)
	for range ticker.C {
		rl.mu.Lock()
		for key, bucket := range rl.clients {
			if time.Since(bucket.lastSeen) > 3*rl.interval {
				delete(rl.clients, key)
			}
		}
		rl.mu.Unlock()
	}
}

func RateLimitMiddleware(limiter *RateLimiter) func(http.Handler) http.Handler {
	return func(next http.Handler) http.Handler {
		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			clientID := r.Header.Get("X-Client-ID")
			if clientID == "" {
				clientID = r.RemoteAddr
			}

			if !limiter.Allow(clientID) {
				http.Error(w, "rate limit exceeded", http.StatusTooManyRequests)
				return
			}

			next.ServeHTTP(w, r)
		})
	}
}

type CircuitBreaker struct {
	mu           sync.Mutex
	failureCount int
	threshold    int
	timeout      time.Duration
	state        string
	lastFailure  time.Time
}

func NewCircuitBreaker(threshold int, timeout time.Duration) *CircuitBreaker {
	return &CircuitBreaker{
		threshold: threshold,
		timeout:   timeout,
		state:     "closed",
	}
}

func (cb *CircuitBreaker) Execute(fn func() error) error {
	cb.mu.Lock()
	if cb.state == "open" {
		if time.Since(cb.lastFailure) > cb.timeout {
			cb.state = "half-open"
			cb.mu.Unlock()
		} else {
			cb.mu.Unlock()
			return fmt.Errorf("circuit breaker is open")
		}
	} else {
		cb.mu.Unlock()
	}

	err := fn()
	cb.mu.Lock()
	defer cb.mu.Unlock()

	if err != nil {
		cb.failureCount++
		cb.lastFailure = time.Now()
		if cb.failureCount >= cb.threshold {
			cb.state = "open"
		}
		return err
	}

	cb.failureCount = 0
	cb.state = "closed"
	return nil
}

Complete Service Startup:

package main

import (
	"context"
	"log"
	"net/http"
	"os"
	"os/signal"
	"syscall"
	"time"

	"github.com/99designs/gqlgen/graphql/handler"
	"github.com/99designs/gqlgen/graphql/playground"
	"github.com/go-chi/chi/v5"
	chimw "github.com/go-chi/chi/v5/middleware"

	"myapp/graph"
	"myapp/middleware"
	"myapp/telemetry"
)

func main() {
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	shutdown, err := telemetry.InitTracer(ctx, os.Getenv("OTEL_ENDPOINT"))
	if err != nil {
		log.Printf("⚠️ Tracer init failed: %v", err)
	} else {
		defer shutdown(ctx)
	}

	userRepo, err := NewPostgresUserRepository(os.Getenv("DATABASE_URL"))
	if err != nil {
		log.Fatalf("Failed to connect to user database: %v", err)
	}
	orderRepo, err := NewPostgresOrderRepository(os.Getenv("DATABASE_URL"))
	if err != nil {
		log.Fatalf("Failed to connect to order database: %v", err)
	}

	resolver := graph.NewResolver(userRepo, orderRepo)
	loaders := graph.NewLoaders(userRepo, nil, orderRepo)

	srv := handler.NewDefaultServer(graph.NewExecutableSchema(
		graph.Config{Resolvers: resolver},
	))

	limiter := middleware.NewRateLimiter(100, time.Second)
	breaker := middleware.NewCircuitBreaker(5, 30*time.Second)

	router := chi.NewRouter()
	router.Use(chimw.RequestID)
	router.Use(chimw.RealIP)
	router.Use(chimw.Logger)
	router.Use(chimw.Recoverer)
	router.Use(chimw.Timeout(30 * time.Second))
	router.Use(middleware.AuthMiddleware(os.Getenv("JWT_SECRET")))
	router.Use(middleware.RateLimitMiddleware(limiter))

	router.Handle("/", playground.Handler("GraphQL Playground", "/query"))
	router.Handle("/query", srv)
	router.Get("/health", func(w http.ResponseWriter, r *http.Request) {
		w.WriteHeader(http.StatusOK)
		w.Write([]byte("ok"))
	})

	port := os.Getenv("PORT")
	if port == "" {
		port = "4001"
	}

	server := &http.Server{
		Addr:         ":" + port,
		Handler:      router,
		ReadTimeout:  15 * time.Second,
		WriteTimeout: 30 * time.Second,
		IdleTimeout:  60 * time.Second,
	}

	go func() {
		log.Printf("🚀 Subgraph running on :%s", port)
		if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
			log.Fatalf("Server error: %v", err)
		}
	}()

	quit := make(chan os.Signal, 1)
	signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
	<-quit

	log.Println("Shutting down gracefully...")
	shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 15*time.Second)
	defer shutdownCancel()

	if err := server.Shutdown(shutdownCtx); err != nil {
		log.Fatalf("Forced shutdown: %v", err)
	}
	log.Println("Server stopped")
	_ = breaker
	_ = loaders
}

5 Common Pitfalls

Pitfall 1: Forgetting to Implement __resolveReference in Subgraphs

❌ Wrong:

// Only defined @key but didn't implement entity resolution
type Resolver struct{}

// Missing this method — gateway cannot resolve cross-subgraph entities
// func (r *entityResolver) FindUserByID(ctx context.Context, id string) (*User, error) { ... }

✅ Correct:

func (r *entityResolver) FindUserByID(ctx context.Context, id string) (*User, error) {
	return r.userRepo.FindByID(ctx, id)
}

func (r *entityResolver) FindUserByEmail(ctx context.Context, email string) (*User, error) {
	return r.userRepo.FindByEmail(ctx, email)
}

Pitfall 2: @shareable Overuse Causing Data Inconsistency

❌ Wrong:

type User @key(fields: "id") @shareable {
  id: ID!
  name: String!
  email: String!
  orderCount: Int!  # Multiple subgraphs provide this field with different logic
}

✅ Correct:

type User @key(fields: "id") {
  id: ID!
  name: String!
  email: String!
}

type UserOrderStats @key(fields: "userId") {
  userId: ID!
  orderCount: Int!
  totalSpent: Float!
}

Pitfall 3: N+1 Queries Without DataLoader

❌ Wrong:

func (r *orderResolver) User(ctx context.Context, obj *Order) (*User, error) {
    // Each Order triggers a separate HTTP request to user service
    resp, err := http.Get(fmt.Sprintf("http://users-service/users/%s", obj.UserID))
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()
    var user User
    json.NewDecoder(resp.Body).Decode(&user)
    return &user, nil
}

✅ Correct:

func (r *orderResolver) User(ctx context.Context, obj *Order) (*User, error) {
    loader := ctx.Value(loaderKey).(*Loaders)
    return loader.UserByID.Load(ctx, obj.UserID)
}

Pitfall 4: Subgraph Schema Changes Without Compatibility Checks

❌ Wrong:

# Publish directly without checking
rover subgraph publish my-graph@production \
  --name users \
  --schema ./schemas/users.graphqls

✅ Correct:

# Check compatibility first
rover subgraph check my-graph@production \
  --name users \
  --schema ./schemas/users.graphqls

# Confirm no breaking changes, then publish
rover subgraph publish my-graph@production \
  --name users \
  --schema ./schemas/users.graphqls \
  --routing-url http://users-service:4001/graphql

Pitfall 5: Missing Timeout and Retry Configuration at Gateway

❌ Wrong:

# router.yaml - no timeout or retry configuration
supergraph:
  listen: 0.0.0.0:4000

✅ Correct:

supergraph:
  listen: 0.0.0.0:4000

traffic_shaping:
  all:
    timeout: 30s
    rate_limit:
      capacity: 1000
      interval: 1s
  subgraphs:
    users:
      timeout: 5s
    products:
      timeout: 3s
    orders:
      timeout: 10s

Error Troubleshooting Reference

Error Message	Cause	Solution
`ENCORE_UNKNOWN_DIRECTIVE`	Subgraph uses federation directive not imported	Add missing directive import in @link
`KEY_FIELDS_MISSING_ON_BASE`	@key references fields that don't exist on the type	Ensure @key-specified fields are declared in the type definition
`EXTERNAL_TYPE_MISMATCH`	@external declared type doesn't match the owning subgraph	Verify @external field types match the original definition
`SHAREABLE_MISMATCH`	Same type has inconsistent @shareable declarations across subgraphs	All subgraphs sharing a type must mark it @shareable
`RESOLVE_REFERENCE_FAILED`	__resolveReference implementation returns an error	Check entity resolver database queries and error handling
`QUERY_PLAN_TIMEOUT`	Query planning timeout — too many subgraphs or query too deep	Limit query depth, optimize schema structure
`SUBGRAPH_UNREACHABLE`	Subgraph service unreachable	Check subgraph health status and network connectivity
`COMPOSITION_ERROR`	Schema composition failure due to type conflicts	Use rover subgraph check to verify compatibility
`N+1_DETECTED`	Gateway detects N+1 query pattern	Add DataLoader batch loading for entity resolution
`CIRCULAR_DEPENDENCY`	Circular dependencies between subgraphs	Refactor entity boundaries, use @requires instead of direct references

Advanced Optimization

Query Complexity Analysis and Limiting

GraphQL query complexity can be exploited — a deeply nested query can produce exponential data volume. Use complexity analysis to limit query cost.

package middleware

import (
	"context"
	"fmt"

	"github.com/99designs/gqlgen/graphql"
)

type ComplexityLimit struct {
	maxComplexity int
}

func NewComplexityLimit(max int) *ComplexityLimit {
	return &ComplexityLimit{maxComplexity: max}
}

func (cl *ComplexityLimit) Extension() graphql.HandlerExtension {
	return graphql.FixedComplexityLimit(cl.maxComplexity)
}

type fieldComplexity struct {
	complexity int
	details    map[string]int
}

func CalculateQueryComplexity(ctx context.Context, req *graphql.Request) (*fieldComplexity, error) {
	complexity := 0
	details := make(map[string]int)

	operation := req.Doc().Operations
	for _, op := range operation {
		for _, sel := range op.SelectionSet {
			calcSelectionComplexity(sel, &complexity, details, 1)
		}
	}

	if complexity > 500 {
		return nil, fmt.Errorf("query complexity %d exceeds limit 500", complexity)
	}

	return &fieldComplexity{complexity: complexity, details: details}, nil
}

func calcSelectionComplexity(sel ast.Selection, total *int, details map[string]int, depth int) {
	switch s := sel.(type) {
	case *ast.Field:
		fieldCost := 1
		if s.SelectionSet != nil {
			fieldCost *= depth
		}
		*total += fieldCost
		details[s.Name.Value] += fieldCost
		if s.SelectionSet != nil {
			for _, child := range s.SelectionSet {
				calcSelectionComplexity(child, total, details, depth+1)
			}
		}
	case *ast.InlineFragment:
		for _, child := range s.SelectionSet {
			calcSelectionComplexity(child, total, details, depth)
		}
	case *ast.FragmentSpread:
		for _, child := range s.Definition.SelectionSet {
			calcSelectionComplexity(child, total, details, depth)
		}
	}
}

Persisted Queries and Query Registration

Production environments should use Persisted Queries — clients only send query hashes, avoiding transmitting full query text and preventing unknown query execution.

package persistedquery

import (
	"context"
	"crypto/sha256"
	"encoding/hex"
	"fmt"
	"sync"

	"github.com/99designs/gqlgen/graphql"
)

type PersistedQueryManager struct {
	mu      sync.RWMutex
	queries map[string]string
	strict  bool
}

func NewPersistedQueryManager(strict bool) *PersistedQueryManager {
	return &PersistedQueryManager{
		queries: make(map[string]string),
		strict:  strict,
	}
}

func (pqm *PersistedQueryManager) Register(hash, query string) {
	pqm.mu.Lock()
	defer pqm.mu.Unlock()
	pqm.queries[hash] = query
}

func (pqm *PersistedQueryManager) Middleware() graphql.RequestMiddleware {
	return func(ctx context.Context, next graphql.ResponseHandler) *graphql.Response {
		reqCtx := graphql.GetRequestContext(ctx)
		hash := reqCtx.RawQuery

		if len(hash) == 64 {
			pqm.mu.RLock()
			query, ok := pqm.queries[hash]
			pqm.mu.RUnlock()

			if ok {
				reqCtx.RawQuery = query
			} else if pqm.strict {
				panic(fmt.Sprintf("unknown persisted query: %s", hash))
			}
		}

		return next(ctx)
	}
}

func HashQuery(query string) string {
	h := sha256.Sum256([]byte(query))
	return hex.EncodeToString(h[:])
}

Subgraph Caching Strategy

Subgraph-level caching significantly reduces repeated queries, especially for hot entities.

package cache

import (
	"context"
	"encoding/json"
	"fmt"
	"time"

	"github.com/redis/go-redis/v9"
)

type EntityCache struct {
	rdb    *redis.Client
	prefix string
	ttl    time.Duration
}

func NewEntityCache(redisURL, prefix string, ttl time.Duration) (*EntityCache, error) {
	opts, err := redis.ParseURL(redisURL)
	if err != nil {
		return nil, fmt.Errorf("invalid redis URL: %w", err)
	}

	return &EntityCache{
		rdb:    redis.NewClient(opts),
		prefix: prefix,
		ttl:    ttl,
	}, nil
}

func (c *EntityCache) Get(ctx context.Context, entityType, id string, dest interface{}) error {
	key := fmt.Sprintf("%s:%s:%s", c.prefix, entityType, id)
	val, err := c.rdb.Get(ctx, key).Result()
	if err == redis.Nil {
		return fmt.Errorf("cache miss for %s:%s", entityType, id)
	}
	if err != nil {
		return fmt.Errorf("cache read error: %w", err)
	}
	return json.Unmarshal([]byte(val), dest)
}

func (c *EntityCache) Set(ctx context.Context, entityType, id string, val interface{}) error {
	key := fmt.Sprintf("%s:%s:%s", c.prefix, entityType, id)
	data, err := json.Marshal(val)
	if err != nil {
		return fmt.Errorf("cache marshal error: %w", err)
	}
	return c.rdb.Set(ctx, key, data, c.ttl).Err()
}

func (c *EntityCache) Invalidate(ctx context.Context, entityType, id string) error {
	key := fmt.Sprintf("%s:%s:%s", c.prefix, entityType, id)
	return c.rdb.Del(ctx, key).Err()
}

func (c *EntityCache) InvalidatePattern(ctx context.Context, pattern string) error {
	iter := c.rdb.Scan(ctx, 0, fmt.Sprintf("%s:%s:*", c.prefix, pattern), 100).Iterator()
	var keys []string
	for iter.Next(ctx) {
		keys = append(keys, iter.Val())
	}
	if err := iter.Err(); err != nil {
		return fmt.Errorf("cache scan error: %w", err)
	}
	if len(keys) > 0 {
		return c.rdb.Del(ctx, keys...).Err()
	}
	return nil
}

Technology Comparison

Dimension	Apollo Federation	Schema Stitching	REST API	gRPC	tRPC
Learning Curve	Medium, requires federation concepts	High, manual conflict resolution	Low	Medium, requires Proto	Low (TypeScript only)
Schema Management	Auto composition, rover CLI	Manual stitching, custom resolvers	No unified schema	Proto definition, auto-generated	TypeScript type inference
Cross-team Collaboration	Excellent, subgraphs evolve independently	Fair, conflicts need manual resolution	Poor, API docs easily outdated	Good, Proto as contract	TS full-stack only
Performance	Good, query planning + batch resolution	Fair, N+1 needs manual handling	Poor, multiple requests	Excellent, binary + HTTP/2	Good, end-to-end type safety
N+1 Prevention	Built-in DataLoader support	Manual implementation required	None	None	None
Ecosystem Maturity	High, Apollo full-stack	Medium, community solutions	High	High	Medium
Language Support	All languages, Go/Java/TS etc.	All languages	All languages	All languages	TypeScript only
Real-time Subscriptions	Supported	Supported	Requires WebSocket	Requires bidirectional stream	Supported
Observability	Apollo Studio integration	Self-built	Self-built	OpenTelemetry	Self-built
Use Case	Large-scale microservice APIs	Custom composition logic	Simple CRUD	Internal high-performance communication	TS full-stack projects

Summary: GraphQL Federation isn't a silver bullet, but it's the most mature solution for microservice GraphQL architecture today. Core principles: split subgraphs by domain boundaries, declare entities with @key, prevent N+1 with DataLoader, handle query planning and rate limiting at the gateway, and always add auth, tracing, and caching in production. Start with 2 subgraphs and incrementally split — don't go all-in at once. Schema checks must be enforced in CI, or a breaking change will eventually take down the entire supergraph.

Recommended Tools

JSON Formatter — Format GraphQL query responses, debug supergraph composition results
Base64 Encode — Encode JWT tokens and authentication headers
Hash Calculator — Calculate SHA256 hashes for persisted queries