Go Distributed Tracing with OpenTelemetry in 2026: Complete Observability for Microservices
Go Distributed Tracing with OpenTelemetry in 2026: Complete Observability for Microservices
If you're still debugging microservice issues by "adding logs → restarting → reading logs," your ops efficiency is stuck in 2018. When a request passes through 5 services, 3 databases, and 2 message queues, without distributed tracing you simply cannot identify the latency bottleneck. Distributed tracing isn't optional—it's one of the three pillars of microservice observability (Metrics, Logs, Traces).
In 2026, OpenTelemetry has become the de facto standard, with Jaeger and Grafana Tempo fully supporting the OTLP protocol. This article starts from OpenTelemetry architecture, provides complete Go implementation code, and covers auto-instrumentation, manual instrumentation, context propagation, and backend integration.
Why Distributed Tracing Is Essential for Microservices
| Observability Pillar | Problem Solved | Typical Tools | Consequence Without It |
|---|---|---|---|
| Metrics | "What's wrong?" | Prometheus | Can't quantify problem scale |
| Logs | "Where's the error?" | Loki/ELK | Can't pinpoint specific errors |
| Traces | "Why is it slow? Where's the bottleneck?" | Jaeger/Tempo | Can't locate latency bottlenecks |
| All combined | "Complete problem picture" | Grafana | Only see fragments of the problem |
Key insight: For a slow request across 5 services, Logs can only tell you "each service is slow," while Traces can tell you "the database query in service 3 accounts for 80% of the time."
1. OpenTelemetry Architecture
OpenTelemetry's core architecture: API → SDK → Exporter → Collector → Backend
[App] → [OTel API] → [OTel SDK] → [OTLP Exporter] → [OTel Collector] → [Jaeger/Tempo]
| Component | Responsibility | Required |
|---|---|---|
| OTel API | Instrumentation interface | Yes |
| OTel SDK | Sampling, batching, export | Yes |
| OTLP Exporter | Send to Collector | Yes |
| OTel Collector | Receive, process, forward | Recommended (production) |
| Backend | Storage, query, display | Yes |
1.1 Initialize OpenTelemetry Provider
package tracing
import (
"context"
"fmt"
"time"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)
func InitProvider(serviceName, collectorURL string) (func(context.Context) error, error) {
exporter, err := otlptracegrpc.New(context.Background(),
otlptracegrpc.WithEndpoint(collectorURL),
otlptracegrpc.WithInsecure(),
)
if err != nil {
return nil, fmt.Errorf("creating exporter: %w", err)
}
res, err := resource.New(context.Background(),
resource.WithAttributes(
semconv.ServiceNameKey.String(serviceName),
semconv.ServiceVersionKey.String("1.0.0"),
semconv.DeploymentEnvironmentKey.String("production"),
),
)
if err != nil {
return nil, fmt.Errorf("creating resource: %w", err)
}
provider := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter,
sdktrace.WithBatchTimeout(5*time.Second),
sdktrace.WithMaxExportBatchSize(512),
),
sdktrace.WithResource(res),
sdktrace.WithSampler(sdktrace.TraceIDRatioBased(0.1)),
)
otel.SetTracerProvider(provider)
return provider.Shutdown, nil
}
2. Auto-instrumentation vs Manual Instrumentation
2.1 HTTP Auto-instrumentation
import (
"net/http"
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
)
func main() {
shutdown, err := tracing.InitProvider("user-service", "otel-collector:4317")
if err != nil {
log.Fatal(err)
}
defer shutdown(context.Background())
mux := http.NewServeMux()
mux.HandleFunc("/users", handleGetUsers)
mux.HandleFunc("/orders", handleGetOrders)
handler := otelhttp.NewHandler(mux, "user-service",
otelhttp.WithMessageEvents(otelhttp.Read, otelhttp.Write),
)
http.ListenAndServe(":8080", handler)
}
2.2 gRPC Auto-instrumentation
import (
"google.golang.org/grpc"
"go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
)
func createGRPCServer() *grpc.Server {
return grpc.NewServer(
grpc.StatsHandler(otelgrpc.NewServerHandler()),
)
}
func createGRPCClient(target string) (*grpc.ClientConn, error) {
return grpc.Dial(target,
grpc.WithStatsHandler(otelgrpc.NewClientHandler()),
)
}
2.3 Database Auto-instrumentation
import (
"database/sql"
"go.opentelemetry.io/contrib/instrumentation/database/sql/otsql"
)
func initDB() *sql.DB {
db, err := otsql.Open("postgres", "postgres://localhost/mydb",
otsql.WithAttributes(semconv.DBSystemPostgreSQL),
)
if err != nil {
log.Fatal(err)
}
return db
}
2.4 Manual Instrumentation
func ProcessOrder(ctx context.Context, order *Order) error {
tracer := otel.Tracer("order-service")
ctx, span := tracer.Start(ctx, "ProcessOrder",
trace.WithAttributes(
attribute.String("order.id", order.ID),
attribute.Float64("order.amount", order.Amount),
),
)
defer span.End()
ctx, validateSpan := tracer.Start(ctx, "ValidateOrder")
if err := validate(order); err != nil {
validateSpan.RecordError(err)
validateSpan.SetStatus(codes.Error, err.Error())
validateSpan.End()
return err
}
validateSpan.End()
ctx, paySpan := tracer.Start(ctx, "ProcessPayment")
if err := processPayment(ctx, order); err != nil {
paySpan.RecordError(err)
paySpan.SetStatus(codes.Error, err.Error())
paySpan.End()
return err
}
paySpan.End()
return nil
}
Auto vs Manual Comparison:
| Dimension | Auto-instrumentation | Manual Instrumentation |
|---|---|---|
| Invasiveness | Zero | Requires code changes |
| Granularity | Framework-level (HTTP/gRPC/DB) | Business-level (any function) |
| Attribute Richness | Standard attributes | Custom attributes |
| Performance Overhead | Low (framework-optimized) | Depends on instrumentation count |
| Recommended Strategy | Use auto for framework layer | Use manual for business critical paths |
3. Trace Context Propagation
Cross-service trace context propagation is the core of distributed tracing. OpenTelemetry uses the W3C Trace Context standard.
3.1 HTTP Propagation
import (
"go.opentelemetry.io/otel/propagation"
)
func callDownstream(ctx context.Context, url string) (*http.Response, error) {
req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
if err != nil {
return nil, err
}
otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header))
return http.DefaultClient.Do(req)
}
3.2 Message Queue Propagation
func publishMessage(ctx context.Context, topic string, msg []byte) error {
carrier := propagation.MapCarrier{}
otel.GetTextMapPropagator().Inject(ctx, carrier)
kafkaMsg := &kafka.Message{
TopicPartition: kafka.TopicPartition{Topic: &topic},
Value: msg,
Headers: make([]kafka.Header, 0, len(carrier)),
}
for k, v := range carrier {
kafkaMsg.Headers = append(kafkaMsg.Headers, kafka.Header{
Key: k, Value: []byte(v),
})
}
return producer.Produce(kafkaMsg, nil)
}
4. Jaeger and Tempo Integration
4.1 Jaeger All-in-One (Development)
apiVersion: apps/v1
kind: Deployment
metadata:
name: jaeger
spec:
selector:
matchLabels:
app: jaeger
template:
spec:
containers:
- name: jaeger
image: jaegertracing/all-in-one:1.60
ports:
- containerPort: 16686
name: ui
- containerPort: 4317
name: otlp-grpc
env:
- name: COLLECTOR_OTLP_ENABLED
value: "true"
4.2 Grafana Tempo (Production)
apiVersion: apps/v1
kind: Deployment
metadata:
name: tempo
spec:
selector:
matchLabels:
app: tempo
template:
spec:
containers:
- name: tempo
image: grafana/tempo:2.6
args: ["-config.file=/etc/tempo/tempo.yaml"]
volumeMounts:
- name: config
mountPath: /etc/tempo
volumes:
- name: config
configMap:
name: tempo-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: tempo-config
data:
tempo.yaml: |
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
storage:
trace:
backend: s3
s3:
bucket: tempo-traces
endpoint: minio:9000
4.3 OTel Collector Configuration
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 512
filter:
error_mode: ignore
traces:
span:
- 'attributes["http.route"] == "/healthz"'
tail_sampling:
decision_wait: 10s
policies:
- name: error-policy
type: status_code
status_code:
status_codes:
- ERROR
- name: slow-policy
type: latency
latency:
threshold_ms: 1000
- name: always-keep
type: probabilistic
probabilistic:
sampling_percentage: 10
exporters:
otlp:
endpoint: tempo:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [filter, tail_sampling, batch]
exporters: [otlp]
Backend Comparison:
| Dimension | Jaeger | Grafana Tempo |
|---|---|---|
| Storage | Elasticsearch/Cassandra | Object storage (S3/GCS) |
| Cost | High (ES cluster) | Low (object storage) |
| Query Latency | Low (indexed) | Medium (Trace ID queries very fast) |
| Grafana Integration | Requires plugin | Native integration |
| Use Case | Development/small scale | Production/large scale |
5 Common Pitfalls
| # | Pitfall | Consequence | Solution |
|---|---|---|---|
| 1 | Sampling rate set to 100% | Storage cost explosion, performance degradation | Use 0.1%-10% in production, 100% for error traces |
| 2 | Not propagating trace context | Cross-service trace broken | Use TextMapPropagator.Inject/Extract |
| 3 | Forgetting span.End() | Incomplete spans, memory leaks | Use defer span.End() |
| 4 | Creating too many spans on hot paths | Excessive performance overhead | Manual instrumentation for critical paths, auto for the rest |
| 5 | Single Collector deployment | Collector failure causes data loss | Deploy multiple Collector instances + load balancing |
10 Error Troubleshooting Items
| # | Error Symptom | Possible Cause | Troubleshooting Method |
|---|---|---|---|
| 1 | No traces visible in Jaeger | Exporter not connected to Collector | Check Collector URL and port |
| 2 | Cross-service trace broken | Context not propagated | Check if Propagator.Inject is called |
| 3 | Missing span attributes | Resource attributes not set | Check resource.New WithAttributes |
| 4 | Critical traces lost after sampling | Sampling rate too low | Use Tail Sampling to prioritize error traces |
| 5 | Collector OOM | Batch queue too large | Reduce batch size and timeout |
| 6 | Tempo query timeout | No index for non-Trace-ID queries | Ensure using Trace ID queries |
| 7 | Incomplete gRPC traces | otelgrpc interceptor not added | Add StatsHandler to both client and server |
| 8 | Missing DB spans | Using otsql but not replacing driver | Confirm using otsql.Open instead of sql.Open |
| 9 | Kafka message trace broken | Trace not injected in message headers | Inject on produce, Extract on consume |
| 10 | Too many spans | Auto + manual instrumentation overlap | Avoid manual spans where auto-instrumentation covers |
Tool Recommendations
When implementing distributed tracing, these tools help with data format and encoding tasks:
- JSON Formatter — Format OTel Collector configuration and Span JSON data for debugging
- Base64 Encoder — Encode Trace IDs and Span IDs for cross-system transmission
- Hash Calculator — Generate hashes for sampling decisions, ensuring consistent sampling for the same Trace
Summary: Distributed tracing is the "X-ray machine" of microservice observability—without it, you can only see symptoms, not causes. OpenTelemetry unifies the API and SDK, auto-instrumentation covers HTTP/gRPC/DB, manual instrumentation supplements business critical paths, Tail Sampling ensures error traces aren't lost, and the Collector handles batching and forwarding. In 2026, Jaeger for development debugging and Tempo for production storage is the optimal combination. Remember: a microservice system without distributed tracing is like an unmonitored black box—when things break, you can only guess.
Try these browser-local tools — no sign-up required →