K8s eBPF Observability: 5 Practical Patterns from Kernel Tracing to Full-Stack Monitoring
When Traditional Monitoring Hits the K8s Kernel Black Hole
Have you ever experienced this—Prometheus metrics look perfectly normal, but service latency inexplicably spikes? Your Sidecar proxy consumes 15% CPU but only tells you "connection timeout"? Logs are full of application-level errors, but you have zero visibility into what's happening at the kernel level?
This is the "triple blind spot" of K8s observability: Traditional monitoring only sees user space, completely blind to kernel-level events; Sidecar injection adds overhead, with Istio's data plane adding 2-5ms of latency; Distributed tracing's sampling rate means you'll never catch that critical 1% of requests.
eBPF changes everything. It lets you capture every detail of system calls directly in kernel space—without modifying the kernel, injecting Sidecars, or changing application code. From TCP retransmissions to process execution, from network packet drops to security events—eBPF gives K8s clusters true "full-stack X-ray vision."
This article walks you through 5 eBPF observability patterns from scratch, covering kernel tracing, network monitoring, security auditing, and performance analysis across the entire stack.
Core Concepts Reference Table
| Concept | Full Name | Description |
|---|---|---|
| eBPF | Extended Berkeley Packet Filter | Sandbox VM in the Linux kernel allowing safe execution of custom programs in kernel space |
| BPF Program | BPF Program | eBPF code written and loaded into the kernel, attached to specific hook points |
| BPF Map | BPF Mapping Table | Data sharing structure between kernel and user space, supporting hash/array/ring types |
| bpftrace | bpftrace | High-level eBPF tracing language with awk-like syntax, ideal for quick prototyping |
| Cilium | Cilium | eBPF-based K8s CNI plugin providing networking, security, and observability |
| Hubble | Hubble | Cilium's observability component providing network traffic visualization and service dependency mapping |
| Kprobe | Kernel Probe | Dynamic kernel probe that can attach to kernel function entry/exit points |
| Tracepoint | Tracepoint | Static kernel tracing points predefined by kernel developers, more stable than kprobes |
| XDP | eXpress Data Path | eBPF hook for processing network packets at the NIC driver level with ultra-low latency |
| BPF Verifier | BPF Verifier | Safety checker in the kernel ensuring eBPF programs cannot crash the kernel |
| BTF | BPF Type Format | eBPF type information format enabling CO-RE (Compile Once, Run Everywhere) |
| Perf Event | Performance Event | Linux performance event subsystem, an important attachment point for eBPF programs |
Five Challenges: Why K8s eBPF Observability Isn't "Just Install a Plugin"
Challenge 1: Kernel Version Compatibility Hell
eBPF features expand with each kernel version iteration. BPF trampoline requires 5.5+, BTF support needs 5.2+, yet many enterprise K8s nodes still run 4.19 or 5.4 kernels. Your carefully crafted eBPF program may fail to load on different nodes.
Challenge 2: BPF Verifier's Strict Restrictions
The BPF verifier rejects any program it cannot prove safe. Loops must be bounded, pointer accesses require null checks, and stack space is limited to 512 bytes. A slightly complex tracing logic may require repeated adjustments to pass verification.
Challenge 3: Production Environment Safety Concerns
eBPF programs run in kernel space. While the verifier provides safety guarantees, many security teams remain cautious about "running custom code in the kernel." Especially in finance and healthcare with strict compliance requirements, eBPF adoption requires rigorous security audits.
Challenge 4: Observability Data Explosion
eBPF can capture massive amounts of events from the kernel—every system call, every network packet, every context switch. In large K8s clusters, unfiltered eBPF data can generate millions of events per second, overwhelming storage and analysis systems.
Challenge 5: Multi-Cluster Correlation Tracing
When requests span multiple K8s clusters, kernel events captured by eBPF lack unified correlation identifiers. You can see TCP retransmissions in cluster A and DNS timeouts in cluster B, but correlating them to the same user request chain is extremely difficult.
Five-Step Implementation: From Kernel Tracing to Full-Stack Monitoring
Step 1: eBPF Program Basics—bpftrace One-Liners and C BPF Programs
bpftrace quick tracing:
# Trace all TCP connection establishment events
bpftrace -e 'kprobe:tcp_connect { printf("PID: %d, Comm: %s\n", pid, comm); }'
# Trace TCP retransmissions, count by process
bpftrace -e 'kprobe:tcp_retransmit_skb { @retrans[comm] = count(); }'
# Trace process execution (security audit)
bpftrace -e 'tracepoint:sched:sched_process_exec { printf("%s -> %s\n", comm, args->filename); }'
# Trace VFS read/write latency distribution
bpftrace -e 'kprobe:vfs_read { @start[tid] = nsecs; } kretprobe:vfs_read /@start[tid]/ { @ns = hist(nsecs - @start[tid]); delete(@start[tid]); }'
# Trace network connection state changes
bpftrace -e 'kprobe:tcp_set_state { printf("state: %d -> %d, pid: %d\n", arg1, arg2, pid); }'
C language eBPF program (TCP connection tracing):
// tcp_connect.bpf.c
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
struct tcp_connect_event {
u32 pid;
u32 saddr;
u32 daddr;
u16 dport;
char comm[16];
};
struct {
__uint(type, BPF_MAP_TYPE_RINGBUF);
__uint(max_entries, 256 * 1024);
} tcp_connect_events SEC(".maps");
SEC("kprobe/tcp_connect")
int BPF_KPROBE(trace_tcp_connect, struct sock *sk)
{
struct tcp_connect_event *event;
event = bpf_ringbuf_reserve(&tcp_connect_events, sizeof(*event), 0);
if (!event)
return 0;
event->pid = bpf_get_current_pid_tgid() >> 32;
event->saddr = BPF_CORE_READ(sk, __sk_common.skc_rcv_saddr);
event->daddr = BPF_CORE_READ(sk, __sk_common.skc_daddr);
event->dport = BPF_CORE_READ(sk, __sk_common.skc_dport);
bpf_get_current_comm(&event->comm, sizeof(event->comm));
bpf_ringbuf_submit(event, 0);
return 0;
}
char LICENSE[] SEC("license") = "GPL";
Step 2: Go-based eBPF Loader (cilium/ebpf library)
// main.go - eBPF TCP Connection Tracer
package main
import (
"bytes"
"encoding/binary"
"errors"
"fmt"
"log"
"net"
"os"
"os/signal"
"syscall"
"github.com/cilium/ebpf"
"github.com/cilium/ebpf/link"
"github.com/cilium/ebpf/ringbuf"
"github.com/cilium/ebpf/rlimit"
)
//go:generate go run github.com/cilium/ebpf/cmd/bpf2go -type tcp_connect_event bpf tcp_connect.bpf.c
type tcpConnectEvent struct {
Pid uint32
Saddr uint32
Daddr uint32
Dport uint16
Comm [16]byte
}
func main() {
if err := rlimit.RemoveMemlock(); err != nil {
log.Fatalf("Failed to remove memlock limit: %v", err)
}
objs := bpfObjects{}
if err := loadBpfObjects(&objs, nil); err != nil {
log.Fatalf("Failed to load eBPF objects: %v", err)
}
defer objs.Close()
kp, err := link.Kprobe("tcp_connect", objs.TraceTcpConnect, nil)
if err != nil {
log.Fatalf("Failed to attach kprobe: %v", err)
}
defer kp.Close()
rd, err := ringbuf.NewReader(objs.TcpConnectEvents)
if err != nil {
log.Fatalf("Failed to create ringbuf reader: %v", err)
}
defer rd.Close()
sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGINT, syscall.SIGTERM)
fmt.Println("TCP connection tracing started, press Ctrl+C to exit...")
fmt.Println("PID\tComm\t\tSrcAddr\t\tDstAddr")
go func() {
<-sig
fmt.Println("\nStopping tracing...")
rd.Close()
}()
for {
record, err := rd.Read()
if err != nil {
if errors.Is(err, ringbuf.ErrClosed) {
fmt.Println("Ringbuf closed")
return
}
log.Printf("Failed to read ringbuf: %v", err)
continue
}
var event tcpConnectEvent
if err := binary.Read(bytes.NewReader(record.RawSample), binary.LittleEndian, &event); err != nil {
log.Printf("Failed to parse event: %v", err)
continue
}
srcIP := net.IP(uint32ToBytes(event.Saddr))
dstIP := net.IP(uint32ToBytes(event.Daddr))
dstPort := binary.BigEndian.Uint16([]byte{byte(event.Dport >> 8), byte(event.Dport & 0xff)})
fmt.Printf("%d\t%s\t\t%s\t%s:%d\n",
event.Pid,
string(bytes.TrimRight(event.Comm[:], "\x00")),
srcIP,
dstIP,
dstPort,
)
}
}
func uint32ToBytes(v uint32) [4]byte {
var b [4]byte
binary.LittleEndian.PutUint32(b[:], v)
return b
}
Project go generate configuration:
// bpf_bpfel.go - Auto-generated by bpf2go (example structure)
// Code generated by bpf2go; DO NOT EDIT.
package main
import "github.com/cilium/ebpf"
type bpfTcpConnectEvent struct {
Pid uint32
Saddr uint32
Daddr uint32
Dport uint16
Comm [16]byte
}
type bpfPrograms struct {
TraceTcpConnect *ebpf.Program `ebpf:"trace_tcp_connect"`
}
type bpfMaps struct {
TcpConnectEvents *ebpf.Map `ebpf:"tcp_connect_events"`
}
type bpfObjects struct {
Programs bpfPrograms
Maps bpfMaps
}
func loadBpfObjects(obj *bpfObjects, opts *ebpf.CollectionOptions) error {
return errors.New("This file is generated by bpf2go, please run go generate")
}
Step 3: Cilium Hubble Network Observability Setup
# cilium-values.yaml - Helm values for Cilium + Hubble
kubeProxyReplacement: true
hubble:
enabled: true
listenAddress: ":4244"
relay:
enabled: true
ui:
enabled: true
metrics:
enabled:
- dns
- drop
- tcp
- flow
- icmp
- http
enableOpenMetrics: true
dashboards:
enabled: true
namespace: monitoring
operator:
replicas: 2
prometheus:
enabled: true
hostPort:
enabled: true
ipam:
mode: kubernetes
tunnel: vxlan
# Install Cilium with Hubble
helm repo add cilium https://helm.cilium.io/
helm repo update
helm install cilium cilium/cilium --version 1.17.0 \
--namespace kube-system \
-f cilium-values.yaml
# Enable Hubble
cilium hubble port-forward&
hubble observe --since 1m --output json
# View DNS queries
hubble observe --type l7-dns --since 5m
# View TCP connections
hubble observe --type tcp --verdict DROPPED --since 10m
# View traffic for a specific service
hubble observe --to-service my-app.default.svc.cluster.local --since 5m
# Export flow logs to file
hubble observe --output json --since 1h > hubble-flows.json
Hubble API Client (Go):
// hubble_client.go - Hubble Flow Monitoring Client
package main
import (
"context"
"fmt"
"log"
"os"
"os/signal"
"syscall"
"time"
"github.com/cilium/hubble/api/v1/flow"
"github.com/cilium/hubble/api/v1/observer"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
)
func main() {
conn, err := grpc.NewClient("localhost:4245",
grpc.WithTransportCredentials(insecure.NewCredentials()),
)
if err != nil {
log.Fatalf("Failed to connect to Hubble gRPC: %v", err)
}
defer conn.Close()
client := observer.NewObserverClient(conn)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
stream, err := client.GetFlows(ctx, &observer.GetFlowsRequest{
Whitelist: []*flow.FlowFilter{
{Verdict: []flow.Verdict{flow.Verdict_DROPPED}},
},
Since: time.Now().Add(-5 * time.Minute).Format(time.RFC3339),
Until: time.Now().Add(1 * time.Hour).Format(time.RFC3339),
Follow: true,
})
if err != nil {
log.Fatalf("Failed to subscribe to Hubble flows: %v", err)
}
sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGINT, syscall.SIGTERM)
fmt.Println("Monitoring dropped network traffic...")
fmt.Println("Time\t\tSource Pod\t\tDest Pod\t\tReason")
go func() {
<-sig
cancel()
}()
for {
resp, err := stream.Recv()
if err != nil {
log.Printf("Failed to receive flow data: %v", err)
return
}
if f := resp.GetFlow(); f != nil {
srcPod := f.GetSource().GetPodName()
dstPod := f.GetDestination().GetPodName()
reason := f.GetDropReasonDesc().String()
fmt.Printf("%s\t%s\t%s\t%s\n",
time.Now().Format("15:04:05"),
srcPod,
dstPod,
reason,
)
}
}
}
Step 4: Security Tracing—Process Execution Monitoring
// exec_monitor.bpf.c - Process Execution Security Monitor
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
#define MAX_COMM_LEN 16
#define MAX_ARGS_LEN 128
#define MAX_FILENAME_LEN 128
struct exec_event {
u32 pid;
u32 ppid;
u32 uid;
u32 gid;
char comm[MAX_COMM_LEN];
char filename[MAX_FILENAME_LEN];
char args[MAX_ARGS_LEN];
};
struct {
__uint(type, BPF_MAP_TYPE_RINGBUF);
__uint(max_entries, 256 * 1024);
} exec_events SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 1024);
__type(key, u32);
__type(value, struct exec_event);
} pending_execs SEC(".maps");
SEC("tracepoint/sched/sched_process_exec")
int trace_exec(struct trace_event_raw_sched_process_exec *ctx)
{
struct exec_event *event;
event = bpf_ringbuf_reserve(&exec_events, sizeof(*event), 0);
if (!event)
return 0;
event->pid = bpf_get_current_pid_tgid() >> 32;
event->uid = bpf_get_current_uid_gid() & 0xFFFFFFFF;
event->gid = bpf_get_current_uid_gid() >> 32;
bpf_get_current_comm(&event->comm, sizeof(event->comm));
bpf_probe_read_kernel_str(&event->filename, sizeof(event->filename), ctx->filename);
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
event->ppid = BPF_CORE_READ(task, real_parent, tgid);
bpf_ringbuf_submit(event, 0);
return 0;
}
SEC("tracepoint/sched/sched_process_exit")
int trace_exit(struct trace_event_raw_sched_process_template *ctx)
{
u32 pid = bpf_get_current_pid_tgid() >> 32;
bpf_map_delete_elem(&pending_execs, &pid);
return 0;
}
char LICENSE[] SEC("license") = "GPL";
Security Monitoring Policy Engine (Go):
// security_monitor.go - Process Execution Security Monitor
package main
import (
"bytes"
"encoding/binary"
"fmt"
"log"
"os"
"os/signal"
"strings"
"syscall"
"github.com/cilium/ebpf"
"github.com/cilium/ebpf/link"
"github.com/cilium/ebpf/ringbuf"
"github.com/cilium/ebpf/rlimit"
)
//go:generate go run github.com/cilium/ebpf/cmd/bpf2go -type exec_event bpf exec_monitor.bpf.c
type execEvent struct {
Pid uint32
Ppid uint32
Uid uint32
Gid uint32
Comm [16]byte
Filename [128]byte
Args [128]byte
}
type SecurityRule struct {
Name string
Description string
Check func(event execEvent) bool
}
var securityRules = []SecurityRule{
{
Name: "suspicious_shell",
Description: "Detect suspicious shell execution",
Check: func(e execEvent) bool {
comm := strings.TrimSpace(string(bytes.TrimRight(e.Comm[:], "\x00")))
return comm == "bash" || comm == "sh" || comm == "zsh"
},
},
{
Name: "privilege_escalation",
Description: "Detect potential privilege escalation",
Check: func(e execEvent) bool {
filename := strings.TrimSpace(string(bytes.TrimRight(e.Filename[:], "\x00")))
return strings.Contains(filename, "sudo") ||
strings.Contains(filename, "su") ||
strings.Contains(filename, "pkexec")
},
},
{
Name: "container_escape",
Description: "Detect container escape risk",
Check: func(e execEvent) bool {
filename := strings.TrimSpace(string(bytes.TrimRight(e.Filename[:], "\x00")))
return strings.Contains(filename, "nsenter") ||
strings.Contains(filename, "docker") ||
strings.Contains(filename, "crictl")
},
},
}
func main() {
if err := rlimit.RemoveMemlock(); err != nil {
log.Fatalf("Failed to remove memlock limit: %v", err)
}
objs := bpfObjects{}
if err := loadBpfObjects(&objs, nil); err != nil {
log.Fatalf("Failed to load eBPF objects: %v", err)
}
defer objs.Close()
tpExec, err := link.Tracepoint("sched", "sched_process_exec", objs.TraceExec, nil)
if err != nil {
log.Fatalf("Failed to attach exec tracepoint: %v", err)
}
defer tpExec.Close()
tpExit, err := link.Tracepoint("sched", "sched_process_exit", objs.TraceExit, nil)
if err != nil {
log.Fatalf("Failed to attach exit tracepoint: %v", err)
}
defer tpExit.Close()
rd, err := ringbuf.NewReader(objs.ExecEvents)
if err != nil {
log.Fatalf("Failed to create ringbuf reader: %v", err)
}
defer rd.Close()
sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGINT, syscall.SIGTERM)
fmt.Println("Security monitoring started...")
go func() {
<-sig
rd.Close()
}()
for {
record, err := rd.Read()
if err != nil {
if err == ringbuf.ErrClosed {
return
}
log.Printf("Failed to read event: %v", err)
continue
}
var event execEvent
if err := binary.Read(bytes.NewReader(record.RawSample), binary.LittleEndian, &event); err != nil {
log.Printf("Failed to parse event: %v", err)
continue
}
for _, rule := range securityRules {
if rule.Check(event) {
comm := string(bytes.TrimRight(event.Comm[:], "\x00"))
filename := string(bytes.TrimRight(event.Filename[:], "\x00"))
log.Printf("[ALERT] %s: PID=%d PPID=%d UID=%d Comm=%s File=%s",
rule.Name, event.Pid, event.Ppid, event.Uid, comm, filename)
}
}
}
}
Step 5: eBPF Performance Profiling—CPU Flame Graphs
# Generate CPU flame graph data using bpftrace
bpftrace -e 'profile:hz:99 /pid/ { @stacks[ustack, kstack] = count(); }' > profile.out
# Generate flame graph using BCC tools
profile -F 99 -a -p <pid> 60 > perf.out
flamegraph.pl perf.out > cpu_flame.svg
Go Performance Profiler:
// cpu_profiler.go - eBPF CPU Performance Profiler
package main
import (
"bytes"
"encoding/binary"
"fmt"
"log"
"os"
"os/signal"
"syscall"
"time"
"github.com/cilium/ebpf"
"github.com/cilium/ebpf/link"
"github.com/cilium/ebpf/perf"
"github.com/cilium/ebpf/rlimit"
)
//go:generate go run github.com/cilium/ebpf/cmd/bpf2go -type stack_event bpf cpu_profiler.bpf.c
type stackEvent struct {
Pid uint32
Tid uint32
KernelIp [10]uint64
UserIp [10]uint64
KstackLen uint32
UstackLen uint32
}
func main() {
if err := rlimit.RemoveMemlock(); err != nil {
log.Fatalf("Failed to remove memlock limit: %v", err)
}
objs := bpfObjects{}
if err := loadBpfObjects(&objs, nil); err != nil {
log.Fatalf("Failed to load eBPF objects: %v", err)
}
defer objs.Close()
lk, err := link.AttachPerfEvent(objs.DoProfile, -1, 0, -1)
if err != nil {
log.Fatalf("Failed to attach perf event: %v", err)
}
defer lk.Close()
rd, err := perf.NewReader(objs.ProfileEvents, os.Getpagesize()*64)
if err != nil {
log.Fatalf("Failed to create perf reader: %v", err)
}
defer rd.Close()
stackCounts := make(map[string]int)
sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGINT, syscall.SIGTERM)
ticker := time.NewTicker(30 * time.Second)
defer ticker.Stop()
fmt.Println("CPU profiling started, output every 30 seconds...")
go func() {
<-sig
rd.Close()
}()
for {
select {
case <-ticker.C:
fmt.Printf("\n=== CPU Profile at %s ===\n", time.Now().Format("15:04:05"))
for stack, count := range stackCounts {
if count > 10 {
fmt.Printf(" %s: %d samples\n", stack, count)
}
}
stackCounts = make(map[string]int)
default:
record, err := rd.Read()
if err != nil {
if err == perf.ErrClosed {
return
}
continue
}
if record.LostSamples != 0 {
log.Printf("Lost %d samples", record.LostSamples)
continue
}
var event stackEvent
if err := binary.Read(bytes.NewReader(record.RawSample), binary.LittleEndian, &event); err != nil {
continue
}
stackKey := fmt.Sprintf("pid=%d kstack=%d ustack=%d",
event.Pid, event.KstackLen, event.UstackLen)
stackCounts[stackKey]++
}
}
}
CPU Profiler eBPF C Program:
// cpu_profiler.bpf.c - CPU Performance Sampling
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
#define MAX_STACK_DEPTH 10
struct stack_event {
u32 pid;
u32 tid;
u64 kernel_ip[MAX_STACK_DEPTH];
u64 user_ip[MAX_STACK_DEPTH];
u32 kstack_len;
u32 ustack_len;
};
struct {
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
__uint(key_size, sizeof(u32));
__uint(value_size, sizeof(u32));
} profile_events SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_STACK_TRACE);
__uint(max_entries, 10000);
__uint(key_size, sizeof(u32));
__uint(value_size, MAX_STACK_DEPTH * sizeof(u64));
} stacks SEC(".maps");
SEC("perf_event")
int do_profile(struct bpf_perf_event_data *ctx)
{
struct stack_event *event;
event = bpf_ringbuf_reserve(&profile_events, sizeof(*event), 0);
if (!event)
return 0;
u64 pid_tgid = bpf_get_current_pid_tgid();
event->pid = pid_tgid >> 32;
event->tid = pid_tgid & 0xFFFFFFFF;
int kstack_id = bpf_get_stackid(ctx, &stacks, 0);
int ustack_id = bpf_get_stackid(ctx, &stacks, BPF_F_USER_STACK);
event->kstack_len = (kstack_id >= 0) ? MAX_STACK_DEPTH : 0;
event->ustack_len = (ustack_id >= 0) ? MAX_STACK_DEPTH : 0;
bpf_ringbuf_submit(event, 0);
return 0;
}
char LICENSE[] SEC("license") = "GPL";
Five Pitfall Guide
Pitfall 1: Loading eBPF Programs Without Removing memlock Limits
❌ Wrong approach:
// Load eBPF program directly without adjusting memlock
objs := bpfObjects{}
err := loadBpfObjects(&objs, nil)
// Error: failed to load eBPF objects: map create: operation not permitted
✅ Correct approach:
// Remove memlock limit first, then load eBPF program
if err := rlimit.RemoveMemlock(); err != nil {
log.Fatalf("Failed to remove memlock limit: %v", err)
}
objs := bpfObjects{}
if err := loadBpfObjects(&objs, nil); err != nil {
log.Fatalf("Failed to load eBPF objects: %v", err)
}
Pitfall 2: Using Infinite Loops in eBPF Programs
❌ Wrong approach:
// BPF verifier will reject infinite loops
SEC("kprobe/tcp_connect")
int trace_tcp(struct pt_regs *ctx) {
while (1) {
// Verifier error: back-edge in program
}
return 0;
}
✅ Correct approach:
// Use bounded loops, verifier needs to prove the loop terminates
SEC("kprobe/tcp_connect")
int trace_tcp(struct pt_regs *ctx) {
#pragma unroll
for (int i = 0; i < 10; i++) {
// Max 10 iterations, verifier can accept this
}
return 0;
}
Pitfall 3: Ignoring BTF Compatibility Causing CO-RE Failures
❌ Wrong approach:
# Run directly on target kernel without checking BTF support
./ebpf-program
# Error: CO-RE relocation failed: kernel does not support BTF
✅ Correct approach:
# Check kernel BTF support first
bpftool btf list
ls /sys/kernel/btf/vmlinux
# Add BTF compatibility check in Go code
// Check BTF compatibility
func checkBTFSupport() error {
if _, err := os.Stat("/sys/kernel/btf/vmlinux"); err != nil {
return fmt.Errorf("Kernel does not support BTF, upgrade to 5.2+ or install BTF file: %w", err)
}
return nil
}
Pitfall 4: Ring Buffer Not Properly Handled Causing Data Loss
❌ Wrong approach:
// Using an undersized ring buffer, data loss under high load
rd, err := ringbuf.NewReader(objs.Events) // Default size may be insufficient
// LostSamples events not handled
✅ Correct approach:
// Set a sufficiently large ring buffer in eBPF C code
// __uint(max_entries, 256 * 1024); // 256KB
// Handle data loss in Go code
record, err := rd.Read()
if err != nil {
if errors.Is(err, ringbuf.ErrClosed) {
return
}
log.Printf("Read failed: %v", err)
continue
}
// Note: ringbuf.NewReader doesn't report lost samples, but perf.NewReader does
Pitfall 5: Hubble Not Properly Configured Causing Invisible Traffic
❌ Wrong approach:
# Only enabled Hubble without configuring metrics and relay
hubble:
enabled: true
# Missing relay and metrics configuration
✅ Correct approach:
hubble:
enabled: true
listenAddress: ":4244"
relay:
enabled: true
rollOutPods: true
ui:
enabled: true
metrics:
enabled:
- dns
- drop
- tcp
- flow
- icmp
- http
enableOpenMetrics: true
networkPolicy:
enabled: true
Error Troubleshooting Reference Table
| Error Message | Cause | Solution |
|---|---|---|
failed to load eBPF objects: map create: operation not permitted |
memlock limit not removed | Call rlimit.RemoveMemlock() or set ulimit -l unlimited |
back-edge in program |
eBPF program contains infinite loop | Use #pragma unroll and bounded loops instead |
CO-RE relocation failed: kernel does not support BTF |
Kernel version too low or missing BTF | Upgrade to 5.2+ kernel, or install bpf-tools to generate BTF |
map create: read-only |
Insufficient eBPF Map permissions | Check CAP_BPF/CAP_SYS_ADMIN capabilities |
invalid argument: couldn't find kprobe target |
Kernel function doesn't exist | Use bpftool prog list to confirm available kprobe points |
ringbuf reserve failed |
Ring buffer is full | Increase ring buffer size, or reduce event frequency |
Hubble agent not ready |
Hubble not properly started | Check cilium status, confirm hubble-relay Pod is running |
connection refused:4245 |
Hubble gRPC port not exposed | Run cilium hubble port-forward |
BPF verifier: unreachable instruction |
Dead code or branches unverifiable by verifier | Simplify conditional logic, remove unreachable code |
failed to attach perf event: invalid argument |
Perf event parameters incorrect | Check CPU frequency and sampling rate parameters |
Three Advanced Optimization Techniques
Technique 1: eBPF Map Batch Operations to Reduce System Call Overhead
When interacting between user space and kernel space, per-entry Map operations generate many system calls. Using Batch operations processes multiple entries at once:
// Batch update eBPF Map
func batchUpdateMap(m *ebpf.Map, entries map[uint32]uint64) error {
keys := make([]uint32, 0, len(entries))
values := make([]uint64, 0, len(entries))
for k, v := range entries {
keys = append(keys, k)
values = append(values, v)
}
var batchSize = uint32(64)
var done uint32
for done < uint32(len(keys)) {
remaining := uint32(len(keys)) - done
if remaining < batchSize {
batchSize = remaining
}
batchKeys := keys[done : done+batchSize]
batchValues := values[done : done+batchSize]
err := m.UpdateBatch(batchKeys, batchValues, nil)
if err != nil {
return fmt.Errorf("batch update failed(offset=%d): %w", done, err)
}
done += batchSize
}
return nil
}
Technique 2: Tail Call-Based eBPF Program Chaining
When a single eBPF program's logic is too complex, use Tail Calls to split it into multiple sub-programs, bypassing verifier complexity limits:
// tail_call_chain.bpf.c - Tail Call Chaining
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#define MAX_TAIL_CALLS 4
struct {
__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
__uint(max_entries, MAX_TAIL_CALLS);
__type(key, __u32);
__type(value, __u32);
} tail_call_map SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_RINGBUF);
__uint(max_entries, 256 * 1024);
} events SEC(".maps");
struct event_data {
u32 phase;
u32 pid;
char comm[16];
};
SEC("kprobe/tcp_connect")
int phase0(struct pt_regs *ctx)
{
struct event_data *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
if (!e) return 0;
e->phase = 0;
e->pid = bpf_get_current_pid_tgid() >> 32;
bpf_get_current_comm(&e->comm, sizeof(e->comm));
bpf_ringbuf_submit(e, 0);
bpf_tail_call(ctx, &tail_call_map, 1);
return 0;
}
SEC("kprobe/tcp_connect")
int phase1(struct pt_regs *ctx)
{
struct event_data *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
if (!e) return 0;
e->phase = 1;
e->pid = bpf_get_current_pid_tgid() >> 32;
bpf_get_current_comm(&e->comm, sizeof(e->comm));
bpf_ringbuf_submit(e, 0);
bpf_tail_call(ctx, &tail_call_map, 2);
return 0;
}
SEC("kprobe/tcp_connect")
int phase2(struct pt_regs *ctx)
{
struct event_data *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
if (!e) return 0;
e->phase = 2;
e->pid = bpf_get_current_pid_tgid() >> 32;
bpf_get_current_comm(&e->comm, sizeof(e->comm));
bpf_ringbuf_submit(e, 0);
return 0;
}
char LICENSE[] SEC("license") = "GPL";
// Register Tail Call sub-programs
progArray := objs.TailCallMap
if err := progArray.Update(uint32(1), objs.Phase1.ProgramFD(), ebpf.UpdateAny); err != nil {
log.Fatalf("Failed to register tail call phase1: %v", err)
}
if err := progArray.Update(uint32(2), objs.Phase2.ProgramFD(), ebpf.UpdateAny); err != nil {
log.Fatalf("Failed to register tail call phase2: %v", err)
}
Technique 3: eBPF Event Aggregation and Sampling to Reduce Data Volume
In high-traffic scenarios, kernel-space aggregation and sampling dramatically reduce the number of events user space needs to process:
// aggregate.bpf.c - Kernel-Space Event Aggregation
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
struct flow_key {
u32 saddr;
u32 daddr;
u16 dport;
u8 protocol;
};
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 65536);
__type(key, struct flow_key);
__type(value, u64);
} flow_counter SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 65536);
__type(key, struct flow_key);
__type(value, u64);
} flow_latency SEC(".maps");
SEC("kprobe/tcp_sendmsg")
int count_sendmsg(struct pt_regs *ctx)
{
struct flow_key key = {};
struct sock *sk = (struct sock *)PT_REGS_PARM1(ctx);
key.saddr = BPF_CORE_READ(sk, __sk_common.skc_rcv_saddr);
key.daddr = BPF_CORE_READ(sk, __sk_common.skc_daddr);
key.dport = BPF_CORE_READ(sk, __sk_common.skc_dport);
key.protocol = IPPROTO_TCP;
u64 *count = bpf_map_lookup_elem(&flow_counter, &key);
if (count) {
__sync_fetch_and_add(count, 1);
} else {
u64 init = 1;
bpf_map_update_elem(&flow_counter, &key, &init, BPF_ANY);
}
return 0;
}
char LICENSE[] SEC("license") = "GPL";
// User-space periodic reading of aggregated data
func pollAggregatedMap(m *ebpf.Map, interval time.Duration) {
ticker := time.NewTicker(interval)
defer ticker.Stop()
for range ticker.C {
var key flowKey
var value uint64
iter := m.Iterate()
fmt.Printf("\n=== Flow Stats at %s ===\n", time.Now().Format("15:04:05"))
for iter.Next(&key, &value) {
if value > 100 {
srcIP := intToIP(key.Saddr)
dstIP := intToIP(key.Daddr)
fmt.Printf(" %s -> %s:%d: %d requests\n",
srcIP, dstIP, key.Dport, value)
}
}
if err := iter.Err(); err != nil {
log.Printf("Map iteration failed: %v", err)
}
}
}
Observability Solution Comparison Analysis
| Dimension | eBPF | Prometheus | OpenTelemetry | Istio | Datadog |
|---|---|---|---|---|---|
| Data Source | Kernel space | App/Exporter | App SDK | Sidecar proxy | Agent+SDK |
| Performance Overhead | Very low (<1%) | Low | Medium (SDK overhead) | Medium-High (Sidecar) | Medium |
| Code Intrusiveness | Zero | Needs Exporter | Needs SDK | Needs Sidecar | Needs Agent |
| Kernel Visibility | Complete | None | None | None | Partial |
| Network Visibility | L3-L7 | L7 metrics | L7 tracing | L4-L7 | L3-L7 |
| Security Auditing | Native support | Needs extra tools | Needs extra tools | Policy logs | Native support |
| Real-time | Microsecond | Second | Millisecond | Millisecond | Second |
| Learning Curve | Steep | Gentle | Medium | Medium | Gentle |
| Multi-Cluster Support | Needs custom build | Federation | Native | Multi-cluster Mesh | Native |
| Cost | Open source free | Open source free | Open source free | Open source free | Commercial paid |
| Use Case | Deep kernel tracing | Metrics monitoring | Distributed tracing | Service mesh | All-in-one monitoring |
Summary
eBPF is not a silver bullet for observability, but it is the only solution that fills the kernel-space monitoring gap. In a K8s observability stack, eBPF should serve as the lowest-level data source, complementing Prometheus metrics and OpenTelemetry traces—eBPF tells you "what happened in the kernel," Prometheus tells you "how the system is performing," and OpenTelemetry tells you "what the request experienced." The combination of all three is true full-stack observability.
Recommended Tools
- JSON Formatter - Format eBPF Map JSON output
- Base64 Encoder - Encode eBPF program configs and certificates
- Hash Calculator - Calculate eBPF program fingerprints and checksums
Try these browser-local tools — no sign-up required →