Rust Tokio Graceful Shutdown: 5 Signal Handling Patterns and Resource Cleanup Guide
Your Rust Service Violently Shut Down Again
During K8s rolling updates, the Pod receives SIGTERM and exits immediately. 1000 in-flight requests are dropped, database transactions roll back, message queue messages are lost. Users see 502 errors everywhere, alerts explode again. You added tokio::signal listening, only to find that at shutdown time, tokio tasks are still running, connections aren't drained, resources aren't released.
Rust Tokio graceful shutdown isn't just adding ctrlc::set_handler. How do you capture signals? How do you stop the runtime? How do you drain connections? How do you clean up resources? How do you persist state? Without understanding these, every deployment is a "mini incident."
This article starts from 5 signal handling patterns and guides you through the full signal capture → runtime shutdown → connection draining → resource cleanup → state persistence pipeline.
Tokio Graceful Shutdown Core Concepts
| Concept | Description |
|---|---|
| tokio::signal | Tokio async signal listening module, supports SIGINT/SIGTERM/SIGHUP |
| tokio::sync::broadcast | Broadcast channel for propagating shutdown signal to all workers |
| tokio::sync::watch | Single-value observation channel for notifying shutdown state changes |
| CancellationToken | Cooperative cancellation token from tokio_util |
| tokio::task::JoinSet | Task collection, can wait for all tasks to complete before shutdown |
| Graceful Shutdown | Stop accepting new requests first, wait for existing requests to complete, then exit |
| Connection Draining | After closing listener, wait for active connections to finish processing |
Graceful Shutdown Flow
1. Receive SIGTERM signal
2. Stop accepting new connections (close Listener)
3. Notify all workers to begin graceful shutdown
4. Wait for active requests to complete (Connection Draining)
5. Close database connection pool
6. Persist necessary state
7. Exit process (exit 0)
Problem Analysis: 5 Major Challenges in Tokio Graceful Shutdown
- Signal timing: Signals may arrive at any moment, must ensure no interruption during critical operations
- Task waiting and timeout: Waiting for all tasks may cause infinite wait, need grace period
- Connection draining: HTTP/gRPC connections need to stop accepting new requests first, then wait for existing ones
- Resource release order: DB connections, file handles, temp files must be released in correct order
- State persistence: In-memory state must be persisted to disk or database before shutdown
Step-by-Step: Complete Graceful Shutdown Implementation
Step 1: Basic Signal Handling
use tokio::signal;
use tokio::sync::broadcast;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let (shutdown_tx, _) = broadcast::channel::<()>(1);
tokio::spawn(handle_shutdown_signal(shutdown_tx.clone()));
let mut shutdown_rx = shutdown_tx.subscribe();
tokio::select! {
_ = run_server() => {
println!("Server exited normally");
}
_ = shutdown_rx.recv() => {
println!("Received shutdown signal");
}
}
Ok(())
}
async fn handle_shutdown_signal(shutdown_tx: broadcast::Sender<()>) {
let ctrl_c = async {
signal::ctrl_c()
.await
.expect("Failed to install Ctrl+C handler");
};
#[cfg(unix)]
let terminate = async {
signal::unix::signal(signal::unix::SignalKind::terminate())
.expect("Failed to install SIGTERM handler")
.recv()
.await;
};
#[cfg(not(unix))]
let terminate = std::future::pending::<()>();
tokio::select! {
_ = ctrl_c => {
println!("Received Ctrl+C");
}
_ = terminate => {
println!("Received SIGTERM");
}
}
let _ = shutdown_tx.send(());
}
async fn run_server() {
loop {
tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
}
}
Step 2: CancellationToken Pattern
use tokio_util::sync::CancellationToken;
use std::time::Duration;
struct AppServer {
cancellation_token: CancellationToken,
workers: Vec<tokio::task::JoinHandle<()>>,
}
impl AppServer {
fn new() -> Self {
Self {
cancellation_token: CancellationToken::new(),
workers: Vec::new(),
}
}
async fn start(&mut self) {
for id in 0..4 {
let token = self.cancellation_token.child_token();
let handle = tokio::spawn(async move {
worker_loop(id, token).await;
});
self.workers.push(handle);
}
}
async fn graceful_shutdown(self, timeout: Duration) {
self.cancellation_token.cancel();
let deadline = tokio::time::Instant::now() + timeout;
for handle in self.workers {
let remaining = deadline.saturating_duration_since(tokio::time::Instant::now());
match tokio::time::timeout(remaining, handle).await {
Ok(Ok(())) => {}
Ok(Err(e)) => {
eprintln!("Worker panicked: {}", e);
}
Err(_) => {
eprintln!("Worker did not shut down within timeout, aborting");
}
}
}
}
}
async fn worker_loop(id: usize, token: CancellationToken) {
loop {
tokio::select! {
_ = token.cancelled() => {
println!("Worker {} shutting down gracefully", id);
break;
}
_ = tokio::time::sleep(Duration::from_secs(1)) => {
println!("Worker {} processing task", id);
}
}
}
println!("Worker {} cleanup complete", id);
}
Step 3: HTTP Server Connection Draining
use axum::{Router, Server};
use axum::routing::get;
use std::net::SocketAddr;
use std::sync::Arc;
use tokio::sync::watch;
struct GracefulServer {
shutdown_tx: watch::Sender<bool>,
shutdown_rx: watch::Receiver<bool>,
}
impl GracefulServer {
fn new() -> Self {
let (shutdown_tx, shutdown_rx) = watch::channel(false);
Self {
shutdown_tx,
shutdown_rx,
}
}
async fn run(self: Arc<Self>, addr: SocketAddr) {
let app = Router::new()
.route("/health", get(|| async { "ok" }))
.route("/api/data", get(handle_data));
let server = Server::bind(&addr)
.serve(app.into_make_service());
let graceful = server.with_graceful_shutdown(async {
let mut rx = self.shutdown_rx.clone();
while !*rx.borrow_and_update() {
if rx.changed().await.is_err() {
break;
}
}
});
if let Err(e) = graceful.await {
eprintln!("Server error: {}", e);
}
}
fn trigger_shutdown(&self) {
let _ = self.shutdown_tx.send(true);
}
}
async fn handle_data() -> &'static str {
tokio::time::sleep(tokio::time::Duration::from_secs(2)).await;
"data response"
}
Step 4: Resource Cleanup Manager
use std::future::Future;
use std::time::Duration;
use tokio::sync::Mutex;
struct CleanupTask {
name: String,
cleanup_fn: Box<dyn FnOnce() -> Box<dyn Future<Output = Result<(), String>> + Send> + Send>,
}
pub struct ResourceManager {
cleanup_tasks: Mutex<Vec<CleanupTask>>,
shutdown_timeout: Duration,
}
impl ResourceManager {
pub fn new(shutdown_timeout: Duration) -> Self {
Self {
cleanup_tasks: Mutex::new(Vec::new()),
shutdown_timeout,
}
}
pub async fn register<F, Fut>(&self, name: impl Into<String>, cleanup_fn: F)
where
F: FnOnce() -> Fut + Send + 'static,
Fut: Future<Output = Result<(), String>> + Send + 'static,
{
let task = CleanupTask {
name: name.into(),
cleanup_fn: Box::new(move || Box::new(cleanup_fn()) as _),
};
self.cleanup_tasks.lock().await.push(task);
}
pub async fn cleanup_all(&self) {
let mut tasks = self.cleanup_tasks.lock().await;
let tasks = std::mem::take(&mut *tasks);
for task in tasks.into_iter().rev() {
let result = tokio::time::timeout(
self.shutdown_timeout,
(task.cleanup_fn)(),
).await;
match result {
Ok(Ok(())) => {
println!("✓ Cleaned up: {}", task.name);
}
Ok(Err(e)) => {
eprintln!("✗ Cleanup failed for {}: {}", task.name, e);
}
Err(_) => {
eprintln!("✗ Cleanup timed out for {}", task.name);
}
}
}
}
}
Step 5: Complete Graceful Shutdown Orchestration
use tokio::sync::broadcast;
use tokio_util::sync::CancellationToken;
use std::time::Duration;
pub struct GracefulShutdown {
shutdown_signal: broadcast::Sender<()>,
cancellation_token: CancellationToken,
resource_manager: ResourceManager,
grace_period: Duration,
}
impl GracefulShutdown {
pub fn new(grace_period: Duration) -> Self {
let (shutdown_signal, _) = broadcast::channel(1);
Self {
shutdown_signal,
cancellation_token: CancellationToken::new(),
resource_manager: ResourceManager::new(Duration::from_secs(5)),
grace_period,
}
}
pub fn shutdown_signal(&self) -> broadcast::Sender<()> {
self.shutdown_signal.clone()
}
pub fn cancellation_token(&self) -> CancellationToken {
self.cancellation_token.clone()
}
pub fn resource_manager(&self) -> &ResourceManager {
&self.resource_manager
}
pub async fn run(self) {
let mut rx = self.shutdown_signal.subscribe();
rx.recv().await.ok();
println!("🛑 Shutdown signal received, starting graceful shutdown...");
self.cancellation_token.cancel();
println!("⏳ Waiting up to {:?} for tasks to complete...", self.grace_period);
tokio::time::sleep(self.grace_period).await;
println!("🧹 Cleaning up resources...");
self.resource_manager.cleanup_all().await;
println!("✅ Graceful shutdown complete. Goodbye!");
}
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let shutdown = GracefulShutdown::new(Duration::from_secs(30));
let signal_tx = shutdown.shutdown_signal();
tokio::spawn(async move {
tokio::signal::ctrl_c().await.ok();
let _ = signal_tx.send(());
});
let token = shutdown.cancellation_token();
tokio::spawn(async move {
loop {
tokio::select! {
_ = token.cancelled() => {
println!("Background task shutting down");
break;
}
_ = tokio::time::sleep(Duration::from_secs(1)) => {
println!("Background task working...");
}
}
}
});
shutdown.run().await;
Ok(())
}
Pitfall Guide
Pitfall 1: Using std::sync::mpsc Instead of tokio::sync
// ❌ Wrong: using std channel in async context
let (tx, rx) = std::sync::mpsc::channel();
tokio::spawn(async move {
let msg = rx.recv().unwrap(); // Blocks current thread!
});
// ✅ Correct: use tokio async channel
let (tx, mut rx) = tokio::sync::mpsc::channel(100);
tokio::spawn(async move {
while let Some(msg) = rx.recv().await {
println!("Received: {}", msg);
}
});
Pitfall 2: Running Expensive Operations in Signal Handler
// ❌ Wrong: doing cleanup in signal callback
ctrlc::set_handler(|| {
std::thread::sleep(Duration::from_secs(5)); // Can't block in signal handler!
cleanup_database();
})?;
// ✅ Correct: signal only notifies, cleanup in main loop
let (tx, mut rx) = tokio::sync::mpsc::channel::<()>(1);
let tx_clone = tx.clone();
ctrlc::set_handler(move || {
let _ = tx_clone.try_send(());
})?;
rx.recv().await;
cleanup_database().await;
Pitfall 3: Not Waiting on JoinHandle, Tasks Killed
// ❌ Wrong: not waiting after spawn, main exits and tasks are killed
tokio::spawn(async {
long_running_task().await;
});
// main exits, task is killed
// ✅ Correct: collect JoinHandles and wait for completion
let mut tasks = tokio::task::JoinSet::new();
for i in 0..4 {
tasks.spawn(async move { worker(i).await });
}
while let Some(result) = tasks.join_next().await {
result?;
}
Pitfall 4: CancellationToken Not Propagated to Child Tasks
// ❌ Wrong: child task doesn't receive cancellation signal
let token = CancellationToken::new();
tokio::spawn(async {
loop {
do_work().await; // Never stops
}
});
token.cancel();
// ✅ Correct: pass child_token to child task
let token = CancellationToken::new();
let child_token = token.child_token();
tokio::spawn(async move {
loop {
tokio::select! {
_ = child_token.cancelled() => break,
_ = do_work() => {}
}
}
});
token.cancel();
Pitfall 5: Unreasonable Grace Period
// ❌ Wrong: grace period too short, requests killed before completion
let grace_period = Duration::from_millis(100); // Too short!
// ✅ Correct: set reasonable grace period based on P99 latency
let grace_period = Duration::from_secs(30); // Greater than P99 latency
// K8s terminationGracePeriodSeconds should be greater than grace_period
Error Troubleshooting
| # | Error Message | Cause | Solution |
|---|---|---|---|
| 1 | task 42 was cancelled |
Task cancelled during execution | Check CancellationToken or timeout, confirm if expected |
| 2 | channel closed |
Sender dropped, receiver still waiting | Ensure sender notifies receiver before closing |
| 3 | JoinError::Panic(...) |
Task panicked internally | Check task logic, add error handling |
| 4 | deadline has elapsed |
tokio::time::timeout expired | Increase timeout or optimize processing speed |
| 5 | connection reset by peer |
Client reset connection during server shutdown | Implement connection draining, stop new connections first |
| 6 | Address in use |
Port not released after shutdown | Ensure Listener is properly dropped, set SO_REUSEADDR |
| 7 | broken pipe |
Writing to closed connection | Check connection state before writing |
| 8 | database connection closed |
DB connection pool released during shutdown | Adjust resource release order, close DB connections last |
| 9 | runtime dropped |
Tokio runtime dropped before tasks complete | Ensure runtime lifetime covers all tasks |
| 10 | signal handler already registered |
Signal handler registered multiple times | Use once_cell or lazy_static to register once |
Advanced Optimization
1. Health Check Aware Shutdown
use axum::{Json, extract::State};
use serde_json::{json, Value};
use std::sync::Arc;
use tokio::sync::watch;
#[derive(Clone)]
struct AppState {
shutting_down: watch::Receiver<bool>,
}
async fn health_check(State(state): State<Arc<AppState>>) -> Json<Value> {
if *state.shutting_down.borrow() {
Json(json!({
"status": "draining",
"ready": false
}))
} else {
Json(json!({
"status": "healthy",
"ready": true
}))
}
}
async fn readiness_check(State(state): State<Arc<AppState>>) -> &'static str {
if *state.shutting_down.borrow() {
"Service Unavailable"
} else {
"OK"
}
}
2. Connection Count Draining
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::Arc;
use tokio::sync::Notify;
#[derive(Clone)]
pub struct ConnectionTracker {
active_connections: Arc<AtomicU64>,
drain_notify: Arc<Notify>,
}
impl ConnectionTracker {
pub fn new() -> Self {
Self {
active_connections: Arc::new(AtomicU64::new(0)),
drain_notify: Arc::new(Notify::new()),
}
}
pub fn connection_enter(&self) -> ConnectionGuard {
self.active_connections.fetch_add(1, Ordering::Relaxed);
ConnectionGuard {
tracker: self.clone(),
}
}
pub fn connection_exit(&self) {
let prev = self.active_connections.fetch_sub(1, Ordering::Relaxed);
if prev == 1 {
self.drain_notify.notify_waiters();
}
}
pub async fn drain(&self, timeout: std::time::Duration) {
let deadline = tokio::time::Instant::now() + timeout;
while self.active_connections.load(Ordering::Relaxed) > 0 {
let remaining = deadline.saturating_duration_since(tokio::time::Instant::now());
if remaining.is_zero() {
let remaining = self.active_connections.load(Ordering::Relaxed);
eprintln!("Drain timeout, {} connections still active", remaining);
break;
}
tokio::select! {
_ = self.drain_notify.notified() => {}
_ = tokio::time::sleep(remaining) => break,
}
}
}
}
pub struct ConnectionGuard {
tracker: ConnectionTracker,
}
impl Drop for ConnectionGuard {
fn drop(&mut self) {
self.tracker.connection_exit();
}
}
3. State Persistence Hook
use std::future::Future;
use std::pin::Pin;
use std::sync::Arc;
use tokio::sync::Mutex;
type PersistFn = Box<dyn Fn() -> Pin<Box<dyn Future<Output = Result<(), String>> + Send>> + Send + Sync>;
pub struct StatePersistence {
hooks: Arc<Mutex<Vec<(String, PersistFn)>>>,
}
impl StatePersistence {
pub fn new() -> Self {
Self {
hooks: Arc::new(Mutex::new(Vec::new())),
}
}
pub async fn register<F, Fut>(&self, name: impl Into<String>, persist_fn: F)
where
F: Fn() -> Fut + Send + Sync + 'static,
Fut: Future<Output = Result<(), String>> + Send + 'static,
{
let name = name.into();
let hook: PersistFn = Box::new(move || Box::pin(persist_fn()));
self.hooks.lock().await.push((name, hook));
}
pub async fn persist_all(&self, timeout: std::time::Duration) -> Vec<(String, Result<(), String>)> {
let hooks = self.hooks.lock().await;
let mut results = Vec::new();
for (name, hook) in hooks.iter() {
let result = tokio::time::timeout(timeout, hook()).await;
match result {
Ok(Ok(())) => {
println!("✓ State persisted: {}", name);
results.push((name.clone(), Ok(())));
}
Ok(Err(e)) => {
eprintln!("✗ State persist failed for {}: {}", name, e);
results.push((name.clone(), Err(e)));
}
Err(_) => {
let err = format!("State persist timed out for {}", name);
eprintln!("✗ {}", err);
results.push((name.clone(), Err(err)));
}
}
}
results
}
}
Comparison Analysis
| Dimension | broadcast channel | watch channel | CancellationToken | JoinSet | mpsc channel |
|---|---|---|---|---|---|
| Propagation | One-to-many broadcast | One-to-many observe | Tree cascade | Task collection | One-to-one/many |
| Cancel Semantic | Notification | State | Cooperative | Wait | Message |
| Multi-consumer | ✅ All receive | ✅ All observe | ✅ Child token cascade | ❌ Single waiter | ❌ Competing |
| Backpressure | ❌ Slow consumer drops | ✅ Latest value only | N/A | N/A | ✅ Buffered |
| Resource Overhead | Low | Very low | Low | Medium | Medium |
| Use Case | Global shutdown | State change notify | Task cooperative cancel | Wait for completion | Command-style shutdown |
Summary: The core of Rust Tokio graceful shutdown is "cooperative" — unlike C's signal handler that can asynchronously interrupt, Tokio's shutdown requires each task to actively cooperate. Among the 5 patterns, CancellationToken is the most versatile: it supports tree-cascaded cancellation, child_token auto-cancels with parent, and cancellation is idempotent. Production recommendation: CancellationToken (task cancellation) + watch channel (state notification) + JoinSet (wait for completion) + ConnectionTracker (connection draining) + StatePersistence (state persistence). Remember the K8s rule:
terminationGracePeriodSecondsmust be greater than your grace period + resource cleanup time.
Recommended Online Tools
- JSON Formatter: /en/json/format
- Base64 Encode/Decode: /en/encode/base64
- Hash Calculator: /en/encode/hash
- JWT Decode: /en/encode/jwt-decode
Try these browser-local tools — no sign-up required →