PostgreSQL連線池最佳化實戰:從PgBouncer到Pgpool-II的6種生產模式
FATAL: sorry, too many connections——凌晨3點的報警簡訊
凌晨3點,你的手機被報警簡訊轟炸:FATAL: sorry, too many connections already。50個微服務實例,每個實例20個連線,PostgreSQL的max_connections只有200。連線風暴(Connection Storm)讓資料庫直接拒絕新連線,整個系統雪崩。2026年,PostgreSQL連線池最佳化 是高併發系統的生命線——從PgBouncer的三種池化模式到Pgpool-II的負載均衡,從Odyssey的高效能架構到連線數容量規劃,選對方案能讓單機支撐10萬併發。
本文將從PostgreSQL連線模型出發,帶你完成6種生產級連線池模式,從設定到監控,從避坑到進階最佳化,全鏈路實戰。
PostgreSQL連線池核心概念
| 概念 | 說明 |
|---|---|
| 連線池(Connection Pool) | 重用資料庫連線,避免頻繁建立/銷毀連線的開銷 |
| PgBouncer | 輕量級PostgreSQL連線池,支援Session/Transaction/Statement三種模式 |
| 交易級池化(Transaction Pooling) | 交易結束後釋放連線回池,最大重用率 |
| 連線階段池化(Session Pooling) | 客戶端斷開才釋放連線,相容性最好 |
| 陳述式級池化(Statement Pooling) | 每條SQL後釋放連線,重用率最高但相容性最差 |
| Pgpool-II | PostgreSQL中介軟體,支援連線池、負載均衡、讀寫分離 |
| Odyssey | Yandex開發的高效能PostgreSQL連線池,支援多資料庫多使用者路由 |
| max_connections | PostgreSQL最大連線數,每個連線是一個OS處理程序,約10MB記憶體 |
PostgreSQL處理程序模型與連線開銷
PostgreSQL Process-Per-Connection 模型:
┌─────────────┐ ┌──────────────┐
│ 應用程式 │────→│ Postmaster │ (主處理程序,監聽埠)
└─────────────┘ └──────┬───────┘
│ fork()
┌──────▼───────┐
│ Backend │ (每個連線一個處理程序)
│ Process │ ≈ 10MB 記憶體
│ (PG_CONN_1) │ ≈ 1個OS處理程序
└──────────────┘
┌──────────────┐
│ Backend │
│ Process │ 1000連線 ≈ 10GB記憶體
│ (PG_CONN_2) │ 上下文切換開銷巨大
└──────────────┘
...N個
連線池介入後:
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ App (1000) │────→│ Pooler (20) │────→│ PostgreSQL │
│ 連線 │ │ 重用連線 │ │ 真實連線 │
└─────────────┘ └──────────────┘ └──────────────┘
問題分析:PostgreSQL連線池最佳化的5大挑戰
- 連線風暴:微服務啟動/重啟時大量實例同時建連,瞬間打滿
max_connections,資料庫拒絕所有新連線 - 連線洩漏:應用未正確釋放連線,閒置連線堆積,池耗盡後新請求阻塞
- 模式選擇困難:PgBouncer三種模式各有取捨,Session模式浪費連線,Transaction模式不相容SET/PREPARE,Statement模式相容性最差
- 容量規劃缺失:沒有公式計算連線數上限,憑經驗設定導致資源浪費或連線不足
- 監控盲區:缺少連線池指標監控,連線耗盡、排隊超時等問題發現太晚
分步實操:6種生產級連線池模式
模式1:PgBouncer Session模式設定
Session模式最接近原生PostgreSQL行為,相容性最好,適合需要連線階段狀態(SET變數、PREPARE陳述式、臨時表)的場景。
; /etc/pgbouncer/pgbouncer.ini — Session模式設定
[databases]
; 生產資料庫連線
myapp_db = host=10.0.1.100 port=5432 dbname=myapp_production
; 唯讀副本
myapp_readonly = host=10.0.1.101 port=5432 dbname=myapp_production
[pgbouncer]
; 監聽地址和埠
listen_addr = 0.0.0.0
listen_port = 6432
; Session模式:客戶端斷開才釋放連線
pool_mode = session
; 池大小:每個資料庫/使用者對的最大服務端連線數
; Session模式下,此值必須 >= 客戶端併發數
default_pool_size = 50
; 最小池大小,預熱連線
min_pool_size = 10
; 預留連線數,供管理員緊急連線
reserve_pool_size = 5
reserve_pool_timeout = 3
; 閒置連線逾時(秒),回收長時間閒置的服務端連線
idle_timeout = 600
; 服務端連線最大存活時間(秒),防止長連線問題
server_lifetime = 3600
; 客戶端閒置逾時(秒)
client_idle_timeout = 0
; 服務端登入重試逾時(秒)
server_login_retry = 3
; 連線等待逾時(秒),池滿時客戶端等待時間
query_timeout = 30
query_wait_timeout = 30
; 認證方式
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
; 管理介面
admin_users = pgadmin
stats_users = pgmonitor
; 日誌
log_connections = 0
log_disconnections = 0
log_pooler_errors = 1
verbose = 0
# 建立認證檔案
echo '"myapp_user" "SCRAM-SHA-256$4096:...hash..."' > /etc/pgbouncer/userlist.txt
# 啟動PgBouncer
pgbouncer -d /etc/pgbouncer/pgbouncer.ini
# 驗證Session模式
psql -h 127.0.0.1 -p 6432 -U myapp_user myapp_db -c "SHOW pool_mode;"
# Python應用連線PgBouncer(Session模式)
import psycopg2
from contextlib import contextmanager
POOL_CONFIG = {
"host": "10.0.1.50",
"port": 6432,
"database": "myapp_db",
"user": "myapp_user",
"password": "secure_password",
"connect_timeout": 5,
"options": "-c search_path=public"
}
@contextmanager
def get_connection():
conn = psycopg2.connect(**POOL_CONFIG)
try:
yield conn
conn.commit()
except Exception:
conn.rollback()
raise
finally:
conn.close()
with get_connection() as conn:
cur = conn.cursor()
cur.execute("SET LOCAL work_mem = '256MB'")
cur.execute("SELECT * FROM orders WHERE user_id = %s", (12345,))
print(cur.fetchall())
模式2:PgBouncer Transaction模式——高併發首選
Transaction模式在交易結束後立即釋放連線回池,是高併發場景的最佳選擇。1000個客戶端可能只需要20個真實資料庫連線。
; /etc/pgbouncer/pgbouncer.ini — Transaction模式設定
[databases]
myapp_db = host=10.0.1.100 port=5432 dbname=myapp_production
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
; Transaction模式:交易結束即釋放連線
pool_mode = transaction
; 池大小:Transaction模式下可以遠小於客戶端數
; 經驗公式:pool_size = ((核心數 * 2) + 有效磁碟數)
; 16核 + SSD → 約 40-50 連線
default_pool_size = 40
min_pool_size = 10
reserve_pool_size = 5
reserve_pool_timeout = 3
; 交易內閒置逾時(秒),防止交易內長時間不操作
idle_transaction_timeout = 60
; 閒置連線逾時
idle_timeout = 600
; 服務端連線最大存活時間
server_lifetime = 1800
; 服務端檢查間隔(秒),檢測服務端連線是否存活
server_check_query = SELECT 1
server_check_delay = 30
; 關鍵:交易模式下的等待逾時
query_timeout = 30
query_wait_timeout = 60
; 客戶端連線上限
max_client_conn = 2000
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
admin_users = pgadmin
stats_users = pgmonitor
# Transaction模式下的最佳實踐:顯式交易控制
import psycopg2
from psycopg2 import pool
threaded_pool = pool.ThreadedConnectionPool(
minconn=5,
maxconn=20,
host="10.0.1.50",
port=6432,
database="myapp_db",
user="myapp_user",
password="secure_password"
)
def transfer_funds(from_id, to_id, amount):
conn = threaded_pool.getconn()
try:
cur = conn.cursor()
cur.execute("BEGIN")
cur.execute(
"UPDATE accounts SET balance = balance - %s WHERE id = %s",
(amount, from_id)
)
cur.execute(
"UPDATE accounts SET balance = balance + %s WHERE id = %s",
(amount, to_id)
)
cur.execute("COMMIT")
except Exception as e:
cur.execute("ROLLBACK")
raise e
finally:
threaded_pool.putconn(conn)
def get_user_orders(user_id):
conn = threaded_pool.getconn()
try:
cur = conn.cursor()
cur.execute(
"SELECT id, status, total FROM orders WHERE user_id = %s",
(user_id,)
)
return cur.fetchall()
finally:
threaded_pool.putconn(conn)
-- Transaction模式下不能使用的功能清單
-- ❌ SET陳述式(交易結束會被重設)
SET work_mem = '256MB'; -- 交易結束遺失
-- ❌ PREPARE陳述式
PREPARE get_user AS SELECT * FROM users WHERE id = $1; -- 交易結束遺失
-- ❌ 臨時表
CREATE TEMP TABLE tmp_cache AS SELECT * FROM products; -- 交易結束遺失
-- ❌ 通知/監聽
LISTEN channel_name; -- 無法跨交易接收通知
-- ✅ 正確做法:在交易內使用SET LOCAL
BEGIN;
SET LOCAL work_mem = '256MB';
SELECT * FROM large_table ORDER BY data;
COMMIT; -- work_mem只在此交易內生效
模式3:Pgpool-II——負載均衡與讀寫分離
Pgpool-II不僅是連線池,更是PostgreSQL中介軟體,支援讀寫分離、負載均衡、自動故障切換。
; /etc/pgpool-II/pgpool.conf — Pgpool-II設定
; 監聽設定
listen_addresses = '*'
port = 9999
unix_socket_directories = '/var/run/pgpool'
; 後端節點設定
backend_hostname0 = '10.0.1.100'
backend_port0 = 5432
backend_weight0 = 1
backend_flag0 = 'ALLOW_TO_FAILBACK'
backend_application_name0 = 'primary'
backend_hostname1 = '10.0.1.101'
backend_port1 = 5432
backend_weight1 = 1
backend_flag1 = 'ALLOW_TO_FAILBACK'
backend_application_name1 = 'replica1'
backend_hostname2 = '10.0.1.102'
backend_port2 = 5432
backend_weight2 = 1
backend_flag2 = 'ALLOW_TO_FAILBACK'
backend_application_name2 = 'replica2'
; 連線池設定
connection_cache = on
num_init_children = 50
max_pool = 4
child_life_time = 300
connection_life_time = 0
child_max_connections = 0
; 負載均衡
load_balance_mode = on
; SELECT查詢自動分發到副本
; 寫操作自動路由到主節點
; 讀寫分離
; 預設:主節點處理寫,副本處理讀
; 可透過hint精確控制
; 健康檢查
health_check_period = 10
health_check_timeout = 5
health_check_user = 'pgpool_health'
health_check_password = 'health_check_pass'
health_check_max_retries = 3
; 故障切換
failover_command = '/etc/pgpool-II/failover.sh %d %h %p %D %H %P %m %M'
failback_command = '/etc/pgpool-II/failback.sh %d %h %p %D %H %P'
follow_master_command = '/etc/pgpool-II/follow_master.sh %d %h %p %D %H %P'
; 自動故障回切
auto_failback = off
failover_on_backend_error = off
; 日誌
log_connections = on
log_disconnections = on
log_statement = off
log_per_node_statement = off
# Pgpool-II讀寫分離驗證
# 寫操作 → 主節點
psql -h 127.0.0.1 -p 9999 -U myapp_user myapp_production -c \
"INSERT INTO orders (user_id, status) VALUES (1, 'pending');"
# 讀操作 → 自動負載均衡到副本
psql -h 127.0.0.1 -p 9999 -U myapp_user myapp_production -c \
"SELECT * FROM orders WHERE user_id = 1;"
# 使用hint強制路由到主節點
psql -h 127.0.0.1 -p 9999 -U myapp_user myapp_production -c \
"/*NO_LOAD_BALANCE*/ SELECT * FROM orders WHERE user_id = 1;"
# 檢視節點狀態
psql -h 127.0.0.1 -p 9999 -U pgpool_admin -c "SHOW pool_nodes;"
# Python應用使用Pgpool-II
import psycopg2
def get_pgpool_connection(read_only=False):
conn = psycopg2.connect(
host="10.0.1.50",
port=9999,
database="myapp_production",
user="myapp_user",
password="secure_password"
)
if read_only:
conn.set_session(readonly=True, autocommit=True)
return conn
# 寫操作
write_conn = get_pgpool_connection(read_only=False)
cur = write_conn.cursor()
cur.execute("UPDATE accounts SET balance = balance - 100 WHERE id = 1")
write_conn.commit()
write_conn.close()
# 讀操作(自動負載均衡)
read_conn = get_pgpool_connection(read_only=True)
cur = read_conn.cursor()
cur.execute("SELECT * FROM accounts WHERE id = 1")
print(cur.fetchall())
read_conn.close()
模式4:Odyssey——高效能連線池
Odyssey由Yandex開發,專為高吞吐場景設計,支援多資料庫多使用者路由,比PgBouncer效能更高。
; /etc/odyssey/odyssey.conf — Odyssey設定
storage myapp_production {
type remote
host 10.0.1.100
port 5432
}
storage myapp_replica {
type remote
host 10.0.1.101
port 5432
}
database default {
storage default
pool default {
pool_mode session
pool_size 0
}
}
database myapp_db {
storage myapp_production
pool myapp_pool {
pool_mode transaction
pool_size 50
pool_timeout 5000
; 閒置逾時(毫秒)
idle_timeout 600000
; 服務端連線生命週期
server_lifetime 1800000
; 自動預備陳述式支援
discard_idle 600000
}
}
database myapp_readonly {
storage myapp_replica
pool readonly_pool {
pool_mode transaction
pool_size 30
pool_timeout 5000
}
}
; 使用者路由規則
user myapp_user {
authentication scramsha256
password "SCRAM-SHA-256$4096:...hash..."
; 預設路由到主庫
database default {
pool myapp_pool
}
; 唯讀路由到副本
database myapp_readonly {
pool readonly_pool
}
}
; 監聽設定
listen {
host 0.0.0.0
port 6432
}
; 日誌
log_format "%p %t %l [%c %s] %m\n"
log_debug 0
log_config 0
; 統計
stats_interval 60
# 啟動Odyssey
odyssey /etc/odyssey/odyssey.conf
# 連線測試
psql -h 127.0.0.1 -p 6432 -U myapp_user myapp_db -c "SELECT version();"
# 唯讀連線
psql -h 127.0.0.1 -p 6432 -U myapp_user myapp_readonly -c "SELECT pg_is_in_recovery();"
# Python應用使用Odyssey
import psycopg2
from psycopg2 import pool
class OdysseyConnectionManager:
def __init__(self):
self.write_pool = pool.ThreadedConnectionPool(
minconn=2, maxconn=10,
host="10.0.1.50", port=6432,
database="myapp_db",
user="myapp_user", password="secure_password"
)
self.read_pool = pool.ThreadedConnectionPool(
minconn=2, maxconn=20,
host="10.0.1.50", port=6432,
database="myapp_readonly",
user="myapp_user", password="secure_password"
)
def get_write_conn(self):
return self.write_pool.getconn()
def get_read_conn(self):
return self.read_pool.getconn()
def put_write_conn(self, conn):
self.write_pool.putconn(conn)
def put_read_conn(self, conn):
self.read_pool.putconn(conn)
manager = OdysseyConnectionManager()
# 寫操作
conn = manager.get_write_conn()
try:
cur = conn.cursor()
cur.execute("INSERT INTO events (type, data) VALUES (%s, %s)", ("click", "{}"))
conn.commit()
finally:
manager.put_write_conn(conn)
# 讀操作
conn = manager.get_read_conn()
try:
cur = conn.cursor()
cur.execute("SELECT COUNT(*) FROM events")
print(cur.fetchone())
finally:
manager.put_read_conn(conn)
模式5:連線數容量規劃公式
容量規劃是連線池最佳化的基礎,沒有公式就是盲人摸象。
連線數容量規劃公式:
1. PostgreSQL最大連線數規劃:
max_connections = (CPU核心數 × 2) + 有效磁碟數
範例:16核CPU + SSD(等效4個磁碟)
max_connections = (16 × 2) + 4 = 36
生產建議:max_connections = 100(留餘量),透過連線池控制真實連線
2. PgBouncer連線池大小:
pool_size = CPU核心數 × 2 + 有效磁碟數
範例:16核 + SSD
pool_size = 16 × 2 + 4 = 36 → 取40
3. 應用側連線池大小(每個實例):
app_pool_size = pool_size / 實例數
範例:40個總連線 / 10個實例 = 4個/實例
4. 客戶端最大連線數:
max_client_conn = 實例數 × app_pool_size × 1.5
範例:10 × 4 × 1.5 = 60
5. 記憶體估算:
PostgreSQL總記憶體 ≈ max_connections × 10MB + shared_buffers
範例:100 × 10MB + 4GB = 5GB
-- 檢視目前連線數分佈
SELECT
datname AS database,
usename AS username,
state,
COUNT(*) AS connection_count,
AVG(EXTRACT(EPOCH FROM (now() - query_start))) AS avg_query_duration_sec
FROM pg_stat_activity
WHERE datname IS NOT NULL
GROUP BY datname, usename, state
ORDER BY connection_count DESC;
-- 檢視連線數歷史趨勢(需要pg_stat_statements)
SELECT
EXTRACT(HOUR FROM query_start) AS hour,
COUNT(*) AS query_count
FROM pg_stat_activity
WHERE query_start > now() - interval '24 hours'
GROUP BY hour
ORDER BY hour;
-- 檢查連線設定
SHOW max_connections;
SHOW superuser_reserved_connections;
-- 計算活躍連線佔比
SELECT
state,
COUNT(*) AS count,
ROUND(COUNT(*)::numeric / (SELECT COUNT(*) FROM pg_stat_activity WHERE datname IS NOT NULL) * 100, 2) AS percentage
FROM pg_stat_activity
WHERE datname IS NOT NULL
GROUP BY state
ORDER BY count DESC;
# 連線數容量規劃工具
def calculate_pool_config(cpu_cores, disk_type, instance_count, workload="mixed"):
effective_spindles = {
"hdd_1": 1, "hdd_4": 4, "hdd_8": 8,
"ssd": 4, "nvme": 8
}.get(disk_type, 4)
pg_max_connections = (cpu_cores * 2) + effective_spindles
pg_max_connections = max(pg_max_connections, 20)
pool_size = (cpu_cores * 2) + effective_spindles
if workload == "read_heavy":
pool_size = int(pool_size * 1.2)
elif workload == "write_heavy":
pool_size = int(pool_size * 0.8)
app_pool_per_instance = max(pool_size // instance_count, 2)
max_client_conn = int(instance_count * app_pool_per_instance * 1.5)
shared_buffers_gb = max(cpu_cores * 0.5, 2)
total_memory_gb = pg_max_connections * 0.01 + shared_buffers_gb
return {
"cpu_cores": cpu_cores,
"disk_type": disk_type,
"instance_count": instance_count,
"workload": workload,
"pg_max_connections": min(pg_max_connections + 20, 200),
"pgbouncer_pool_size": pool_size,
"pgbouncer_min_pool_size": max(pool_size // 4, 5),
"app_pool_per_instance": app_pool_per_instance,
"max_client_conn": max_client_conn,
"shared_buffers_gb": shared_buffers_gb,
"estimated_total_memory_gb": round(total_memory_gb, 1),
"pool_mode": "transaction" if workload in ("read_heavy", "mixed") else "session"
}
config = calculate_pool_config(
cpu_cores=16,
disk_type="ssd",
instance_count=10,
workload="mixed"
)
for key, value in config.items():
print(f"{key}: {value}")
模式6:生產監控與告警設定
沒有監控的連線池就是定時炸彈,完善的監控和告警是生產部署的底線。
# prometheus-pgbouncer-exporter 設定
# /etc/prometheus/pgbouncer_exporter.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'pgbouncer'
static_configs:
- targets: ['10.0.1.50:9127']
metrics_path: /metrics
params:
pgbouncer_addr: ['10.0.1.50:6432']
# Prometheus告警規則
# /etc/prometheus/rules/pgbouncer_alerts.yml
groups:
- name: pgbouncer_alerts
rules:
- alert: PgbouncerPoolExhausted
expr: pgbouncer_pools_server_active >= pgbouncer_pools_server_max
for: 2m
labels:
severity: critical
annotations:
summary: "PgBouncer連線池耗盡"
description: "資料庫 {{ $labels.database }} 的連線池已滿,活躍連線數達到上限"
- alert: PgbouncerClientWaitQueue
expr: pgbouncer_pools_client_active > pgbouncer_pools_server_active * 2
for: 5m
labels:
severity: warning
annotations:
summary: "PgBouncer客戶端等待佇列過長"
description: "資料庫 {{ $labels.database }} 客戶端等待數是服務端的2倍以上"
- alert: PgbouncerHighWaitTime
expr: pgbouncer_pools_client_wait_time_seconds > 5
for: 1m
labels:
severity: warning
annotations:
summary: "PgBouncer等待時間過長"
description: "資料庫 {{ $labels.database }} 等待時間超過5秒"
- alert: PostgreSQLTooManyConnections
expr: pg_stat_activity_count > 80
for: 5m
labels:
severity: warning
annotations:
summary: "PostgreSQL連線數過高"
description: "目前活躍連線數 {{ $value }},接近 max_connections 上限"
-- PgBouncer管理命令
-- 透過psql連線到PgBouncer管理資料庫
-- psql -h 127.0.0.1 -p 6432 -U pgadmin pgbouncer
-- 檢視連線池狀態
SHOW POOLS;
-- 檢視客戶端列表
SHOW CLIENTS;
-- 檢視服務端列表
SHOW SERVERS;
-- 檢視資料庫列表
SHOW DATABASES;
-- 檢視統計資訊
SHOW STATS;
-- 檢視活躍查詢
SHOW ACTIVE_SOCKETS;
-- 強制釋放閒置連線
PAUSE myapp_db;
RESUME myapp_db;
-- 關閉指定資料庫的所有連線
KILL myapp_db;
#!/bin/bash
# /etc/pgbouncer/health_check.sh — PgBouncer健康檢查腳本
PGBOUNCER_HOST="127.0.0.1"
PGBOUNCER_PORT="6432"
PGBOUNCER_ADMIN_USER="pgadmin"
ALERT_WEBHOOK="https://hooks.slack.com/services/xxx"
check_pool_status() {
local result=$(psql -h $PGBOUNCER_HOST -p $PGBOUNCER_PORT \
-U $PGBOUNCER_ADMIN_USER pgbouncer -c "SHOW POOLS;" -t 2>/dev/null)
if [ -z "$result" ]; then
send_alert "PgBouncer無法連線,可能已當機"
return 1
fi
echo "$result" | while read line; do
active=$(echo "$line" | awk '{print $5}')
max_conn=$(echo "$line" | awk '{print $7}')
if [ -n "$active" ] && [ -n "$max_conn" ] && [ "$active" -ge "$max_conn" ]; then
send_alert "PgBouncer連線池耗盡: active=$active, max=$max_conn"
fi
done
}
check_postgresql_connections() {
local conn_count=$(psql -h 10.0.1.100 -p 5432 -U monitor -d myapp_production \
-t -c "SELECT COUNT(*) FROM pg_stat_activity WHERE datname = 'myapp_production';" 2>/dev/null)
conn_count=$(echo $conn_count | tr -d ' ')
if [ "$conn_count" -gt 80 ]; then
send_alert "PostgreSQL連線數過高: $conn_count / 100"
fi
}
send_alert() {
local message="$1"
curl -s -X POST "$ALERT_WEBHOOK" \
-H "Content-Type: application/json" \
-d "{\"text\": \"[ALERT] $message\"}" > /dev/null
echo "$(date): $message" >> /var/log/pgbouncer_health.log
}
check_pool_status
check_postgresql_connections
避坑指南:5個常見陷阱
陷阱1:Transaction模式下使用SET陳述式
-- ❌ 錯誤:SET在交易結束後遺失
SET work_mem = '256MB';
SELECT * FROM large_table ORDER BY data;
-- 下一個查詢 work_mem 恢復預設值
-- ✅ 正確:使用SET LOCAL在交易內生效
BEGIN;
SET LOCAL work_mem = '256MB';
SELECT * FROM large_table ORDER BY data;
COMMIT;
陷阱2:應用側連線池與PgBouncer雙重池化
# ❌ 錯誤:應用側HikariCP大連線池 + PgBouncer
# HikariCP: maximumPoolSize=50
# PgBouncer: default_pool_size=40
# 結果:50個客戶端連線爭搶40個服務端連線,大量等待
# ✅ 正確:應用側小連線池 + PgBouncer
# HikariCP: maximumPoolSize=5(每實例)
# PgBouncer: default_pool_size=40
# 10個實例 × 5連線 = 50客戶端連線,40個服務端連線足夠
# HikariCP設定(配合PgBouncer Transaction模式)
spring.datasource.hikari.maximum-pool-size=5
spring.datasource.hikari.minimum-idle=2
spring.datasource.hikari.connection-timeout=3000
spring.datasource.hikari.idle-timeout=600000
spring.datasource.hikari.max-lifetime=1800000
spring.datasource.hikari.connection-test-query=SELECT 1
陷阱3:PgBouncer server_lifetime過長
; ❌ 錯誤:server_lifetime過長,服務端連線可能關聯過期連線階段狀態
server_lifetime = 86400
; ✅ 正確:合理設定server_lifetime,定期重新整理連線
server_lifetime = 1800
; 搭配server_check_query定期檢查連線健康
server_check_query = SELECT 1
server_check_delay = 30
陷阱4:忽略superuser_reserved_connections
-- ❌ 錯誤:max_connections=100,沒有預留超級使用者連線
-- 當連線耗盡時,連管理員都無法登入
-- ✅ 正確:預留超級使用者連線
ALTER SYSTEM SET superuser_reserved_connections = 3;
SELECT pg_reload_conf();
-- 緊急情況下使用超級使用者連線
psql -h 10.0.1.100 -U postgres -d myapp_production -c \
"SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE usename = 'idle_app' AND state = 'idle';"
陷阱5:Pgpool-II健康檢查使用者權限不足
-- ❌ 錯誤:使用普通使用者做健康檢查
-- CREATE USER health_check WITH PASSWORD 'pass';
-- ✅ 正確:賦予健康檢查使用者足夠權限
CREATE USER pgpool_health WITH PASSWORD 'health_check_pass' LOGIN;
GRANT pg_monitor TO pgpool_health;
-- 或最小權限
GRANT CONNECT ON DATABASE myapp_production TO pgpool_health;
錯誤排查表
| 錯誤訊息 | 原因 | 解決方案 |
|---|---|---|
FATAL: sorry, too many connections already |
連線數超過max_connections |
增加連線池、降低default_pool_size、檢查連線洩漏 |
FATAL: remaining connection slots are reserved for non-replication superuser connections |
普通使用者連線數達到上限 | 增大max_connections或使用超級使用者清理閒置連線 |
FATAL: no pg_hba.conf entry for host |
客戶端IP不在白名單 | 在pg_hba.conf中新增客戶端IP或連線池伺服器IP |
FATAL: password authentication failed for user |
認證資訊錯誤 | 檢查PgBouncer的userlist.txt或auth_query設定 |
ERROR: connection not open |
PgBouncer無法連線到PostgreSQL | 檢查後端資料庫連線設定、網路、PostgreSQL是否運行 |
ERROR: query_wait_timeout |
客戶端等待連線逾時 | 增大query_wait_timeout或default_pool_size |
ERROR: idle_transaction_timeout |
交易內閒置時間過長 | 檢查應用是否有長交易未提交,增大idle_transaction_timeout |
ERROR: server_login_retry failed |
PgBouncer重連PostgreSQL失敗 | 檢查PostgreSQL是否可達、pg_hba.conf是否允許連線 |
ERROR: prepared statement "s1" does not exist |
Transaction模式下PREPARE陳述式遺失 | 改用Session模式或使用prepared_statement_cache_size=0 |
ERROR: cannot drop temporary table because it is being used |
Transaction模式下臨時表跨交易使用 | 改用Session模式或使用永久表替代臨時表 |
進階最佳化
1. PgBouncer auth_query動態認證
; /etc/pgbouncer/pgbouncer.ini — 動態認證設定
[databases]
myapp_db = host=10.0.1.100 port=5432 dbname=myapp_production auth_query="SELECT usename, passwd FROM pg_shadow WHERE usename=$1"
[pgbouncer]
auth_type = md5
; 不需要維護userlist.txt,直接查詢PostgreSQL取得密碼
; 適合使用者頻繁變更的場景
-- 建立唯讀使用者供auth_query使用
CREATE USER pgbouncer_auth WITH PASSWORD 'auth_query_pass' LOGIN;
GRANT pg_monitor TO pgbouncer_auth;
-- 設定pg_hba.conf允許pgbouncer_auth連線
-- host myapp_production pgbouncer_auth 10.0.1.50/32 md5
2. 多級連線池架構
多級連線池架構設計:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ App實例1 │ │ App實例2 │ │ App實例3 │
│ HikariCP(5) │ │ HikariCP(5) │ │ HikariCP(5) │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└────────┬────────┴────────┬────────┘
│ │
┌────────▼────────┐ ┌─────▼──────────┐
│ PgBouncer │ │ PgBouncer │
│ (寫-Session) │ │ (讀-Transaction)│
│ pool_size=20 │ │ pool_size=40 │
│ :6432 │ │ :6433 │
└────────┬────────┘ └──────┬──────────┘
│ │
┌────────▼────────┐ ┌─────▼──────────┐
│ PostgreSQL │ │ PostgreSQL │
│ Primary │ │ Replica ×2 │
│ max_conn=100 │ │ max_conn=100 │
└─────────────────┘ └─────────────────┘
關鍵設計:
- 寫操作:Session模式(相容PREPARE/SET)
- 讀操作:Transaction模式(最大重用率)
- 應用側小池 + PgBouncer大池
- 讀寫分離在連線池層完成
3. 連線池自動擴縮容
# 基於負載的連線池自動擴縮容
import psycopg2
import time
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("pool_autoscaler")
class PoolAutoScaler:
def __init__(self, pgbouncer_host, pgbouncer_port, admin_user, admin_db="pgbouncer"):
self.conn_params = {
"host": pgbouncer_host,
"port": pgbouncer_port,
"user": admin_user,
"database": admin_db
}
self.min_pool = 10
self.max_pool = 100
self.scale_up_threshold = 0.8
self.scale_down_threshold = 0.3
def get_pool_stats(self):
conn = psycopg2.connect(**self.conn_params)
cur = conn.cursor()
cur.execute("SHOW POOLS;")
pools = cur.fetchall()
cur.execute("SHOW DATABASES;")
databases = cur.fetchall()
conn.close()
return pools, databases
def calculate_utilization(self, active, total):
if total == 0:
return 0
return active / total
def scale_pool(self, database, new_size):
new_size = max(self.min_pool, min(self.max_pool, new_size))
conn = psycopg2.connect(**self.conn_params)
cur = conn.cursor()
cur.execute(f"SET default_pool_size = {new_size} FOR {database};")
conn.close()
logger.info(f"Pool scaled: {database} → {new_size}")
def run(self):
pools, databases = self.get_pool_stats()
for pool in pools:
database = pool[0]
active = pool[4] if len(pool) > 4 else 0
total = pool[6] if len(pool) > 6 else 1
utilization = self.calculate_utilization(active, total)
if utilization > self.scale_up_threshold:
new_size = int(total * 1.5)
self.scale_pool(database, new_size)
elif utilization < self.scale_down_threshold:
new_size = int(total * 0.7)
self.scale_pool(database, new_size)
if __name__ == "__main__":
scaler = PoolAutoScaler("10.0.1.50", 6432, "pgadmin")
while True:
try:
scaler.run()
except Exception as e:
logger.error(f"Auto-scaler error: {e}")
time.sleep(60)
連線池方案對比
| 特性 | PgBouncer | Pgpool-II | Odyssey | PostgreSQL內建 | SQLAlchemy Pool |
|---|---|---|---|---|---|
| 池化模式 | Session/Transaction/Statement | Session/Transaction | Session/Transaction | 無 | Session |
| 讀寫分離 | ❌ 需外部設定 | ✅ 內建 | ✅ 設定路由 | ❌ | ❌ |
| 負載均衡 | ❌ | ✅ 內建 | ✅ 設定路由 | ❌ | ❌ |
| 故障切換 | ❌ | ✅ 內建 | ❌ | ❌ | ❌ |
| 效能 | 極高(C語言) | 高(C語言) | 極高(C語言) | 無 | 中(Python層) |
| 多資料庫路由 | ✅ | ✅ | ✅ | ❌ | ❌ |
| 記憶體佔用 | 極低(~2MB) | 中等 | 低 | N/A | 低 |
| 部署複雜度 | 低 | 高 | 中 | 無 | 無 |
| 適用場景 | 大多數場景 | 需要讀寫分離+負載均衡 | 超高併發+多租戶 | 開發/測試 | 單應用Python專案 |
| PREPARE支援 | Session模式 | ✅ | ✅ | ✅ | ✅ |
| 推薦度 | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐ |
總結
PostgreSQL連線池最佳化核心要點:1) 理解Process-Per-Connection模型的開銷,每個連線≈10MB記憶體;2) 高併發場景首選PgBouncer Transaction模式,1000客戶端只需20-40個真實連線;3) 需要讀寫分離+負載均衡選Pgpool-II;4) 超高併發+多租戶選Odyssey;5) 連線數公式:
pool_size = CPU核心數 × 2 + 有效磁碟數;6) 應用側小池+PgBouncer大池,避免雙重池化;7) 生產必須有監控告警,連線池耗盡是P0級故障;8) 預留superuser_reserved_connections,防止連線耗盡後無法登入。
推薦工具
本站提供瀏覽器本地工具,免註冊即可試用 →