PostgreSQL Performance Tuning

Query Planning, Indexing, Caching, and Configuration

PostgreSQL is renowned for its robustness and feature set, but achieving optimal performance requires understanding its internals and applying targeted optimization strategies. This comprehensive guide explores the critical aspects of PostgreSQL performance tuning, from query planning and indexing to caching mechanisms and configuration parameters.

1. Understanding Query Planning with EXPLAIN

The query planner is PostgreSQL's brain for determining how to execute queries. Understanding EXPLAIN output is fundamental to performance optimization.

Basic EXPLAIN Usage

-- Basic execution plan EXPLAIN SELECT * FROM orders WHERE customer_id = 1001; -- Include actual execution statistics EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 1001; -- Detailed output with buffers and timing EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT o.order_id, c.customer_name, SUM(oi.quantity * oi.price) FROM orders o JOIN customers c ON o.customer_id = c.customer_id JOIN order_items oi ON o.order_id = oi.order_id WHERE o.order_date >= '2024-01-01' GROUP BY o.order_id, c.customer_name;

Reading EXPLAIN Output

Key metrics to monitor include:

Pro Tip: Compare estimated rows with actual rows. Large discrepancies indicate outdated statistics. Run ANALYZE table_name; to update them.

Common Query Plan Nodes

-- Sequential Scan (reads entire table) Seq Scan on orders (cost=0.00..1234.56 rows=50000) -- Index Scan (uses index to find rows) Index Scan using idx_customer on orders (cost=0.42..8.44 rows=1) -- Bitmap Index Scan (for multiple matching rows) Bitmap Heap Scan on orders (cost=12.50..567.89 rows=500) -> Bitmap Index Scan on idx_order_date -- Hash Join (for joining tables) Hash Join (cost=100.00..1500.00 rows=1000) Hash Cond: (o.customer_id = c.customer_id)

2. Indexing Strategies

Proper indexing can transform slow queries into lightning-fast operations. However, over-indexing can harm write performance and consume storage.

B-tree Indexes (Default)

B-tree indexes are ideal for equality and range queries, and they're the default index type in PostgreSQL.

-- Basic B-tree index CREATE INDEX idx_orders_customer ON orders(customer_id); -- Composite index (order matters!) CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date); -- This query uses the index efficiently: SELECT * FROM orders WHERE customer_id = 1001 AND order_date >= '2024-01-01'; -- This query can use partial index scan: SELECT * FROM orders WHERE customer_id = 1001; -- This query CANNOT use the index efficiently: SELECT * FROM orders WHERE order_date >= '2024-01-01';

Partial Indexes

Partial indexes cover only a subset of rows, reducing index size and improving performance for specific queries.

-- Index only active orders CREATE INDEX idx_active_orders ON orders(order_date) WHERE status = 'active'; -- Index only recent large orders CREATE INDEX idx_recent_large_orders ON orders(customer_id, order_date) WHERE order_date >= '2024-01-01' AND total_amount > 1000; -- Using the partial index SELECT * FROM orders WHERE status = 'active' AND order_date >= '2024-06-01';

Expression Indexes

-- Index on computed values CREATE INDEX idx_lower_email ON customers(LOWER(email)); -- Now this query can use the index: SELECT * FROM customers WHERE LOWER(email) = '[email protected]'; -- Index on extracted date part CREATE INDEX idx_order_month ON orders(EXTRACT(MONTH FROM order_date)); SELECT * FROM orders WHERE EXTRACT(MONTH FROM order_date) = 12;

GIN Indexes (for Full-Text Search and Arrays)

-- Full-text search index CREATE INDEX idx_product_search ON products USING GIN(to_tsvector('english', name || ' ' || description)); -- Search using the index SELECT * FROM products WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'laptop & wireless'); -- Array index CREATE INDEX idx_tags ON articles USING GIN(tags); SELECT * FROM articles WHERE tags @> ARRAY['postgresql', 'performance'];

BRIN Indexes (for Very Large Tables)

-- BRIN index for time-series data CREATE INDEX idx_logs_timestamp ON system_logs USING BRIN(timestamp) WITH (pages_per_range = 128); -- Much smaller than B-tree but still effective for range queries SELECT * FROM system_logs WHERE timestamp >= '2024-01-01' AND timestamp < '2024-02-01';
Warning: Every index slows down INSERT, UPDATE, and DELETE operations. Only create indexes that demonstrably improve query performance. Use pg_stat_user_indexes to monitor index usage.

3. Caching and Memory Management

Shared Buffers

PostgreSQL's shared buffers cache frequently accessed data pages in memory, reducing disk I/O.

-- In postgresql.conf # Rule of thumb: 25% of system RAM (up to 8-16GB) shared_buffers = 4GB # For systems with 32GB RAM shared_buffers = 8GB # Check current buffer usage SELECT count(*) as total_buffers, count(*) FILTER (WHERE isdirty) as dirty_buffers, pg_size_pretty(count(*) * 8192) as total_size FROM pg_buffercache;

Effective Cache Size

-- In postgresql.conf # Tell the planner how much memory is available for caching # Set to ~50-75% of total RAM effective_cache_size = 16GB # This doesn't allocate memory, it just helps the planner # make better decisions about index usage

Work Memory

-- In postgresql.conf # Memory for sorting and hash operations # Start conservative: 4MB per connection work_mem = 4MB # For specific queries, increase temporarily SET work_mem = '256MB'; SELECT * FROM large_table ORDER BY some_column; RESET work_mem; -- Monitor sort operations SELECT datname, temp_files, temp_bytes, pg_size_pretty(temp_bytes) as temp_size FROM pg_stat_database;

Maintenance Work Memory

-- In postgresql.conf # Memory for VACUUM, CREATE INDEX, etc. # Can be set higher since these don't run in parallel maintenance_work_mem = 1GB -- For very large indexes SET maintenance_work_mem = '4GB'; CREATE INDEX idx_large_table ON large_table(column); RESET maintenance_work_mem;

Query Result Caching

-- Application-level caching with prepared statements PREPARE get_customer_orders (int) AS SELECT * FROM orders WHERE customer_id = $1; EXECUTE get_customer_orders(1001); -- Materialized views for expensive queries CREATE MATERIALIZED VIEW monthly_sales_summary AS SELECT DATE_TRUNC('month', order_date) as month, customer_id, COUNT(*) as order_count, SUM(total_amount) as total_sales FROM orders GROUP BY 1, 2; -- Create index on materialized view CREATE INDEX idx_monthly_sales ON monthly_sales_summary(month, customer_id); -- Refresh periodically REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales_summary;

4. Configuration Tuning

Checkpoint Configuration

-- In postgresql.conf # Maximum time between checkpoints checkpoint_timeout = 15min # Maximum size of WAL before checkpoint max_wal_size = 4GB min_wal_size = 1GB # Spread checkpoint I/O over this fraction of checkpoint interval checkpoint_completion_target = 0.9 -- Monitor checkpoint activity SELECT * FROM pg_stat_bgwriter;

Connection and Resource Limits

-- In postgresql.conf # Maximum concurrent connections max_connections = 100 # Use connection pooling (pgBouncer) for web apps instead # Parallel query workers max_parallel_workers_per_gather = 4 max_parallel_workers = 8 max_worker_processes = 8 -- Enable parallel queries for specific operations SET parallel_setup_cost = 100; SET parallel_tuple_cost = 0.01;

Autovacuum Tuning

-- In postgresql.conf # Autovacuum is critical for performance autovacuum = on # More aggressive autovacuum for high-write tables autovacuum_vacuum_scale_factor = 0.1 autovacuum_analyze_scale_factor = 0.05 # Maximum number of autovacuum workers autovacuum_max_workers = 3 -- Per-table autovacuum settings ALTER TABLE high_traffic_orders SET ( autovacuum_vacuum_scale_factor = 0.05, autovacuum_analyze_scale_factor = 0.02 ); -- Monitor bloat SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size, n_dead_tup, n_live_tup, round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) as dead_pct FROM pg_stat_user_tables WHERE n_dead_tup > 1000 ORDER BY n_dead_tup DESC;

Logging Configuration

-- In postgresql.conf # Log slow queries log_min_duration_statement = 1000 # Log queries > 1 second # Log autovacuum actions log_autovacuum_min_duration = 0 # Log checkpoints log_checkpoints = on # Detailed query statistics shared_preload_libraries = 'pg_stat_statements' -- After restart, create extension CREATE EXTENSION pg_stat_statements; -- Find slowest queries SELECT query, calls, total_exec_time, mean_exec_time, max_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;

5. Additional Optimization Techniques

Partitioning Large Tables

-- Range partitioning by date CREATE TABLE orders ( order_id SERIAL, order_date DATE NOT NULL, customer_id INT, total_amount DECIMAL ) PARTITION BY RANGE (order_date); CREATE TABLE orders_2024_q1 PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2024-04-01'); CREATE TABLE orders_2024_q2 PARTITION OF orders FOR VALUES FROM ('2024-04-01') TO ('2024-07-01'); -- Queries automatically use partition pruning EXPLAIN SELECT * FROM orders WHERE order_date >= '2024-03-01' AND order_date < '2024-04-01';

Query Optimization Tips

-- Use EXISTS instead of IN for large subqueries -- Slower: SELECT * FROM customers WHERE customer_id IN (SELECT customer_id FROM orders); -- Faster: SELECT * FROM customers c WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id); -- Use LIMIT for pagination SELECT * FROM orders ORDER BY order_date DESC LIMIT 20 OFFSET 100; -- Avoid SELECT * - specify needed columns SELECT order_id, customer_id, total_amount FROM orders WHERE status = 'pending'; -- Use covering indexes to avoid table access CREATE INDEX idx_orders_covering ON orders(customer_id) INCLUDE (order_date, total_amount);

Statistics and Monitoring

-- Update table statistics ANALYZE orders; -- For specific columns ANALYZE orders (customer_id, order_date); -- Increase statistics target for high-cardinality columns ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 1000; -- Monitor cache hit ratio (should be > 90%) SELECT sum(heap_blks_read) as heap_read, sum(heap_blks_hit) as heap_hit, sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio FROM pg_statio_user_tables; -- Find missing indexes SELECT schemaname, tablename, seq_scan, seq_tup_read, idx_scan, seq_tup_read / seq_scan as avg_seq_tup_read FROM pg_stat_user_tables WHERE seq_scan > 0 ORDER BY seq_tup_read DESC LIMIT 20;
Performance Monitoring Checklist: Regularly monitor: cache hit ratios, slow query logs, index usage statistics, table bloat, checkpoint frequency, and connection pool utilization. Set up automated alerts for degradation in these metrics.

6. Recommended Configuration Template

Here's a starting point for a production PostgreSQL server with 32GB RAM and SSD storage:

Parameter Value Description
shared_buffers 8GB 25% of RAM for caching
effective_cache_size 24GB 75% of RAM for query planning
work_mem 32MB Per-operation memory
maintenance_work_mem 2GB For maintenance operations
max_connections 100 Use connection pooling
max_wal_size 4GB Checkpoint threshold
checkpoint_completion_target 0.9 Spread checkpoint I/O
random_page_cost 1.1 Lower for SSD (default 4.0)