MySQL Schema Design & Optimization

Production-Ready Database Architecture Patterns

TData Types Best Practices

Choose Smallest Sufficient Type

  • TINYINT for small integers (0-255 or -128-127)
  • SMALLINT for medium integers (0-65,535)
  • INT for standard integers (most common)
  • BIGINT only when absolutely necessary

String Types

-- Use VARCHAR for variable-length strings VARCHAR(255) -- Most common VARCHAR(50) -- Better for usernames, emails -- Use CHAR for fixed-length data CHAR(2) -- Country codes (US, UK) CHAR(32) -- MD5 hashes -- TEXT types for large content TEXT -- Up to 65KB MEDIUMTEXT -- Up to 16MB

Dates & Times

  • TIMESTAMP for automatic tracking (created_at, updated_at)
  • DATETIME for specific dates/times
  • DATE for date-only values (birthdays)
Performance Tip Smaller data types = less disk I/O, better caching, faster queries. A table with 10 million rows saves gigabytes by using TINYINT instead of INT where appropriate.

IIndexing Strategies

Primary Keys

-- Auto-increment (recommended for most cases) CREATE TABLE users ( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, email VARCHAR(255) NOT NULL ); -- UUID for distributed systems id BINARY(16) PRIMARY KEY

Single-Column Indexes

-- For frequently queried columns CREATE INDEX idx_email ON users(email); CREATE INDEX idx_status ON orders(status);

Composite Indexes

-- Column order matters! Most selective first CREATE INDEX idx_user_status ON orders(user_id, status, created_at); -- This index supports queries like: WHERE user_id = ? WHERE user_id = ? AND status = ? WHERE user_id = ? AND status = ? AND created_at > ?

Unique Indexes

CREATE UNIQUE INDEX idx_email ON users(email); ALTER TABLE users ADD UNIQUE(username);
Index Pitfalls Too many indexes slow down INSERT/UPDATE operations. Index only columns used in WHERE, JOIN, and ORDER BY clauses. Avoid indexing low-cardinality columns (gender, boolean flags).

NNormalization & Design Patterns

First Normal Form (1NF)

  • Atomic values in each column
  • No repeating groups
  • Each row must be unique
-- BAD: Multiple values in one column tags VARCHAR(255) -- "mysql,database,sql" -- GOOD: Separate junction table CREATE TABLE post_tags ( post_id INT, tag_id INT, PRIMARY KEY(post_id, tag_id) );

Second Normal Form (2NF)

  • Meet 1NF requirements
  • No partial dependencies
  • All non-key attributes depend on entire primary key

Third Normal Form (3NF)

  • Meet 2NF requirements
  • No transitive dependencies
  • Non-key attributes depend only on primary key
Denormalization When Needed For read-heavy workloads, strategic denormalization can improve performance. Consider caching computed values or duplicating frequently joined data.

RRelationships & Foreign Keys

One-to-Many

CREATE TABLE users ( id INT PRIMARY KEY AUTO_INCREMENT, username VARCHAR(50) ); CREATE TABLE posts ( id INT PRIMARY KEY AUTO_INCREMENT, user_id INT NOT NULL, title VARCHAR(255), FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE );

Many-to-Many

CREATE TABLE students ( id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(100) ); CREATE TABLE courses ( id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(100) ); CREATE TABLE enrollments ( student_id INT, course_id INT, enrolled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY(student_id, course_id), FOREIGN KEY(student_id) REFERENCES students(id), FOREIGN KEY(course_id) REFERENCES courses(id) );

Referential Actions

Action Behavior
CASCADE Delete/update child rows automatically
SET NULL Set foreign key to NULL
RESTRICT Prevent deletion if children exist
NO ACTION Similar to RESTRICT

Performance Optimization

Query Optimization

-- Use EXPLAIN to analyze queries EXPLAIN SELECT * FROM orders WHERE user_id = 123 AND status = 'pending'; -- Avoid SELECT * SELECT id, title, created_at FROM posts; -- Use LIMIT for pagination SELECT * FROM users LIMIT 20 OFFSET 0;

Covering Indexes

-- Include all queried columns in index CREATE INDEX idx_covering ON orders(user_id, status, created_at, total); -- This query uses index-only scan SELECT created_at, total FROM orders WHERE user_id = 123 AND status = 'shipped';

Partitioning

-- Range partitioning by date CREATE TABLE logs ( id INT, created_at DATE, message TEXT ) PARTITION BY RANGE (YEAR(created_at)) ( PARTITION p2023 VALUES LESS THAN (2024), PARTITION p2024 VALUES LESS THAN (2025), PARTITION p2025 VALUES LESS THAN (2026) );
Caching Strategies Use application-level caching (Redis, Memcached) for frequently accessed data. Consider MySQL query cache for repeated identical queries.

PCommon Schema Patterns

Soft Deletes

ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP NULL; -- Query active records SELECT * FROM users WHERE deleted_at IS NULL; -- Soft delete UPDATE users SET deleted_at = NOW() WHERE id = 123;

Timestamps

created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP

Versioning

CREATE TABLE document_versions ( id INT PRIMARY KEY AUTO_INCREMENT, document_id INT NOT NULL, version INT NOT NULL, content TEXT, created_at TIMESTAMP, UNIQUE KEY(document_id, version) );

Polymorphic Relationships

CREATE TABLE comments ( id INT PRIMARY KEY AUTO_INCREMENT, commentable_id INT NOT NULL, commentable_type VARCHAR(50) NOT NULL, content TEXT, INDEX(commentable_id, commentable_type) ); -- commentable_type: 'Post', 'Video', 'Article'

JSON Columns (MySQL 5.7+)

CREATE TABLE products ( id INT PRIMARY KEY, name VARCHAR(255), attributes JSON ); INSERT INTO products (name, attributes) VALUES ( 'Laptop', '{"cpu": "i7", "ram": "16GB", "ssd": "512GB"}' ); SELECT * FROM products WHERE attributes->'$.ram' = '16GB';

EStorage Engines

InnoDB (Default & Recommended)

  • ACID-compliant transactions
  • Row-level locking
  • Foreign key support
  • Crash recovery
  • Better for write-heavy workloads
CREATE TABLE transactions ( id INT PRIMARY KEY, amount DECIMAL(10,2) ) ENGINE=InnoDB;

MyISAM (Legacy)

  • Table-level locking
  • No transaction support
  • Faster for read-only workloads
  • Full-text search (before MySQL 5.6)
Avoid MyISAM InnoDB now supports everything MyISAM does and more. Use InnoDB unless you have a specific legacy requirement.

Memory

  • Stores data in RAM
  • Extremely fast
  • Data lost on restart
  • Good for temporary tables

🔒Security Best Practices

Password Storage

-- NEVER store plain text passwords -- Use bcrypt, Argon2, or similar password_hash VARCHAR(255) NOT NULL -- Application layer hashing (pseudo-code) hash = bcrypt(password, cost=12)

SQL Injection Prevention

  • Always use prepared statements
  • Never concatenate user input into queries
  • Validate and sanitize all inputs
  • Use ORM frameworks properly
-- VULNERABLE query = "SELECT * FROM users WHERE id = " + userId -- SAFE (Prepared Statement) PREPARE stmt FROM 'SELECT * FROM users WHERE id = ?'; EXECUTE stmt USING @userId;

Principle of Least Privilege

-- Create limited user accounts CREATE USER 'app_user'@'localhost' IDENTIFIED BY 'strong_password'; GRANT SELECT, INSERT, UPDATE, DELETE ON mydb.* TO 'app_user'@'localhost'; -- Revoke unnecessary privileges REVOKE DROP ON mydb.* FROM 'app_user'@'localhost';

Encryption

  • Enable SSL/TLS for connections
  • Encrypt sensitive columns (SSN, credit cards)
  • Use encrypted backups
  • Consider transparent data encryption (TDE)

CConstraints & Validation

NOT NULL

email VARCHAR(255) NOT NULL, username VARCHAR(50) NOT NULL

UNIQUE

email VARCHAR(255) UNIQUE NOT NULL, username VARCHAR(50) UNIQUE

CHECK Constraints (MySQL 8.0.16+)

CREATE TABLE products ( id INT PRIMARY KEY, price DECIMAL(10,2), stock INT, CHECK (price >= 0), CHECK (stock >= 0) );

DEFAULT Values

status VARCHAR(20) DEFAULT 'active', created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, is_verified BOOLEAN DEFAULT FALSE

ENUM for Fixed Sets

status ENUM('pending', 'processing', 'completed', 'failed'), priority ENUM('low', 'medium', 'high', 'urgent')
ENUM Limitations Changing ENUM values requires ALTER TABLE. For frequently changing value sets, use a lookup table instead.

MMaintenance & Monitoring

Analyzing Tables

-- Update table statistics ANALYZE TABLE users; -- Check table health CHECK TABLE users; -- Repair corrupted tables REPAIR TABLE users;

Optimizing Tables

-- Reclaim space and defragment OPTIMIZE TABLE users; -- For InnoDB, this rebuilds the table

Monitoring Queries

-- Show running queries SHOW PROCESSLIST; -- Enable slow query log SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 2; -- Show table sizes SELECT table_name, ROUND((data_length + index_length) / 1024 / 1024, 2) AS size_mb FROM information_schema.tables WHERE table_schema = 'your_database' ORDER BY size_mb DESC;

Index Usage Analysis

-- Find unused indexes SELECT * FROM sys.schema_unused_indexes; -- Show index statistics SHOW INDEX FROM users;
Regular Maintenance Schedule Run ANALYZE TABLE weekly on large tables. Monitor slow query log daily. Review and remove unused indexes monthly. Optimize tables quarterly.

🔄Schema Migration Strategies

Online Schema Changes

-- Use ALGORITHM=INPLACE for minimal locking ALTER TABLE users ADD COLUMN phone VARCHAR(20), ALGORITHM=INPLACE, LOCK=NONE; -- Add index without blocking CREATE INDEX idx_email ON users(email) ALGORITHM=INPLACE, LOCK=NONE;

Safe Column Additions

-- Add nullable column (fast) ALTER TABLE users ADD COLUMN bio TEXT NULL; -- Add with default (may lock table) ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active' NOT NULL;

Renaming Columns

-- MySQL 8.0+ ALTER TABLE users RENAME COLUMN old_name TO new_name; -- Older versions ALTER TABLE users CHANGE old_name new_name VARCHAR(255);

Zero-Downtime Migrations

  • Use pt-online-schema-change (Percona Toolkit)
  • Implement dual-writing during transitions
  • Test migrations on production copies
  • Have rollback plans ready
Migration Risks Always backup before schema changes. Test migrations on staging first. Avoid altering tables during peak hours. Monitor replication lag.

AAdvanced Design Patterns

Hierarchical Data (Nested Sets)

CREATE TABLE categories ( id INT PRIMARY KEY, name VARCHAR(100), lft INT NOT NULL, rgt INT NOT NULL, INDEX(lft, rgt) ); -- Query all descendants efficiently SELECT child.* FROM categories AS parent, categories AS child WHERE child.lft BETWEEN parent.lft AND parent.rgt AND parent.id = ?;

Event Sourcing

CREATE TABLE events ( id BIGINT PRIMARY KEY AUTO_INCREMENT, aggregate_id INT NOT NULL, event_type VARCHAR(50) NOT NULL, event_data JSON NOT NULL, version INT NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, UNIQUE KEY(aggregate_id, version), INDEX(aggregate_id, created_at) );

Multi-Tenant Schemas

-- Shared schema with tenant_id CREATE TABLE posts ( id BIGINT PRIMARY KEY, tenant_id INT NOT NULL, title VARCHAR(255), INDEX(tenant_id, created_at) ); -- Row-level security SELECT * FROM posts WHERE tenant_id = ?;

Time-Series Data

CREATE TABLE metrics ( id BIGINT PRIMARY KEY AUTO_INCREMENT, metric_name VARCHAR(100), value DECIMAL(10,2), recorded_at TIMESTAMP NOT NULL, INDEX(metric_name, recorded_at) ) PARTITION BY RANGE (UNIX_TIMESTAMP(recorded_at)) ( -- Partitions by month );
When to Use Advanced Patterns Only implement complex patterns when simpler solutions won't work. Most applications don't need event sourcing or nested sets. Start simple, refactor when needed.