DatabaseIntermediate
17 min readNov 24, 2025

Database Normalization vs Denormalization: When to Use Each

Master database design tradeoffs between normalized and denormalized schemas with real-world examples.

R

Rithy Tep

Author

Database Normalization vs Denormalization: When to Use Each

Normalization: Eliminating Redundancy

First Normal Form (1NF)

Rule: No repeating groups, atomic values only.

SET NOCOUNT ON; -- BAD: Not in 1NF (repeating groups) CREATE TABLE [dbo].[Orders] ( [Id] INT PRIMARY KEY, [CustomerName] NVARCHAR(200), [Product1] NVARCHAR(200), [Product2] NVARCHAR(200), [Product3] NVARCHAR(200) ); -- GOOD: 1NF (atomic values) CREATE TABLE [dbo].[Orders] ( [Id] INT PRIMARY KEY IDENTITY(1,1), [CustomerName] NVARCHAR(200) ); CREATE TABLE [dbo].[OrderItems] ( [OrderId] INT NOT NULL, [ProductName] NVARCHAR(200) NOT NULL, CONSTRAINT [FK_OrderItems_Orders] FOREIGN KEY ([OrderId]) REFERENCES [dbo].[Orders]([Id]) );

Second Normal Form (2NF)

Rule: 1NF + No partial dependencies (all non-key columns depend on entire primary key).

SET NOCOUNT ON; -- BAD: Not in 2NF (partial dependency) CREATE TABLE [dbo].[OrderItems] ( [OrderId] INT NOT NULL, [ProductId] INT NOT NULL, [ProductName] NVARCHAR(200), -- Depends only on ProductId [ProductPrice] DECIMAL(10,2), -- Depends only on ProductId [Quantity] INT NOT NULL, CONSTRAINT [PK_OrderItems] PRIMARY KEY ([OrderId], [ProductId]) ); -- GOOD: 2NF CREATE TABLE [dbo].[Products] ( [Id] INT PRIMARY KEY IDENTITY(1,1), [Name] NVARCHAR(200) NOT NULL, [Price] DECIMAL(10,2) NOT NULL ); CREATE TABLE [dbo].[OrderItems] ( [OrderId] INT NOT NULL, [ProductId] INT NOT NULL, [Quantity] INT NOT NULL, CONSTRAINT [PK_OrderItems] PRIMARY KEY ([OrderId], [ProductId]), CONSTRAINT [FK_OrderItems_Products] FOREIGN KEY ([ProductId]) REFERENCES [dbo].[Products]([Id]) );

Third Normal Form (3NF)

Rule: 2NF + No transitive dependencies (non-key columns don't depend on other non-key columns).

SET NOCOUNT ON; -- BAD: Not in 3NF (transitive dependency) CREATE TABLE [dbo].[Employees] ( [Id] INT PRIMARY KEY IDENTITY(1,1), [Name] NVARCHAR(200) NOT NULL, [DepartmentId] INT NOT NULL, [DepartmentName] NVARCHAR(200), -- Depends on DepartmentId [DepartmentLocation] NVARCHAR(200) -- Depends on DepartmentId ); -- GOOD: 3NF CREATE TABLE [dbo].[Departments] ( [Id] INT PRIMARY KEY IDENTITY(1,1), [Name] NVARCHAR(200) NOT NULL, [Location] NVARCHAR(200) NOT NULL ); CREATE TABLE [dbo].[Employees] ( [Id] INT PRIMARY KEY IDENTITY(1,1), [Name] NVARCHAR(200) NOT NULL, [DepartmentId] INT NOT NULL, CONSTRAINT [FK_Employees_Departments] FOREIGN KEY ([DepartmentId]) REFERENCES [dbo].[Departments]([Id]) );

Benefits of Normalization

No data redundancy - Store each fact once ✅ Data integrity - Updates happen in one place ✅ Smaller storage - No duplicate data ✅ Easier updates - Change once, affects all references

Denormalization: Optimizing for Reads

When to Denormalize

  • Read-heavy workloads
  • Complex joins hurt performance
  • Pre-calculated aggregates needed
  • Real-time dashboards

Example 1: E-commerce Orders

Normalized (3NF):

SET NOCOUNT ON; -- 4 tables, 3 JOINs needed SELECT o.[Id], o.[OrderDate], c.[Name] AS [CustomerName], c.[Email] AS [CustomerEmail], p.[Name] AS [ProductName], oi.[Quantity], p.[Price] FROM [dbo].[Orders] o WITH(NOLOCK) INNER JOIN [dbo].[Customers] c WITH(NOLOCK) ON o.[CustomerId] = c.[Id] INNER JOIN [dbo].[OrderItems] oi WITH(NOLOCK) ON o.[Id] = oi.[OrderId] INNER JOIN [dbo].[Products] p WITH(NOLOCK) ON oi.[ProductId] = p.[Id] OPTION (RECOMPILE); -- Slow with millions of orders!

Denormalized:

SET NOCOUNT ON; -- T-SQL: 1 table, no JOINs CREATE TABLE [dbo].[OrdersDenormalized] ( [Id] INT PRIMARY KEY IDENTITY(1,1), [OrderDate] DATETIME2 NOT NULL DEFAULT GETDATE(), [CustomerId] INT NOT NULL, [CustomerName] NVARCHAR(200) NOT NULL, -- Duplicated for performance [CustomerEmail] NVARCHAR(200) NOT NULL, -- Duplicated for performance [ProductId] INT NOT NULL, [ProductName] NVARCHAR(200) NOT NULL, -- Duplicated for performance [ProductPrice] DECIMAL(10,2) NOT NULL, -- Duplicated for performance [Quantity] INT NOT NULL, [TotalAmount] AS ([ProductPrice] * [Quantity]) PERSISTED -- Computed column ); -- Fast query - no JOINs needed! SELECT [Id], [OrderDate], [CustomerName], [CustomerEmail], [ProductName], [Quantity], [TotalAmount] FROM [dbo].[OrdersDenormalized] WITH(NOLOCK) WHERE [Id] = 123 OPTION (RECOMPILE);

Example 2: Blog Posts with Comments

Normalized:

SET NOCOUNT ON; CREATE TABLE [dbo].[Posts] ( [Id] INT PRIMARY KEY IDENTITY(1,1), [Title] NVARCHAR(255) NOT NULL, [Content] NVARCHAR(MAX) NOT NULL, [CreatedAt] DATETIME2 NOT NULL DEFAULT GETDATE() ); CREATE TABLE [dbo].[Comments] ( [Id] INT PRIMARY KEY IDENTITY(1,1), [PostId] INT NOT NULL, [Content] NVARCHAR(MAX) NOT NULL, [CreatedAt] DATETIME2 NOT NULL DEFAULT GETDATE(), CONSTRAINT [FK_Comments_Posts] FOREIGN KEY ([PostId]) REFERENCES [dbo].[Posts]([Id]) ); -- Query requires JOIN + COUNT (slower) SELECT p.[Id], p.[Title], p.[Content], p.[CreatedAt], COUNT(c.[Id]) AS [CommentCount] FROM [dbo].[Posts] p WITH(NOLOCK) LEFT JOIN [dbo].[Comments] c WITH(NOLOCK) ON p.[Id] = c.[PostId] GROUP BY p.[Id], p.[Title], p.[Content], p.[CreatedAt] OPTION (RECOMPILE);

Denormalized:

SET NOCOUNT ON; -- T-SQL with denormalized counter CREATE TABLE [dbo].[Posts] ( [Id] INT PRIMARY KEY IDENTITY(1,1), [Title] NVARCHAR(255) NOT NULL, [Content] NVARCHAR(MAX) NOT NULL, [CreatedAt] DATETIME2 NOT NULL DEFAULT GETDATE(), [CommentCount] INT NOT NULL DEFAULT 0 -- Denormalized counter ); GO -- Update with trigger CREATE TRIGGER [trg_UpdateCommentCount] ON [dbo].[Comments] AFTER INSERT AS BEGIN SET NOCOUNT ON; UPDATE p SET p.[CommentCount] = p.[CommentCount] + 1 FROM [dbo].[Posts] p WITH(ROWLOCK) INNER JOIN inserted i ON p.[Id] = i.[PostId]; END; GO -- Fast query, no JOIN needed SELECT [Id], [Title], [Content], [CreatedAt], [CommentCount] FROM [dbo].[Posts] WITH(NOLOCK) WHERE [CreatedAt] >= DATEADD(DAY, -30, GETDATE()) OPTION (RECOMPILE);

Example 3: User Analytics Dashboard

Normalized:

SET NOCOUNT ON; -- T-SQL: Calculate on every page load (SLOW!) SELECT u.[Id], u.[Name], u.[Email], COUNT(DISTINCT o.[Id]) AS [TotalOrders], ISNULL(SUM(o.[TotalAmount]), 0) AS [TotalSpent], MAX(o.[CreatedAt]) AS [LastOrderDate] FROM [dbo].[Users] u WITH(NOLOCK) LEFT JOIN [dbo].[Orders] o WITH(NOLOCK) ON u.[Id] = o.[UserId] GROUP BY u.[Id], u.[Name], u.[Email] OPTION (RECOMPILE);

Denormalized:

SET NOCOUNT ON; -- T-SQL: Pre-calculated aggregates CREATE TABLE [dbo].[UserStats] ( [UserId] INT PRIMARY KEY, [TotalOrders] INT NOT NULL DEFAULT 0, [TotalSpent] DECIMAL(10,2) NOT NULL DEFAULT 0, [LastOrderDate] DATETIME2, [UpdatedAt] DATETIME2 NOT NULL DEFAULT GETDATE(), CONSTRAINT [FK_UserStats_Users] FOREIGN KEY ([UserId]) REFERENCES [dbo].[Users]([Id]) ); GO -- T-SQL: Update with trigger CREATE TRIGGER [trg_UpdateUserStats] ON [dbo].[Orders] AFTER INSERT AS BEGIN SET NOCOUNT ON; MERGE INTO [dbo].[UserStats] AS target USING ( SELECT i.[UserId], COUNT(*) AS [OrderCount], SUM(i.[TotalAmount]) AS [OrderTotal], MAX(i.[CreatedAt]) AS [LastOrder] FROM inserted i GROUP BY i.[UserId] ) AS source ON target.[UserId] = source.[UserId] WHEN MATCHED THEN UPDATE SET [TotalOrders] = target.[TotalOrders] + source.[OrderCount], [TotalSpent] = target.[TotalSpent] + source.[OrderTotal], [LastOrderDate] = source.[LastOrder], [UpdatedAt] = GETDATE() WHEN NOT MATCHED THEN INSERT ([UserId], [TotalOrders], [TotalSpent], [LastOrderDate]) VALUES (source.[UserId], source.[OrderCount], source.[OrderTotal], source.[LastOrder]); END; GO -- Fast query - no aggregation needed! SELECT [UserId], [TotalOrders], [TotalSpent], [LastOrderDate], [UpdatedAt] FROM [dbo].[UserStats] WITH(NOLOCK) WHERE [UserId] = 123 OPTION (RECOMPILE);

Hybrid Approach: Materialized Views

Best of both worlds - keep normalized tables, denormalize for reads.

SQL Server (Indexed Views)

SET NOCOUNT ON; -- Keep normalized tables CREATE TABLE [dbo].[Orders] ( [Id] INT PRIMARY KEY IDENTITY(1,1), [CustomerId] INT NOT NULL, [OrderDate] DATETIME2 NOT NULL DEFAULT GETDATE(), [TotalAmount] DECIMAL(10,2) NOT NULL ); GO -- Create indexed view (materialized) CREATE VIEW [dbo].[vw_OrderSummary] WITH SCHEMABINDING AS SELECT CAST([OrderDate] AS DATE) AS [OrderDate], COUNT_BIG(*) AS [OrderCount], SUM([TotalAmount]) AS [Revenue] FROM [dbo].[Orders] GROUP BY CAST([OrderDate] AS DATE); GO -- Create clustered index to materialize the view CREATE UNIQUE CLUSTERED INDEX [IX_vw_OrderSummary_OrderDate] ON [dbo].[vw_OrderSummary]([OrderDate]); GO -- Fast queries on indexed view SELECT [OrderDate], [OrderCount], [Revenue] FROM [dbo].[vw_OrderSummary] WITH(NOLOCK) WHERE [OrderDate] = N'2024-01-15' OPTION (RECOMPILE); -- Note: Indexed views auto-update when base table changes -- No manual refresh needed in SQL Server

MySQL (with events)

-- Summary table CREATE TABLE daily_sales ( date DATE PRIMARY KEY, order_count INT, revenue DECIMAL(10,2) ); -- Update every hour CREATE EVENT update_daily_sales ON SCHEDULE EVERY 1 HOUR DO INSERT INTO daily_sales (date, order_count, revenue) SELECT DATE(order_date), COUNT(*), SUM(total_amount) FROM orders WHERE order_date >= CURDATE() - INTERVAL 7 DAY GROUP BY DATE(order_date) ON DUPLICATE KEY UPDATE order_count = VALUES(order_count), revenue = VALUES(revenue);

Decision Matrix

| Scenario | Use Normalization | Use Denormalization | |----------|------------------|---------------------| | Write-heavy workload | ✅ Yes | ❌ No (slower writes) | | Read-heavy workload | ⚠️ Maybe | ✅ Yes | | Complex reports | ❌ Slow JOINs | ✅ Pre-calculated | | Data integrity critical | ✅ Yes | ⚠️ Manage carefully | | Storage limited | ✅ Less storage | ❌ More storage | | Real-time dashboards | ❌ Too slow | ✅ Fast reads | | OLTP (transactions) | ✅ Normalized | ❌ No | | OLAP (analytics) | ❌ Too slow | ✅ Denormalized |

Best Practices

1. Start Normalized, Denormalize Strategically

SET NOCOUNT ON; -- Keep source of truth normalized CREATE TABLE [dbo].[Products] ( [Id] INT PRIMARY KEY IDENTITY(1,1), [Name] NVARCHAR(255) NOT NULL, [Price] DECIMAL(10,2) NOT NULL ); -- Denormalize for performance CREATE TABLE [dbo].[OrderItems] ( [OrderId] INT NOT NULL, [ProductId] INT NOT NULL, [ProductName] NVARCHAR(255) NOT NULL, -- Denormalized for faster reads [PriceAtPurchase] DECIMAL(10,2) NOT NULL, -- Historical price [Quantity] INT NOT NULL );

2. Use Database Features

  • Triggers: Auto-update denormalized data
  • Materialized Views: Cached query results
  • Generated Columns: Computed values
-- PostgreSQL generated column CREATE TABLE orders ( id INT PRIMARY KEY, subtotal DECIMAL(10,2), tax_rate DECIMAL(4,2), total DECIMAL(10,2) GENERATED ALWAYS AS (subtotal * (1 + tax_rate)) STORED );

3. Document Denormalization

SET NOCOUNT ON; -- Add comments explaining denormalization CREATE TABLE [dbo].[Orders] ( [Id] INT PRIMARY KEY IDENTITY(1,1), [CustomerName] NVARCHAR(200) NOT NULL, -- Denormalized from Customers table for performance -- Updated via trigger when Customers.Name changes [CustomerId] INT NOT NULL );

4. Monitor and Adjust

-- Find slow queries SELECT query, calls, total_time, mean_time FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10; -- Consider denormalizing if: -- 1. Query is called frequently -- 2. Query has expensive JOINs -- 3. Data changes infrequently

Conclusion

Normalize by default for:

  • Data integrity
  • Storage efficiency
  • Transactional systems (OLTP)

Denormalize selectively for:

  • Read performance
  • Complex aggregations
  • Analytics/reporting (OLAP)

Best approach: Hybrid - normalized source, denormalized views/caches.

#Database Design#Normalization#Denormalization#Schema Design#Performance