Waqas Ahmad — Software Architect & Technical Consultant - Available USA, Europe, Global

Waqas Ahmad — Software Architect & Technical Consultant

Specializing in

Distributed Systems

.NET ArchitectureCloud-Native ArchitectureAzure Cloud EngineeringAPI ArchitectureMicroservices ArchitectureEvent-Driven ArchitectureDatabase Design & Optimization

👋 Hi, I'm Waqas — a Software Architect and Technical Consultant specializing in .NET, Azure, microservices, and API-first system design..
I help companies build reliable, maintainable, and high-performance backend platforms that scale.

Experienced across engineering ecosystems shaped by Microsoft, the Cloud Native Computing Foundation, and the Apache Software Foundation.

Available for remote consulting (USA, Europe, Global) — flexible across EST, PST, GMT & CET.

services
Article

Database Indexing Strategies: When and What to Index

Clustered and nonclustered indexes, covering indexes, and alignment with query patterns.

services
Read the article

Introduction

This guidance is relevant when the topic of this article applies to your system or design choices; it breaks down when constraints or context differ. I’ve applied it in real projects and refined the takeaways over time (as of 2026).

Slow queries and degrading reports or APIs at scale usually mean the database is doing more work than it needs—often due to missing or misaligned indexes. This article covers when and what to index, clustered vs nonclustered indexes, composite column order, covering indexes and INCLUDE, and how indexes affect SELECT, JOIN, ORDER BY, and WHERE, with SQL examples and query plans. For architects and tech leads, aligning indexes with real query patterns and maintaining fragmentation and statistics keeps read performance predictable and avoids costly rewrites later.

For a deeper overview of this topic, explore the full Database Design & Optimization guide.

Decision Context

  • System scale: Tables from thousands to millions (or more) of rows; read-heavy or mixed workloads. Applies when you’re tuning queries or designing schema and need to decide what to index.
  • Team size: Developers and DBAs; someone must own index strategy, maintenance, and measurement. Works when the team can read execution plans and align indexes with query patterns.
  • Time / budget pressure: Fits when you have slow queries to fix or new features that need predictable read performance; breaks down when there’s no time to measure (adding indexes blindly can hurt writes).
  • Technical constraints: SQL Server or Azure SQL (and similar for other engines); clustered and nonclustered indexes; query plans and Query Store. Assumes you can run maintenance (rebuild/reorg, statistics).
  • Non-goals: This article does not optimise for schema design in the abstract or for NoSQL; it focuses on relational indexing (SQL Server/Azure SQL) and when/what to index.

What is an index?

An index in a database is like the index at the back of a book. The book has all the content in order (by page); the index gives you a separate, ordered list of “topic → page number.” When you look up a topic, you use the index to jump straight to the right page instead of reading every page. A database does the same thing:

  • The table holds the rows (in physical order determined by the clustered index, or in no particular order if it’s a heap).
  • An index holds copies of selected column values, sorted, and pointers back to the rows (row IDs or clustered key values).
  • When you filter (WHERE), sort (ORDER BY), or join on those columns, the engine can use the index to find the right rows without scanning the whole table.

Indexes are separate structures (except the clustered index, which is the table order). They consume storage and must be updated on every INSERT, UPDATE, and DELETE. So we add indexes for read performance and accept a write cost; we don’t index every column, we index for the queries that matter.

Index Seek vs Table Scan: efficiency difference

The core efficiency difference is seek vs scan.

Operation What the engine does When it’s used Typical cost (large table)
Index Seek Uses the index to jump to the right rows (B-tree lookup). Reads only index pages + the rows that match. WHERE/JOIN on indexed columns with selective predicate. Low: O(log n) index traversal + matching rows.
Table Scan Reads every row in the table (or every row in an index). No useful index, or predicate not on index key, or very low selectivity. High: O(n) — full table.
Index Scan Reads every row in the index (all index pages). When the whole index is needed (e.g. covering index for a query that returns many rows). Medium: full index, but index is smaller than table.

Example: Table Orders with 10 million rows. Query: SELECT * FROM Orders WHERE CustomerId = 42.

  • Without an index on CustomerId: Engine does a Table Scan — reads all 10 million rows, checks each CustomerId. Cost: millions of logical reads.
  • With a nonclustered index on CustomerId: Engine does an Index Seek — looks up CustomerId = 42 in the index (few index pages), gets a short list of row pointers, then fetches only those rows. Cost: dozens or hundreds of logical reads.

The efficiency gain is often 100x or more for selective queries. That’s why we index columns that appear in WHERE, JOIN, and ORDER BY.

Clustered index: one per table

Every table in SQL Server has at most one clustered index. It doesn’t just point to the data; it defines the physical order in which the rows are stored. The table is the clustered index — the leaf level of the clustered index contains the full row data. That’s why we say “the” clustered index: there’s only one, and choosing it is a big deal. Inserts and updates have to keep rows in that order, so the key you pick affects write performance and fragmentation.

SQL: create a table with clustered primary key (default in SQL Server)

CREATE TABLE dbo.Orders (
    Id          BIGINT IDENTITY(1,1) NOT NULL,
    CustomerId  INT NOT NULL,
    Status      VARCHAR(20) NOT NULL,
    CreatedAt   DATETIME2(0) NOT NULL,
    Total       DECIMAL(18,2) NOT NULL,
    CONSTRAINT PK_Orders PRIMARY KEY CLUSTERED (Id)  -- clustered on Id
);

Good clustered key: narrow (few bytes), stable (doesn’t change), ever-increasing (new rows go at the end). A surrogate Id (e.g. bigint identity) or CreatedAt fits. Bad: wide or random key (e.g. GUID, or many columns) — inserts cause page splits (rows inserted in the middle), which fragments the table and hurts performance.

SQL: create clustered index on existing table (heap)

CREATE CLUSTERED INDEX IX_Orders_Clustered ON dbo.Orders (CreatedAt);

The primary key does not have to be the clustered key. You can have a nonclustered primary key and a different clustered key, but in practice a narrow Id as both primary and clustered key is a safe default.

Nonclustered indexes: for filters, joins, and sorts

Nonclustered indexes are separate structures. They store a copy of the index key columns (and optionally more columns in INCLUDE) and pointers to the row (the clustered key, or a row ID if the table is a heap). You can have many nonclustered indexes per table. They’re for the columns that show up in WHERE, JOIN, and ORDER BY.

SQL: create a nonclustered index

CREATE NONCLUSTERED INDEX IX_Orders_CustomerId ON dbo.Orders (CustomerId);

Now the query SELECT * FROM Orders WHERE CustomerId = 42 can use Index Seek on IX_Orders_CustomerId instead of Table Scan. The engine finds the matching index entries, then does a key lookup to the clustered index to get the rest of the row (Status, CreatedAt, Total). If the query only needed CustomerId and Id, the index could cover the query (see Covering indexes).

Efficiency: Each nonclustered index costs writes (every INSERT/UPDATE/DELETE must update the index) and space. Index only for queries that are proven slow or very frequent. Use execution plans and Query Store to see which queries need help.

Composite indexes: column order matters

A composite index is an index on more than one column. The order of columns is critical: the index is sorted by the first column, then by the second within the first, and so on. So the engine can seek only on a leading prefix of the key.

Rule of thumb: Put equality columns first (Status = 'Pending'), then range or sort columns (CreatedAt > @date, ORDER BY CreatedAt). That way the engine can seek to the right “slice” and then scan or seek within it.

SQL: composite index — good column order

-- Good for: WHERE CustomerId = @id AND Status = 'Pending' ORDER BY CreatedAt DESC
CREATE NONCLUSTERED INDEX IX_Orders_CustomerId_Status_CreatedAt
ON dbo.Orders (CustomerId, Status, CreatedAt DESC);
  • Query A: WHERE CustomerId = 42 — can use the index (CustomerId is leading). Seek on CustomerId.
  • Query B: WHERE CustomerId = 42 AND Status = 'Pending'Seek on CustomerId + Status.
  • Query C: WHERE CustomerId = 42 AND Status = 'Pending' ORDER BY CreatedAt DESCSeek + order already in index. Very efficient.
  • Query D: WHERE Status = 'Pending'cannot seek on Status (Status is not leading). Engine may scan the whole index or table. So for “all pending orders” you’d want a separate index with Status first, e.g. (Status, CreatedAt).

SQL: wrong column order — index not used for Status-only query

-- This index helps ORDER BY CreatedAt, but NOT WHERE Status = 'Pending'
CREATE NONCLUSTERED INDEX IX_Orders_CreatedAt_Status ON dbo.Orders (CreatedAt, Status);

Table: composite index column order

Index key order Good for query Bad for query
(CustomerId, Status, CreatedAt) WHERE CustomerId = ? AND Status = ? ORDER BY CreatedAt WHERE Status = ? only
(Status, CreatedAt) WHERE Status = ? ORDER BY CreatedAt WHERE CreatedAt > ? only
(CreatedAt, Status) WHERE CreatedAt > ? AND Status = ? WHERE Status = ? only

Covering indexes and INCLUDE

Sometimes the engine finds the right rows via the index but then has to go back to the table to get other columns — a key lookup. That extra hop is done per row and can be expensive if many rows match. A covering index contains all the columns the query needs (in the key or in INCLUDE), so the engine never has to touch the table — the entire result is satisfied from the index.

SQL: covering index with INCLUDE

-- Query: SELECT Id, CustomerId, Status, CreatedAt, Total FROM Orders WHERE CustomerId = @id AND Status = 'Pending'
-- Key: columns we filter/sort on. INCLUDE: columns we only need in the output.
CREATE NONCLUSTERED INDEX IX_Orders_CustomerId_Status_INCLUDE_Total_CreatedAt
ON dbo.Orders (CustomerId, Status) INCLUDE (Total, CreatedAt);

Here Id is the clustered key so it’s in every nonclustered index leaf anyway. We add Total and CreatedAt in INCLUDE so the query doesn’t need a key lookup. Result: Index Seek + no lookup — index-only read. Don’t overdo INCLUDE: every extra column bloats the index and slows writes. Add only what the query actually selects.

How indexes affect JOIN efficiency

JOINs match rows from two tables. Without indexes on the join columns, the engine often does nested loops over full table scans or hash joins that spill to tempdb. With indexes, the engine can seek into the inner table for each row from the outer table (or use merge join if both sides are sorted).

Example: Orders JOIN Customers

SELECT o.Id, o.Total, c.CustomerName
FROM dbo.Orders o
INNER JOIN dbo.Customers c ON c.Id = o.CustomerId
WHERE o.Status = 'Pending';
  • Without index on Orders.CustomerId: For each row in the join, the engine might scan Customers or do a hash join. If Orders has an index on (Status, CustomerId), the engine can seek on Status = 'Pending', then for each CustomerId seek into Customers if Customers.Id is the primary (clustered) key.
  • Without index on Customers.Id: Customers should have a clustered primary key on Id, so lookups by Id are already fast. If Customers is a heap or the key is different, an index on Id (or the join column) is essential.

SQL: indexes that make the JOIN efficient

-- Orders: seek by Status, then join to Customers by CustomerId
CREATE NONCLUSTERED INDEX IX_Orders_Status_CustomerId ON dbo.Orders (Status, CustomerId);

-- Customers: clustered PK on Id is enough for the join lookup
-- (already have it if Id is primary key clustered)

Efficiency summary for JOINs

Table Index on join column Effect
Inner table (e.g. Customers) Clustered or nonclustered on Id Lookup by o.CustomerId is a seek instead of scan.
Outer table (e.g. Orders) Index on Status (and CustomerId if needed) Filter (WHERE) is a seek; fewer rows passed to the join.
Both Composite index (Status, CustomerId) on Orders Single seek on Orders, then seek on Customers per row — nested loops with two seeks.

How indexes affect WHERE and ORDER BY

  • WHERE: Index on the filtered column(s) allows Index Seek instead of Table Scan. Composite index: put equality columns first so the engine can seek on them.
  • ORDER BY: If the index key matches the ORDER BY order (and direction), the engine can read rows already sorted — no separate sort operator. Example: index on (CustomerId, CreatedAt DESC) for ORDER BY CreatedAt DESC when filtered by CustomerId.
  • WHERE + ORDER BY: Composite index on (equality columns, sort column) gives both seek and pre-sorted read. Example: (Status, CreatedAt) for WHERE Status = 'Pending' ORDER BY CreatedAt DESC.

SQL: index for WHERE + ORDER BY

-- Query: SELECT * FROM Orders WHERE Status = 'Pending' ORDER BY CreatedAt DESC
CREATE NONCLUSTERED INDEX IX_Orders_Status_CreatedAt ON dbo.Orders (Status, CreatedAt DESC);

Index maintenance: fragmentation and statistics

Fragmentation: Over time, inserts and deletes leave gaps and out-of-order pages. That’s fragmentation. It can hurt read performance (more I/O). Check with sys.dm_db_index_physical_stats. Rebuild or reorganise when fragmentation is high (e.g. > 30%).

SQL: check fragmentation

SELECT OBJECT_NAME(ips.object_id) AS TableName, i.name AS IndexName,
       ips.avg_fragmentation_in_percent, ips.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') ips
JOIN sys.indexes i ON ips.object_id = i.object_id AND ips.index_id = i.index_id
WHERE ips.avg_fragmentation_in_percent > 10
ORDER BY ips.avg_fragmentation_in_percent DESC;

SQL: rebuild vs reorganise

-- Rebuild: new copy of the index. Use for high fragmentation. Can be online.
ALTER INDEX IX_Orders_CustomerId ON dbo.Orders REBUILD WITH (ONLINE = ON);

-- Reorganise: defragment in place. Lighter, for moderate fragmentation.
ALTER INDEX IX_Orders_CustomerId ON dbo.Orders REORGANIZE;

Statistics: The engine uses statistics to choose plans. Stale or missing statistics → wrong plans (e.g. scan instead of seek). Update after creating or changing indexes.

SQL: update statistics

UPDATE STATISTICS dbo.Orders WITH FULLSCAN;

Query plans: what to look for

  • Index Seek — good: engine uses index to jump to rows.
  • Table Scan / Index Scan — on a large table, often bad: full read.
  • Key Lookup — extra hop from nonclustered index to table; consider covering index with INCLUDE.
  • Estimated vs actual rows — big gap can mean statistics are off; update statistics.
  • Missing index suggestion — plan may suggest an index; verify and add if it matches real workload.

SQL: get execution plan (text)

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT * FROM dbo.Orders WHERE CustomerId = 42;
-- Check "logical reads" and execution plan (SSMS: Include Actual Execution Plan).

Worked examples with full SQL

Setup: Orders table

CREATE TABLE dbo.Orders (
    Id          BIGINT IDENTITY(1,1) NOT NULL,
    CustomerId  INT NOT NULL,
    Status      VARCHAR(20) NOT NULL,
    CreatedAt   DATETIME2(0) NOT NULL,
    Total       DECIMAL(18,2) NOT NULL,
    CONSTRAINT PK_Orders PRIMARY KEY CLUSTERED (Id)
);

Query 1: Pending orders for customer, newest first, with Total.

SELECT Id, CustomerId, Status, CreatedAt, Total
FROM dbo.Orders
WHERE CustomerId = @id AND Status = 'Pending'
ORDER BY CreatedAt DESC;

Index: Composite + covering.

CREATE NONCLUSTERED INDEX IX_Orders_CustomerId_Status_CreatedAt
ON dbo.Orders (CustomerId, Status, CreatedAt DESC) INCLUDE (Total);

Query 2: All pending orders (report).

SELECT Id, CustomerId, Status, CreatedAt, Total
FROM dbo.Orders
WHERE Status = 'Pending'
ORDER BY CreatedAt DESC;

Index: Status leading, then CreatedAt; INCLUDE Total.

CREATE NONCLUSTERED INDEX IX_Orders_Status_CreatedAt
ON dbo.Orders (Status, CreatedAt DESC) INCLUDE (Total);

When not to index

  • Very small tables — full scan is cheap; index overhead not worth it.
  • Columns with very low selectivity (e.g. boolean) — index may not be used; measure.
  • Heavy write, rare read — each index slows inserts/updates; balance with read gain.
  • Append-only log tables — few indexes; often clustered on identity/date only.

Common mistakes

Mistake Fix
Over-indexing Index only for proven slow or frequent queries; drop unused indexes.
Wrong column order in composite Equality first, then range/sort; match index to query.
Missing or stale statistics Auto-update on; maintenance job for large/volatile tables.
Clustered on GUID Prefer narrow, ever-increasing key (Id, CreatedAt).
Ignoring key lookups Add INCLUDE or key columns so index covers the query.
No measurement Use execution plans and Query Store; measure before and after.

You can also explore more patterns in the Database Design & Optimization resource page.

Summary

Indexes speed up reads (WHERE, JOIN, ORDER BY) by allowing Index Seek instead of Table Scan; use clustered (one per table) and nonclustered (many per table) with composite column order (equality then range/sort) and covering (INCLUDE) where key lookups hurt. Wrong or missing indexes lead to table scans and slow queries under load; measuring with execution plans and Query Store and maintaining fragmentation and statistics keeps performance predictable. Next, identify your slow or frequent queries, then add or adjust indexes to match and verify with plans.

Position & Rationale

I index for proven slow or frequent queries first—not speculatively. I use execution plans and Query Store to see seeks vs scans and key lookups; I add or adjust indexes based on that. I prefer one clustered index per table on a narrow, ever-increasing key (e.g. Id or CreatedAt); I avoid GUIDs or wide keys for the clustered index because they fragment. For composite indexes I put equality columns first, then range or ORDER BY columns, so the index matches the query. I use INCLUDE to cover the query and avoid key lookups when the extra columns are read-only in the plan. I avoid over-indexing: every index costs writes and space; I drop unused indexes. I don’t add indexes without measuring before and after.

Trade-Offs & Failure Modes

Indexes speed up reads but slow writes and consume space; too many indexes hurt INSERT/UPDATE/DELETE. Clustered choice affects fragmentation and all nonclustered indexes (they reference the clustered key). Covering (INCLUDE) avoids lookups but makes the index larger. Failure modes: indexing every column “just in case”; wrong column order in composite indexes so the engine can’t seek; missing or stale statistics so the optimizer picks bad plans; ignoring key lookups when a covering index would help; clustered on GUID causing heavy fragmentation.

What Most Guides Miss

Most guides explain seek vs scan but don’t stress column order in composite indexes enough—the engine can only use the index for a leading prefix, so (Status, CreatedAt) helps WHERE Status = @s ORDER BY CreatedAt but not WHERE CreatedAt > @d alone. Another gap: key lookups—when the plan shows a lookup, consider INCLUDE or adding key columns so the index covers the query. Statistics and fragmentation are often mentioned but not tied to “measure, then add; measure, then drop”; without measurement you’re guessing.

Decision Framework

  • If a query is slow or frequent → Capture the plan; look for scans and key lookups; add or adjust index to get seeks and cover the query.
  • For composite index → Put equality columns first, then range/ORDER BY; match the predicate order where it helps.
  • For key lookup → Add INCLUDE columns or extend the key so the index covers the query.
  • For clustered → One per table; narrow, stable, ever-increasing key (Id, CreatedAt); avoid GUID.
  • Maintenance → Rebuild or reorganise when fragmented; keep statistics up to date; drop unused indexes.

Key Takeaways

  • Index for proven slow or frequent queries; measure with execution plans and Query Store.
  • Clustered = one per table, narrow and ever-increasing; nonclustered = many per table, match query patterns.
  • Composite column order: equality first, then range/sort; leading prefix only.
  • Covering (INCLUDE) removes key lookups; use when the plan shows a lookup on a hot path.
  • Indexes cost writes and space; maintain and drop unused indexes.

For enterprise applications, I offer consulting on database architecture, performance tuning, and EF Core optimization.

When I Would Use This Again — and When I Wouldn’t

I’d use this indexing approach again for any SQL Server or Azure SQL workload where reads dominate or specific queries are slow—align indexes with query patterns and measure. I wouldn’t add indexes without a plan and measurement; speculative indexing leads to write cost and maintenance without proof of benefit. I also wouldn’t cluster on GUID or a volatile key; the fragmentation cost is real. For small tables (e.g. < 10k rows) I often leave the default or add one targeted index; over-indexing small tables is unnecessary.

services
Frequently Asked Questions

Frequently Asked Questions

What is a clustered index?

A clustered index defines the physical order of the rows in the table. There is only one per table. The table is stored in that order. Choose a narrow, stable, ever-increasing key (e.g. Id or CreatedAt) to avoid page splits and fragmentation.

What is a nonclustered index?

A nonclustered index is a separate structure that stores key columns and pointers to the rows. The engine uses it to find rows by key without scanning the table. You can have many nonclustered indexes per table; each one speeds up certain queries but adds cost to writes.

What is a covering index?

A covering index contains all the columns a query needs (in the key or INCLUDE), so the engine can satisfy the query from the index alone and doesn’t have to do a key lookup to the table. That reduces I/O and often improves read performance a lot.

Why does column order matter in a composite index?

The index is sorted by the first column, then the second, and so on. The engine can only seek on a leading prefix. So (Status, CreatedAt) helps WHERE Status = 'X' ORDER BY CreatedAt but not WHERE CreatedAt > @date alone. Put equality columns first, then range or sort columns.

When should I use INCLUDE?

Use INCLUDE when you want a covering index but don’t want the extra columns in the key. Key columns determine order and seekability; INCLUDE columns are stored in the leaf for lookups. Add only the columns the query needs so the index doesn’t get too wide.

What is index fragmentation?

Fragmentation is when the physical pages of an index are out of order or have gaps from inserts and deletes. It can hurt read performance. Rebuild or reorganise the index when fragmentation is high (e.g. above 30%); rebuild is more thorough, reorganise is lighter.

What are statistics and why do they matter?

Statistics describe the distribution of values in columns (e.g. row counts, histograms). The engine uses them to choose execution plans. Stale or missing statistics lead to bad plans (e.g. scan instead of seek). Keep auto-update on and consider maintenance jobs for large or volatile tables.

How do I find unused indexes?

Use sys.dm_db_index_usage_stats to see how often each index was used for seeks and scans. If an index has zero or very low use over a representative period, consider dropping it to reduce write cost and storage.

Should the primary key be the clustered key?

Not necessarily, but often yes. The primary key must be unique; the clustered key defines physical order. A narrow, increasing primary key (e.g. Id) is a good choice for both. Avoid clustering on a GUID or other wide, random key if you care about insert performance.

How do indexes affect JOINs?

Indexes on join columns (e.g. CustomerId on Orders, Id on Customers) let the engine seek into the inner table instead of scanning. That turns a hash or nested-loop scan into index seeks and can dramatically reduce I/O and runtime.

What is a key lookup?

A key lookup is when the engine uses a nonclustered index to find rows but then has to fetch other columns from the table (clustered index). It’s an extra hop per row. A covering index (with INCLUDE) avoids key lookups by storing those columns in the index leaf.

When should I rebuild vs reorganise?

Rebuild creates a new copy of the index; use when fragmentation is high or you want a clean slate. Reorganise defragments in place and is lighter. Use reorganise for moderate fragmentation; use rebuild in maintenance windows when you need maximum effect.

How do I measure the impact of an index?

Run the query before and after adding the index. Compare execution plan (seek vs scan), estimated and actual rows, and runtime. Use Query Store or similar to track over time. Remove the index if it’s not used or if write cost isn’t worth the read gain.

What is a heap?

A heap is a table with no clustered index. Rows are stored without a defined order. Heaps are rare; most tables have a clustered index. Adding a clustered index to a heap can improve range scans and lookups.

How do I choose the clustered key for an existing table?

Prefer a narrow, stable, ever-increasing column (Id, CreatedAt). If the current key is wide or random (e.g. GUID), consider adding a new column (e.g. bigint identity) and clustering on that, then making the old key nonclustered if it’s still needed for uniqueness.

services
Related Guides & Resources

services
Related services