Indexes
Database indexes are fundamental to database performance optimization. They allow for quick data retrieval by minimizing the amount of data the database system has to scan. Without indexes, databases would need to perform a full table scan for every query, which is inefficient for large datasets. Indexes are widely used in relational databases such as PostgreSQL, MySQL, and Oracle to enhance the performance of queries involving search, filtering, sorting, and joins.
What Is a Database Index?
A database index is a separate data structure that stores a mapping of column values to the physical location of rows in the table. It acts like a "lookup table" for the database, enabling it to quickly locate rows matching specific criteria.
Indexes are typically implemented using B-Trees, hash tables, or other optimized data structures. When a query references a column that has an index, the database uses the index to quickly identify the rows of interest instead of scanning the entire table.
How Indexes Work (Brief Overview)
- When an index is created on a column, the database builds a data structure (like a B-Tree or a hash table) that maps column values to the physical locations of rows in the table.
- The index stores sorted values of the indexed column(s) along with pointers to the rows where the values are found.
- For queries involving indexed columns, the database uses the index to efficiently locate rows, rather than scanning the entire table.
- For multi-column indexes (composite indexes), the order of columns in the index affects performance based on the query patterns.
B-Tree vs Hash Table
Indexes are crucial for optimizing database queries, and the two widely used index structures are B-Tree and Hash Table. Here's a concise comparison:
B-Tree Index
- Structure: Balanced tree with sorted data.
- Best For: Range queries (
BETWEEN,<,>, etc.) and ordered results. - Features:
- Supports both equality (
=) and range queries. - Allows ordered traversal and efficient sorting.
- Logarithmic (
O(log n)) performance for lookups.
- Supports both equality (
- Example:
CREATE INDEX idx_btree_salary ON employees(salary);
SELECT * FROM employees WHERE salary BETWEEN 50000 AND 100000;
Hash Table Index
- Structure: Hash function maps keys to slots.
- Best For: Exact-match lookups (
=). - Features:
- Optimized for equality queries.
- Faster constant-time (
O(1)) lookups but no range support. - Cannot handle sorting or ordering.
- Example:
CREATE INDEX idx_hash_username ON employees USING hash(username);
SELECT * FROM employees WHERE username = 'abhishek123';
Comparison Table
| Aspect | B-Tree Index | Hash Table Index |
|---|---|---|
| Query Support | Equality + Range (=, <, >, BETWEEN) | Equality only (=) |
| Sorting | Supports ordered traversal | No sorting |
| Performance | O(log n) for lookups | O(1) for exact matches |
| Use Case | General-purpose indexing | Optimized for exact-match lookups |
| Range Queries | Supported | Not supported |
| Space Efficiency | Higher due to tree structure | Compact |
| Collisions | Not applicable | Possible hash collisions |
Index Types
Indexes in databases are specialized data structures that optimize query performance by enabling fast data retrieval. They can be broadly categorized based on their structure, functionality, and use cases. Below are the most common types of database indexes:
Single-Column Index
- Description: An index created on a single column of a table.
- Use Case: Speeds up queries that filter or sort by a single column.
- Example (PostgreSQL):
CREATE INDEX idx_employee_id ON employees(employee_id);
Composite Index
- Description: An index created on multiple columns of a table.
- Use Case: Improves performance for queries that involve filtering or sorting on multiple columns.
- Example:
CREATE INDEX idx_name_dept ON employees(last_name, department_id); - Note: The order of columns in a composite index matters. For example, the above index is useful for queries that filter by
last_nameor by bothlast_nameanddepartment_id, but not for filtering solely bydepartment_id.
Unique Index
- Description: Ensures that all values in the indexed column(s) are unique.
- Use Case: Used to enforce uniqueness constraints (e.g., on primary keys or email addresses).
- Example:
CREATE UNIQUE INDEX idx_unique_email ON employees(email);
Primary Key Index
- Description: Automatically created when a
PRIMARY KEYconstraint is defined. It is a unique index. - Use Case: Enforces the uniqueness and non-nullability of the primary key column(s).
- Example:
ALTER TABLE employees ADD PRIMARY KEY (employee_id);
Clustered Index
- Description: Determines the physical order of data in the table. A table can have only one clustered index.
- Use Case: Optimizes range queries and queries requiring ordered results.
- Example (PostgreSQL uses
PRIMARY KEYorUNIQUEconstraints to create clustered indexes):CREATE INDEX idx_ordered_employees ON employees(employee_id);
Non-Clustered Index
- Description: Does not affect the physical order of data in the table. Stores pointers to the rows in a separate data structure.
- Use Case: Useful for columns frequently queried but not requiring data to be stored in order.
- Example:
CREATE INDEX idx_lastname ON employees(last_name);
Full-Text Index
- Description: Optimized for searching large text fields.
- Use Case: Used for full-text search in applications like search engines.
- Example:
CREATE INDEX idx_fulltext_description ON products USING gin(to_tsvector('english', description));
Partial Index
- Description: Indexes only a subset of rows based on a condition.
- Use Case: Reduces index size and improves performance by indexing only frequently queried rows.
- Example:
CREATE INDEX idx_active_employees ON employees(employee_id) WHERE status = 'active';