Additional data structures that are used by databases to improve the performance of reads on a specific key value at the expense of slowing down writes.
types of index data structures
trade-off analysis
| Type of index | Read Performance | Write Performance | Range Query? | Storage |
|---|---|---|---|---|
| Hash index | in-memory | |||
| B-tree | , where is the size of the tree and is the size of the dataset returned | disk | ||
| LSM tree + SSTables | , where is the size of the tree and is the total size of all of the SSTables. | in-memory + disk |
recovering from faults
All the indexes listed above can make use of write-ahead logs (WALs) to ensure that faults don’t cause dropped writes.
A write-ahead log is used as so:
- A write needs to be made into the index.
- Before writing anything to the index, a record is appended to the WAL.
- The write is made to the index.
- If the write is successful, we “cross off” the record in the WAL.
- If the system crashes before the write completes, the system will first complete any incomplete operations in the WAL before moving on to new write requests.
co-locating data
Indexes contain key-value pairs; the values in these pairs can either be the actual data, or just a pointer to the data.
The three methods here include:
- Clustered index: The index stores the actual data.
- Non-clustered index: The index stores pointers to the data.
- Covered index: The index stores mostly pointers, but keeps data for some commonly accessed keys.
In general, a clustered index sacrifices storage efficiency in favor of faster reads.
Non-clustered indexes are more common in real-life use cases because we often have multiple indexes. If we used clustered indexes, we’d have to duplicate all of the data.
multi-dimensional indexes
Indexes can be created based on multiple columns.
For example, we could index on both name and then date. Everything stays the same, but we’d just sort with name as a priority and then tiebreak with date.