Hash table
Definition
A usually unordered data structure that maps keys to values.
A hash (preferably unique) is computed for a given key and determines at which
bin/index the data will be stored.
Collision can happen when multiple keys are mapped to the same hash –> methods
to solve collision problem.
Hashing function needs to be:
- deterministic: always same value for a given input.
- fast & easy to compute.
- uniform: ensures uniform distribution of values.
Example of hashing functions:
- Extraction: extract bits from the value to form the key.
- Compression: bits are divided into blocks of equal lengths and combined logically (rotation to prevent anagram having same hash).
- Division: .
- Multiplication: weird multiplication using modulo.
Collision resolution methods
Indirect
Element with same hash are chained:
- in a linked list = separate chaining
- in an overflow area in the table = coalesced hashing if dynamic allocation
is not possible (this is rarely seen).
Colliding values are put in the end of table. For each value, the index of the next colliding element is stored.
Direct
Computes a hash until empty slot is found with varying functions = open-adressing.
There are different types of functions:
- Linear probing: looks at next slot.
- Quadratic probing: looks at the i²th slot at the ith iteration.
- Double hashing (prevents clustering but costly): combine 2 hash functions.
Separate vs Open-adressing
Separate | Open-adressing |
---|---|
Wastage of space (some slots unused) | Every slots can be used even when h(x) does not map to it |
Extra space for links | No links |
Chaining is mostly used when it is unknown how many and how frequently keys may be inserted or deleted. | Open addressing is used when the frequency and number of keys is known. |
Pros and Cons
Pros:
- Fast search/insertion/deletion on average: .
Cons:
- No order.
- Large constant time for lookup.
- Data distributed randomly in memory.
- Becomes quite inefficient when there are many collisions.
Complexity
Operation | Complexity |
---|---|
Search | O(n) but θ(1) |
Insert | O(n) but θ(1) |
Delete | O(n) but θ(1) |
The performance of a hash table depends efficiency of hash, best case is the
perfect hashing function, worst case is when all elements have same hash.
For every operation:
- best case: direct access through hash.
- worst case: goes through every colliding element.