Hash table

Definition

A usually unordered data structure that maps keys to values.
A hash (preferably unique) is computed for a given key and determines at which bin/index the data will be stored.
Collision can happen when multiple keys are mapped to the same hash –> methods to solve collision problem.

Hashing function needs to be:

  • deterministic: always same value for a given input.
  • fast & easy to compute.
  • uniform: ensures uniform distribution of values.

Example of hashing functions:

  • Extraction: extract bits from the value to form the key.
  • Compression: bits are divided into blocks of equal lengths and combined logically (rotation to prevent anagram having same hash).
  • Division: .
  • Multiplication: weird multiplication using modulo.

Collision resolution methods

Indirect

Element with same hash are chained:

  • in a linked list = separate chaining
  • in an overflow area in the table = coalesced hashing if dynamic allocation is not possible (this is rarely seen).
    Colliding values are put in the end of table. For each value, the index of the next colliding element is stored.

Direct

Computes a hash until empty slot is found with varying functions = open-adressing.

There are different types of functions:

  • Linear probing: looks at next slot.
  • Quadratic probing: looks at the i²th slot at the ith iteration.
  • Double hashing (prevents clustering but costly): combine 2 hash functions.

Separate vs Open-adressing

Separate Open-adressing
Wastage of space (some slots unused) Every slots can be used even when h(x) does not map to it
Extra space for links No links
Chaining is mostly used when it is unknown how many and how frequently keys may be inserted or deleted. Open addressing is used when the frequency and number of keys is known.

Pros and Cons

Pros:

  • Fast search/insertion/deletion on average: .

Cons:

  • No order.
  • Large constant time for lookup.
  • Data distributed randomly in memory.
  • Becomes quite inefficient when there are many collisions.

Complexity

Operation Complexity
Search O(n) but θ(1)
Insert O(n) but θ(1)
Delete O(n) but θ(1)

The performance of a hash table depends efficiency of hash, best case is the perfect hashing function, worst case is when all elements have same hash.
For every operation:

  • best case: direct access through hash.
  • worst case: goes through every colliding element.