Hash Tables

Notes

This data structure is used frequently in day to day programming ie very common

A Hash table takes a key-value pair and stores it, and when given a key, it quickly returns the pair.

Key Functions

Now, given a name, return a number quickly.

Normally, if we would use an array, this would be a slow operation

The lookup time for a hash table look up function is very close to constant time

This can be mapped to an entire object, not just a single value

To achieve this constant time look up works we need to dive a little deep.

Open Hashing - Separate Chaining

Hashing - have an array with keys. Create a function which maps keys to array elements. Hash

A hash function will take a key and return an index to the location in memory where the key is stored.

For a good hash function, we want items well distributed in the hash table

a simple hash function would be say, using the modulo operator by 10...

so hash(32) would return an index of 2. Like wise, hash(97) would give us an index 7...but so would hash(177) , and hash(37) so we can chan this to the 7

Pasted image 20220726182750.png

In reality though, our hash table should be much much larger(generally twice the size of what we think we'd need)

Array size should also be a prime number so data is more evenly spread (since you always mod by the size), this way, handling collisions don't lead to problems.

So now, to look up the key for 37. We look up the hash value for 37, then do a linear search for the key we're looking for.

Hash collisions cause the lookup to not be in constant time

Closed Hashing / Open addressing

Linear Probing

Pasted image 20220726184005.png
When we insert 32, we do a mod by 10 and place 32 in index 2

With linear probing, if we have a collision, we look for the next empty slot and place it there.

This leads to primary clustering when there is a collision the situation gets worse.

Quadratic Probing

Pasted image 20220726184229.png
With quadratic probing, we initially try the index, if that is taken we do the index plus i^2, if that is taken, we increment i where i is the i'th collision. This spreads out the clustering.

Double Hashing

Pasted image 20220726184408.png

In double hashing we use two hash functions
example
h1(x) = x % 10
h2(x) = 7 - (x % 7)

h1(89) = 9
h1(18) = 8
h1(49) = 9, h2(49) = 7- (49 % 7) = 7
-> try h1(49) + 1 * h2(49) = 9 +1 * 7 = 16 (% 10 ) then we run through h1 again h1(16) = 6

Load Factor

The load of your current hash table ...ie what percentage of your allocated memory is occupied...so if you have a hash map of 10, and you have 6 items stored, then your load factor is 6/10 or α=nc

In closed hashing, your load factor will always be 1
However, in open hashing, there is no limit for how many items you can change, so your size may be far greater than one.

Our goal is to always have a small load factor as it reduces the likelihood of collisions

Bloom Filter

A bloom filter is a bit vector that is instantiated to all bits set to 0. When we add something to a bloom filter, we apply it through multiple hash functions. Each hash function will produce a different result, then we will sit the index = to 1.

Once the bit is set, it will always remain set, it can never reset back to zero.
Pasted image 20220726200834.png

Bloom filters are used for testing whether a key has been previously inserted into a table or not.

It can tell very quickly if an item does not exist , or if it may exist

As long as one of the bits is not set to one, you can be certain that a key does not exist
Pasted image 20220726200940.png

Questions

How do we decide on the hash function

It depends on the set of data, there is no universally fit hash function

Snippets

Terms

Resources

Bloom Filter