Hash Tables

Hash Tables: Introduction

In computing, a hash table (hash map) is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored (Source: wikipedia). A simple real-world example of a hash table would be a dictionary. (See reference).

Hash functions are used to convert a string into a number. The number is used to index into a hash table. Hash functions are

one way: we can convert a string into a number, but we can't convert a number back into a string.

deterministic: the same string will always have the same number.

Dictionary is Python's built-in hash table data structure.

HT: Collisions

Collisions are the result of a hash function producing a number that is already in use. In a hash table, collisions are resolved by using a separate chaining technique. Another way of dealing with collisions is to move to the next bucket until an empty slot is found - this is called linear probing. Linear probing is one of the open addressing methods.

HT: Constructor

class HashTable:
    def __init__(self, size = 7):
        self.data_map = [None] * size
      
    def __hash(self, key):
        my_hash = 0
        for letter in key:
            my_hash = (my_hash + ord(letter) * 23) % len(self.data_map)
        return my_hash  

    def print_table(self):
        for i, val in enumerate(self.data_map): 
            print(i, ": ", val)

my_hash_table = HashTable()

my_hash_table.print_table()

HT: Set

    def set_item(self, key, value):
        index = self.__hash(key)
        if self.data_map[index] == None:
            self.data_map[index] = []
        self.data_map[index].append([key, value])
        
my_hash_table = HashTable()
my_hash_table.set_item('bolts', 1400)
my_hash_table.set_item('washers', 50)
my_hash_table.set_item('lumber', 70)
my_hash_table.print_table()

HT: Keys

    def keys(self):
        all_keys = []
        for i in range(len(self.data_map)):
            if self.data_map[i] is not None:
                for j in range(len(self.data_map[i])):
                    all_keys.append(self.data_map[i][j][0])
        return all_keys

my_hash_table = HashTable()
my_hash_table.set_item('bolts', 1400)
my_hash_table.set_item('washers', 50)
my_hash_table.set_item('lumber', 70)
print(my_hash_table.keys())

HT: Big O

Hash method: O(1)

Get item: O(1). The distribution of the data in the hash table is assumed to be uniform.

HT: Interview Questions

We have two list [1, 3, 5] and [2, 4, 5]. We want to find if the two lists have any common elements. If yes return true, otherwise return false.

Naive approach: create a nested for loop and check if the current element in the first list is in the second list. The time complexity is O(n^2).

Naive approach code:

def item_in_common(list1, list2):
    for i in list1:
        for j in list2:
            if i == j:
                return True
    return False

list1 = [1,3,5]
list2 = [2,4,5]
print(item_in_common(list1, list2))

More efficient approach: create a hash table and store the elements in the first list in the hash table. Then check if the current element in the second list is in the hash table. The time complexity is O(n).

def item_in_common(list1, list2):
    my_dict = {}
    for i in list1:
        my_dict[i] = True
    for j in list2:
        if j in my_dict:
            return True
    return False

list1 = [1,3,5]
list2 = [2,4,6]
print(item_in_common(list1, list2))

We only need to loop one time for each list thus it is more efficient.

Quiz

  • Both Insert and Lookup by key in a Hash Table is O(1): True

  • Since a Hash Table is O(1) for Insert and Lookup it is always better than a Binary Search Tree: False (Binary Search Trees are sorted which makes them better at searching for all values that fall within a range.)

  • Looking up a value in a Hash Table is O(1): False (Key lookup is O(1) but not value.)

Last updated