Hash Tables in Python – DSA

    Table of Contents

    1. Introduction
    2. Hash Table Fundamentals
    3. Python Dictionary (Built-in Hash Table)
    4. Hash Functions
    5. Collision Resolution
    6. Implementing Hash Table from Scratch
    7. Common Operations
    8. Time and Space Complexity
    9. Hash Table Patterns
    10. Advanced Concepts
    11. Practice Problems
    12. Tips and Best Practices

    Introduction

    A Hash Table (or Hash Map) is a data structure that implements an associative array abstract data type, mapping keys to values. It uses a hash function to compute an index (hash code) into an array of buckets or slots, from which the desired value can be found.

    Key Characteristics:

    • Fast Access: O(1) average case for search, insert, and delete
    • Key-Value Pairs: Stores data as key-value associations
    • Hash Function: Converts keys into array indices
    • Collision Handling: Manages cases where multiple keys hash to same index

    Real-World Applications:

    • Database indexing
    • Caching systems (LRU Cache, Redis)
    • Symbol tables in compilers
    • Password verification
    • Blockchain (cryptographic hashing)
    • Spell checkers
    • Router tables
    • Implementing sets and maps

    Hash Table Fundamentals

    How Hash Tables Work

    Key → Hash Function → Index → Value
    
    Example:
    "apple"hash("apple")421.50
    "banana"hash("banana")170.75
    Python

    Core Components:

    1. Hash Function: Converts key to array index
    2. Bucket Array: Stores the key-value pairs
    3. Collision Resolution: Handles hash collisions
    4. Load Factor: Ratio of elements to bucket size

    Hash Table Structure

    class HashTable:
        def __init__(self, size=10):
            self.size = size
            self.table = [[] for _ in range(size)]  # Using chaining
    
        def _hash(self, key):
            """Hash function to compute index"""
            return hash(key) % self.size
    Python

    Python Dictionary (Built-in Hash Table)

    Python’s dict is a highly optimized hash table implementation.

    Creating Dictionaries

    # Empty dictionary
    empty_dict = {}
    empty_dict = dict()
    
    # Dictionary with initial values
    person = {
        'name': 'Alice',
        'age': 30,
        'city': 'New York'
    }
    
    # Using dict() constructor
    person = dict(name='Alice', age=30, city='New York')
    
    # From list of tuples
    items = [('a', 1), ('b', 2), ('c', 3)]
    d = dict(items)
    
    # Dictionary comprehension
    squares = {x: x**2 for x in range(10)}
    
    # From two lists
    keys = ['a', 'b', 'c']
    values = [1, 2, 3]
    d = dict(zip(keys, values))
    Python

    Basic Operations

    # Access elements
    person = {'name': 'Alice', 'age': 30}
    print(person['name'])  # 'Alice'
    print(person.get('age'))  # 30
    print(person.get('salary', 0))  # 0 (default value)
    
    # Add/Update elements
    person['city'] = 'Boston'  # Add new key
    person['age'] = 31  # Update existing key
    person.update({'job': 'Engineer', 'age': 32})
    
    # Delete elements
    del person['city']
    age = person.pop('age')  # Remove and return value
    person.popitem()  # Remove and return last (key, value) pair
    person.clear()  # Remove all items
    
    # Check key existence
    if 'name' in person:
        print("Name exists")
    
    # Get all keys, values, items
    keys = person.keys()
    values = person.values()
    items = person.items()
    
    # Length
    length = len(person)
    Python

    Advanced Dictionary Methods

    # setdefault - get value or set default if key doesn't exist
    d = {'a': 1}
    value = d.setdefault('b', 0)  # Returns 0 and sets d['b'] = 0
    
    # fromkeys - create dict from sequence of keys
    keys = ['a', 'b', 'c']
    d = dict.fromkeys(keys, 0)  # {'a': 0, 'b': 0, 'c': 0}
    
    # copy - shallow copy
    d2 = d.copy()
    
    # Merge dictionaries (Python 3.9+)
    d1 = {'a': 1, 'b': 2}
    d2 = {'b': 3, 'c': 4}
    merged = d1 | d2  # {'a': 1, 'b': 3, 'c': 4}
    
    # Update in place
    d1 |= d2
    
    # Older merge methods
    merged = {**d1, **d2}
    merged = dict(d1, **d2)
    Python

    Iteration

    person = {'name': 'Alice', 'age': 30, 'city': 'Boston'}
    
    # Iterate over keys
    for key in person:
        print(key)
    
    for key in person.keys():
        print(key)
    
    # Iterate over values
    for value in person.values():
        print(value)
    
    # Iterate over key-value pairs
    for key, value in person.items():
        print(f"{key}: {value}")
    
    # Enumerate with index
    for i, (key, value) in enumerate(person.items()):
        print(f"{i}: {key} = {value}")
    Python

    Hash Functions

    A good hash function should:

    1. Be deterministic (same input → same output)
    2. Uniformly distribute keys
    3. Be fast to compute
    4. Minimize collisions

    Built-in hash() Function

    # Works with immutable types
    print(hash(42))           # Integer
    print(hash("hello"))      # String
    print(hash((1, 2, 3)))    # Tuple
    
    # Not hashable (mutable types)
    # hash([1, 2, 3])         # TypeError: unhashable type: 'list'
    # hash({'a': 1})          # TypeError: unhashable type: 'dict'
    Python

    Custom Hash Functions

    # Simple hash function for strings
    def simple_hash(key, size):
        """Sum of ASCII values mod size"""
        return sum(ord(c) for c in str(key)) % size
    
    # Polynomial rolling hash
    def polynomial_hash(key, size, prime=31):
        """Better distribution using polynomial"""
        hash_value = 0
        for char in str(key):
            hash_value = (hash_value * prime + ord(char)) % size
        return hash_value
    
    # DJB2 hash (popular string hash)
    def djb2_hash(key, size):
        hash_value = 5381
        for char in str(key):
            hash_value = ((hash_value << 5) + hash_value) + ord(char)
        return hash_value % size
    
    # Using Python's built-in hash
    def builtin_hash(key, size):
        return hash(key) % size
    Python

    Making Custom Objects Hashable

    class Person:
        def __init__(self, name, age):
            self.name = name
            self.age = age
    
        def __hash__(self):
            """Make object hashable"""
            return hash((self.name, self.age))
    
        def __eq__(self, other):
            """Required for hash to work properly"""
            if not isinstance(other, Person):
                return False
            return self.name == other.name and self.age == other.age
    
        def __repr__(self):
            return f"Person('{self.name}', {self.age})"
    
    # Now can use as dictionary key
    p1 = Person("Alice", 30)
    p2 = Person("Alice", 30)
    d = {p1: "Engineer"}
    print(d[p2])  # Works! Returns "Engineer"
    Python

    Collision Resolution

    When two keys hash to the same index, we have a collision.

    1. Chaining (Separate Chaining)

    Store multiple elements at the same index using a list/linked list.

    class HashTableChaining:
        def __init__(self, size=10):
            self.size = size
            self.table = [[] for _ in range(size)]
    
        def _hash(self, key):
            return hash(key) % self.size
    
        def insert(self, key, value):
            """Insert or update key-value pair"""
            hash_index = self._hash(key)
            bucket = self.table[hash_index]
    
            # Update if key exists
            for i, (k, v) in enumerate(bucket):
                if k == key:
                    bucket[i] = (key, value)
                    return
    
            # Add new key-value pair
            bucket.append((key, value))
    
        def get(self, key):
            """Retrieve value by key"""
            hash_index = self._hash(key)
            bucket = self.table[hash_index]
    
            for k, v in bucket:
                if k == key:
                    return v
    
            raise KeyError(f"Key '{key}' not found")
    
        def delete(self, key):
            """Remove key-value pair"""
            hash_index = self._hash(key)
            bucket = self.table[hash_index]
    
            for i, (k, v) in enumerate(bucket):
                if k == key:
                    del bucket[i]
                    return v
    
            raise KeyError(f"Key '{key}' not found")
    
        def __contains__(self, key):
            """Check if key exists"""
            try:
                self.get(key)
                return True
            except KeyError:
                return False
    
        def __str__(self):
            items = []
            for bucket in self.table:
                for key, value in bucket:
                    items.append(f"{key}: {value}")
            return "{" + ", ".join(items) + "}"
    Python

    2. Open Addressing

    Store all elements in the hash table itself (no chaining).

    a) Linear Probing

    class HashTableLinearProbing:
        def __init__(self, size=10):
            self.size = size
            self.keys = [None] * size
            self.values = [None] * size
            self.count = 0
    
        def _hash(self, key):
            return hash(key) % self.size
    
        def _probe(self, index):
            """Linear probing: (index + 1) % size"""
            return (index + 1) % self.size
    
        def _resize(self):
            """Resize when load factor > 0.7"""
            old_keys = self.keys
            old_values = self.values
    
            self.size *= 2
            self.keys = [None] * self.size
            self.values = [None] * self.size
            self.count = 0
    
            for i in range(len(old_keys)):
                if old_keys[i] is not None:
                    self.insert(old_keys[i], old_values[i])
    
        def insert(self, key, value):
            """Insert or update key-value pair"""
            if self.count / self.size > 0.7:
                self._resize()
    
            index = self._hash(key)
    
            while self.keys[index] is not None:
                if self.keys[index] == key:
                    # Update existing key
                    self.values[index] = value
                    return
                index = self._probe(index)
    
            # Insert new key
            self.keys[index] = key
            self.values[index] = value
            self.count += 1
    
        def get(self, key):
            """Retrieve value by key"""
            index = self._hash(key)
    
            while self.keys[index] is not None:
                if self.keys[index] == key:
                    return self.values[index]
                index = self._probe(index)
    
            raise KeyError(f"Key '{key}' not found")
    
        def delete(self, key):
            """Delete key-value pair"""
            index = self._hash(key)
    
            while self.keys[index] is not None:
                if self.keys[index] == key:
                    self.keys[index] = None
                    self.values[index] = None
                    self.count -= 1
    
                    # Rehash subsequent entries
                    self._rehash_cluster(index)
                    return
                index = self._probe(index)
    
            raise KeyError(f"Key '{key}' not found")
    
        def _rehash_cluster(self, start):
            """Rehash entries after deletion"""
            index = self._probe(start)
    
            while self.keys[index] is not None:
                key = self.keys[index]
                value = self.values[index]
    
                self.keys[index] = None
                self.values[index] = None
                self.count -= 1
    
                self.insert(key, value)
                index = self._probe(index)
    Python

    b) Quadratic Probing

    class HashTableQuadraticProbing:
        def __init__(self, size=10):
            self.size = size
            self.keys = [None] * size
            self.values = [None] * size
    
        def _hash(self, key):
            return hash(key) % self.size
    
        def _probe(self, index, i):
            """Quadratic probing: (index + i²) % size"""
            return (index + i * i) % self.size
    
        def insert(self, key, value):
            index = self._hash(key)
            i = 0
    
            while self.keys[index] is not None and self.keys[index] != key:
                i += 1
                index = self._probe(self._hash(key), i)
    
            self.keys[index] = key
            self.values[index] = value
    Python

    c) Double Hashing

    class HashTableDoubleHashing:
        def __init__(self, size=10):
            self.size = size
            self.keys = [None] * size
            self.values = [None] * size
    
        def _hash1(self, key):
            return hash(key) % self.size
    
        def _hash2(self, key):
            """Second hash function"""
            return 7 - (hash(key) % 7)  # Prime number less than size
    
        def _probe(self, key, i):
            """Double hashing: (h1(key) + i * h2(key)) % size"""
            return (self._hash1(key) + i * self._hash2(key)) % self.size
    
        def insert(self, key, value):
            i = 0
            index = self._probe(key, i)
    
            while self.keys[index] is not None and self.keys[index] != key:
                i += 1
                index = self._probe(key, i)
    
            self.keys[index] = key
            self.values[index] = value
    Python

    Implementing Hash Table from Scratch

    Complete implementation with all features:

    class HashTable:
        def __init__(self, capacity=10):
            self.capacity = capacity
            self.size = 0
            self.table = [[] for _ in range(capacity)]
    
        def _hash(self, key):
            """Hash function using Python's built-in hash"""
            return hash(key) % self.capacity
    
        def _load_factor(self):
            """Calculate current load factor"""
            return self.size / self.capacity
    
        def _resize(self):
            """Resize table when load factor exceeds 0.7"""
            old_table = self.table
            self.capacity *= 2
            self.table = [[] for _ in range(self.capacity)]
            self.size = 0
    
            for bucket in old_table:
                for key, value in bucket:
                    self.insert(key, value)
    
        def insert(self, key, value):
            """Insert or update key-value pair - O(1) average"""
            # Resize if needed
            if self._load_factor() > 0.7:
                self._resize()
    
            index = self._hash(key)
            bucket = self.table[index]
    
            # Update if key exists
            for i, (k, v) in enumerate(bucket):
                if k == key:
                    bucket[i] = (key, value)
                    return
    
            # Add new key-value pair
            bucket.append((key, value))
            self.size += 1
    
        def get(self, key):
            """Get value by key - O(1) average"""
            index = self._hash(key)
            bucket = self.table[index]
    
            for k, v in bucket:
                if k == key:
                    return v
    
            raise KeyError(f"Key '{key}' not found")
    
        def delete(self, key):
            """Delete key-value pair - O(1) average"""
            index = self._hash(key)
            bucket = self.table[index]
    
            for i, (k, v) in enumerate(bucket):
                if k == key:
                    del bucket[i]
                    self.size -= 1
                    return v
    
            raise KeyError(f"Key '{key}' not found")
    
        def contains(self, key):
            """Check if key exists - O(1) average"""
            try:
                self.get(key)
                return True
            except KeyError:
                return False
    
        def keys(self):
            """Get all keys"""
            all_keys = []
            for bucket in self.table:
                for key, _ in bucket:
                    all_keys.append(key)
            return all_keys
    
        def values(self):
            """Get all values"""
            all_values = []
            for bucket in self.table:
                for _, value in bucket:
                    all_values.append(value)
            return all_values
    
        def items(self):
            """Get all key-value pairs"""
            all_items = []
            for bucket in self.table:
                for key, value in bucket:
                    all_items.append((key, value))
            return all_items
    
        def clear(self):
            """Remove all items"""
            self.table = [[] for _ in range(self.capacity)]
            self.size = 0
    
        def __len__(self):
            """Return number of items"""
            return self.size
    
        def __contains__(self, key):
            """Support 'in' operator"""
            return self.contains(key)
    
        def __getitem__(self, key):
            """Support dict[key] syntax"""
            return self.get(key)
    
        def __setitem__(self, key, value):
            """Support dict[key] = value syntax"""
            self.insert(key, value)
    
        def __delitem__(self, key):
            """Support del dict[key] syntax"""
            self.delete(key)
    
        def __str__(self):
            """String representation"""
            items = [f"{k!r}: {v!r}" for k, v in self.items()]
            return "{" + ", ".join(items) + "}"
    
        def __repr__(self):
            return f"HashTable({self})"
    
    # Usage
    ht = HashTable()
    ht['name'] = 'Alice'
    ht['age'] = 30
    print(ht['name'])  # Alice
    print('age' in ht)  # True
    del ht['age']
    print(len(ht))  # 1
    Python

    Common Operations

    1. Frequency Counter

    def count_frequency(arr):
        """Count frequency of elements - O(n)"""
        freq = {}
        for item in arr:
            freq[item] = freq.get(item, 0) + 1
        return freq
    
    # Using defaultdict
    from collections import defaultdict
    
    def count_frequency_defaultdict(arr):
        freq = defaultdict(int)
        for item in arr:
            freq[item] += 1
        return dict(freq)
    
    # Using Counter
    from collections import Counter
    
    def count_frequency_counter(arr):
        return Counter(arr)
    
    # Example
    arr = [1, 2, 2, 3, 3, 3, 4]
    print(count_frequency(arr))  # {1: 1, 2: 2, 3: 3, 4: 1}
    Python

    2. Group Anagrams

    def group_anagrams(words):
        """
        Group words that are anagrams - O(n * k log k)
        where n is number of words, k is max word length
        """
        groups = {}
    
        for word in words:
            # Sort characters as key
            key = ''.join(sorted(word))
            if key not in groups:
                groups[key] = []
            groups[key].append(word)
    
        return list(groups.values())
    
    # Using defaultdict
    from collections import defaultdict
    
    def group_anagrams_v2(words):
        groups = defaultdict(list)
        for word in words:
            key = ''.join(sorted(word))
            groups[key].append(word)
        return list(groups.values())
    
    # Example
    words = ["eat", "tea", "tan", "ate", "nat", "bat"]
    print(group_anagrams(words))
    # [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
    Python

    3. Two Sum Problem

    def two_sum(nums, target):
        """
        Find two numbers that sum to target - O(n)
        """
        seen = {}
    
        for i, num in enumerate(nums):
            complement = target - num
            if complement in seen:
                return [seen[complement], i]
            seen[num] = i
    
        return []
    
    # Example
    nums = [2, 7, 11, 15]
    target = 9
    print(two_sum(nums, target))  # [0, 1]
    Python

    4. First Unique Character

    def first_unique_char(s):
        """
        Find first non-repeating character - O(n)
        """
        freq = {}
    
        # Count frequencies
        for char in s:
            freq[char] = freq.get(char, 0) + 1
    
        # Find first with frequency 1
        for i, char in enumerate(s):
            if freq[char] == 1:
                return i
    
        return -1
    
    # Example
    s = "leetcode"
    print(first_unique_char(s))  # 0 ('l')
    Python

    5. Subarray Sum Equals K

    def subarray_sum_k(nums, k):
        """
        Count subarrays with sum equal to k - O(n)
        """
        count = 0
        prefix_sum = 0
        sum_count = {0: 1}  # prefix_sum: frequency
    
        for num in nums:
            prefix_sum += num
    
            # Check if (prefix_sum - k) exists
            if (prefix_sum - k) in sum_count:
                count += sum_count[prefix_sum - k]
    
            # Add current prefix_sum
            sum_count[prefix_sum] = sum_count.get(prefix_sum, 0) + 1
    
        return count
    
    # Example
    nums = [1, 1, 1]
    k = 2
    print(subarray_sum_k(nums, k))  # 2
    Python

    Time and Space Complexity

    Average Case Complexity

    OperationTimeSpace
    InsertO(1)O(1)
    DeleteO(1)O(1)
    SearchO(1)O(1)
    Access by keyO(1)O(1)

    Worst Case Complexity

    OperationTimeSpace
    InsertO(n)O(1)
    DeleteO(n)O(1)
    SearchO(n)O(1)

    Note: Worst case O(n) occurs when all keys hash to same index (all collisions).

    Load Factor Impact

    Load Factor = Number of Elements / Table Size
    
    Optimal Load Factor: 0.7 - 0.75
    
    - Too low: Wastes memory
    - Too high: More collisions, slower operations
    Python

    Hash Table Patterns

    1. Hash Set Pattern

    # Check for duplicates - O(n)
    def contains_duplicate(nums):
        seen = set()
        for num in nums:
            if num in seen:
                return True
            seen.add(num)
        return False
    
    # Alternative
    def contains_duplicate_v2(nums):
        return len(nums) != len(set(nums))
    Python

    2. Hash Map for Counting

    # Most frequent element
    def most_frequent(nums):
        """O(n) time, O(n) space"""
        freq = {}
        max_count = 0
        result = None
    
        for num in nums:
            freq[num] = freq.get(num, 0) + 1
            if freq[num] > max_count:
                max_count = freq[num]
                result = num
    
        return result
    
    # Using Counter
    from collections import Counter
    
    def most_frequent_v2(nums):
        return Counter(nums).most_common(1)[0][0]
    Python

    3. Hash Map for Grouping

    # Group elements by property
    def group_by_length(words):
        """Group words by length"""
        groups = {}
        for word in words:
            length = len(word)
            if length not in groups:
                groups[length] = []
            groups[length].append(word)
        return groups
    
    # Using defaultdict
    from collections import defaultdict
    
    def group_by_length_v2(words):
        groups = defaultdict(list)
        for word in words:
            groups[len(word)].append(word)
        return dict(groups)
    Python

    4. Hash Map for Tracking Indices

    # Find all indices of target sum
    def two_sum_all(nums, target):
        """Find all pairs that sum to target"""
        seen = {}
        pairs = []
    
        for i, num in enumerate(nums):
            complement = target - num
            if complement in seen:
                for j in seen[complement]:
                    pairs.append([j, i])
    
            if num not in seen:
                seen[num] = []
            seen[num].append(i)
    
        return pairs
    Python

    5. Sliding Window with Hash Map

    # Longest substring without repeating characters
    def longest_unique_substring(s):
        """O(n) time, O(min(m, n)) space"""
        char_index = {}
        max_length = 0
        start = 0
    
        for end, char in enumerate(s):
            if char in char_index and char_index[char] >= start:
                start = char_index[char] + 1
    
            char_index[char] = end
            max_length = max(max_length, end - start + 1)
    
        return max_length
    
    # Example
    s = "abcabcbb"
    print(longest_unique_substring(s))  # 3 ("abc")
    Python

    6. Hash Map for Prefix/Suffix

    # Longest common prefix using hash map
    def longest_common_prefix(strs):
        if not strs:
            return ""
    
        prefix_map = {}
    
        for i, char in enumerate(strs[0]):
            prefix = strs[0][:i+1]
            if all(s.startswith(prefix) for s in strs):
                prefix_map[len(prefix)] = prefix
            else:
                break
    
        if not prefix_map:
            return ""
    
        max_len = max(prefix_map.keys())
        return prefix_map[max_len]
    Python

    Advanced Concepts

    1. LRU Cache Implementation

    from collections import OrderedDict
    
    class LRUCache:
        """Least Recently Used Cache using OrderedDict"""
    
        def __init__(self, capacity):
            self.cache = OrderedDict()
            self.capacity = capacity
    
        def get(self, key):
            """Get value and mark as recently used"""
            if key not in self.cache:
                return -1
    
            # Move to end (most recent)
            self.cache.move_to_end(key)
            return self.cache[key]
    
        def put(self, key, value):
            """Put key-value and mark as recently used"""
            if key in self.cache:
                # Update and move to end
                self.cache.move_to_end(key)
    
            self.cache[key] = value
    
            # Remove least recently used if over capacity
            if len(self.cache) > self.capacity:
                self.cache.popitem(last=False)
    
    # Manual implementation with doubly linked list
    class Node:
        def __init__(self, key, value):
            self.key = key
            self.value = value
            self.prev = None
            self.next = None
    
    class LRUCacheManual:
        def __init__(self, capacity):
            self.capacity = capacity
            self.cache = {}  # key -> Node
            self.head = Node(0, 0)  # Dummy head
            self.tail = Node(0, 0)  # Dummy tail
            self.head.next = self.tail
            self.tail.prev = self.head
    
        def _add_node(self, node):
            """Add node right after head"""
            node.prev = self.head
            node.next = self.head.next
            self.head.next.prev = node
            self.head.next = node
    
        def _remove_node(self, node):
            """Remove node from list"""
            prev = node.prev
            next = node.next
            prev.next = next
            next.prev = prev
    
        def _move_to_head(self, node):
            """Move node to head (most recent)"""
            self._remove_node(node)
            self._add_node(node)
    
        def _pop_tail(self):
            """Remove least recently used node"""
            node = self.tail.prev
            self._remove_node(node)
            return node
    
        def get(self, key):
            if key not in self.cache:
                return -1
    
            node = self.cache[key]
            self._move_to_head(node)
            return node.value
    
        def put(self, key, value):
            if key in self.cache:
                node = self.cache[key]
                node.value = value
                self._move_to_head(node)
            else:
                node = Node(key, value)
                self.cache[key] = node
                self._add_node(node)
    
                if len(self.cache) > self.capacity:
                    tail = self._pop_tail()
                    del self.cache[tail.key]
    Python

    2. Consistent Hashing

    import bisect
    import hashlib
    
    class ConsistentHashing:
        """Used in distributed systems for load balancing"""
    
        def __init__(self, nodes=None, replicas=3):
            self.replicas = replicas
            self.ring = {}  # hash -> node
            self.sorted_keys = []
    
            if nodes:
                for node in nodes:
                    self.add_node(node)
    
        def _hash(self, key):
            """Hash function using MD5"""
            return int(hashlib.md5(str(key).encode()).hexdigest(), 16)
    
        def add_node(self, node):
            """Add node to the ring"""
            for i in range(self.replicas):
                virtual_key = f"{node}:{i}"
                hash_key = self._hash(virtual_key)
                self.ring[hash_key] = node
                bisect.insort(self.sorted_keys, hash_key)
    
        def remove_node(self, node):
            """Remove node from the ring"""
            for i in range(self.replicas):
                virtual_key = f"{node}:{i}"
                hash_key = self._hash(virtual_key)
                del self.ring[hash_key]
                self.sorted_keys.remove(hash_key)
    
        def get_node(self, key):
            """Get node responsible for key"""
            if not self.ring:
                return None
    
            hash_key = self._hash(key)
            idx = bisect.bisect(self.sorted_keys, hash_key)
    
            # Wrap around if necessary
            if idx == len(self.sorted_keys):
                idx = 0
    
            return self.ring[self.sorted_keys[idx]]
    Python

    3. Bloom Filter

    import hashlib
    
    class BloomFilter:
        """Probabilistic data structure for membership testing"""
    
        def __init__(self, size=1000, hash_count=3):
            self.size = size
            self.hash_count = hash_count
            self.bit_array = [False] * size
    
        def _hashes(self, item):
            """Generate multiple hash values"""
            hashes = []
            for i in range(self.hash_count):
                hash_input = f"{item}{i}".encode()
                hash_value = int(hashlib.md5(hash_input).hexdigest(), 16)
                hashes.append(hash_value % self.size)
            return hashes
    
        def add(self, item):
            """Add item to filter"""
            for hash_value in self._hashes(item):
                self.bit_array[hash_value] = True
    
        def contains(self, item):
            """Check if item might be in filter"""
            return all(self.bit_array[h] for h in self._hashes(item))
    
        # Note: False positives possible, false negatives impossible
    Python

    4. Count-Min Sketch

    import hashlib
    
    class CountMinSketch:
        """Probabilistic frequency counter for streams"""
    
        def __init__(self, width=1000, depth=5):
            self.width = width
            self.depth = depth
            self.table = [[0] * width for _ in range(depth)]
    
        def _hashes(self, item):
            """Generate hash values for each row"""
            hashes = []
            for i in range(self.depth):
                hash_input = f"{item}{i}".encode()
                hash_value = int(hashlib.md5(hash_input).hexdigest(), 16)
                hashes.append(hash_value % self.width)
            return hashes
    
        def add(self, item, count=1):
            """Add item to sketch"""
            for i, hash_value in enumerate(self._hashes(item)):
                self.table[i][hash_value] += count
    
        def estimate(self, item):
            """Estimate frequency of item"""
            return min(
                self.table[i][h]
                for i, h in enumerate(self._hashes(item))
            )
    Python

    5. HashMap with Expiry (Cache)

    import time
    
    class ExpiringHashMap:
        """HashMap with time-based expiration"""
    
        def __init__(self):
            self.store = {}  # key -> (value, expiry_time)
    
        def put(self, key, value, ttl=None):
            """
            Put key-value with optional TTL (time to live) in seconds
            """
            expiry = None
            if ttl is not None:
                expiry = time.time() + ttl
            self.store[key] = (value, expiry)
    
        def get(self, key):
            """Get value if not expired"""
            if key not in self.store:
                return None
    
            value, expiry = self.store[key]
    
            # Check if expired
            if expiry is not None and time.time() > expiry:
                del self.store[key]
                return None
    
            return value
    
        def cleanup(self):
            """Remove all expired entries"""
            current_time = time.time()
            expired_keys = [
                key for key, (_, expiry) in self.store.items()
                if expiry is not None and current_time > expiry
            ]
            for key in expired_keys:
                del self.store[key]
    Python

    Practice Problems

    Easy Level

    1. Valid Anagram

    def is_anagram(s, t):
        """
        Check if two strings are anagrams
        Time: O(n), Space: O(1) - limited to 26 letters
        """
        if len(s) != len(t):
            return False
    
        char_count = {}
    
        for char in s:
            char_count[char] = char_count.get(char, 0) + 1
    
        for char in t:
            if char not in char_count:
                return False
            char_count[char] -= 1
            if char_count[char] < 0:
                return False
    
        return all(count == 0 for count in char_count.values())
    
    # Alternative using Counter
    from collections import Counter
    
    def is_anagram_v2(s, t):
        return Counter(s) == Counter(t)
    
    # Alternative using sorting
    def is_anagram_v3(s, t):
        return sorted(s) == sorted(t)
    Python

    2. Contains Duplicate

    def contains_duplicate(nums):
        """
        Check if array contains any duplicates
        Time: O(n), Space: O(n)
        """
        seen = set()
        for num in nums:
            if num in seen:
                return True
            seen.add(num)
        return False
    
    # One-liner
    def contains_duplicate_v2(nums):
        return len(nums) != len(set(nums))
    Python

    3. Single Number

    def single_number(nums):
        """
        Find number that appears once (others appear twice)
        Time: O(n), Space: O(n)
        """
        count = {}
        for num in nums:
            count[num] = count.get(num, 0) + 1
    
        for num, freq in count.items():
            if freq == 1:
                return num
    
    # Optimized using XOR - O(1) space
    def single_number_xor(nums):
        result = 0
        for num in nums:
            result ^= num
        return result
    Python

    4. Intersection of Two Arrays

    def intersection(nums1, nums2):
        """
        Find intersection of two arrays
        Time: O(n + m), Space: O(min(n, m))
        """
        return list(set(nums1) & set(nums2))
    
    # Alternative
    def intersection_v2(nums1, nums2):
        set1 = set(nums1)
        result = set()
        for num in nums2:
            if num in set1:
                result.add(num)
        return list(result)
    Python

    5. Happy Number

    def is_happy(n):
        """
        Determine if number is happy number
        Time: O(log n), Space: O(log n)
        """
        seen = set()
    
        while n != 1 and n not in seen:
            seen.add(n)
            n = sum(int(digit) ** 2 for digit in str(n))
    
        return n == 1
    Python

    Medium Level

    1. Group Anagrams (Revisited)

    def group_anagrams(strs):
        """
        Time: O(n * k log k), Space: O(n * k)
        """
        from collections import defaultdict
    
        groups = defaultdict(list)
    
        for s in strs:
            # Use sorted tuple as key
            key = tuple(sorted(s))
            groups[key].append(s)
    
        return list(groups.values())
    
    # Optimized using character count
    def group_anagrams_v2(strs):
        from collections import defaultdict
    
        groups = defaultdict(list)
    
        for s in strs:
            # Use character count as key
            count = [0] * 26
            for char in s:
                count[ord(char) - ord('a')] += 1
            groups[tuple(count)].append(s)
    
        return list(groups.values())
    Python

    2. Top K Frequent Elements

    def top_k_frequent(nums, k):
        """
        Time: O(n), Space: O(n)
        """
        from collections import Counter
    
        count = Counter(nums)
        return [num for num, _ in count.most_common(k)]
    
    # Using bucket sort
    def top_k_frequent_v2(nums, k):
        count = {}
        for num in nums:
            count[num] = count.get(num, 0) + 1
    
        # Bucket sort by frequency
        buckets = [[] for _ in range(len(nums) + 1)]
        for num, freq in count.items():
            buckets[freq].append(num)
    
        # Collect top k
        result = []
        for i in range(len(buckets) - 1, -1, -1):
            result.extend(buckets[i])
            if len(result) >= k:
                return result[:k]
    
        return result
    Python

    3. Longest Consecutive Sequence

    def longest_consecutive(nums):
        """
        Time: O(n), Space: O(n)
        """
        if not nums:
            return 0
    
        num_set = set(nums)
        longest = 0
    
        for num in num_set:
            # Only count from sequence start
            if num - 1 not in num_set:
                current_num = num
                current_length = 1
    
                while current_num + 1 in num_set:
                    current_num += 1
                    current_length += 1
    
                longest = max(longest, current_length)
    
        return longest
    Python

    4. 4Sum II

    def four_sum_count(nums1, nums2, nums3, nums4):
        """
        Count tuples (i, j, k, l) where nums1[i] + nums2[j] + nums3[k] + nums4[l] == 0
        Time: O(n²), Space: O(n²)
        """
        # Store sums of nums1 and nums2
        sum_count = {}
        for a in nums1:
            for b in nums2:
                sum_ab = a + b
                sum_count[sum_ab] = sum_count.get(sum_ab, 0) + 1
    
        # Find complement in nums3 and nums4
        count = 0
        for c in nums3:
            for d in nums4:
                target = -(c + d)
                count += sum_count.get(target, 0)
    
        return count
    Python

    5. Find Duplicate Subtrees

    class TreeNode:
        def __init__(self, val=0, left=None, right=None):
            self.val = val
            self.left = left
            self.right = right
    
    def find_duplicate_subtrees(root):
        """
        Find all duplicate subtrees in binary tree
        Time: O(n), Space: O(n)
        """
        from collections import defaultdict
    
        subtrees = defaultdict(int)
        duplicates = []
    
        def serialize(node):
            if not node:
                return "#"
    
            # Serialize subtree
            s = f"{node.val},{serialize(node.left)},{serialize(node.right)}"
    
            subtrees[s] += 1
            if subtrees[s] == 2:
                duplicates.append(node)
    
            return s
    
        serialize(root)
        return duplicates
    Python

    Hard Level

    1. Longest Substring with At Most K Distinct Characters

    def length_of_longest_substring_k_distinct(s, k):
        """
        Time: O(n), Space: O(k)
        """
        if k == 0:
            return 0
    
        char_count = {}
        max_length = 0
        left = 0
    
        for right, char in enumerate(s):
            char_count[char] = char_count.get(char, 0) + 1
    
            while len(char_count) > k:
                left_char = s[left]
                char_count[left_char] -= 1
                if char_count[left_char] == 0:
                    del char_count[left_char]
                left += 1
    
            max_length = max(max_length, right - left + 1)
    
        return max_length
    Python

    2. Minimum Window Substring

    def min_window(s, t):
        """
        Find minimum window in s that contains all characters of t
        Time: O(n + m), Space: O(m)
        """
        if not s or not t:
            return ""
    
        from collections import Counter
    
        target_count = Counter(t)
        required = len(target_count)
    
        window_count = {}
        formed = 0
    
        left = 0
        min_len = float('inf')
        min_window = (0, 0)
    
        for right, char in enumerate(s):
            window_count[char] = window_count.get(char, 0) + 1
    
            if char in target_count and window_count[char] == target_count[char]:
                formed += 1
    
            while left <= right and formed == required:
                # Update minimum window
                if right - left + 1 < min_len:
                    min_len = right - left + 1
                    min_window = (left, right)
    
                # Shrink window
                left_char = s[left]
                window_count[left_char] -= 1
                if left_char in target_count and window_count[left_char] < target_count[left_char]:
                    formed -= 1
                left += 1
    
        left, right = min_window
        return s[left:right + 1] if min_len != float('inf') else ""
    Python

    3. Substring with Concatenation of All Words

    def find_substring(s, words):
        """
        Find all starting indices of concatenated substrings
        Time: O(n * m * k), Space: O(m)
        """
        if not s or not words:
            return []
    
        from collections import Counter
    
        word_len = len(words[0])
        word_count = len(words)
        total_len = word_len * word_count
    
        word_freq = Counter(words)
        result = []
    
        for i in range(len(s) - total_len + 1):
            seen = {}
            for j in range(word_count):
                start = i + j * word_len
                word = s[start:start + word_len]
    
                if word not in word_freq:
                    break
    
                seen[word] = seen.get(word, 0) + 1
                if seen[word] > word_freq[word]:
                    break
            else:
                result.append(i)
    
        return result
    Python

    Tips and Best Practices

    1. Choosing Hash Table Type

    # Use dict for general key-value storage
    cache = {'user1': 'data1', 'user2': 'data2'}
    
    # Use set for membership testing only
    unique_items = {1, 2, 3, 4, 5}
    
    # Use defaultdict to avoid key checks
    from collections import defaultdict
    word_freq = defaultdict(int)
    for word in words:
        word_freq[word] += 1  # No need to check if key exists
    
    # Use Counter for frequency counting
    from collections import Counter
    freq = Counter([1, 2, 2, 3, 3, 3])
    
    # Use OrderedDict to maintain insertion order (Python 3.7+ dict maintains order)
    from collections import OrderedDict
    od = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
    Python

    2. Performance Tips

    # Bad: Repeated key existence checks
    if key in my_dict:
        value = my_dict[key]
    else:
        value = default
    
    # Good: Use get() with default
    value = my_dict.get(key, default)
    
    # Bad: Building dict incrementally
    d = {}
    for key, value in items:
        d[key] = value
    
    # Good: Use dict comprehension
    d = {key: value for key, value in items}
    
    # Bad: Multiple lookups
    if key in d:
        d[key] += 1
    else:
        d[key] = 1
    
    # Good: Use setdefault or defaultdict
    d[key] = d.get(key, 0) + 1
    Python

    3. Memory Optimization

    # Use __slots__ for objects used as keys
    class Point:
        __slots__ = ['x', 'y']
    
        def __init__(self, x, y):
            self.x = x
            self.y = y
    
        def __hash__(self):
            return hash((self.x, self.y))
    
        def __eq__(self, other):
            return self.x == other.x and self.y == other.y
    
    # Use tuples instead of lists for keys (immutable and hashable)
    # Good
    cache = {(1, 2): 'value'}
    
    # Bad
    # cache = {[1, 2]: 'value'}  # TypeError: unhashable type
    Python

    4. Common Pitfalls

    # Pitfall 1: Modifying dict while iterating
    d = {'a': 1, 'b': 2, 'c': 3}
    # Bad
    for key in d:
        if d[key] == 2:
            del d[key]  # RuntimeError!
    
    # Good
    keys_to_delete = [key for key, value in d.items() if value == 2]
    for key in keys_to_delete:
        del d[key]
    
    # Pitfall 2: Using mutable objects as keys
    # Bad
    my_list = [1, 2, 3]
    # d = {my_list: 'value'}  # TypeError: unhashable type
    
    # Good: Convert to tuple
    d = {tuple(my_list): 'value'}
    
    # Pitfall 3: Not checking for KeyError
    d = {'a': 1}
    # value = d['b']  # KeyError!
    
    # Good options:
    value = d.get('b')  # Returns None
    value = d.get('b', 0)  # Returns default
    if 'b' in d:
        value = d['b']
    Python

    5. Testing Strategies

    def test_hash_table():
        ht = HashTable()
    
        # Test basic operations
        ht['key1'] = 'value1'
        assert ht['key1'] == 'value1'
    
        # Test update
        ht['key1'] = 'new_value'
        assert ht['key1'] == 'new_value'
    
        # Test deletion
        del ht['key1']
        assert 'key1' not in ht
    
        # Test multiple entries
        for i in range(100):
            ht[f'key{i}'] = f'value{i}'
        assert len(ht) == 100
    
        # Test collision handling (use same hash)
        # Depends on implementation
    
        # Test edge cases
        ht[''] = 'empty_key'
        ht[None] = 'none_key'
        ht[0] = 'zero_key'
    Python

    6. Interview Tips

    1. Clarify requirements:
      • Can keys be any type or specific types?
      • Are there memory constraints?
      • Expected number of operations?
      • Need to handle concurrent access?
    2. Discuss trade-offs:
      • Hash function quality vs. computation time
      • Memory usage vs. collision rate
      • Chaining vs. open addressing
    3. Optimize progressively:
      • Start with simple solution (brute force)
      • Identify where hash table helps
      • Optimize space/time as needed
    4. Consider alternatives:
      • When hash table might not be best
      • Array/list for small datasets
      • Trees for ordered operations
      • Tries for prefix searches

    Summary

    Hash tables are one of the most important data structures in computer science:

    O(1) average time for insert, delete, and search
    Versatile – used in countless applications
    Python dicts are highly optimized hash tables
    Master patterns – counting, grouping, caching, deduplication
    Understand collisions – chaining vs. open addressing
    Practice extensively – hash tables appear in most coding interviews

    Key Takeaways:

    1. Use dict for general key-value storage
    2. Use set for membership testing
    3. Use Counter for frequency counting
    4. Use defaultdict to avoid key checks
    5. Understand time/space complexity trade-offs
    6. Practice common patterns and problems

    Happy coding! 🚀


    Discover more from Altgr Blog

    Subscribe to get the latest posts sent to your email.

    Leave a Reply

    Your email address will not be published. Required fields are marked *