Hash Tables in Python – DSA

Introduction
Hash Table Fundamentals
Python Dictionary (Built-in Hash Table)
Hash Functions
Collision Resolution
Implementing Hash Table from Scratch
Common Operations
Time and Space Complexity
Hash Table Patterns
Advanced Concepts
Practice Problems
Tips and Best Practices

Introduction

A Hash Table (or Hash Map) is a data structure that implements an associative array abstract data type, mapping keys to values. It uses a hash function to compute an index (hash code) into an array of buckets or slots, from which the desired value can be found.

Key Characteristics:

Fast Access: O(1) average case for search, insert, and delete
Key-Value Pairs: Stores data as key-value associations
Hash Function: Converts keys into array indices
Collision Handling: Manages cases where multiple keys hash to same index

Real-World Applications:

Database indexing
Caching systems (LRU Cache, Redis)
Symbol tables in compilers
Password verification
Blockchain (cryptographic hashing)
Spell checkers
Router tables
Implementing sets and maps

Hash Table Fundamentals

How Hash Tables Work

Key → Hash Function → Index → Value

Example:
"apple" → hash("apple") → 42 → 1.50
"banana" → hash("banana") → 17 → 0.75

Key → Hash Function → Index → Value

Example:
"apple" → hash("apple") → 42 → 1.50
"banana" → hash("banana") → 17 → 0.75

Python

Core Components:

Hash Function: Converts key to array index
Bucket Array: Stores the key-value pairs
Collision Resolution: Handles hash collisions
Load Factor: Ratio of elements to bucket size

Hash Table Structure

class HashTable:
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]  # Using chaining

    def _hash(self, key):
        """Hash function to compute index"""
        return hash(key) % self.size

class HashTable:
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]  # Using chaining

    def _hash(self, key):
        """Hash function to compute index"""
        return hash(key) % self.size

Python

Python Dictionary (Built-in Hash Table)

Python’s dict is a highly optimized hash table implementation.

Creating Dictionaries

# Empty dictionary
empty_dict = {}
empty_dict = dict()

# Dictionary with initial values
person = {
    'name': 'Alice',
    'age': 30,
    'city': 'New York'
}

# Using dict() constructor
person = dict(name='Alice', age=30, city='New York')

# From list of tuples
items = [('a', 1), ('b', 2), ('c', 3)]
d = dict(items)

# Dictionary comprehension
squares = {x: x**2 for x in range(10)}

# From two lists
keys = ['a', 'b', 'c']
values = [1, 2, 3]
d = dict(zip(keys, values))

# Empty dictionary
empty_dict = {}
empty_dict = dict()

# Dictionary with initial values
person = {
    'name': 'Alice',
    'age': 30,
    'city': 'New York'
}

# Using dict() constructor
person = dict(name='Alice', age=30, city='New York')

# From list of tuples
items = [('a', 1), ('b', 2), ('c', 3)]
d = dict(items)

# Dictionary comprehension
squares = {x: x**2 for x in range(10)}

# From two lists
keys = ['a', 'b', 'c']
values = [1, 2, 3]
d = dict(zip(keys, values))

Python

Basic Operations

# Access elements
person = {'name': 'Alice', 'age': 30}
print(person['name'])  # 'Alice'
print(person.get('age'))  # 30
print(person.get('salary', 0))  # 0 (default value)

# Add/Update elements
person['city'] = 'Boston'  # Add new key
person['age'] = 31  # Update existing key
person.update({'job': 'Engineer', 'age': 32})

# Delete elements
del person['city']
age = person.pop('age')  # Remove and return value
person.popitem()  # Remove and return last (key, value) pair
person.clear()  # Remove all items

# Check key existence
if 'name' in person:
    print("Name exists")

# Get all keys, values, items
keys = person.keys()
values = person.values()
items = person.items()

# Length
length = len(person)

# Access elements
person = {'name': 'Alice', 'age': 30}
print(person['name'])  # 'Alice'
print(person.get('age'))  # 30
print(person.get('salary', 0))  # 0 (default value)

# Add/Update elements
person['city'] = 'Boston'  # Add new key
person['age'] = 31  # Update existing key
person.update({'job': 'Engineer', 'age': 32})

# Delete elements
del person['city']
age = person.pop('age')  # Remove and return value
person.popitem()  # Remove and return last (key, value) pair
person.clear()  # Remove all items

# Check key existence
if 'name' in person:
    print("Name exists")

# Get all keys, values, items
keys = person.keys()
values = person.values()
items = person.items()

# Length
length = len(person)

Python

Advanced Dictionary Methods

# setdefault - get value or set default if key doesn't exist
d = {'a': 1}
value = d.setdefault('b', 0)  # Returns 0 and sets d['b'] = 0

# fromkeys - create dict from sequence of keys
keys = ['a', 'b', 'c']
d = dict.fromkeys(keys, 0)  # {'a': 0, 'b': 0, 'c': 0}

# copy - shallow copy
d2 = d.copy()

# Merge dictionaries (Python 3.9+)
d1 = {'a': 1, 'b': 2}
d2 = {'b': 3, 'c': 4}
merged = d1 | d2  # {'a': 1, 'b': 3, 'c': 4}

# Update in place
d1 |= d2

# Older merge methods
merged = {**d1, **d2}
merged = dict(d1, **d2)

# setdefault - get value or set default if key doesn't exist
d = {'a': 1}
value = d.setdefault('b', 0)  # Returns 0 and sets d['b'] = 0

# fromkeys - create dict from sequence of keys
keys = ['a', 'b', 'c']
d = dict.fromkeys(keys, 0)  # {'a': 0, 'b': 0, 'c': 0}

# copy - shallow copy
d2 = d.copy()

# Merge dictionaries (Python 3.9+)
d1 = {'a': 1, 'b': 2}
d2 = {'b': 3, 'c': 4}
merged = d1 | d2  # {'a': 1, 'b': 3, 'c': 4}

# Update in place
d1 |= d2

# Older merge methods
merged = {**d1, **d2}
merged = dict(d1, **d2)

Python

Iteration

person = {'name': 'Alice', 'age': 30, 'city': 'Boston'}

# Iterate over keys
for key in person:
    print(key)

for key in person.keys():
    print(key)

# Iterate over values
for value in person.values():
    print(value)

# Iterate over key-value pairs
for key, value in person.items():
    print(f"{key}: {value}")

# Enumerate with index
for i, (key, value) in enumerate(person.items()):
    print(f"{i}: {key} = {value}")

person = {'name': 'Alice', 'age': 30, 'city': 'Boston'}

# Iterate over keys
for key in person:
    print(key)

for key in person.keys():
    print(key)

# Iterate over values
for value in person.values():
    print(value)

# Iterate over key-value pairs
for key, value in person.items():
    print(f"{key}: {value}")

# Enumerate with index
for i, (key, value) in enumerate(person.items()):
    print(f"{i}: {key} = {value}")

Python

Hash Functions

A good hash function should:

Be deterministic (same input → same output)
Uniformly distribute keys
Be fast to compute
Minimize collisions

Built-in hash() Function

# Works with immutable types
print(hash(42))           # Integer
print(hash("hello"))      # String
print(hash((1, 2, 3)))    # Tuple

# Not hashable (mutable types)
# hash([1, 2, 3])         # TypeError: unhashable type: 'list'
# hash({'a': 1})          # TypeError: unhashable type: 'dict'

# Works with immutable types
print(hash(42))           # Integer
print(hash("hello"))      # String
print(hash((1, 2, 3)))    # Tuple

# Not hashable (mutable types)
# hash([1, 2, 3])         # TypeError: unhashable type: 'list'
# hash({'a': 1})          # TypeError: unhashable type: 'dict'

Python

Custom Hash Functions

# Simple hash function for strings
def simple_hash(key, size):
    """Sum of ASCII values mod size"""
    return sum(ord(c) for c in str(key)) % size

# Polynomial rolling hash
def polynomial_hash(key, size, prime=31):
    """Better distribution using polynomial"""
    hash_value = 0
    for char in str(key):
        hash_value = (hash_value * prime + ord(char)) % size
    return hash_value

# DJB2 hash (popular string hash)
def djb2_hash(key, size):
    hash_value = 5381
    for char in str(key):
        hash_value = ((hash_value << 5) + hash_value) + ord(char)
    return hash_value % size

# Using Python's built-in hash
def builtin_hash(key, size):
    return hash(key) % size

# Simple hash function for strings
def simple_hash(key, size):
    """Sum of ASCII values mod size"""
    return sum(ord(c) for c in str(key)) % size

# Polynomial rolling hash
def polynomial_hash(key, size, prime=31):
    """Better distribution using polynomial"""
    hash_value = 0
    for char in str(key):
        hash_value = (hash_value * prime + ord(char)) % size
    return hash_value

# DJB2 hash (popular string hash)
def djb2_hash(key, size):
    hash_value = 5381
    for char in str(key):
        hash_value = ((hash_value << 5) + hash_value) + ord(char)
    return hash_value % size

# Using Python's built-in hash
def builtin_hash(key, size):
    return hash(key) % size

Python

Making Custom Objects Hashable

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def __hash__(self):
        """Make object hashable"""
        return hash((self.name, self.age))

    def __eq__(self, other):
        """Required for hash to work properly"""
        if not isinstance(other, Person):
            return False
        return self.name == other.name and self.age == other.age

    def __repr__(self):
        return f"Person('{self.name}', {self.age})"

# Now can use as dictionary key
p1 = Person("Alice", 30)
p2 = Person("Alice", 30)
d = {p1: "Engineer"}
print(d[p2])  # Works! Returns "Engineer"

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def __hash__(self):
        """Make object hashable"""
        return hash((self.name, self.age))

    def __eq__(self, other):
        """Required for hash to work properly"""
        if not isinstance(other, Person):
            return False
        return self.name == other.name and self.age == other.age

    def __repr__(self):
        return f"Person('{self.name}', {self.age})"

# Now can use as dictionary key
p1 = Person("Alice", 30)
p2 = Person("Alice", 30)
d = {p1: "Engineer"}
print(d[p2])  # Works! Returns "Engineer"

Python

Collision Resolution

When two keys hash to the same index, we have a collision.

1. Chaining (Separate Chaining)

Store multiple elements at the same index using a list/linked list.

class HashTableChaining:
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]

    def _hash(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        """Insert or update key-value pair"""
        hash_index = self._hash(key)
        bucket = self.table[hash_index]

        # Update if key exists
        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket[i] = (key, value)
                return

        # Add new key-value pair
        bucket.append((key, value))

    def get(self, key):
        """Retrieve value by key"""
        hash_index = self._hash(key)
        bucket = self.table[hash_index]

        for k, v in bucket:
            if k == key:
                return v

        raise KeyError(f"Key '{key}' not found")

    def delete(self, key):
        """Remove key-value pair"""
        hash_index = self._hash(key)
        bucket = self.table[hash_index]

        for i, (k, v) in enumerate(bucket):
            if k == key:
                del bucket[i]
                return v

        raise KeyError(f"Key '{key}' not found")

    def __contains__(self, key):
        """Check if key exists"""
        try:
            self.get(key)
            return True
        except KeyError:
            return False

    def __str__(self):
        items = []
        for bucket in self.table:
            for key, value in bucket:
                items.append(f"{key}: {value}")
        return "{" + ", ".join(items) + "}"

class HashTableChaining:
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]

    def _hash(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        """Insert or update key-value pair"""
        hash_index = self._hash(key)
        bucket = self.table[hash_index]

        # Update if key exists
        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket[i] = (key, value)
                return

        # Add new key-value pair
        bucket.append((key, value))

    def get(self, key):
        """Retrieve value by key"""
        hash_index = self._hash(key)
        bucket = self.table[hash_index]

        for k, v in bucket:
            if k == key:
                return v

        raise KeyError(f"Key '{key}' not found")

    def delete(self, key):
        """Remove key-value pair"""
        hash_index = self._hash(key)
        bucket = self.table[hash_index]

        for i, (k, v) in enumerate(bucket):
            if k == key:
                del bucket[i]
                return v

        raise KeyError(f"Key '{key}' not found")

    def __contains__(self, key):
        """Check if key exists"""
        try:
            self.get(key)
            return True
        except KeyError:
            return False

    def __str__(self):
        items = []
        for bucket in self.table:
            for key, value in bucket:
                items.append(f"{key}: {value}")
        return "{" + ", ".join(items) + "}"

Python

2. Open Addressing

Store all elements in the hash table itself (no chaining).

a) Linear Probing

class HashTableLinearProbing:
    def __init__(self, size=10):
        self.size = size
        self.keys = [None] * size
        self.values = [None] * size
        self.count = 0

    def _hash(self, key):
        return hash(key) % self.size

    def _probe(self, index):
        """Linear probing: (index + 1) % size"""
        return (index + 1) % self.size

    def _resize(self):
        """Resize when load factor > 0.7"""
        old_keys = self.keys
        old_values = self.values

        self.size *= 2
        self.keys = [None] * self.size
        self.values = [None] * self.size
        self.count = 0

        for i in range(len(old_keys)):
            if old_keys[i] is not None:
                self.insert(old_keys[i], old_values[i])

    def insert(self, key, value):
        """Insert or update key-value pair"""
        if self.count / self.size > 0.7:
            self._resize()

        index = self._hash(key)

        while self.keys[index] is not None:
            if self.keys[index] == key:
                # Update existing key
                self.values[index] = value
                return
            index = self._probe(index)

        # Insert new key
        self.keys[index] = key
        self.values[index] = value
        self.count += 1

    def get(self, key):
        """Retrieve value by key"""
        index = self._hash(key)

        while self.keys[index] is not None:
            if self.keys[index] == key:
                return self.values[index]
            index = self._probe(index)

        raise KeyError(f"Key '{key}' not found")

    def delete(self, key):
        """Delete key-value pair"""
        index = self._hash(key)

        while self.keys[index] is not None:
            if self.keys[index] == key:
                self.keys[index] = None
                self.values[index] = None
                self.count -= 1

                # Rehash subsequent entries
                self._rehash_cluster(index)
                return
            index = self._probe(index)

        raise KeyError(f"Key '{key}' not found")

    def _rehash_cluster(self, start):
        """Rehash entries after deletion"""
        index = self._probe(start)

        while self.keys[index] is not None:
            key = self.keys[index]
            value = self.values[index]

            self.keys[index] = None
            self.values[index] = None
            self.count -= 1

            self.insert(key, value)
            index = self._probe(index)

class HashTableLinearProbing:
    def __init__(self, size=10):
        self.size = size
        self.keys = [None] * size
        self.values = [None] * size
        self.count = 0

    def _hash(self, key):
        return hash(key) % self.size

    def _probe(self, index):
        """Linear probing: (index + 1) % size"""
        return (index + 1) % self.size

    def _resize(self):
        """Resize when load factor > 0.7"""
        old_keys = self.keys
        old_values = self.values

        self.size *= 2
        self.keys = [None] * self.size
        self.values = [None] * self.size
        self.count = 0

        for i in range(len(old_keys)):
            if old_keys[i] is not None:
                self.insert(old_keys[i], old_values[i])

    def insert(self, key, value):
        """Insert or update key-value pair"""
        if self.count / self.size > 0.7:
            self._resize()

        index = self._hash(key)

        while self.keys[index] is not None:
            if self.keys[index] == key:
                # Update existing key
                self.values[index] = value
                return
            index = self._probe(index)

        # Insert new key
        self.keys[index] = key
        self.values[index] = value
        self.count += 1

    def get(self, key):
        """Retrieve value by key"""
        index = self._hash(key)

        while self.keys[index] is not None:
            if self.keys[index] == key:
                return self.values[index]
            index = self._probe(index)

        raise KeyError(f"Key '{key}' not found")

    def delete(self, key):
        """Delete key-value pair"""
        index = self._hash(key)

        while self.keys[index] is not None:
            if self.keys[index] == key:
                self.keys[index] = None
                self.values[index] = None
                self.count -= 1

                # Rehash subsequent entries
                self._rehash_cluster(index)
                return
            index = self._probe(index)

        raise KeyError(f"Key '{key}' not found")

    def _rehash_cluster(self, start):
        """Rehash entries after deletion"""
        index = self._probe(start)

        while self.keys[index] is not None:
            key = self.keys[index]
            value = self.values[index]

            self.keys[index] = None
            self.values[index] = None
            self.count -= 1

            self.insert(key, value)
            index = self._probe(index)

Python

b) Quadratic Probing

class HashTableQuadraticProbing:
    def __init__(self, size=10):
        self.size = size
        self.keys = [None] * size
        self.values = [None] * size

    def _hash(self, key):
        return hash(key) % self.size

    def _probe(self, index, i):
        """Quadratic probing: (index + i²) % size"""
        return (index + i * i) % self.size

    def insert(self, key, value):
        index = self._hash(key)
        i = 0

        while self.keys[index] is not None and self.keys[index] != key:
            i += 1
            index = self._probe(self._hash(key), i)

        self.keys[index] = key
        self.values[index] = value

class HashTableQuadraticProbing:
    def __init__(self, size=10):
        self.size = size
        self.keys = [None] * size
        self.values = [None] * size

    def _hash(self, key):
        return hash(key) % self.size

    def _probe(self, index, i):
        """Quadratic probing: (index + i²) % size"""
        return (index + i * i) % self.size

    def insert(self, key, value):
        index = self._hash(key)
        i = 0

        while self.keys[index] is not None and self.keys[index] != key:
            i += 1
            index = self._probe(self._hash(key), i)

        self.keys[index] = key
        self.values[index] = value

Python

c) Double Hashing

class HashTableDoubleHashing:
    def __init__(self, size=10):
        self.size = size
        self.keys = [None] * size
        self.values = [None] * size

    def _hash1(self, key):
        return hash(key) % self.size

    def _hash2(self, key):
        """Second hash function"""
        return 7 - (hash(key) % 7)  # Prime number less than size

    def _probe(self, key, i):
        """Double hashing: (h1(key) + i * h2(key)) % size"""
        return (self._hash1(key) + i * self._hash2(key)) % self.size

    def insert(self, key, value):
        i = 0
        index = self._probe(key, i)

        while self.keys[index] is not None and self.keys[index] != key:
            i += 1
            index = self._probe(key, i)

        self.keys[index] = key
        self.values[index] = value

class HashTableDoubleHashing:
    def __init__(self, size=10):
        self.size = size
        self.keys = [None] * size
        self.values = [None] * size

    def _hash1(self, key):
        return hash(key) % self.size

    def _hash2(self, key):
        """Second hash function"""
        return 7 - (hash(key) % 7)  # Prime number less than size

    def _probe(self, key, i):
        """Double hashing: (h1(key) + i * h2(key)) % size"""
        return (self._hash1(key) + i * self._hash2(key)) % self.size

    def insert(self, key, value):
        i = 0
        index = self._probe(key, i)

        while self.keys[index] is not None and self.keys[index] != key:
            i += 1
            index = self._probe(key, i)

        self.keys[index] = key
        self.values[index] = value

Python

Implementing Hash Table from Scratch

Complete implementation with all features:

class HashTable:
    def __init__(self, capacity=10):
        self.capacity = capacity
        self.size = 0
        self.table = [[] for _ in range(capacity)]

    def _hash(self, key):
        """Hash function using Python's built-in hash"""
        return hash(key) % self.capacity

    def _load_factor(self):
        """Calculate current load factor"""
        return self.size / self.capacity

    def _resize(self):
        """Resize table when load factor exceeds 0.7"""
        old_table = self.table
        self.capacity *= 2
        self.table = [[] for _ in range(self.capacity)]
        self.size = 0

        for bucket in old_table:
            for key, value in bucket:
                self.insert(key, value)

    def insert(self, key, value):
        """Insert or update key-value pair - O(1) average"""
        # Resize if needed
        if self._load_factor() > 0.7:
            self._resize()

        index = self._hash(key)
        bucket = self.table[index]

        # Update if key exists
        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket[i] = (key, value)
                return

        # Add new key-value pair
        bucket.append((key, value))
        self.size += 1

    def get(self, key):
        """Get value by key - O(1) average"""
        index = self._hash(key)
        bucket = self.table[index]

        for k, v in bucket:
            if k == key:
                return v

        raise KeyError(f"Key '{key}' not found")

    def delete(self, key):
        """Delete key-value pair - O(1) average"""
        index = self._hash(key)
        bucket = self.table[index]

        for i, (k, v) in enumerate(bucket):
            if k == key:
                del bucket[i]
                self.size -= 1
                return v

        raise KeyError(f"Key '{key}' not found")

    def contains(self, key):
        """Check if key exists - O(1) average"""
        try:
            self.get(key)
            return True
        except KeyError:
            return False

    def keys(self):
        """Get all keys"""
        all_keys = []
        for bucket in self.table:
            for key, _ in bucket:
                all_keys.append(key)
        return all_keys

    def values(self):
        """Get all values"""
        all_values = []
        for bucket in self.table:
            for _, value in bucket:
                all_values.append(value)
        return all_values

    def items(self):
        """Get all key-value pairs"""
        all_items = []
        for bucket in self.table:
            for key, value in bucket:
                all_items.append((key, value))
        return all_items

    def clear(self):
        """Remove all items"""
        self.table = [[] for _ in range(self.capacity)]
        self.size = 0

    def __len__(self):
        """Return number of items"""
        return self.size

    def __contains__(self, key):
        """Support 'in' operator"""
        return self.contains(key)

    def __getitem__(self, key):
        """Support dict[key] syntax"""
        return self.get(key)

    def __setitem__(self, key, value):
        """Support dict[key] = value syntax"""
        self.insert(key, value)

    def __delitem__(self, key):
        """Support del dict[key] syntax"""
        self.delete(key)

    def __str__(self):
        """String representation"""
        items = [f"{k!r}: {v!r}" for k, v in self.items()]
        return "{" + ", ".join(items) + "}"

    def __repr__(self):
        return f"HashTable({self})"

# Usage
ht = HashTable()
ht['name'] = 'Alice'
ht['age'] = 30
print(ht['name'])  # Alice
print('age' in ht)  # True
del ht['age']
print(len(ht))  # 1

class HashTable:
    def __init__(self, capacity=10):
        self.capacity = capacity
        self.size = 0
        self.table = [[] for _ in range(capacity)]

    def _hash(self, key):
        """Hash function using Python's built-in hash"""
        return hash(key) % self.capacity

    def _load_factor(self):
        """Calculate current load factor"""
        return self.size / self.capacity

    def _resize(self):
        """Resize table when load factor exceeds 0.7"""
        old_table = self.table
        self.capacity *= 2
        self.table = [[] for _ in range(self.capacity)]
        self.size = 0

        for bucket in old_table:
            for key, value in bucket:
                self.insert(key, value)

    def insert(self, key, value):
        """Insert or update key-value pair - O(1) average"""
        # Resize if needed
        if self._load_factor() > 0.7:
            self._resize()

        index = self._hash(key)
        bucket = self.table[index]

        # Update if key exists
        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket[i] = (key, value)
                return

        # Add new key-value pair
        bucket.append((key, value))
        self.size += 1

    def get(self, key):
        """Get value by key - O(1) average"""
        index = self._hash(key)
        bucket = self.table[index]

        for k, v in bucket:
            if k == key:
                return v

        raise KeyError(f"Key '{key}' not found")

    def delete(self, key):
        """Delete key-value pair - O(1) average"""
        index = self._hash(key)
        bucket = self.table[index]

        for i, (k, v) in enumerate(bucket):
            if k == key:
                del bucket[i]
                self.size -= 1
                return v

        raise KeyError(f"Key '{key}' not found")

    def contains(self, key):
        """Check if key exists - O(1) average"""
        try:
            self.get(key)
            return True
        except KeyError:
            return False

    def keys(self):
        """Get all keys"""
        all_keys = []
        for bucket in self.table:
            for key, _ in bucket:
                all_keys.append(key)
        return all_keys

    def values(self):
        """Get all values"""
        all_values = []
        for bucket in self.table:
            for _, value in bucket:
                all_values.append(value)
        return all_values

    def items(self):
        """Get all key-value pairs"""
        all_items = []
        for bucket in self.table:
            for key, value in bucket:
                all_items.append((key, value))
        return all_items

    def clear(self):
        """Remove all items"""
        self.table = [[] for _ in range(self.capacity)]
        self.size = 0

    def __len__(self):
        """Return number of items"""
        return self.size

    def __contains__(self, key):
        """Support 'in' operator"""
        return self.contains(key)

    def __getitem__(self, key):
        """Support dict[key] syntax"""
        return self.get(key)

    def __setitem__(self, key, value):
        """Support dict[key] = value syntax"""
        self.insert(key, value)

    def __delitem__(self, key):
        """Support del dict[key] syntax"""
        self.delete(key)

    def __str__(self):
        """String representation"""
        items = [f"{k!r}: {v!r}" for k, v in self.items()]
        return "{" + ", ".join(items) + "}"

    def __repr__(self):
        return f"HashTable({self})"

# Usage
ht = HashTable()
ht['name'] = 'Alice'
ht['age'] = 30
print(ht['name'])  # Alice
print('age' in ht)  # True
del ht['age']
print(len(ht))  # 1

Python

Common Operations

1. Frequency Counter

def count_frequency(arr):
    """Count frequency of elements - O(n)"""
    freq = {}
    for item in arr:
        freq[item] = freq.get(item, 0) + 1
    return freq

# Using defaultdict
from collections import defaultdict

def count_frequency_defaultdict(arr):
    freq = defaultdict(int)
    for item in arr:
        freq[item] += 1
    return dict(freq)

# Using Counter
from collections import Counter

def count_frequency_counter(arr):
    return Counter(arr)

# Example
arr = [1, 2, 2, 3, 3, 3, 4]
print(count_frequency(arr))  # {1: 1, 2: 2, 3: 3, 4: 1}

def count_frequency(arr):
    """Count frequency of elements - O(n)"""
    freq = {}
    for item in arr:
        freq[item] = freq.get(item, 0) + 1
    return freq

# Using defaultdict
from collections import defaultdict

def count_frequency_defaultdict(arr):
    freq = defaultdict(int)
    for item in arr:
        freq[item] += 1
    return dict(freq)

# Using Counter
from collections import Counter

def count_frequency_counter(arr):
    return Counter(arr)

# Example
arr = [1, 2, 2, 3, 3, 3, 4]
print(count_frequency(arr))  # {1: 1, 2: 2, 3: 3, 4: 1}

Python

2. Group Anagrams

def group_anagrams(words):
    """
    Group words that are anagrams - O(n * k log k)
    where n is number of words, k is max word length
    """
    groups = {}

    for word in words:
        # Sort characters as key
        key = ''.join(sorted(word))
        if key not in groups:
            groups[key] = []
        groups[key].append(word)

    return list(groups.values())

# Using defaultdict
from collections import defaultdict

def group_anagrams_v2(words):
    groups = defaultdict(list)
    for word in words:
        key = ''.join(sorted(word))
        groups[key].append(word)
    return list(groups.values())

# Example
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(words))
# [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]

def group_anagrams(words):
    """
    Group words that are anagrams - O(n * k log k)
    where n is number of words, k is max word length
    """
    groups = {}

    for word in words:
        # Sort characters as key
        key = ''.join(sorted(word))
        if key not in groups:
            groups[key] = []
        groups[key].append(word)

    return list(groups.values())

# Using defaultdict
from collections import defaultdict

def group_anagrams_v2(words):
    groups = defaultdict(list)
    for word in words:
        key = ''.join(sorted(word))
        groups[key].append(word)
    return list(groups.values())

# Example
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(words))
# [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]

Python

3. Two Sum Problem

def two_sum(nums, target):
    """
    Find two numbers that sum to target - O(n)
    """
    seen = {}

    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]
        seen[num] = i

    return []

# Example
nums = [2, 7, 11, 15]
target = 9
print(two_sum(nums, target))  # [0, 1]

def two_sum(nums, target):
    """
    Find two numbers that sum to target - O(n)
    """
    seen = {}

    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]
        seen[num] = i

    return []

# Example
nums = [2, 7, 11, 15]
target = 9
print(two_sum(nums, target))  # [0, 1]

Python

4. First Unique Character

def first_unique_char(s):
    """
    Find first non-repeating character - O(n)
    """
    freq = {}

    # Count frequencies
    for char in s:
        freq[char] = freq.get(char, 0) + 1

    # Find first with frequency 1
    for i, char in enumerate(s):
        if freq[char] == 1:
            return i

    return -1

# Example
s = "leetcode"
print(first_unique_char(s))  # 0 ('l')

def first_unique_char(s):
    """
    Find first non-repeating character - O(n)
    """
    freq = {}

    # Count frequencies
    for char in s:
        freq[char] = freq.get(char, 0) + 1

    # Find first with frequency 1
    for i, char in enumerate(s):
        if freq[char] == 1:
            return i

    return -1

# Example
s = "leetcode"
print(first_unique_char(s))  # 0 ('l')

Python

5. Subarray Sum Equals K

def subarray_sum_k(nums, k):
    """
    Count subarrays with sum equal to k - O(n)
    """
    count = 0
    prefix_sum = 0
    sum_count = {0: 1}  # prefix_sum: frequency

    for num in nums:
        prefix_sum += num

        # Check if (prefix_sum - k) exists
        if (prefix_sum - k) in sum_count:
            count += sum_count[prefix_sum - k]

        # Add current prefix_sum
        sum_count[prefix_sum] = sum_count.get(prefix_sum, 0) + 1

    return count

# Example
nums = [1, 1, 1]
k = 2
print(subarray_sum_k(nums, k))  # 2

def subarray_sum_k(nums, k):
    """
    Count subarrays with sum equal to k - O(n)
    """
    count = 0
    prefix_sum = 0
    sum_count = {0: 1}  # prefix_sum: frequency

    for num in nums:
        prefix_sum += num

        # Check if (prefix_sum - k) exists
        if (prefix_sum - k) in sum_count:
            count += sum_count[prefix_sum - k]

        # Add current prefix_sum
        sum_count[prefix_sum] = sum_count.get(prefix_sum, 0) + 1

    return count

# Example
nums = [1, 1, 1]
k = 2
print(subarray_sum_k(nums, k))  # 2

Python

Time and Space Complexity

Average Case Complexity

Operation	Time	Space
Insert	O(1)	O(1)
Delete	O(1)	O(1)
Search	O(1)	O(1)
Access by key	O(1)	O(1)

Worst Case Complexity

Operation	Time	Space
Insert	O(n)	O(1)
Delete	O(n)	O(1)
Search	O(n)	O(1)

Note: Worst case O(n) occurs when all keys hash to same index (all collisions).

Load Factor Impact

Load Factor = Number of Elements / Table Size

Optimal Load Factor: 0.7 - 0.75

- Too low: Wastes memory
- Too high: More collisions, slower operations

Load Factor = Number of Elements / Table Size

Optimal Load Factor: 0.7 - 0.75

- Too low: Wastes memory
- Too high: More collisions, slower operations

Python

Hash Table Patterns

1. Hash Set Pattern

# Check for duplicates - O(n)
def contains_duplicate(nums):
    seen = set()
    for num in nums:
        if num in seen:
            return True
        seen.add(num)
    return False

# Alternative
def contains_duplicate_v2(nums):
    return len(nums) != len(set(nums))

# Check for duplicates - O(n)
def contains_duplicate(nums):
    seen = set()
    for num in nums:
        if num in seen:
            return True
        seen.add(num)
    return False

# Alternative
def contains_duplicate_v2(nums):
    return len(nums) != len(set(nums))

Python

2. Hash Map for Counting

# Most frequent element
def most_frequent(nums):
    """O(n) time, O(n) space"""
    freq = {}
    max_count = 0
    result = None

    for num in nums:
        freq[num] = freq.get(num, 0) + 1
        if freq[num] > max_count:
            max_count = freq[num]
            result = num

    return result

# Using Counter
from collections import Counter

def most_frequent_v2(nums):
    return Counter(nums).most_common(1)[0][0]

# Most frequent element
def most_frequent(nums):
    """O(n) time, O(n) space"""
    freq = {}
    max_count = 0
    result = None

    for num in nums:
        freq[num] = freq.get(num, 0) + 1
        if freq[num] > max_count:
            max_count = freq[num]
            result = num

    return result

# Using Counter
from collections import Counter

def most_frequent_v2(nums):
    return Counter(nums).most_common(1)[0][0]

Python

3. Hash Map for Grouping

# Group elements by property
def group_by_length(words):
    """Group words by length"""
    groups = {}
    for word in words:
        length = len(word)
        if length not in groups:
            groups[length] = []
        groups[length].append(word)
    return groups

# Using defaultdict
from collections import defaultdict

def group_by_length_v2(words):
    groups = defaultdict(list)
    for word in words:
        groups[len(word)].append(word)
    return dict(groups)

# Group elements by property
def group_by_length(words):
    """Group words by length"""
    groups = {}
    for word in words:
        length = len(word)
        if length not in groups:
            groups[length] = []
        groups[length].append(word)
    return groups

# Using defaultdict
from collections import defaultdict

def group_by_length_v2(words):
    groups = defaultdict(list)
    for word in words:
        groups[len(word)].append(word)
    return dict(groups)

Python

4. Hash Map for Tracking Indices

# Find all indices of target sum
def two_sum_all(nums, target):
    """Find all pairs that sum to target"""
    seen = {}
    pairs = []

    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            for j in seen[complement]:
                pairs.append([j, i])

        if num not in seen:
            seen[num] = []
        seen[num].append(i)

    return pairs

# Find all indices of target sum
def two_sum_all(nums, target):
    """Find all pairs that sum to target"""
    seen = {}
    pairs = []

    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            for j in seen[complement]:
                pairs.append([j, i])

        if num not in seen:
            seen[num] = []
        seen[num].append(i)

    return pairs

Python

5. Sliding Window with Hash Map

# Longest substring without repeating characters
def longest_unique_substring(s):
    """O(n) time, O(min(m, n)) space"""
    char_index = {}
    max_length = 0
    start = 0

    for end, char in enumerate(s):
        if char in char_index and char_index[char] >= start:
            start = char_index[char] + 1

        char_index[char] = end
        max_length = max(max_length, end - start + 1)

    return max_length

# Example
s = "abcabcbb"
print(longest_unique_substring(s))  # 3 ("abc")

# Longest substring without repeating characters
def longest_unique_substring(s):
    """O(n) time, O(min(m, n)) space"""
    char_index = {}
    max_length = 0
    start = 0

    for end, char in enumerate(s):
        if char in char_index and char_index[char] >= start:
            start = char_index[char] + 1

        char_index[char] = end
        max_length = max(max_length, end - start + 1)

    return max_length

# Example
s = "abcabcbb"
print(longest_unique_substring(s))  # 3 ("abc")

Python

6. Hash Map for Prefix/Suffix

# Longest common prefix using hash map
def longest_common_prefix(strs):
    if not strs:
        return ""

    prefix_map = {}

    for i, char in enumerate(strs[0]):
        prefix = strs[0][:i+1]
        if all(s.startswith(prefix) for s in strs):
            prefix_map[len(prefix)] = prefix
        else:
            break

    if not prefix_map:
        return ""

    max_len = max(prefix_map.keys())
    return prefix_map[max_len]

# Longest common prefix using hash map
def longest_common_prefix(strs):
    if not strs:
        return ""

    prefix_map = {}

    for i, char in enumerate(strs[0]):
        prefix = strs[0][:i+1]
        if all(s.startswith(prefix) for s in strs):
            prefix_map[len(prefix)] = prefix
        else:
            break

    if not prefix_map:
        return ""

    max_len = max(prefix_map.keys())
    return prefix_map[max_len]

Python

Advanced Concepts

1. LRU Cache Implementation

from collections import OrderedDict

class LRUCache:
    """Least Recently Used Cache using OrderedDict"""

    def __init__(self, capacity):
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key):
        """Get value and mark as recently used"""
        if key not in self.cache:
            return -1

        # Move to end (most recent)
        self.cache.move_to_end(key)
        return self.cache[key]

    def put(self, key, value):
        """Put key-value and mark as recently used"""
        if key in self.cache:
            # Update and move to end
            self.cache.move_to_end(key)

        self.cache[key] = value

        # Remove least recently used if over capacity
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)

# Manual implementation with doubly linked list
class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None

class LRUCacheManual:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = {}  # key -> Node
        self.head = Node(0, 0)  # Dummy head
        self.tail = Node(0, 0)  # Dummy tail
        self.head.next = self.tail
        self.tail.prev = self.head

    def _add_node(self, node):
        """Add node right after head"""
        node.prev = self.head
        node.next = self.head.next
        self.head.next.prev = node
        self.head.next = node

    def _remove_node(self, node):
        """Remove node from list"""
        prev = node.prev
        next = node.next
        prev.next = next
        next.prev = prev

    def _move_to_head(self, node):
        """Move node to head (most recent)"""
        self._remove_node(node)
        self._add_node(node)

    def _pop_tail(self):
        """Remove least recently used node"""
        node = self.tail.prev
        self._remove_node(node)
        return node

    def get(self, key):
        if key not in self.cache:
            return -1

        node = self.cache[key]
        self._move_to_head(node)
        return node.value

    def put(self, key, value):
        if key in self.cache:
            node = self.cache[key]
            node.value = value
            self._move_to_head(node)
        else:
            node = Node(key, value)
            self.cache[key] = node
            self._add_node(node)

            if len(self.cache) > self.capacity:
                tail = self._pop_tail()
                del self.cache[tail.key]

from collections import OrderedDict

class LRUCache:
    """Least Recently Used Cache using OrderedDict"""

    def __init__(self, capacity):
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key):
        """Get value and mark as recently used"""
        if key not in self.cache:
            return -1

        # Move to end (most recent)
        self.cache.move_to_end(key)
        return self.cache[key]

    def put(self, key, value):
        """Put key-value and mark as recently used"""
        if key in self.cache:
            # Update and move to end
            self.cache.move_to_end(key)

        self.cache[key] = value

        # Remove least recently used if over capacity
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)

# Manual implementation with doubly linked list
class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None

class LRUCacheManual:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = {}  # key -> Node
        self.head = Node(0, 0)  # Dummy head
        self.tail = Node(0, 0)  # Dummy tail
        self.head.next = self.tail
        self.tail.prev = self.head

    def _add_node(self, node):
        """Add node right after head"""
        node.prev = self.head
        node.next = self.head.next
        self.head.next.prev = node
        self.head.next = node

    def _remove_node(self, node):
        """Remove node from list"""
        prev = node.prev
        next = node.next
        prev.next = next
        next.prev = prev

    def _move_to_head(self, node):
        """Move node to head (most recent)"""
        self._remove_node(node)
        self._add_node(node)

    def _pop_tail(self):
        """Remove least recently used node"""
        node = self.tail.prev
        self._remove_node(node)
        return node

    def get(self, key):
        if key not in self.cache:
            return -1

        node = self.cache[key]
        self._move_to_head(node)
        return node.value

    def put(self, key, value):
        if key in self.cache:
            node = self.cache[key]
            node.value = value
            self._move_to_head(node)
        else:
            node = Node(key, value)
            self.cache[key] = node
            self._add_node(node)

            if len(self.cache) > self.capacity:
                tail = self._pop_tail()
                del self.cache[tail.key]

Python

2. Consistent Hashing

import bisect
import hashlib

class ConsistentHashing:
    """Used in distributed systems for load balancing"""

    def __init__(self, nodes=None, replicas=3):
        self.replicas = replicas
        self.ring = {}  # hash -> node
        self.sorted_keys = []

        if nodes:
            for node in nodes:
                self.add_node(node)

    def _hash(self, key):
        """Hash function using MD5"""
        return int(hashlib.md5(str(key).encode()).hexdigest(), 16)

    def add_node(self, node):
        """Add node to the ring"""
        for i in range(self.replicas):
            virtual_key = f"{node}:{i}"
            hash_key = self._hash(virtual_key)
            self.ring[hash_key] = node
            bisect.insort(self.sorted_keys, hash_key)

    def remove_node(self, node):
        """Remove node from the ring"""
        for i in range(self.replicas):
            virtual_key = f"{node}:{i}"
            hash_key = self._hash(virtual_key)
            del self.ring[hash_key]
            self.sorted_keys.remove(hash_key)

    def get_node(self, key):
        """Get node responsible for key"""
        if not self.ring:
            return None

        hash_key = self._hash(key)
        idx = bisect.bisect(self.sorted_keys, hash_key)

        # Wrap around if necessary
        if idx == len(self.sorted_keys):
            idx = 0

        return self.ring[self.sorted_keys[idx]]

import bisect
import hashlib

class ConsistentHashing:
    """Used in distributed systems for load balancing"""

    def __init__(self, nodes=None, replicas=3):
        self.replicas = replicas
        self.ring = {}  # hash -> node
        self.sorted_keys = []

        if nodes:
            for node in nodes:
                self.add_node(node)

    def _hash(self, key):
        """Hash function using MD5"""
        return int(hashlib.md5(str(key).encode()).hexdigest(), 16)

    def add_node(self, node):
        """Add node to the ring"""
        for i in range(self.replicas):
            virtual_key = f"{node}:{i}"
            hash_key = self._hash(virtual_key)
            self.ring[hash_key] = node
            bisect.insort(self.sorted_keys, hash_key)

    def remove_node(self, node):
        """Remove node from the ring"""
        for i in range(self.replicas):
            virtual_key = f"{node}:{i}"
            hash_key = self._hash(virtual_key)
            del self.ring[hash_key]
            self.sorted_keys.remove(hash_key)

    def get_node(self, key):
        """Get node responsible for key"""
        if not self.ring:
            return None

        hash_key = self._hash(key)
        idx = bisect.bisect(self.sorted_keys, hash_key)

        # Wrap around if necessary
        if idx == len(self.sorted_keys):
            idx = 0

        return self.ring[self.sorted_keys[idx]]

Python

3. Bloom Filter

import hashlib

class BloomFilter:
    """Probabilistic data structure for membership testing"""

    def __init__(self, size=1000, hash_count=3):
        self.size = size
        self.hash_count = hash_count
        self.bit_array = [False] * size

    def _hashes(self, item):
        """Generate multiple hash values"""
        hashes = []
        for i in range(self.hash_count):
            hash_input = f"{item}{i}".encode()
            hash_value = int(hashlib.md5(hash_input).hexdigest(), 16)
            hashes.append(hash_value % self.size)
        return hashes

    def add(self, item):
        """Add item to filter"""
        for hash_value in self._hashes(item):
            self.bit_array[hash_value] = True

    def contains(self, item):
        """Check if item might be in filter"""
        return all(self.bit_array[h] for h in self._hashes(item))

    # Note: False positives possible, false negatives impossible

import hashlib

class BloomFilter:
    """Probabilistic data structure for membership testing"""

    def __init__(self, size=1000, hash_count=3):
        self.size = size
        self.hash_count = hash_count
        self.bit_array = [False] * size

    def _hashes(self, item):
        """Generate multiple hash values"""
        hashes = []
        for i in range(self.hash_count):
            hash_input = f"{item}{i}".encode()
            hash_value = int(hashlib.md5(hash_input).hexdigest(), 16)
            hashes.append(hash_value % self.size)
        return hashes

    def add(self, item):
        """Add item to filter"""
        for hash_value in self._hashes(item):
            self.bit_array[hash_value] = True

    def contains(self, item):
        """Check if item might be in filter"""
        return all(self.bit_array[h] for h in self._hashes(item))

    # Note: False positives possible, false negatives impossible

Python

4. Count-Min Sketch

import hashlib

class CountMinSketch:
    """Probabilistic frequency counter for streams"""

    def __init__(self, width=1000, depth=5):
        self.width = width
        self.depth = depth
        self.table = [[0] * width for _ in range(depth)]

    def _hashes(self, item):
        """Generate hash values for each row"""
        hashes = []
        for i in range(self.depth):
            hash_input = f"{item}{i}".encode()
            hash_value = int(hashlib.md5(hash_input).hexdigest(), 16)
            hashes.append(hash_value % self.width)
        return hashes

    def add(self, item, count=1):
        """Add item to sketch"""
        for i, hash_value in enumerate(self._hashes(item)):
            self.table[i][hash_value] += count

    def estimate(self, item):
        """Estimate frequency of item"""
        return min(
            self.table[i][h]
            for i, h in enumerate(self._hashes(item))
        )

import hashlib

class CountMinSketch:
    """Probabilistic frequency counter for streams"""

    def __init__(self, width=1000, depth=5):
        self.width = width
        self.depth = depth
        self.table = [[0] * width for _ in range(depth)]

    def _hashes(self, item):
        """Generate hash values for each row"""
        hashes = []
        for i in range(self.depth):
            hash_input = f"{item}{i}".encode()
            hash_value = int(hashlib.md5(hash_input).hexdigest(), 16)
            hashes.append(hash_value % self.width)
        return hashes

    def add(self, item, count=1):
        """Add item to sketch"""
        for i, hash_value in enumerate(self._hashes(item)):
            self.table[i][hash_value] += count

    def estimate(self, item):
        """Estimate frequency of item"""
        return min(
            self.table[i][h]
            for i, h in enumerate(self._hashes(item))
        )

Python

5. HashMap with Expiry (Cache)

import time

class ExpiringHashMap:
    """HashMap with time-based expiration"""

    def __init__(self):
        self.store = {}  # key -> (value, expiry_time)

    def put(self, key, value, ttl=None):
        """
        Put key-value with optional TTL (time to live) in seconds
        """
        expiry = None
        if ttl is not None:
            expiry = time.time() + ttl
        self.store[key] = (value, expiry)

    def get(self, key):
        """Get value if not expired"""
        if key not in self.store:
            return None

        value, expiry = self.store[key]

        # Check if expired
        if expiry is not None and time.time() > expiry:
            del self.store[key]
            return None

        return value

    def cleanup(self):
        """Remove all expired entries"""
        current_time = time.time()
        expired_keys = [
            key for key, (_, expiry) in self.store.items()
            if expiry is not None and current_time > expiry
        ]
        for key in expired_keys:
            del self.store[key]

import time

class ExpiringHashMap:
    """HashMap with time-based expiration"""

    def __init__(self):
        self.store = {}  # key -> (value, expiry_time)

    def put(self, key, value, ttl=None):
        """
        Put key-value with optional TTL (time to live) in seconds
        """
        expiry = None
        if ttl is not None:
            expiry = time.time() + ttl
        self.store[key] = (value, expiry)

    def get(self, key):
        """Get value if not expired"""
        if key not in self.store:
            return None

        value, expiry = self.store[key]

        # Check if expired
        if expiry is not None and time.time() > expiry:
            del self.store[key]
            return None

        return value

    def cleanup(self):
        """Remove all expired entries"""
        current_time = time.time()
        expired_keys = [
            key for key, (_, expiry) in self.store.items()
            if expiry is not None and current_time > expiry
        ]
        for key in expired_keys:
            del self.store[key]

Python

Practice Problems

Easy Level

1. Valid Anagram

def is_anagram(s, t):
    """
    Check if two strings are anagrams
    Time: O(n), Space: O(1) - limited to 26 letters
    """
    if len(s) != len(t):
        return False

    char_count = {}

    for char in s:
        char_count[char] = char_count.get(char, 0) + 1

    for char in t:
        if char not in char_count:
            return False
        char_count[char] -= 1
        if char_count[char] < 0:
            return False

    return all(count == 0 for count in char_count.values())

# Alternative using Counter
from collections import Counter

def is_anagram_v2(s, t):
    return Counter(s) == Counter(t)

# Alternative using sorting
def is_anagram_v3(s, t):
    return sorted(s) == sorted(t)

def is_anagram(s, t):
    """
    Check if two strings are anagrams
    Time: O(n), Space: O(1) - limited to 26 letters
    """
    if len(s) != len(t):
        return False

    char_count = {}

    for char in s:
        char_count[char] = char_count.get(char, 0) + 1

    for char in t:
        if char not in char_count:
            return False
        char_count[char] -= 1
        if char_count[char] < 0:
            return False

    return all(count == 0 for count in char_count.values())

# Alternative using Counter
from collections import Counter

def is_anagram_v2(s, t):
    return Counter(s) == Counter(t)

# Alternative using sorting
def is_anagram_v3(s, t):
    return sorted(s) == sorted(t)

Python

2. Contains Duplicate

def contains_duplicate(nums):
    """
    Check if array contains any duplicates
    Time: O(n), Space: O(n)
    """
    seen = set()
    for num in nums:
        if num in seen:
            return True
        seen.add(num)
    return False

# One-liner
def contains_duplicate_v2(nums):
    return len(nums) != len(set(nums))

def contains_duplicate(nums):
    """
    Check if array contains any duplicates
    Time: O(n), Space: O(n)
    """
    seen = set()
    for num in nums:
        if num in seen:
            return True
        seen.add(num)
    return False

# One-liner
def contains_duplicate_v2(nums):
    return len(nums) != len(set(nums))

Python

3. Single Number

def single_number(nums):
    """
    Find number that appears once (others appear twice)
    Time: O(n), Space: O(n)
    """
    count = {}
    for num in nums:
        count[num] = count.get(num, 0) + 1

    for num, freq in count.items():
        if freq == 1:
            return num

# Optimized using XOR - O(1) space
def single_number_xor(nums):
    result = 0
    for num in nums:
        result ^= num
    return result

def single_number(nums):
    """
    Find number that appears once (others appear twice)
    Time: O(n), Space: O(n)
    """
    count = {}
    for num in nums:
        count[num] = count.get(num, 0) + 1

    for num, freq in count.items():
        if freq == 1:
            return num

# Optimized using XOR - O(1) space
def single_number_xor(nums):
    result = 0
    for num in nums:
        result ^= num
    return result

Python

4. Intersection of Two Arrays

def intersection(nums1, nums2):
    """
    Find intersection of two arrays
    Time: O(n + m), Space: O(min(n, m))
    """
    return list(set(nums1) & set(nums2))

# Alternative
def intersection_v2(nums1, nums2):
    set1 = set(nums1)
    result = set()
    for num in nums2:
        if num in set1:
            result.add(num)
    return list(result)

def intersection(nums1, nums2):
    """
    Find intersection of two arrays
    Time: O(n + m), Space: O(min(n, m))
    """
    return list(set(nums1) & set(nums2))

# Alternative
def intersection_v2(nums1, nums2):
    set1 = set(nums1)
    result = set()
    for num in nums2:
        if num in set1:
            result.add(num)
    return list(result)

Python

5. Happy Number

def is_happy(n):
    """
    Determine if number is happy number
    Time: O(log n), Space: O(log n)
    """
    seen = set()

    while n != 1 and n not in seen:
        seen.add(n)
        n = sum(int(digit) ** 2 for digit in str(n))

    return n == 1

def is_happy(n):
    """
    Determine if number is happy number
    Time: O(log n), Space: O(log n)
    """
    seen = set()

    while n != 1 and n not in seen:
        seen.add(n)
        n = sum(int(digit) ** 2 for digit in str(n))

    return n == 1

Python

Medium Level

1. Group Anagrams (Revisited)

def group_anagrams(strs):
    """
    Time: O(n * k log k), Space: O(n * k)
    """
    from collections import defaultdict

    groups = defaultdict(list)

    for s in strs:
        # Use sorted tuple as key
        key = tuple(sorted(s))
        groups[key].append(s)

    return list(groups.values())

# Optimized using character count
def group_anagrams_v2(strs):
    from collections import defaultdict

    groups = defaultdict(list)

    for s in strs:
        # Use character count as key
        count = [0] * 26
        for char in s:
            count[ord(char) - ord('a')] += 1
        groups[tuple(count)].append(s)

    return list(groups.values())

def group_anagrams(strs):
    """
    Time: O(n * k log k), Space: O(n * k)
    """
    from collections import defaultdict

    groups = defaultdict(list)

    for s in strs:
        # Use sorted tuple as key
        key = tuple(sorted(s))
        groups[key].append(s)

    return list(groups.values())

# Optimized using character count
def group_anagrams_v2(strs):
    from collections import defaultdict

    groups = defaultdict(list)

    for s in strs:
        # Use character count as key
        count = [0] * 26
        for char in s:
            count[ord(char) - ord('a')] += 1
        groups[tuple(count)].append(s)

    return list(groups.values())

Python

2. Top K Frequent Elements

def top_k_frequent(nums, k):
    """
    Time: O(n), Space: O(n)
    """
    from collections import Counter

    count = Counter(nums)
    return [num for num, _ in count.most_common(k)]

# Using bucket sort
def top_k_frequent_v2(nums, k):
    count = {}
    for num in nums:
        count[num] = count.get(num, 0) + 1

    # Bucket sort by frequency
    buckets = [[] for _ in range(len(nums) + 1)]
    for num, freq in count.items():
        buckets[freq].append(num)

    # Collect top k
    result = []
    for i in range(len(buckets) - 1, -1, -1):
        result.extend(buckets[i])
        if len(result) >= k:
            return result[:k]

    return result

def top_k_frequent(nums, k):
    """
    Time: O(n), Space: O(n)
    """
    from collections import Counter

    count = Counter(nums)
    return [num for num, _ in count.most_common(k)]

# Using bucket sort
def top_k_frequent_v2(nums, k):
    count = {}
    for num in nums:
        count[num] = count.get(num, 0) + 1

    # Bucket sort by frequency
    buckets = [[] for _ in range(len(nums) + 1)]
    for num, freq in count.items():
        buckets[freq].append(num)

    # Collect top k
    result = []
    for i in range(len(buckets) - 1, -1, -1):
        result.extend(buckets[i])
        if len(result) >= k:
            return result[:k]

    return result

Python

3. Longest Consecutive Sequence

def longest_consecutive(nums):
    """
    Time: O(n), Space: O(n)
    """
    if not nums:
        return 0

    num_set = set(nums)
    longest = 0

    for num in num_set:
        # Only count from sequence start
        if num - 1 not in num_set:
            current_num = num
            current_length = 1

            while current_num + 1 in num_set:
                current_num += 1
                current_length += 1

            longest = max(longest, current_length)

    return longest

def longest_consecutive(nums):
    """
    Time: O(n), Space: O(n)
    """
    if not nums:
        return 0

    num_set = set(nums)
    longest = 0

    for num in num_set:
        # Only count from sequence start
        if num - 1 not in num_set:
            current_num = num
            current_length = 1

            while current_num + 1 in num_set:
                current_num += 1
                current_length += 1

            longest = max(longest, current_length)

    return longest

Python

4. 4Sum II

def four_sum_count(nums1, nums2, nums3, nums4):
    """
    Count tuples (i, j, k, l) where nums1[i] + nums2[j] + nums3[k] + nums4[l] == 0
    Time: O(n²), Space: O(n²)
    """
    # Store sums of nums1 and nums2
    sum_count = {}
    for a in nums1:
        for b in nums2:
            sum_ab = a + b
            sum_count[sum_ab] = sum_count.get(sum_ab, 0) + 1

    # Find complement in nums3 and nums4
    count = 0
    for c in nums3:
        for d in nums4:
            target = -(c + d)
            count += sum_count.get(target, 0)

    return count

def four_sum_count(nums1, nums2, nums3, nums4):
    """
    Count tuples (i, j, k, l) where nums1[i] + nums2[j] + nums3[k] + nums4[l] == 0
    Time: O(n²), Space: O(n²)
    """
    # Store sums of nums1 and nums2
    sum_count = {}
    for a in nums1:
        for b in nums2:
            sum_ab = a + b
            sum_count[sum_ab] = sum_count.get(sum_ab, 0) + 1

    # Find complement in nums3 and nums4
    count = 0
    for c in nums3:
        for d in nums4:
            target = -(c + d)
            count += sum_count.get(target, 0)

    return count

Python

5. Find Duplicate Subtrees

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

def find_duplicate_subtrees(root):
    """
    Find all duplicate subtrees in binary tree
    Time: O(n), Space: O(n)
    """
    from collections import defaultdict

    subtrees = defaultdict(int)
    duplicates = []

    def serialize(node):
        if not node:
            return "#"

        # Serialize subtree
        s = f"{node.val},{serialize(node.left)},{serialize(node.right)}"

        subtrees[s] += 1
        if subtrees[s] == 2:
            duplicates.append(node)

        return s

    serialize(root)
    return duplicates

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

def find_duplicate_subtrees(root):
    """
    Find all duplicate subtrees in binary tree
    Time: O(n), Space: O(n)
    """
    from collections import defaultdict

    subtrees = defaultdict(int)
    duplicates = []

    def serialize(node):
        if not node:
            return "#"

        # Serialize subtree
        s = f"{node.val},{serialize(node.left)},{serialize(node.right)}"

        subtrees[s] += 1
        if subtrees[s] == 2:
            duplicates.append(node)

        return s

    serialize(root)
    return duplicates

Python

Hard Level

1. Longest Substring with At Most K Distinct Characters

def length_of_longest_substring_k_distinct(s, k):
    """
    Time: O(n), Space: O(k)
    """
    if k == 0:
        return 0

    char_count = {}
    max_length = 0
    left = 0

    for right, char in enumerate(s):
        char_count[char] = char_count.get(char, 0) + 1

        while len(char_count) > k:
            left_char = s[left]
            char_count[left_char] -= 1
            if char_count[left_char] == 0:
                del char_count[left_char]
            left += 1

        max_length = max(max_length, right - left + 1)

    return max_length

def length_of_longest_substring_k_distinct(s, k):
    """
    Time: O(n), Space: O(k)
    """
    if k == 0:
        return 0

    char_count = {}
    max_length = 0
    left = 0

    for right, char in enumerate(s):
        char_count[char] = char_count.get(char, 0) + 1

        while len(char_count) > k:
            left_char = s[left]
            char_count[left_char] -= 1
            if char_count[left_char] == 0:
                del char_count[left_char]
            left += 1

        max_length = max(max_length, right - left + 1)

    return max_length

Python

2. Minimum Window Substring

def min_window(s, t):
    """
    Find minimum window in s that contains all characters of t
    Time: O(n + m), Space: O(m)
    """
    if not s or not t:
        return ""

    from collections import Counter

    target_count = Counter(t)
    required = len(target_count)

    window_count = {}
    formed = 0

    left = 0
    min_len = float('inf')
    min_window = (0, 0)

    for right, char in enumerate(s):
        window_count[char] = window_count.get(char, 0) + 1

        if char in target_count and window_count[char] == target_count[char]:
            formed += 1

        while left <= right and formed == required:
            # Update minimum window
            if right - left + 1 < min_len:
                min_len = right - left + 1
                min_window = (left, right)

            # Shrink window
            left_char = s[left]
            window_count[left_char] -= 1
            if left_char in target_count and window_count[left_char] < target_count[left_char]:
                formed -= 1
            left += 1

    left, right = min_window
    return s[left:right + 1] if min_len != float('inf') else ""

def min_window(s, t):
    """
    Find minimum window in s that contains all characters of t
    Time: O(n + m), Space: O(m)
    """
    if not s or not t:
        return ""

    from collections import Counter

    target_count = Counter(t)
    required = len(target_count)

    window_count = {}
    formed = 0

    left = 0
    min_len = float('inf')
    min_window = (0, 0)

    for right, char in enumerate(s):
        window_count[char] = window_count.get(char, 0) + 1

        if char in target_count and window_count[char] == target_count[char]:
            formed += 1

        while left <= right and formed == required:
            # Update minimum window
            if right - left + 1 < min_len:
                min_len = right - left + 1
                min_window = (left, right)

            # Shrink window
            left_char = s[left]
            window_count[left_char] -= 1
            if left_char in target_count and window_count[left_char] < target_count[left_char]:
                formed -= 1
            left += 1

    left, right = min_window
    return s[left:right + 1] if min_len != float('inf') else ""

Python

3. Substring with Concatenation of All Words

def find_substring(s, words):
    """
    Find all starting indices of concatenated substrings
    Time: O(n * m * k), Space: O(m)
    """
    if not s or not words:
        return []

    from collections import Counter

    word_len = len(words[0])
    word_count = len(words)
    total_len = word_len * word_count

    word_freq = Counter(words)
    result = []

    for i in range(len(s) - total_len + 1):
        seen = {}
        for j in range(word_count):
            start = i + j * word_len
            word = s[start:start + word_len]

            if word not in word_freq:
                break

            seen[word] = seen.get(word, 0) + 1
            if seen[word] > word_freq[word]:
                break
        else:
            result.append(i)

    return result

def find_substring(s, words):
    """
    Find all starting indices of concatenated substrings
    Time: O(n * m * k), Space: O(m)
    """
    if not s or not words:
        return []

    from collections import Counter

    word_len = len(words[0])
    word_count = len(words)
    total_len = word_len * word_count

    word_freq = Counter(words)
    result = []

    for i in range(len(s) - total_len + 1):
        seen = {}
        for j in range(word_count):
            start = i + j * word_len
            word = s[start:start + word_len]

            if word not in word_freq:
                break

            seen[word] = seen.get(word, 0) + 1
            if seen[word] > word_freq[word]:
                break
        else:
            result.append(i)

    return result

Python

Tips and Best Practices

1. Choosing Hash Table Type

# Use dict for general key-value storage
cache = {'user1': 'data1', 'user2': 'data2'}

# Use set for membership testing only
unique_items = {1, 2, 3, 4, 5}

# Use defaultdict to avoid key checks
from collections import defaultdict
word_freq = defaultdict(int)
for word in words:
    word_freq[word] += 1  # No need to check if key exists

# Use Counter for frequency counting
from collections import Counter
freq = Counter([1, 2, 2, 3, 3, 3])

# Use OrderedDict to maintain insertion order (Python 3.7+ dict maintains order)
from collections import OrderedDict
od = OrderedDict([('a', 1), ('b', 2), ('c', 3)])

# Use dict for general key-value storage
cache = {'user1': 'data1', 'user2': 'data2'}

# Use set for membership testing only
unique_items = {1, 2, 3, 4, 5}

# Use defaultdict to avoid key checks
from collections import defaultdict
word_freq = defaultdict(int)
for word in words:
    word_freq[word] += 1  # No need to check if key exists

# Use Counter for frequency counting
from collections import Counter
freq = Counter([1, 2, 2, 3, 3, 3])

# Use OrderedDict to maintain insertion order (Python 3.7+ dict maintains order)
from collections import OrderedDict
od = OrderedDict([('a', 1), ('b', 2), ('c', 3)])

Python

2. Performance Tips

# Bad: Repeated key existence checks
if key in my_dict:
    value = my_dict[key]
else:
    value = default

# Good: Use get() with default
value = my_dict.get(key, default)

# Bad: Building dict incrementally
d = {}
for key, value in items:
    d[key] = value

# Good: Use dict comprehension
d = {key: value for key, value in items}

# Bad: Multiple lookups
if key in d:
    d[key] += 1
else:
    d[key] = 1

# Good: Use setdefault or defaultdict
d[key] = d.get(key, 0) + 1

# Bad: Repeated key existence checks
if key in my_dict:
    value = my_dict[key]
else:
    value = default

# Good: Use get() with default
value = my_dict.get(key, default)

# Bad: Building dict incrementally
d = {}
for key, value in items:
    d[key] = value

# Good: Use dict comprehension
d = {key: value for key, value in items}

# Bad: Multiple lookups
if key in d:
    d[key] += 1
else:
    d[key] = 1

# Good: Use setdefault or defaultdict
d[key] = d.get(key, 0) + 1

Python

3. Memory Optimization

# Use __slots__ for objects used as keys
class Point:
    __slots__ = ['x', 'y']

    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __hash__(self):
        return hash((self.x, self.y))

    def __eq__(self, other):
        return self.x == other.x and self.y == other.y

# Use tuples instead of lists for keys (immutable and hashable)
# Good
cache = {(1, 2): 'value'}

# Bad
# cache = {[1, 2]: 'value'}  # TypeError: unhashable type

# Use __slots__ for objects used as keys
class Point:
    __slots__ = ['x', 'y']

    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __hash__(self):
        return hash((self.x, self.y))

    def __eq__(self, other):
        return self.x == other.x and self.y == other.y

# Use tuples instead of lists for keys (immutable and hashable)
# Good
cache = {(1, 2): 'value'}

# Bad
# cache = {[1, 2]: 'value'}  # TypeError: unhashable type

Python

4. Common Pitfalls

# Pitfall 1: Modifying dict while iterating
d = {'a': 1, 'b': 2, 'c': 3}
# Bad
for key in d:
    if d[key] == 2:
        del d[key]  # RuntimeError!

# Good
keys_to_delete = [key for key, value in d.items() if value == 2]
for key in keys_to_delete:
    del d[key]

# Pitfall 2: Using mutable objects as keys
# Bad
my_list = [1, 2, 3]
# d = {my_list: 'value'}  # TypeError: unhashable type

# Good: Convert to tuple
d = {tuple(my_list): 'value'}

# Pitfall 3: Not checking for KeyError
d = {'a': 1}
# value = d['b']  # KeyError!

# Good options:
value = d.get('b')  # Returns None
value = d.get('b', 0)  # Returns default
if 'b' in d:
    value = d['b']

# Pitfall 1: Modifying dict while iterating
d = {'a': 1, 'b': 2, 'c': 3}
# Bad
for key in d:
    if d[key] == 2:
        del d[key]  # RuntimeError!

# Good
keys_to_delete = [key for key, value in d.items() if value == 2]
for key in keys_to_delete:
    del d[key]

# Pitfall 2: Using mutable objects as keys
# Bad
my_list = [1, 2, 3]
# d = {my_list: 'value'}  # TypeError: unhashable type

# Good: Convert to tuple
d = {tuple(my_list): 'value'}

# Pitfall 3: Not checking for KeyError
d = {'a': 1}
# value = d['b']  # KeyError!

# Good options:
value = d.get('b')  # Returns None
value = d.get('b', 0)  # Returns default
if 'b' in d:
    value = d['b']

Python

5. Testing Strategies

def test_hash_table():
    ht = HashTable()

    # Test basic operations
    ht['key1'] = 'value1'
    assert ht['key1'] == 'value1'

    # Test update
    ht['key1'] = 'new_value'
    assert ht['key1'] == 'new_value'

    # Test deletion
    del ht['key1']
    assert 'key1' not in ht

    # Test multiple entries
    for i in range(100):
        ht[f'key{i}'] = f'value{i}'
    assert len(ht) == 100

    # Test collision handling (use same hash)
    # Depends on implementation

    # Test edge cases
    ht[''] = 'empty_key'
    ht[None] = 'none_key'
    ht[0] = 'zero_key'

def test_hash_table():
    ht = HashTable()

    # Test basic operations
    ht['key1'] = 'value1'
    assert ht['key1'] == 'value1'

    # Test update
    ht['key1'] = 'new_value'
    assert ht['key1'] == 'new_value'

    # Test deletion
    del ht['key1']
    assert 'key1' not in ht

    # Test multiple entries
    for i in range(100):
        ht[f'key{i}'] = f'value{i}'
    assert len(ht) == 100

    # Test collision handling (use same hash)
    # Depends on implementation

    # Test edge cases
    ht[''] = 'empty_key'
    ht[None] = 'none_key'
    ht[0] = 'zero_key'

Python

6. Interview Tips

Clarify requirements:
- Can keys be any type or specific types?
- Are there memory constraints?
- Expected number of operations?
- Need to handle concurrent access?
Discuss trade-offs:
- Hash function quality vs. computation time
- Memory usage vs. collision rate
- Chaining vs. open addressing
Optimize progressively:
- Start with simple solution (brute force)
- Identify where hash table helps
- Optimize space/time as needed
Consider alternatives:
- When hash table might not be best
- Array/list for small datasets
- Trees for ordered operations
- Tries for prefix searches

Summary

Hash tables are one of the most important data structures in computer science:

✅ O(1) average time for insert, delete, and search
✅ Versatile – used in countless applications
✅ Python dicts are highly optimized hash tables
✅ Master patterns – counting, grouping, caching, deduplication
✅ Understand collisions – chaining vs. open addressing
✅ Practice extensively – hash tables appear in most coding interviews

Key Takeaways:

Use dict for general key-value storage
Use set for membership testing
Use Counter for frequency counting
Use defaultdict to avoid key checks
Understand time/space complexity trade-offs
Practice common patterns and problems

Happy coding! 🚀

Discover more from Altgr Blog

Subscribe to get the latest posts sent to your email.

Table of Contents

Introduction

Key Characteristics:

Real-World Applications:

Hash Table Fundamentals

How Hash Tables Work

Core Components:

Hash Table Structure

Python Dictionary (Built-in Hash Table)

Creating Dictionaries

Basic Operations

Advanced Dictionary Methods

Iteration

Hash Functions

Built-in hash() Function

Custom Hash Functions

Making Custom Objects Hashable

Collision Resolution

1. Chaining (Separate Chaining)

2. Open Addressing

a) Linear Probing

b) Quadratic Probing

c) Double Hashing

Implementing Hash Table from Scratch

Common Operations

1. Frequency Counter

2. Group Anagrams

3. Two Sum Problem

4. First Unique Character

5. Subarray Sum Equals K

Time and Space Complexity

Average Case Complexity

Worst Case Complexity

Load Factor Impact

Hash Table Patterns

1. Hash Set Pattern

2. Hash Map for Counting

3. Hash Map for Grouping

4. Hash Map for Tracking Indices

5. Sliding Window with Hash Map

6. Hash Map for Prefix/Suffix

Advanced Concepts

1. LRU Cache Implementation

2. Consistent Hashing

3. Bloom Filter

4. Count-Min Sketch

5. HashMap with Expiry (Cache)

Practice Problems

Easy Level

1. Valid Anagram

2. Contains Duplicate

3. Single Number

4. Intersection of Two Arrays

5. Happy Number

Medium Level

1. Group Anagrams (Revisited)

2. Top K Frequent Elements

3. Longest Consecutive Sequence

4. 4Sum II

5. Find Duplicate Subtrees

Hard Level

1. Longest Substring with At Most K Distinct Characters

2. Minimum Window Substring

3. Substring with Concatenation of All Words

Tips and Best Practices

1. Choosing Hash Table Type

2. Performance Tips

3. Memory Optimization

4. Common Pitfalls

5. Testing Strategies

6. Interview Tips

Summary

Key Takeaways:

Related

Discover more from Altgr Blog

Related Posts

Comprehensive Guide to API Endpoint Creation

Recursion in Python – DSA

Sorting Algorithms in Python – DSA

Leave a Reply Cancel reply