Table of Contents
- Introduction
- Hash Table Fundamentals
- Python Dictionary (Built-in Hash Table)
- Hash Functions
- Collision Resolution
- Implementing Hash Table from Scratch
- Common Operations
- Time and Space Complexity
- Hash Table Patterns
- Advanced Concepts
- Practice Problems
- Tips and Best Practices
Introduction
A Hash Table (or Hash Map) is a data structure that implements an associative array abstract data type, mapping keys to values. It uses a hash function to compute an index (hash code) into an array of buckets or slots, from which the desired value can be found.
Key Characteristics:
- Fast Access: O(1) average case for search, insert, and delete
- Key-Value Pairs: Stores data as key-value associations
- Hash Function: Converts keys into array indices
- Collision Handling: Manages cases where multiple keys hash to same index
Real-World Applications:
- Database indexing
- Caching systems (LRU Cache, Redis)
- Symbol tables in compilers
- Password verification
- Blockchain (cryptographic hashing)
- Spell checkers
- Router tables
- Implementing sets and maps
Hash Table Fundamentals
How Hash Tables Work
Key → Hash Function → Index → Value
Example:
"apple" → hash("apple") → 42 → 1.50
"banana" → hash("banana") → 17 → 0.75PythonCore Components:
- Hash Function: Converts key to array index
- Bucket Array: Stores the key-value pairs
- Collision Resolution: Handles hash collisions
- Load Factor: Ratio of elements to bucket size
Hash Table Structure
class HashTable:
def __init__(self, size=10):
self.size = size
self.table = [[] for _ in range(size)] # Using chaining
def _hash(self, key):
"""Hash function to compute index"""
return hash(key) % self.sizePythonPython Dictionary (Built-in Hash Table)
Python’s dict is a highly optimized hash table implementation.
Creating Dictionaries
# Empty dictionary
empty_dict = {}
empty_dict = dict()
# Dictionary with initial values
person = {
'name': 'Alice',
'age': 30,
'city': 'New York'
}
# Using dict() constructor
person = dict(name='Alice', age=30, city='New York')
# From list of tuples
items = [('a', 1), ('b', 2), ('c', 3)]
d = dict(items)
# Dictionary comprehension
squares = {x: x**2 for x in range(10)}
# From two lists
keys = ['a', 'b', 'c']
values = [1, 2, 3]
d = dict(zip(keys, values))PythonBasic Operations
# Access elements
person = {'name': 'Alice', 'age': 30}
print(person['name']) # 'Alice'
print(person.get('age')) # 30
print(person.get('salary', 0)) # 0 (default value)
# Add/Update elements
person['city'] = 'Boston' # Add new key
person['age'] = 31 # Update existing key
person.update({'job': 'Engineer', 'age': 32})
# Delete elements
del person['city']
age = person.pop('age') # Remove and return value
person.popitem() # Remove and return last (key, value) pair
person.clear() # Remove all items
# Check key existence
if 'name' in person:
print("Name exists")
# Get all keys, values, items
keys = person.keys()
values = person.values()
items = person.items()
# Length
length = len(person)PythonAdvanced Dictionary Methods
# setdefault - get value or set default if key doesn't exist
d = {'a': 1}
value = d.setdefault('b', 0) # Returns 0 and sets d['b'] = 0
# fromkeys - create dict from sequence of keys
keys = ['a', 'b', 'c']
d = dict.fromkeys(keys, 0) # {'a': 0, 'b': 0, 'c': 0}
# copy - shallow copy
d2 = d.copy()
# Merge dictionaries (Python 3.9+)
d1 = {'a': 1, 'b': 2}
d2 = {'b': 3, 'c': 4}
merged = d1 | d2 # {'a': 1, 'b': 3, 'c': 4}
# Update in place
d1 |= d2
# Older merge methods
merged = {**d1, **d2}
merged = dict(d1, **d2)PythonIteration
person = {'name': 'Alice', 'age': 30, 'city': 'Boston'}
# Iterate over keys
for key in person:
print(key)
for key in person.keys():
print(key)
# Iterate over values
for value in person.values():
print(value)
# Iterate over key-value pairs
for key, value in person.items():
print(f"{key}: {value}")
# Enumerate with index
for i, (key, value) in enumerate(person.items()):
print(f"{i}: {key} = {value}")PythonHash Functions
A good hash function should:
- Be deterministic (same input → same output)
- Uniformly distribute keys
- Be fast to compute
- Minimize collisions
Built-in hash() Function
# Works with immutable types
print(hash(42)) # Integer
print(hash("hello")) # String
print(hash((1, 2, 3))) # Tuple
# Not hashable (mutable types)
# hash([1, 2, 3]) # TypeError: unhashable type: 'list'
# hash({'a': 1}) # TypeError: unhashable type: 'dict'PythonCustom Hash Functions
# Simple hash function for strings
def simple_hash(key, size):
"""Sum of ASCII values mod size"""
return sum(ord(c) for c in str(key)) % size
# Polynomial rolling hash
def polynomial_hash(key, size, prime=31):
"""Better distribution using polynomial"""
hash_value = 0
for char in str(key):
hash_value = (hash_value * prime + ord(char)) % size
return hash_value
# DJB2 hash (popular string hash)
def djb2_hash(key, size):
hash_value = 5381
for char in str(key):
hash_value = ((hash_value << 5) + hash_value) + ord(char)
return hash_value % size
# Using Python's built-in hash
def builtin_hash(key, size):
return hash(key) % sizePythonMaking Custom Objects Hashable
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def __hash__(self):
"""Make object hashable"""
return hash((self.name, self.age))
def __eq__(self, other):
"""Required for hash to work properly"""
if not isinstance(other, Person):
return False
return self.name == other.name and self.age == other.age
def __repr__(self):
return f"Person('{self.name}', {self.age})"
# Now can use as dictionary key
p1 = Person("Alice", 30)
p2 = Person("Alice", 30)
d = {p1: "Engineer"}
print(d[p2]) # Works! Returns "Engineer"PythonCollision Resolution
When two keys hash to the same index, we have a collision.
1. Chaining (Separate Chaining)
Store multiple elements at the same index using a list/linked list.
class HashTableChaining:
def __init__(self, size=10):
self.size = size
self.table = [[] for _ in range(size)]
def _hash(self, key):
return hash(key) % self.size
def insert(self, key, value):
"""Insert or update key-value pair"""
hash_index = self._hash(key)
bucket = self.table[hash_index]
# Update if key exists
for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value)
return
# Add new key-value pair
bucket.append((key, value))
def get(self, key):
"""Retrieve value by key"""
hash_index = self._hash(key)
bucket = self.table[hash_index]
for k, v in bucket:
if k == key:
return v
raise KeyError(f"Key '{key}' not found")
def delete(self, key):
"""Remove key-value pair"""
hash_index = self._hash(key)
bucket = self.table[hash_index]
for i, (k, v) in enumerate(bucket):
if k == key:
del bucket[i]
return v
raise KeyError(f"Key '{key}' not found")
def __contains__(self, key):
"""Check if key exists"""
try:
self.get(key)
return True
except KeyError:
return False
def __str__(self):
items = []
for bucket in self.table:
for key, value in bucket:
items.append(f"{key}: {value}")
return "{" + ", ".join(items) + "}"Python2. Open Addressing
Store all elements in the hash table itself (no chaining).
a) Linear Probing
class HashTableLinearProbing:
def __init__(self, size=10):
self.size = size
self.keys = [None] * size
self.values = [None] * size
self.count = 0
def _hash(self, key):
return hash(key) % self.size
def _probe(self, index):
"""Linear probing: (index + 1) % size"""
return (index + 1) % self.size
def _resize(self):
"""Resize when load factor > 0.7"""
old_keys = self.keys
old_values = self.values
self.size *= 2
self.keys = [None] * self.size
self.values = [None] * self.size
self.count = 0
for i in range(len(old_keys)):
if old_keys[i] is not None:
self.insert(old_keys[i], old_values[i])
def insert(self, key, value):
"""Insert or update key-value pair"""
if self.count / self.size > 0.7:
self._resize()
index = self._hash(key)
while self.keys[index] is not None:
if self.keys[index] == key:
# Update existing key
self.values[index] = value
return
index = self._probe(index)
# Insert new key
self.keys[index] = key
self.values[index] = value
self.count += 1
def get(self, key):
"""Retrieve value by key"""
index = self._hash(key)
while self.keys[index] is not None:
if self.keys[index] == key:
return self.values[index]
index = self._probe(index)
raise KeyError(f"Key '{key}' not found")
def delete(self, key):
"""Delete key-value pair"""
index = self._hash(key)
while self.keys[index] is not None:
if self.keys[index] == key:
self.keys[index] = None
self.values[index] = None
self.count -= 1
# Rehash subsequent entries
self._rehash_cluster(index)
return
index = self._probe(index)
raise KeyError(f"Key '{key}' not found")
def _rehash_cluster(self, start):
"""Rehash entries after deletion"""
index = self._probe(start)
while self.keys[index] is not None:
key = self.keys[index]
value = self.values[index]
self.keys[index] = None
self.values[index] = None
self.count -= 1
self.insert(key, value)
index = self._probe(index)Pythonb) Quadratic Probing
class HashTableQuadraticProbing:
def __init__(self, size=10):
self.size = size
self.keys = [None] * size
self.values = [None] * size
def _hash(self, key):
return hash(key) % self.size
def _probe(self, index, i):
"""Quadratic probing: (index + i²) % size"""
return (index + i * i) % self.size
def insert(self, key, value):
index = self._hash(key)
i = 0
while self.keys[index] is not None and self.keys[index] != key:
i += 1
index = self._probe(self._hash(key), i)
self.keys[index] = key
self.values[index] = valuePythonc) Double Hashing
class HashTableDoubleHashing:
def __init__(self, size=10):
self.size = size
self.keys = [None] * size
self.values = [None] * size
def _hash1(self, key):
return hash(key) % self.size
def _hash2(self, key):
"""Second hash function"""
return 7 - (hash(key) % 7) # Prime number less than size
def _probe(self, key, i):
"""Double hashing: (h1(key) + i * h2(key)) % size"""
return (self._hash1(key) + i * self._hash2(key)) % self.size
def insert(self, key, value):
i = 0
index = self._probe(key, i)
while self.keys[index] is not None and self.keys[index] != key:
i += 1
index = self._probe(key, i)
self.keys[index] = key
self.values[index] = valuePythonImplementing Hash Table from Scratch
Complete implementation with all features:
class HashTable:
def __init__(self, capacity=10):
self.capacity = capacity
self.size = 0
self.table = [[] for _ in range(capacity)]
def _hash(self, key):
"""Hash function using Python's built-in hash"""
return hash(key) % self.capacity
def _load_factor(self):
"""Calculate current load factor"""
return self.size / self.capacity
def _resize(self):
"""Resize table when load factor exceeds 0.7"""
old_table = self.table
self.capacity *= 2
self.table = [[] for _ in range(self.capacity)]
self.size = 0
for bucket in old_table:
for key, value in bucket:
self.insert(key, value)
def insert(self, key, value):
"""Insert or update key-value pair - O(1) average"""
# Resize if needed
if self._load_factor() > 0.7:
self._resize()
index = self._hash(key)
bucket = self.table[index]
# Update if key exists
for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value)
return
# Add new key-value pair
bucket.append((key, value))
self.size += 1
def get(self, key):
"""Get value by key - O(1) average"""
index = self._hash(key)
bucket = self.table[index]
for k, v in bucket:
if k == key:
return v
raise KeyError(f"Key '{key}' not found")
def delete(self, key):
"""Delete key-value pair - O(1) average"""
index = self._hash(key)
bucket = self.table[index]
for i, (k, v) in enumerate(bucket):
if k == key:
del bucket[i]
self.size -= 1
return v
raise KeyError(f"Key '{key}' not found")
def contains(self, key):
"""Check if key exists - O(1) average"""
try:
self.get(key)
return True
except KeyError:
return False
def keys(self):
"""Get all keys"""
all_keys = []
for bucket in self.table:
for key, _ in bucket:
all_keys.append(key)
return all_keys
def values(self):
"""Get all values"""
all_values = []
for bucket in self.table:
for _, value in bucket:
all_values.append(value)
return all_values
def items(self):
"""Get all key-value pairs"""
all_items = []
for bucket in self.table:
for key, value in bucket:
all_items.append((key, value))
return all_items
def clear(self):
"""Remove all items"""
self.table = [[] for _ in range(self.capacity)]
self.size = 0
def __len__(self):
"""Return number of items"""
return self.size
def __contains__(self, key):
"""Support 'in' operator"""
return self.contains(key)
def __getitem__(self, key):
"""Support dict[key] syntax"""
return self.get(key)
def __setitem__(self, key, value):
"""Support dict[key] = value syntax"""
self.insert(key, value)
def __delitem__(self, key):
"""Support del dict[key] syntax"""
self.delete(key)
def __str__(self):
"""String representation"""
items = [f"{k!r}: {v!r}" for k, v in self.items()]
return "{" + ", ".join(items) + "}"
def __repr__(self):
return f"HashTable({self})"
# Usage
ht = HashTable()
ht['name'] = 'Alice'
ht['age'] = 30
print(ht['name']) # Alice
print('age' in ht) # True
del ht['age']
print(len(ht)) # 1PythonCommon Operations
1. Frequency Counter
def count_frequency(arr):
"""Count frequency of elements - O(n)"""
freq = {}
for item in arr:
freq[item] = freq.get(item, 0) + 1
return freq
# Using defaultdict
from collections import defaultdict
def count_frequency_defaultdict(arr):
freq = defaultdict(int)
for item in arr:
freq[item] += 1
return dict(freq)
# Using Counter
from collections import Counter
def count_frequency_counter(arr):
return Counter(arr)
# Example
arr = [1, 2, 2, 3, 3, 3, 4]
print(count_frequency(arr)) # {1: 1, 2: 2, 3: 3, 4: 1}Python2. Group Anagrams
def group_anagrams(words):
"""
Group words that are anagrams - O(n * k log k)
where n is number of words, k is max word length
"""
groups = {}
for word in words:
# Sort characters as key
key = ''.join(sorted(word))
if key not in groups:
groups[key] = []
groups[key].append(word)
return list(groups.values())
# Using defaultdict
from collections import defaultdict
def group_anagrams_v2(words):
groups = defaultdict(list)
for word in words:
key = ''.join(sorted(word))
groups[key].append(word)
return list(groups.values())
# Example
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(words))
# [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]Python3. Two Sum Problem
def two_sum(nums, target):
"""
Find two numbers that sum to target - O(n)
"""
seen = {}
for i, num in enumerate(nums):
complement = target - num
if complement in seen:
return [seen[complement], i]
seen[num] = i
return []
# Example
nums = [2, 7, 11, 15]
target = 9
print(two_sum(nums, target)) # [0, 1]Python4. First Unique Character
def first_unique_char(s):
"""
Find first non-repeating character - O(n)
"""
freq = {}
# Count frequencies
for char in s:
freq[char] = freq.get(char, 0) + 1
# Find first with frequency 1
for i, char in enumerate(s):
if freq[char] == 1:
return i
return -1
# Example
s = "leetcode"
print(first_unique_char(s)) # 0 ('l')Python5. Subarray Sum Equals K
def subarray_sum_k(nums, k):
"""
Count subarrays with sum equal to k - O(n)
"""
count = 0
prefix_sum = 0
sum_count = {0: 1} # prefix_sum: frequency
for num in nums:
prefix_sum += num
# Check if (prefix_sum - k) exists
if (prefix_sum - k) in sum_count:
count += sum_count[prefix_sum - k]
# Add current prefix_sum
sum_count[prefix_sum] = sum_count.get(prefix_sum, 0) + 1
return count
# Example
nums = [1, 1, 1]
k = 2
print(subarray_sum_k(nums, k)) # 2PythonTime and Space Complexity
Average Case Complexity
| Operation | Time | Space |
|---|---|---|
| Insert | O(1) | O(1) |
| Delete | O(1) | O(1) |
| Search | O(1) | O(1) |
| Access by key | O(1) | O(1) |
Worst Case Complexity
| Operation | Time | Space |
|---|---|---|
| Insert | O(n) | O(1) |
| Delete | O(n) | O(1) |
| Search | O(n) | O(1) |
Note: Worst case O(n) occurs when all keys hash to same index (all collisions).
Load Factor Impact
Load Factor = Number of Elements / Table Size
Optimal Load Factor: 0.7 - 0.75
- Too low: Wastes memory
- Too high: More collisions, slower operationsPythonHash Table Patterns
1. Hash Set Pattern
# Check for duplicates - O(n)
def contains_duplicate(nums):
seen = set()
for num in nums:
if num in seen:
return True
seen.add(num)
return False
# Alternative
def contains_duplicate_v2(nums):
return len(nums) != len(set(nums))Python2. Hash Map for Counting
# Most frequent element
def most_frequent(nums):
"""O(n) time, O(n) space"""
freq = {}
max_count = 0
result = None
for num in nums:
freq[num] = freq.get(num, 0) + 1
if freq[num] > max_count:
max_count = freq[num]
result = num
return result
# Using Counter
from collections import Counter
def most_frequent_v2(nums):
return Counter(nums).most_common(1)[0][0]Python3. Hash Map for Grouping
# Group elements by property
def group_by_length(words):
"""Group words by length"""
groups = {}
for word in words:
length = len(word)
if length not in groups:
groups[length] = []
groups[length].append(word)
return groups
# Using defaultdict
from collections import defaultdict
def group_by_length_v2(words):
groups = defaultdict(list)
for word in words:
groups[len(word)].append(word)
return dict(groups)Python4. Hash Map for Tracking Indices
# Find all indices of target sum
def two_sum_all(nums, target):
"""Find all pairs that sum to target"""
seen = {}
pairs = []
for i, num in enumerate(nums):
complement = target - num
if complement in seen:
for j in seen[complement]:
pairs.append([j, i])
if num not in seen:
seen[num] = []
seen[num].append(i)
return pairsPython5. Sliding Window with Hash Map
# Longest substring without repeating characters
def longest_unique_substring(s):
"""O(n) time, O(min(m, n)) space"""
char_index = {}
max_length = 0
start = 0
for end, char in enumerate(s):
if char in char_index and char_index[char] >= start:
start = char_index[char] + 1
char_index[char] = end
max_length = max(max_length, end - start + 1)
return max_length
# Example
s = "abcabcbb"
print(longest_unique_substring(s)) # 3 ("abc")Python6. Hash Map for Prefix/Suffix
# Longest common prefix using hash map
def longest_common_prefix(strs):
if not strs:
return ""
prefix_map = {}
for i, char in enumerate(strs[0]):
prefix = strs[0][:i+1]
if all(s.startswith(prefix) for s in strs):
prefix_map[len(prefix)] = prefix
else:
break
if not prefix_map:
return ""
max_len = max(prefix_map.keys())
return prefix_map[max_len]PythonAdvanced Concepts
1. LRU Cache Implementation
from collections import OrderedDict
class LRUCache:
"""Least Recently Used Cache using OrderedDict"""
def __init__(self, capacity):
self.cache = OrderedDict()
self.capacity = capacity
def get(self, key):
"""Get value and mark as recently used"""
if key not in self.cache:
return -1
# Move to end (most recent)
self.cache.move_to_end(key)
return self.cache[key]
def put(self, key, value):
"""Put key-value and mark as recently used"""
if key in self.cache:
# Update and move to end
self.cache.move_to_end(key)
self.cache[key] = value
# Remove least recently used if over capacity
if len(self.cache) > self.capacity:
self.cache.popitem(last=False)
# Manual implementation with doubly linked list
class Node:
def __init__(self, key, value):
self.key = key
self.value = value
self.prev = None
self.next = None
class LRUCacheManual:
def __init__(self, capacity):
self.capacity = capacity
self.cache = {} # key -> Node
self.head = Node(0, 0) # Dummy head
self.tail = Node(0, 0) # Dummy tail
self.head.next = self.tail
self.tail.prev = self.head
def _add_node(self, node):
"""Add node right after head"""
node.prev = self.head
node.next = self.head.next
self.head.next.prev = node
self.head.next = node
def _remove_node(self, node):
"""Remove node from list"""
prev = node.prev
next = node.next
prev.next = next
next.prev = prev
def _move_to_head(self, node):
"""Move node to head (most recent)"""
self._remove_node(node)
self._add_node(node)
def _pop_tail(self):
"""Remove least recently used node"""
node = self.tail.prev
self._remove_node(node)
return node
def get(self, key):
if key not in self.cache:
return -1
node = self.cache[key]
self._move_to_head(node)
return node.value
def put(self, key, value):
if key in self.cache:
node = self.cache[key]
node.value = value
self._move_to_head(node)
else:
node = Node(key, value)
self.cache[key] = node
self._add_node(node)
if len(self.cache) > self.capacity:
tail = self._pop_tail()
del self.cache[tail.key]Python2. Consistent Hashing
import bisect
import hashlib
class ConsistentHashing:
"""Used in distributed systems for load balancing"""
def __init__(self, nodes=None, replicas=3):
self.replicas = replicas
self.ring = {} # hash -> node
self.sorted_keys = []
if nodes:
for node in nodes:
self.add_node(node)
def _hash(self, key):
"""Hash function using MD5"""
return int(hashlib.md5(str(key).encode()).hexdigest(), 16)
def add_node(self, node):
"""Add node to the ring"""
for i in range(self.replicas):
virtual_key = f"{node}:{i}"
hash_key = self._hash(virtual_key)
self.ring[hash_key] = node
bisect.insort(self.sorted_keys, hash_key)
def remove_node(self, node):
"""Remove node from the ring"""
for i in range(self.replicas):
virtual_key = f"{node}:{i}"
hash_key = self._hash(virtual_key)
del self.ring[hash_key]
self.sorted_keys.remove(hash_key)
def get_node(self, key):
"""Get node responsible for key"""
if not self.ring:
return None
hash_key = self._hash(key)
idx = bisect.bisect(self.sorted_keys, hash_key)
# Wrap around if necessary
if idx == len(self.sorted_keys):
idx = 0
return self.ring[self.sorted_keys[idx]]Python3. Bloom Filter
import hashlib
class BloomFilter:
"""Probabilistic data structure for membership testing"""
def __init__(self, size=1000, hash_count=3):
self.size = size
self.hash_count = hash_count
self.bit_array = [False] * size
def _hashes(self, item):
"""Generate multiple hash values"""
hashes = []
for i in range(self.hash_count):
hash_input = f"{item}{i}".encode()
hash_value = int(hashlib.md5(hash_input).hexdigest(), 16)
hashes.append(hash_value % self.size)
return hashes
def add(self, item):
"""Add item to filter"""
for hash_value in self._hashes(item):
self.bit_array[hash_value] = True
def contains(self, item):
"""Check if item might be in filter"""
return all(self.bit_array[h] for h in self._hashes(item))
# Note: False positives possible, false negatives impossiblePython4. Count-Min Sketch
import hashlib
class CountMinSketch:
"""Probabilistic frequency counter for streams"""
def __init__(self, width=1000, depth=5):
self.width = width
self.depth = depth
self.table = [[0] * width for _ in range(depth)]
def _hashes(self, item):
"""Generate hash values for each row"""
hashes = []
for i in range(self.depth):
hash_input = f"{item}{i}".encode()
hash_value = int(hashlib.md5(hash_input).hexdigest(), 16)
hashes.append(hash_value % self.width)
return hashes
def add(self, item, count=1):
"""Add item to sketch"""
for i, hash_value in enumerate(self._hashes(item)):
self.table[i][hash_value] += count
def estimate(self, item):
"""Estimate frequency of item"""
return min(
self.table[i][h]
for i, h in enumerate(self._hashes(item))
)Python5. HashMap with Expiry (Cache)
import time
class ExpiringHashMap:
"""HashMap with time-based expiration"""
def __init__(self):
self.store = {} # key -> (value, expiry_time)
def put(self, key, value, ttl=None):
"""
Put key-value with optional TTL (time to live) in seconds
"""
expiry = None
if ttl is not None:
expiry = time.time() + ttl
self.store[key] = (value, expiry)
def get(self, key):
"""Get value if not expired"""
if key not in self.store:
return None
value, expiry = self.store[key]
# Check if expired
if expiry is not None and time.time() > expiry:
del self.store[key]
return None
return value
def cleanup(self):
"""Remove all expired entries"""
current_time = time.time()
expired_keys = [
key for key, (_, expiry) in self.store.items()
if expiry is not None and current_time > expiry
]
for key in expired_keys:
del self.store[key]PythonPractice Problems
Easy Level
1. Valid Anagram
def is_anagram(s, t):
"""
Check if two strings are anagrams
Time: O(n), Space: O(1) - limited to 26 letters
"""
if len(s) != len(t):
return False
char_count = {}
for char in s:
char_count[char] = char_count.get(char, 0) + 1
for char in t:
if char not in char_count:
return False
char_count[char] -= 1
if char_count[char] < 0:
return False
return all(count == 0 for count in char_count.values())
# Alternative using Counter
from collections import Counter
def is_anagram_v2(s, t):
return Counter(s) == Counter(t)
# Alternative using sorting
def is_anagram_v3(s, t):
return sorted(s) == sorted(t)Python2. Contains Duplicate
def contains_duplicate(nums):
"""
Check if array contains any duplicates
Time: O(n), Space: O(n)
"""
seen = set()
for num in nums:
if num in seen:
return True
seen.add(num)
return False
# One-liner
def contains_duplicate_v2(nums):
return len(nums) != len(set(nums))Python3. Single Number
def single_number(nums):
"""
Find number that appears once (others appear twice)
Time: O(n), Space: O(n)
"""
count = {}
for num in nums:
count[num] = count.get(num, 0) + 1
for num, freq in count.items():
if freq == 1:
return num
# Optimized using XOR - O(1) space
def single_number_xor(nums):
result = 0
for num in nums:
result ^= num
return resultPython4. Intersection of Two Arrays
def intersection(nums1, nums2):
"""
Find intersection of two arrays
Time: O(n + m), Space: O(min(n, m))
"""
return list(set(nums1) & set(nums2))
# Alternative
def intersection_v2(nums1, nums2):
set1 = set(nums1)
result = set()
for num in nums2:
if num in set1:
result.add(num)
return list(result)Python5. Happy Number
def is_happy(n):
"""
Determine if number is happy number
Time: O(log n), Space: O(log n)
"""
seen = set()
while n != 1 and n not in seen:
seen.add(n)
n = sum(int(digit) ** 2 for digit in str(n))
return n == 1PythonMedium Level
1. Group Anagrams (Revisited)
def group_anagrams(strs):
"""
Time: O(n * k log k), Space: O(n * k)
"""
from collections import defaultdict
groups = defaultdict(list)
for s in strs:
# Use sorted tuple as key
key = tuple(sorted(s))
groups[key].append(s)
return list(groups.values())
# Optimized using character count
def group_anagrams_v2(strs):
from collections import defaultdict
groups = defaultdict(list)
for s in strs:
# Use character count as key
count = [0] * 26
for char in s:
count[ord(char) - ord('a')] += 1
groups[tuple(count)].append(s)
return list(groups.values())Python2. Top K Frequent Elements
def top_k_frequent(nums, k):
"""
Time: O(n), Space: O(n)
"""
from collections import Counter
count = Counter(nums)
return [num for num, _ in count.most_common(k)]
# Using bucket sort
def top_k_frequent_v2(nums, k):
count = {}
for num in nums:
count[num] = count.get(num, 0) + 1
# Bucket sort by frequency
buckets = [[] for _ in range(len(nums) + 1)]
for num, freq in count.items():
buckets[freq].append(num)
# Collect top k
result = []
for i in range(len(buckets) - 1, -1, -1):
result.extend(buckets[i])
if len(result) >= k:
return result[:k]
return resultPython3. Longest Consecutive Sequence
def longest_consecutive(nums):
"""
Time: O(n), Space: O(n)
"""
if not nums:
return 0
num_set = set(nums)
longest = 0
for num in num_set:
# Only count from sequence start
if num - 1 not in num_set:
current_num = num
current_length = 1
while current_num + 1 in num_set:
current_num += 1
current_length += 1
longest = max(longest, current_length)
return longestPython4. 4Sum II
def four_sum_count(nums1, nums2, nums3, nums4):
"""
Count tuples (i, j, k, l) where nums1[i] + nums2[j] + nums3[k] + nums4[l] == 0
Time: O(n²), Space: O(n²)
"""
# Store sums of nums1 and nums2
sum_count = {}
for a in nums1:
for b in nums2:
sum_ab = a + b
sum_count[sum_ab] = sum_count.get(sum_ab, 0) + 1
# Find complement in nums3 and nums4
count = 0
for c in nums3:
for d in nums4:
target = -(c + d)
count += sum_count.get(target, 0)
return countPython5. Find Duplicate Subtrees
class TreeNode:
def __init__(self, val=0, left=None, right=None):
self.val = val
self.left = left
self.right = right
def find_duplicate_subtrees(root):
"""
Find all duplicate subtrees in binary tree
Time: O(n), Space: O(n)
"""
from collections import defaultdict
subtrees = defaultdict(int)
duplicates = []
def serialize(node):
if not node:
return "#"
# Serialize subtree
s = f"{node.val},{serialize(node.left)},{serialize(node.right)}"
subtrees[s] += 1
if subtrees[s] == 2:
duplicates.append(node)
return s
serialize(root)
return duplicatesPythonHard Level
1. Longest Substring with At Most K Distinct Characters
def length_of_longest_substring_k_distinct(s, k):
"""
Time: O(n), Space: O(k)
"""
if k == 0:
return 0
char_count = {}
max_length = 0
left = 0
for right, char in enumerate(s):
char_count[char] = char_count.get(char, 0) + 1
while len(char_count) > k:
left_char = s[left]
char_count[left_char] -= 1
if char_count[left_char] == 0:
del char_count[left_char]
left += 1
max_length = max(max_length, right - left + 1)
return max_lengthPython2. Minimum Window Substring
def min_window(s, t):
"""
Find minimum window in s that contains all characters of t
Time: O(n + m), Space: O(m)
"""
if not s or not t:
return ""
from collections import Counter
target_count = Counter(t)
required = len(target_count)
window_count = {}
formed = 0
left = 0
min_len = float('inf')
min_window = (0, 0)
for right, char in enumerate(s):
window_count[char] = window_count.get(char, 0) + 1
if char in target_count and window_count[char] == target_count[char]:
formed += 1
while left <= right and formed == required:
# Update minimum window
if right - left + 1 < min_len:
min_len = right - left + 1
min_window = (left, right)
# Shrink window
left_char = s[left]
window_count[left_char] -= 1
if left_char in target_count and window_count[left_char] < target_count[left_char]:
formed -= 1
left += 1
left, right = min_window
return s[left:right + 1] if min_len != float('inf') else ""Python3. Substring with Concatenation of All Words
def find_substring(s, words):
"""
Find all starting indices of concatenated substrings
Time: O(n * m * k), Space: O(m)
"""
if not s or not words:
return []
from collections import Counter
word_len = len(words[0])
word_count = len(words)
total_len = word_len * word_count
word_freq = Counter(words)
result = []
for i in range(len(s) - total_len + 1):
seen = {}
for j in range(word_count):
start = i + j * word_len
word = s[start:start + word_len]
if word not in word_freq:
break
seen[word] = seen.get(word, 0) + 1
if seen[word] > word_freq[word]:
break
else:
result.append(i)
return resultPythonTips and Best Practices
1. Choosing Hash Table Type
# Use dict for general key-value storage
cache = {'user1': 'data1', 'user2': 'data2'}
# Use set for membership testing only
unique_items = {1, 2, 3, 4, 5}
# Use defaultdict to avoid key checks
from collections import defaultdict
word_freq = defaultdict(int)
for word in words:
word_freq[word] += 1 # No need to check if key exists
# Use Counter for frequency counting
from collections import Counter
freq = Counter([1, 2, 2, 3, 3, 3])
# Use OrderedDict to maintain insertion order (Python 3.7+ dict maintains order)
from collections import OrderedDict
od = OrderedDict([('a', 1), ('b', 2), ('c', 3)])Python2. Performance Tips
# Bad: Repeated key existence checks
if key in my_dict:
value = my_dict[key]
else:
value = default
# Good: Use get() with default
value = my_dict.get(key, default)
# Bad: Building dict incrementally
d = {}
for key, value in items:
d[key] = value
# Good: Use dict comprehension
d = {key: value for key, value in items}
# Bad: Multiple lookups
if key in d:
d[key] += 1
else:
d[key] = 1
# Good: Use setdefault or defaultdict
d[key] = d.get(key, 0) + 1Python3. Memory Optimization
# Use __slots__ for objects used as keys
class Point:
__slots__ = ['x', 'y']
def __init__(self, x, y):
self.x = x
self.y = y
def __hash__(self):
return hash((self.x, self.y))
def __eq__(self, other):
return self.x == other.x and self.y == other.y
# Use tuples instead of lists for keys (immutable and hashable)
# Good
cache = {(1, 2): 'value'}
# Bad
# cache = {[1, 2]: 'value'} # TypeError: unhashable typePython4. Common Pitfalls
# Pitfall 1: Modifying dict while iterating
d = {'a': 1, 'b': 2, 'c': 3}
# Bad
for key in d:
if d[key] == 2:
del d[key] # RuntimeError!
# Good
keys_to_delete = [key for key, value in d.items() if value == 2]
for key in keys_to_delete:
del d[key]
# Pitfall 2: Using mutable objects as keys
# Bad
my_list = [1, 2, 3]
# d = {my_list: 'value'} # TypeError: unhashable type
# Good: Convert to tuple
d = {tuple(my_list): 'value'}
# Pitfall 3: Not checking for KeyError
d = {'a': 1}
# value = d['b'] # KeyError!
# Good options:
value = d.get('b') # Returns None
value = d.get('b', 0) # Returns default
if 'b' in d:
value = d['b']Python5. Testing Strategies
def test_hash_table():
ht = HashTable()
# Test basic operations
ht['key1'] = 'value1'
assert ht['key1'] == 'value1'
# Test update
ht['key1'] = 'new_value'
assert ht['key1'] == 'new_value'
# Test deletion
del ht['key1']
assert 'key1' not in ht
# Test multiple entries
for i in range(100):
ht[f'key{i}'] = f'value{i}'
assert len(ht) == 100
# Test collision handling (use same hash)
# Depends on implementation
# Test edge cases
ht[''] = 'empty_key'
ht[None] = 'none_key'
ht[0] = 'zero_key'Python6. Interview Tips
- Clarify requirements:
- Can keys be any type or specific types?
- Are there memory constraints?
- Expected number of operations?
- Need to handle concurrent access?
- Discuss trade-offs:
- Hash function quality vs. computation time
- Memory usage vs. collision rate
- Chaining vs. open addressing
- Optimize progressively:
- Start with simple solution (brute force)
- Identify where hash table helps
- Optimize space/time as needed
- Consider alternatives:
- When hash table might not be best
- Array/list for small datasets
- Trees for ordered operations
- Tries for prefix searches
Summary
Hash tables are one of the most important data structures in computer science:
✅ O(1) average time for insert, delete, and search
✅ Versatile – used in countless applications
✅ Python dicts are highly optimized hash tables
✅ Master patterns – counting, grouping, caching, deduplication
✅ Understand collisions – chaining vs. open addressing
✅ Practice extensively – hash tables appear in most coding interviews
Key Takeaways:
- Use
dictfor general key-value storage - Use
setfor membership testing - Use
Counterfor frequency counting - Use
defaultdictto avoid key checks - Understand time/space complexity trade-offs
- Practice common patterns and problems
Happy coding! 🚀
Discover more from Altgr Blog
Subscribe to get the latest posts sent to your email.
