Table of Contents
- Introduction
- String Basics in Python
- String Manipulation Techniques
- String Searching Algorithms
- Palindrome Problems
- Anagram Problems
- String Compression
- Trie (Prefix Tree)
- Suffix Arrays and Trees
- Regular Expressions
- Common String Interview Problems
- Advanced String Algorithms
- Time and Space Complexity
- Best Practices and Tips
Introduction
String algorithms are fundamental to computer science and form the backbone of many applications including text processing, search engines, bioinformatics, and data compression. In Python, strings are immutable sequences of Unicode characters, making them efficient for read operations but requiring careful handling for modifications.
Key Characteristics of Strings in Python
- Immutable: Cannot be modified in-place
- Sequence Type: Supports indexing, slicing, and iteration
- Unicode Support: Handles international characters
- Rich Built-in Methods: Extensive string manipulation capabilities
Why String Algorithms Matter
- Text Processing: Search, replace, parse text data
- Pattern Matching: Find patterns in large texts
- Data Validation: Check formats, validate inputs
- Compression: Reduce storage space
- Bioinformatics: DNA sequence analysis
- Search Engines: Indexing and querying
String Basics in Python
Creating Strings
# Different ways to create strings
single_quotes = 'Hello World'
double_quotes = "Hello World"
triple_quotes = """Multi-line
string"""
# String with special characters
escaped = "He said, \"Hello!\""
raw_string = r"C:\Users\Path" # Raw string
# Unicode strings
unicode_str = "Hello 世界"PythonString Properties
s = "Python"
print(len(s)) # 6 - length
print(s[0]) # 'P' - indexing
print(s[-1]) # 'n' - negative indexing
print(s[1:4]) # 'yth' - slicing
print('P' in s) # True - membership
print(s * 2) # 'PythonPython' - repetitionPythonString Immutability
s = "Hello"
# s[0] = 'h' # TypeError: 'str' object does not support item assignment
# Instead, create new string
s = 'h' + s[1:] # 'hello'PythonString Manipulation Techniques
Basic Operations
# Concatenation
s1 = "Hello"
s2 = "World"
result = s1 + " " + s2 # "Hello World"
# Joining strings
words = ["Hello", "World", "Python"]
sentence = " ".join(words) # "Hello World Python"
# Splitting
text = "Hello World Python"
words = text.split() # ['Hello', 'World', 'Python']
parts = text.split(" ", 1) # ['Hello', 'World Python']PythonCase Conversion
s = "Hello World"
print(s.upper()) # "HELLO WORLD"
print(s.lower()) # "hello world"
print(s.title()) # "Hello World"
print(s.capitalize()) # "Hello world"
print(s.swapcase()) # "hELLO wORLD"PythonString Formatting
# Old style formatting
name = "Alice"
age = 30
print("My name is %s and I am %d years old" % (name, age))
# New style formatting
print("My name is {} and I am {} years old".format(name, age))
print("My name is {0} and I am {1} years old".format(name, age))
# f-strings (Python 3.6+)
print(f"My name is {name} and I am {age} years old")
# Template strings
from string import Template
template = Template("My name is $name and I am $age years old")
print(template.substitute(name=name, age=age))PythonString Stripping and Alignment
s = " Hello World "
print(s.strip()) # "Hello World"
print(s.lstrip()) # "Hello World "
print(s.rstrip()) # " Hello World"
# Alignment
print("Hello".ljust(10, '-')) # "Hello-----"
print("Hello".rjust(10, '-')) # "-----Hello"
print("Hello".center(10, '-')) # "--Hello---"PythonString Searching Algorithms
Naive String Search
def naive_search(text, pattern):
"""
Naive string search algorithm
Time: O((n-m+1)*m) where n=len(text), m=len(pattern)
"""
n, m = len(text), len(pattern)
results = []
for i in range(n - m + 1):
if text[i:i+m] == pattern:
results.append(i)
return results
# Example
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
print(naive_search(text, pattern)) # [10]PythonKnuth-Morris-Pratt (KMP) Algorithm
def compute_lps(pattern):
"""Compute Longest Prefix Suffix array"""
m = len(pattern)
lps = [0] * m
length = 0
i = 1
while i < m:
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
return lps
def kmp_search(text, pattern):
"""
KMP string search algorithm
Time: O(n + m) where n=len(text), m=len(pattern)
"""
n, m = len(text), len(pattern)
lps = compute_lps(pattern)
results = []
i = j = 0
while i < n:
if pattern[j] == text[i]:
i += 1
j += 1
if j == m:
results.append(i - j)
j = lps[j - 1]
elif i < n and pattern[j] != text[i]:
if j != 0:
j = lps[j - 1]
else:
i += 1
return results
# Example
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
print(kmp_search(text, pattern)) # [10]PythonRabin-Karp Algorithm
def rabin_karp(text, pattern, prime=101):
"""
Rabin-Karp string search algorithm
Time: O(n + m) average case, O((n-m+1)*m) worst case
"""
n, m = len(text), len(pattern)
results = []
# Calculate hash of pattern
pattern_hash = 0
text_hash = 0
h = 1
# h = d^(m-1) % prime
for i in range(m-1):
h = (h * 256) % prime
# Calculate initial hashes
for i in range(m):
pattern_hash = (256 * pattern_hash + ord(pattern[i])) % prime
text_hash = (256 * text_hash + ord(text[i])) % prime
# Slide the pattern over text
for i in range(n - m + 1):
if pattern_hash == text_hash:
# Check for spurious hits
if text[i:i+m] == pattern:
results.append(i)
# Calculate hash for next window
if i < n - m:
text_hash = (256 * (text_hash - ord(text[i]) * h) + ord(text[i+m])) % prime
if text_hash < 0:
text_hash += prime
return results
# Example
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
print(rabin_karp(text, pattern)) # [10]PythonBoyer-Moore Algorithm
def bad_character_table(pattern):
"""Create bad character table for Boyer-Moore"""
table = {}
m = len(pattern)
for i in range(m - 1):
table[pattern[i]] = m - 1 - i
return table
def boyer_moore_search(text, pattern):
"""
Boyer-Moore string search algorithm
Time: O(n/m) best case, O(n*m) worst case
"""
n, m = len(text), len(pattern)
if m == 0:
return []
results = []
bad_char = bad_character_table(pattern)
i = 0
while i <= n - m:
j = m - 1
while j >= 0 and pattern[j] == text[i + j]:
j -= 1
if j < 0:
results.append(i)
i += 1
else:
shift = bad_char.get(text[i + j], m)
i += shift
return results
# Example
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
print(boyer_moore_search(text, pattern)) # [10]PythonPalindrome Problems
Check if String is Palindrome
def is_palindrome(s):
"""
Check if string is palindrome
Time: O(n), Space: O(1)
"""
left, right = 0, len(s) - 1
while left < right:
if s[left] != s[right]:
return False
left += 1
right -= 1
return True
# Test cases
print(is_palindrome("racecar")) # True
print(is_palindrome("hello")) # False
print(is_palindrome("A man a plan a canal Panama".lower().replace(" ", ""))) # TruePythonLongest Palindromic Substring
def longest_palindromic_substring(s):
"""
Find longest palindromic substring using expand around center
Time: O(n²), Space: O(1)
"""
if not s:
return ""
n = len(s)
start = max_length = 0
def expand_around_center(left, right):
while left >= 0 and right < n and s[left] == s[right]:
left -= 1
right += 1
return right - left - 1
for i in range(n):
# Odd length palindromes
len1 = expand_around_center(i, i)
# Even length palindromes
len2 = expand_around_center(i, i + 1)
current_length = max(len1, len2)
if current_length > max_length:
max_length = current_length
start = i - (current_length - 1) // 2
return s[start:start + max_length]
# Test cases
print(longest_palindromic_substring("babad")) # "bab" or "aba"
print(longest_palindromic_substring("cbbd")) # "bb"PythonPalindrome Partitioning
def palindrome_partitioning(s):
"""
Partition string into minimum number of palindromic substrings
Time: O(n²), Space: O(n²)
"""
n = len(s)
# dp[i] = minimum cuts needed for s[0..i-1]
dp = [float('inf')] * (n + 1)
dp[0] = 0
# is_pal[i][j] = True if s[i..j] is palindrome
is_pal = [[False] * n for _ in range(n)]
for i in range(n):
for j in range(i + 1):
if s[i] == s[j] and (i - j <= 2 or is_pal[j + 1][i - 1]):
is_pal[j][i] = True
if j == 0:
dp[i + 1] = 0
else:
dp[i + 1] = min(dp[i + 1], dp[j] + 1)
return dp[n]
# Test
print(palindrome_partitioning("aab")) # 1 ("aa", "b")PythonAnagram Problems
Check if Two Strings are Anagrams
def are_anagrams(s1, s2):
"""
Check if two strings are anagrams
Time: O(n log n), Space: O(1) for fixed alphabet
"""
return sorted(s1) == sorted(s2)
def are_anagrams_optimized(s1, s2):
"""
Check if two strings are anagrams using frequency count
Time: O(n), Space: O(1) for ASCII
"""
if len(s1) != len(s2):
return False
count = [0] * 256 # ASCII characters
for char in s1:
count[ord(char)] += 1
for char in s2:
count[ord(char)] -= 1
if count[ord(char)] < 0:
return False
return True
# Test cases
print(are_anagrams("listen", "silent")) # True
print(are_anagrams("hello", "world")) # FalsePythonGroup Anagrams
def group_anagrams(strs):
"""
Group strings that are anagrams
Time: O(n * k log k) where n=len(strs), k=max string length
Space: O(n * k)
"""
anagram_groups = {}
for s in strs:
# Sort string as key
key = ''.join(sorted(s))
if key not in anagram_groups:
anagram_groups[key] = []
anagram_groups[key].append(s)
return list(anagram_groups.values())
# Test
strs = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(strs))
# Output: [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]PythonFind All Anagrams in String
def find_anagrams(s, p):
"""
Find all starting indices of p's anagrams in s
Time: O(n), Space: O(1) for fixed alphabet
"""
if len(p) > len(s):
return []
p_count = [0] * 256
s_count = [0] * 256
result = []
# Count frequency of p
for char in p:
p_count[ord(char)] += 1
# Sliding window
for i in range(len(s)):
s_count[ord(s[i])] += 1
# Remove character that's out of window
if i >= len(p):
s_count[ord(s[i - len(p)])] -= 1
# Check if current window matches p
if i >= len(p) - 1 and p_count == s_count:
result.append(i - len(p) + 1)
return result
# Test
s = "cbaebabacd"
p = "abc"
print(find_anagrams(s, p)) # [0, 6]PythonString Compression
Run Length Encoding
def compress_string(s):
"""
Compress string using run-length encoding
Time: O(n), Space: O(n)
"""
if not s:
return ""
compressed = []
count = 1
for i in range(1, len(s)):
if s[i] == s[i-1]:
count += 1
else:
compressed.append(s[i-1] + str(count))
count = 1
# Add last group
compressed.append(s[-1] + str(count))
result = ''.join(compressed)
return result if len(result) < len(s) else s
# Test
print(compress_string("aabcccccaaa")) # "a2b1c5a3"
print(compress_string("abc")) # "abc" (no compression)PythonString Compression II (LeetCode 1531)
def get_length_of_optimal_compression(s, k):
"""
Find minimum length after removing k characters and compressing
Time: O(n^2 * k), Space: O(n^2)
"""
n = len(s)
# dp[i][j] = min length for first i chars, removing j chars
dp = [[float('inf')] * (k + 1) for _ in range(n + 1)]
dp[0][0] = 0
for i in range(1, n + 1):
for j in range(min(i, k) + 1):
# Case 1: Remove current character
if j > 0:
dp[i][j] = dp[i-1][j-1]
# Case 2: Keep current character
count = 0
for prev in range(i):
if s[prev] == s[i-1]:
count += 1
else:
count = 1
# Length needed for this run
run_length = 1
if count >= 10:
run_length = 3 # e.g., "a10"
elif count >= 2:
run_length = 2 # e.g., "a2"
# else run_length = 1 (just the character)
if prev + 1 >= run_length:
dp[i][j] = min(dp[i][j], dp[prev][j] + run_length)
return dp[n][k]PythonTrie (Prefix Tree)
Trie Implementation
class TrieNode:
def __init__(self):
self.children = {}
self.is_end_of_word = False
self.count = 0 # For frequency counting
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word):
"""Insert word into trie"""
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.count += 1
node.is_end_of_word = True
def search(self, word):
"""Check if word exists in trie"""
node = self.root
for char in word:
if char not in node.children:
return False
node = node.children[char]
return node.is_end_of_word
def starts_with(self, prefix):
"""Check if any word starts with prefix"""
node = self.root
for char in prefix:
if char not in node.children:
return False
node = node.children[char]
return True
def get_words_with_prefix(self, prefix):
"""Get all words starting with prefix"""
def dfs(node, current_word, result):
if node.is_end_of_word:
result.append(current_word)
for char, child in node.children.items():
dfs(child, current_word + char, result)
node = self.root
for char in prefix:
if char not in node.children:
return []
node = node.children[char]
result = []
dfs(node, prefix, result)
return result
def delete(self, word):
"""Delete word from trie"""
def _delete(node, word, index):
if index == len(word):
if not node.is_end_of_word:
return False
node.is_end_of_word = False
return len(node.children) == 0
char = word[index]
if char not in node.children:
return False
should_delete = _delete(node.children[char], word, index + 1)
if should_delete:
del node.children[char]
return len(node.children) == 0
else:
return False
return _delete(self.root, word, 0)
# Example usage
trie = Trie()
words = ["apple", "app", "application", "bat", "ball"]
for word in words:
trie.insert(word)
print(trie.search("app")) # True
print(trie.search("appl")) # False
print(trie.starts_with("app")) # True
print(trie.get_words_with_prefix("app")) # ["apple", "app", "application"]PythonApplications of Trie
# Auto-complete
def autocomplete(trie, prefix):
return trie.get_words_with_prefix(prefix)
# Word frequency
def word_frequency(trie, word):
node = trie.root
for char in word:
if char not in node.children:
return 0
node = node.children[char]
return node.count if node.is_end_of_word else 0
# Longest common prefix
def longest_common_prefix(words):
if not words:
return ""
trie = Trie()
for word in words:
trie.insert(word)
prefix = ""
node = trie.root
while len(node.children) == 1 and not node.is_end_of_word:
char = next(iter(node.children.keys()))
prefix += char
node = node.children[char]
return prefix
# Test
words = ["flower", "flow", "flight"]
print(longest_common_prefix(words)) # "fl"PythonSuffix Arrays and Trees
Suffix Array Construction
def build_suffix_array(s):
"""
Build suffix array for string s
Time: O(n log n), Space: O(n)
"""
n = len(s)
suffixes = [(s[i:], i) for i in range(n)]
suffixes.sort()
return [suffix[1] for suffix in suffixes]
def build_suffix_array_efficient(s):
"""
Efficient suffix array construction using radix sort
Time: O(n log n), Space: O(n)
"""
n = len(s)
sa = list(range(n))
rank = [ord(c) for c in s]
tmp = [0] * n
k = 1
while k < n:
# Sort by rank[i] and rank[i+k]
sa.sort(key=lambda x: (rank[x], rank[x + k] if x + k < n else -1))
# Update ranks
tmp[sa[0]] = 0
for i in range(1, n):
prev = sa[i-1]
curr = sa[i]
if (rank[prev], rank[prev + k] if prev + k < n else -1) < \
(rank[curr], rank[curr + k] if curr + k < n else -1):
tmp[curr] = tmp[prev] + 1
else:
tmp[curr] = tmp[prev]
rank, tmp = tmp, rank
k *= 2
return sa
# Example
s = "banana"
sa = build_suffix_array(s)
print("Suffix Array:", sa)
print("Sorted suffixes:")
for i in sa:
print(f"{i}: {s[i:]}")PythonPattern Matching with Suffix Array
def search_pattern_suffix_array(s, pattern, sa):
"""
Search pattern using suffix array
Time: O(m log n) where m=len(pattern), n=len(s)
"""
n, m = len(s), len(pattern)
left, right = 0, n - 1
result = []
while left <= right:
mid = (left + right) // 2
suffix = s[sa[mid]:]
if suffix.startswith(pattern):
# Find all occurrences
i = mid
while i >= 0 and s[sa[i]:].startswith(pattern):
result.append(sa[i])
i -= 1
i = mid + 1
while i < n and s[sa[i]:].startswith(pattern):
result.append(sa[i])
i += 1
return sorted(result)
elif pattern < suffix:
right = mid - 1
else:
left = mid + 1
return []
# Example
s = "banana"
pattern = "ana"
sa = build_suffix_array(s)
print(search_pattern_suffix_array(s, pattern, sa)) # [1, 3]PythonRegular Expressions
Basic Regex Patterns
import re
# Common patterns
patterns = {
'email': r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$',
'phone': r'^\+?1?[-.\s]?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})$',
'url': r'https?://(?:[-\w.])+(?:[:\d]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:\w*))*)?',
'date': r'^\d{4}-\d{2}-\d{2}$',
'ip_address': r'^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$'
}
def validate_pattern(text, pattern):
return bool(re.match(pattern, text))
# Test
print(validate_pattern("user@example.com", patterns['email'])) # True
print(validate_pattern("192.168.1.1", patterns['ip_address'])) # TruePythonString Matching with Regex
def find_all_matches(text, pattern):
"""Find all matches of pattern in text"""
return re.findall(pattern, text)
def replace_pattern(text, pattern, replacement):
"""Replace all occurrences of pattern"""
return re.sub(pattern, replacement, text)
def split_by_pattern(text, pattern):
"""Split text by pattern"""
return re.split(pattern, text)
# Examples
text = "The price is $100.50 and the discount is 20%."
# Find all numbers
numbers = find_all_matches(text, r'\d+\.?\d*')
print("Numbers:", numbers) # ['100.50', '20']
# Replace currency symbols
cleaned = replace_pattern(text, r'\$', 'USD ')
print("Cleaned:", cleaned)
# Extract words
words = find_all_matches(text, r'\b\w+\b')
print("Words:", words)PythonAdvanced Regex Techniques
def extract_emails(text):
"""Extract all email addresses from text"""
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
return re.findall(email_pattern, text)
def validate_password(password):
"""
Validate password strength
Requirements: 8+ chars, uppercase, lowercase, digit, special char
"""
if len(password) < 8:
return False, "Password must be at least 8 characters"
patterns = [
(r'[A-Z]', "uppercase letter"),
(r'[a-z]', "lowercase letter"),
(r'\d', "digit"),
(r'[!@#$%^&*(),.?":{}|<>]', "special character")
]
for pattern, requirement in patterns:
if not re.search(pattern, password):
return False, f"Password must contain at least one {requirement}"
return True, "Password is valid"
# Test
text = "Contact us at support@example.com or admin@company.org"
emails = extract_emails(text)
print("Emails:", emails)
passwords = ["weak", "Strong123!", "weakpassword"]
for pwd in passwords:
valid, msg = validate_password(pwd)
print(f"{pwd}: {msg}")PythonCommon String Interview Problems
1. Valid Parentheses
def is_valid_parentheses(s):
"""
Check if parentheses are valid
Time: O(n), Space: O(n)
"""
stack = []
mapping = {')': '(', '}': '{', ']': '['}
for char in s:
if char in mapping:
top = stack.pop() if stack else '#'
if mapping[char] != top:
return False
else:
stack.append(char)
return not stack
# Test
print(is_valid_parentheses("()[]{}")) # True
print(is_valid_parentheses("(]")) # FalsePython2. Longest Substring Without Repeating Characters
def longest_substring_without_repeating(s):
"""
Find length of longest substring without repeating characters
Time: O(n), Space: O(min(n, 128))
"""
n = len(s)
max_length = 0
char_index = {}
left = 0
for right in range(n):
if s[right] in char_index:
left = max(left, char_index[s[right]] + 1)
char_index[s[right]] = right
max_length = max(max_length, right - left + 1)
return max_length
# Test
print(longest_substring_without_repeating("abcabcbb")) # 3 ("abc")
print(longest_substring_without_repeating("bbbbb")) # 1 ("b")Python3. String to Integer (atoi)
def my_atoi(s):
"""
Convert string to integer
Time: O(n), Space: O(1)
"""
s = s.strip()
if not s:
return 0
sign = 1
if s[0] in ['-', '+']:
sign = -1 if s[0] == '-' else 1
s = s[1:]
result = 0
for char in s:
if not char.isdigit():
break
result = result * 10 + int(char)
# Handle overflow
if sign * result > 2**31 - 1:
return 2**31 - 1
if sign * result < -2**31:
return -2**31
return sign * result
# Test
print(my_atoi("42")) # 42
print(my_atoi(" -42")) # -42
print(my_atoi("4193 with words")) # 4193Python4. Group Shifted Strings
def group_shifted_strings(strings):
"""
Group strings that are shifted versions of each other
Time: O(n*k) where n=len(strings), k=max string length
"""
def get_shift_key(s):
if not s:
return ""
# Use first character as reference
shift = ord(s[0])
key = []
for char in s:
# Normalize shift relative to first character
diff = (ord(char) - shift) % 26
key.append(str(diff))
return ','.join(key)
groups = {}
for s in strings:
key = get_shift_key(s)
if key not in groups:
groups[key] = []
groups[key].append(s)
return list(groups.values())
# Test
strings = ["abc", "bcd", "acef", "xyz", "az", "ba", "a", "z"]
print(group_shifted_strings(strings))
# Output: [['abc', 'bcd', 'xyz'], ['acef'], ['az', 'ba'], ['a', 'z']]Python5. Minimum Window Substring
def min_window(s, t):
"""
Find minimum window in s that contains all characters in t
Time: O(n), Space: O(k) where k is size of character set
"""
if not t or not s:
return ""
# Count characters in t
target_count = {}
for char in t:
target_count[char] = target_count.get(char, 0) + 1
required = len(target_count)
formed = 0
# Window counts
window_count = {}
left = 0
min_length = float('inf')
min_window_start = 0
for right in range(len(s)):
char = s[right]
window_count[char] = window_count.get(char, 0) + 1
if char in target_count and window_count[char] == target_count[char]:
formed += 1
# Shrink window from left
while left <= right and formed == required:
char = s[left]
# Update minimum window
if right - left + 1 < min_length:
min_length = right - left + 1
min_window_start = left
window_count[char] -= 1
if char in target_count and window_count[char] < target_count[char]:
formed -= 1
left += 1
return "" if min_length == float('inf') else s[min_window_start:min_window_start + min_length]
# Test
s = "ADOBECODEBANC"
t = "ABC"
print(min_window(s, t)) # "BANC"PythonAdvanced String Algorithms
Manacher’s Algorithm for Palindromes
def manacher(s):
"""
Find longest palindromic substring using Manacher's algorithm
Time: O(n), Space: O(n)
"""
# Transform string to handle even-length palindromes
t = '^#' + '#'.join(s) + '#$'
n = len(t)
p = [0] * n # p[i] = length of palindrome centered at i
c = r = 0 # center and right boundary
for i in range(1, n-1):
i_mirror = 2*c - i # mirror of i around c
if r > i:
p[i] = min(r - i, p[i_mirror])
# Expand around center i
while t[i + p[i] + 1] == t[i - p[i] - 1]:
p[i] += 1
# Update center and right boundary
if i + p[i] > r:
c = i
r = i + p[i]
# Find maximum palindrome
max_len = max(p)
center_index = p.index(max_len)
start = (center_index - max_len) // 2
return s[start:start + max_len]
# Test
print(manacher("babad")) # "bab" or "aba"
print(manacher("cbbd")) # "bb"PythonZ-Algorithm for Pattern Matching
def z_algorithm(s):
"""
Compute Z-array for string s
Z[i] = length of longest substring starting at i that matches prefix
Time: O(n), Space: O(n)
"""
n = len(s)
z = [0] * n
l = r = 0 # [l, r] is the rightmost segment match
for i in range(1, n):
if i < r:
# i is within current segment
k = i - l
if z[k] < r - i + 1:
z[i] = z[k]
else:
z[i] = r - i + 1
while i + z[i] < n and s[z[i]] == s[i + z[i]]:
z[i] += 1
else:
# i is outside current segment
z[i] = 0
while i + z[i] < n and s[z[i]] == s[i + z[i]]:
z[i] += 1
# Update segment
if i + z[i] - 1 > r:
l = i
r = i + z[i] - 1
return z
def search_pattern_z(s, pattern):
"""
Search pattern in s using Z-algorithm
"""
concat = pattern + '$' + s
z = z_algorithm(concat)
result = []
for i in range(len(pattern) + 1, len(concat)):
if z[i] == len(pattern):
result.append(i - len(pattern) - 1)
return result
# Test
s = "AABAACAADAABAABA"
pattern = "AABA"
print(search_pattern_z(s, pattern)) # [0, 9, 12]PythonAho-Corasick Algorithm
class AhoCorasickNode:
def __init__(self):
self.children = {}
self.fail = None
self.output = []
class AhoCorasick:
def __init__(self, patterns):
self.root = AhoCorasickNode()
self.build_trie(patterns)
self.build_failure_links()
def build_trie(self, patterns):
for pattern in patterns:
node = self.root
for char in pattern:
if char not in node.children:
node.children[char] = AhoCorasickNode()
node = node.children[char]
node.output.append(pattern)
def build_failure_links(self):
from collections import deque
queue = deque()
for node in self.root.children.values():
node.fail = self.root
queue.append(node)
while queue:
current = queue.popleft()
for char, child in current.children.items():
queue.append(child)
# Find failure link
fail = current.fail
while fail != self.root and char not in fail.children:
fail = fail.fail
if char in fail.children:
fail = fail.children[char]
else:
fail = self.root
child.fail = fail
child.output.extend(fail.output)
def search(self, text):
"""Find all pattern occurrences in text"""
node = self.root
results = []
for i, char in enumerate(text):
while node != self.root and char not in node.children:
node = node.fail
if char in node.children:
node = node.children[char]
else:
node = self.root
if node.output:
for pattern in node.output:
results.append((i - len(pattern) + 1, pattern))
return results
# Test
patterns = ["he", "she", "his", "hers"]
ac = AhoCorasick(patterns)
text = "ushers"
print(ac.search(text)) # [(1, 'he'), (1, 'she'), (4, 'he')]PythonTime and Space Complexity
String Algorithm Complexities
| Algorithm | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Naive Search | O((n-m+1)×m) | O(1) | Simple, works for small inputs |
| KMP | O(n+m) | O(m) | Best for multiple searches |
| Rabin-Karp | O(n+m) avg, O((n-m+1)×m) worst | O(1) | Good for multiple patterns |
| Boyer-Moore | O(n/m) best, O(n×m) worst | O(σ) | Fast in practice |
| Z-Algorithm | O(n) | O(n) | Linear time pattern matching |
| Aho-Corasick | O(n+m) | O(m×σ) | Multiple pattern search |
| Trie Operations | O(m) | O(m×σ) | m = key length, σ = alphabet size |
| Suffix Array | O(n log n) | O(n) | For advanced string problems |
| Manacher | O(n) | O(n) | Fast palindrome finding |
Python String Operation Complexities
| Operation | Time Complexity | Notes |
|---|---|---|
| len(s) | O(1) | Pre-computed |
| s[i] | O(1) | Direct access |
| s[i:j] | O(j-i) | Creates new string |
| s + t | O(n+m) | Creates new string |
| s.find(t) | O(n×m) | Naive search |
| s.replace(a,b) | O(n) | Creates new string |
| s.split() | O(n) | Creates list |
| ”.join(list) | O(n) | Efficient concatenation |
Best Practices and Tips
1. String Immutability Awareness
# Bad: Repeated concatenation in loop
result = ""
for i in range(1000):
result += str(i) # Creates new string each time
# Good: Use list and join
parts = []
for i in range(1000):
parts.append(str(i))
result = ''.join(parts) # Single string creationPython2. Efficient String Building
# Use io.StringIO for large string building
import io
output = io.StringIO()
for line in large_data:
output.write(line)
output.write('\n')
result = output.getvalue()Python3. Character Set Considerations
# For ASCII-only strings
char_count = [0] * 128
# For Unicode strings
from collections import Counter
char_count = Counter(s)Python4. Memory-Efficient String Processing
# Process large files line by line
def process_large_file(filename):
with open(filename, 'r') as f:
for line in f:
# Process one line at a time
process_line(line.strip())Python5. Regular Expression Optimization
import re
# Compile regex for reuse
email_pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
def is_valid_email(email):
return bool(email_pattern.match(email))Python6. String Comparison Best Practices
# Use == for content comparison
s1 = "hello"
s2 = "hello"
print(s1 == s2) # True
# Use 'is' for identity comparison (rarely needed for strings)
print(s1 is s2) # May be True due to interning
# For case-insensitive comparison
print(s1.lower() == s2.lower())
# For lexicographical comparison
print(s1 < s2) # Alphabetical orderPython7. Encoding and Unicode Handling
# Handle different encodings
text = "Hello 世界"
utf8_bytes = text.encode('utf-8')
decoded = utf8_bytes.decode('utf-8')
# Normalize Unicode
import unicodedata
normalized = unicodedata.normalize('NFC', text)PythonInterview Tips
- Clarify Constraints: Ask about string length, character set, case sensitivity
- Edge Cases: Empty strings, single characters, all same characters
- Time vs Space: Discuss trade-offs between different approaches
- Built-in Functions: Know when to use Python’s string methods vs custom implementation
- Pattern Recognition: Identify common patterns (palindromes, anagrams, substrings)
- Optimization: Start with brute force, then optimize
Common Pitfalls
- Index Errors: Always check bounds before accessing s[i]
- Off-by-One Errors: Careful with loop bounds in string algorithms
- Character Encoding: Handle Unicode properly
- Memory Usage: Be aware of string immutability costs
- Case Sensitivity: Consider case-insensitive requirements
Discover more from Altgr Blog
Subscribe to get the latest posts sent to your email.
