String Algorithms in Python – DSA

Introduction
String Basics in Python
String Manipulation Techniques
String Searching Algorithms
Palindrome Problems
Anagram Problems
String Compression
Trie (Prefix Tree)
Suffix Arrays and Trees
Regular Expressions
Common String Interview Problems
Advanced String Algorithms
Time and Space Complexity
Best Practices and Tips

Introduction

String algorithms are fundamental to computer science and form the backbone of many applications including text processing, search engines, bioinformatics, and data compression. In Python, strings are immutable sequences of Unicode characters, making them efficient for read operations but requiring careful handling for modifications.

Key Characteristics of Strings in Python

Immutable: Cannot be modified in-place
Sequence Type: Supports indexing, slicing, and iteration
Unicode Support: Handles international characters
Rich Built-in Methods: Extensive string manipulation capabilities

Why String Algorithms Matter

Text Processing: Search, replace, parse text data
Pattern Matching: Find patterns in large texts
Data Validation: Check formats, validate inputs
Compression: Reduce storage space
Bioinformatics: DNA sequence analysis
Search Engines: Indexing and querying

String Basics in Python

Creating Strings

# Different ways to create strings
single_quotes = 'Hello World'
double_quotes = "Hello World"
triple_quotes = """Multi-line
string"""

# String with special characters
escaped = "He said, \"Hello!\""
raw_string = r"C:\Users\Path"  # Raw string

# Unicode strings
unicode_str = "Hello 世界"

# Different ways to create strings
single_quotes = 'Hello World'
double_quotes = "Hello World"
triple_quotes = """Multi-line
string"""

# String with special characters
escaped = "He said, \"Hello!\""
raw_string = r"C:\Users\Path"  # Raw string

# Unicode strings
unicode_str = "Hello 世界"

Python

String Properties

s = "Python"

print(len(s))        # 6 - length
print(s[0])          # 'P' - indexing
print(s[-1])         # 'n' - negative indexing
print(s[1:4])        # 'yth' - slicing
print('P' in s)      # True - membership
print(s * 2)         # 'PythonPython' - repetition

s = "Python"

print(len(s))        # 6 - length
print(s[0])          # 'P' - indexing
print(s[-1])         # 'n' - negative indexing
print(s[1:4])        # 'yth' - slicing
print('P' in s)      # True - membership
print(s * 2)         # 'PythonPython' - repetition

Python

String Immutability

s = "Hello"
# s[0] = 'h'  # TypeError: 'str' object does not support item assignment

# Instead, create new string
s = 'h' + s[1:]  # 'hello'

s = "Hello"
# s[0] = 'h'  # TypeError: 'str' object does not support item assignment

# Instead, create new string
s = 'h' + s[1:]  # 'hello'

Python

String Manipulation Techniques

Basic Operations

# Concatenation
s1 = "Hello"
s2 = "World"
result = s1 + " " + s2  # "Hello World"

# Joining strings
words = ["Hello", "World", "Python"]
sentence = " ".join(words)  # "Hello World Python"

# Splitting
text = "Hello World Python"
words = text.split()  # ['Hello', 'World', 'Python']
parts = text.split(" ", 1)  # ['Hello', 'World Python']

# Concatenation
s1 = "Hello"
s2 = "World"
result = s1 + " " + s2  # "Hello World"

# Joining strings
words = ["Hello", "World", "Python"]
sentence = " ".join(words)  # "Hello World Python"

# Splitting
text = "Hello World Python"
words = text.split()  # ['Hello', 'World', 'Python']
parts = text.split(" ", 1)  # ['Hello', 'World Python']

Python

Case Conversion

s = "Hello World"

print(s.upper())      # "HELLO WORLD"
print(s.lower())      # "hello world"
print(s.title())      # "Hello World"
print(s.capitalize()) # "Hello world"
print(s.swapcase())   # "hELLO wORLD"

s = "Hello World"

print(s.upper())      # "HELLO WORLD"
print(s.lower())      # "hello world"
print(s.title())      # "Hello World"
print(s.capitalize()) # "Hello world"
print(s.swapcase())   # "hELLO wORLD"

Python

String Formatting

# Old style formatting
name = "Alice"
age = 30
print("My name is %s and I am %d years old" % (name, age))

# New style formatting
print("My name is {} and I am {} years old".format(name, age))
print("My name is {0} and I am {1} years old".format(name, age))

# f-strings (Python 3.6+)
print(f"My name is {name} and I am {age} years old")

# Template strings
from string import Template
template = Template("My name is $name and I am $age years old")
print(template.substitute(name=name, age=age))

# Old style formatting
name = "Alice"
age = 30
print("My name is %s and I am %d years old" % (name, age))

# New style formatting
print("My name is {} and I am {} years old".format(name, age))
print("My name is {0} and I am {1} years old".format(name, age))

# f-strings (Python 3.6+)
print(f"My name is {name} and I am {age} years old")

# Template strings
from string import Template
template = Template("My name is $name and I am $age years old")
print(template.substitute(name=name, age=age))

Python

String Stripping and Alignment

s = "  Hello World  "

print(s.strip())    # "Hello World"
print(s.lstrip())   # "Hello World  "
print(s.rstrip())   # "  Hello World"

# Alignment
print("Hello".ljust(10, '-'))   # "Hello-----"
print("Hello".rjust(10, '-'))   # "-----Hello"
print("Hello".center(10, '-'))  # "--Hello---"

s = "  Hello World  "

print(s.strip())    # "Hello World"
print(s.lstrip())   # "Hello World  "
print(s.rstrip())   # "  Hello World"

# Alignment
print("Hello".ljust(10, '-'))   # "Hello-----"
print("Hello".rjust(10, '-'))   # "-----Hello"
print("Hello".center(10, '-'))  # "--Hello---"

Python

String Searching Algorithms

Naive String Search

def naive_search(text, pattern):
    """
    Naive string search algorithm
    Time: O((n-m+1)*m) where n=len(text), m=len(pattern)
    """
    n, m = len(text), len(pattern)
    results = []

    for i in range(n - m + 1):
        if text[i:i+m] == pattern:
            results.append(i)

    return results

# Example
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
print(naive_search(text, pattern))  # [10]

def naive_search(text, pattern):
    """
    Naive string search algorithm
    Time: O((n-m+1)*m) where n=len(text), m=len(pattern)
    """
    n, m = len(text), len(pattern)
    results = []

    for i in range(n - m + 1):
        if text[i:i+m] == pattern:
            results.append(i)

    return results

# Example
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
print(naive_search(text, pattern))  # [10]

Python

Knuth-Morris-Pratt (KMP) Algorithm

def compute_lps(pattern):
    """Compute Longest Prefix Suffix array"""
    m = len(pattern)
    lps = [0] * m
    length = 0
    i = 1

    while i < m:
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        else:
            if length != 0:
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1

    return lps

def kmp_search(text, pattern):
    """
    KMP string search algorithm
    Time: O(n + m) where n=len(text), m=len(pattern)
    """
    n, m = len(text), len(pattern)
    lps = compute_lps(pattern)
    results = []

    i = j = 0
    while i < n:
        if pattern[j] == text[i]:
            i += 1
            j += 1

        if j == m:
            results.append(i - j)
            j = lps[j - 1]
        elif i < n and pattern[j] != text[i]:
            if j != 0:
                j = lps[j - 1]
            else:
                i += 1

    return results

# Example
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
print(kmp_search(text, pattern))  # [10]

def compute_lps(pattern):
    """Compute Longest Prefix Suffix array"""
    m = len(pattern)
    lps = [0] * m
    length = 0
    i = 1

    while i < m:
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        else:
            if length != 0:
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1

    return lps

def kmp_search(text, pattern):
    """
    KMP string search algorithm
    Time: O(n + m) where n=len(text), m=len(pattern)
    """
    n, m = len(text), len(pattern)
    lps = compute_lps(pattern)
    results = []

    i = j = 0
    while i < n:
        if pattern[j] == text[i]:
            i += 1
            j += 1

        if j == m:
            results.append(i - j)
            j = lps[j - 1]
        elif i < n and pattern[j] != text[i]:
            if j != 0:
                j = lps[j - 1]
            else:
                i += 1

    return results

# Example
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
print(kmp_search(text, pattern))  # [10]

Python

Rabin-Karp Algorithm

def rabin_karp(text, pattern, prime=101):
    """
    Rabin-Karp string search algorithm
    Time: O(n + m) average case, O((n-m+1)*m) worst case
    """
    n, m = len(text), len(pattern)
    results = []

    # Calculate hash of pattern
    pattern_hash = 0
    text_hash = 0
    h = 1

    # h = d^(m-1) % prime
    for i in range(m-1):
        h = (h * 256) % prime

    # Calculate initial hashes
    for i in range(m):
        pattern_hash = (256 * pattern_hash + ord(pattern[i])) % prime
        text_hash = (256 * text_hash + ord(text[i])) % prime

    # Slide the pattern over text
    for i in range(n - m + 1):
        if pattern_hash == text_hash:
            # Check for spurious hits
            if text[i:i+m] == pattern:
                results.append(i)

        # Calculate hash for next window
        if i < n - m:
            text_hash = (256 * (text_hash - ord(text[i]) * h) + ord(text[i+m])) % prime
            if text_hash < 0:
                text_hash += prime

    return results

# Example
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
print(rabin_karp(text, pattern))  # [10]

def rabin_karp(text, pattern, prime=101):
    """
    Rabin-Karp string search algorithm
    Time: O(n + m) average case, O((n-m+1)*m) worst case
    """
    n, m = len(text), len(pattern)
    results = []

    # Calculate hash of pattern
    pattern_hash = 0
    text_hash = 0
    h = 1

    # h = d^(m-1) % prime
    for i in range(m-1):
        h = (h * 256) % prime

    # Calculate initial hashes
    for i in range(m):
        pattern_hash = (256 * pattern_hash + ord(pattern[i])) % prime
        text_hash = (256 * text_hash + ord(text[i])) % prime

    # Slide the pattern over text
    for i in range(n - m + 1):
        if pattern_hash == text_hash:
            # Check for spurious hits
            if text[i:i+m] == pattern:
                results.append(i)

        # Calculate hash for next window
        if i < n - m:
            text_hash = (256 * (text_hash - ord(text[i]) * h) + ord(text[i+m])) % prime
            if text_hash < 0:
                text_hash += prime

    return results

# Example
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
print(rabin_karp(text, pattern))  # [10]

Python

Boyer-Moore Algorithm

def bad_character_table(pattern):
    """Create bad character table for Boyer-Moore"""
    table = {}
    m = len(pattern)

    for i in range(m - 1):
        table[pattern[i]] = m - 1 - i

    return table

def boyer_moore_search(text, pattern):
    """
    Boyer-Moore string search algorithm
    Time: O(n/m) best case, O(n*m) worst case
    """
    n, m = len(text), len(pattern)
    if m == 0:
        return []

    results = []
    bad_char = bad_character_table(pattern)

    i = 0
    while i <= n - m:
        j = m - 1

        while j >= 0 and pattern[j] == text[i + j]:
            j -= 1

        if j < 0:
            results.append(i)
            i += 1
        else:
            shift = bad_char.get(text[i + j], m)
            i += shift

    return results

# Example
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
print(boyer_moore_search(text, pattern))  # [10]

def bad_character_table(pattern):
    """Create bad character table for Boyer-Moore"""
    table = {}
    m = len(pattern)

    for i in range(m - 1):
        table[pattern[i]] = m - 1 - i

    return table

def boyer_moore_search(text, pattern):
    """
    Boyer-Moore string search algorithm
    Time: O(n/m) best case, O(n*m) worst case
    """
    n, m = len(text), len(pattern)
    if m == 0:
        return []

    results = []
    bad_char = bad_character_table(pattern)

    i = 0
    while i <= n - m:
        j = m - 1

        while j >= 0 and pattern[j] == text[i + j]:
            j -= 1

        if j < 0:
            results.append(i)
            i += 1
        else:
            shift = bad_char.get(text[i + j], m)
            i += shift

    return results

# Example
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
print(boyer_moore_search(text, pattern))  # [10]

Python

Palindrome Problems

Check if String is Palindrome

def is_palindrome(s):
    """
    Check if string is palindrome
    Time: O(n), Space: O(1)
    """
    left, right = 0, len(s) - 1

    while left < right:
        if s[left] != s[right]:
            return False
        left += 1
        right -= 1

    return True

# Test cases
print(is_palindrome("racecar"))  # True
print(is_palindrome("hello"))    # False
print(is_palindrome("A man a plan a canal Panama".lower().replace(" ", "")))  # True

def is_palindrome(s):
    """
    Check if string is palindrome
    Time: O(n), Space: O(1)
    """
    left, right = 0, len(s) - 1

    while left < right:
        if s[left] != s[right]:
            return False
        left += 1
        right -= 1

    return True

# Test cases
print(is_palindrome("racecar"))  # True
print(is_palindrome("hello"))    # False
print(is_palindrome("A man a plan a canal Panama".lower().replace(" ", "")))  # True

Python

Longest Palindromic Substring

def longest_palindromic_substring(s):
    """
    Find longest palindromic substring using expand around center
    Time: O(n²), Space: O(1)
    """
    if not s:
        return ""

    n = len(s)
    start = max_length = 0

    def expand_around_center(left, right):
        while left >= 0 and right < n and s[left] == s[right]:
            left -= 1
            right += 1
        return right - left - 1

    for i in range(n):
        # Odd length palindromes
        len1 = expand_around_center(i, i)
        # Even length palindromes
        len2 = expand_around_center(i, i + 1)

        current_length = max(len1, len2)
        if current_length > max_length:
            max_length = current_length
            start = i - (current_length - 1) // 2

    return s[start:start + max_length]

# Test cases
print(longest_palindromic_substring("babad"))  # "bab" or "aba"
print(longest_palindromic_substring("cbbd"))   # "bb"

def longest_palindromic_substring(s):
    """
    Find longest palindromic substring using expand around center
    Time: O(n²), Space: O(1)
    """
    if not s:
        return ""

    n = len(s)
    start = max_length = 0

    def expand_around_center(left, right):
        while left >= 0 and right < n and s[left] == s[right]:
            left -= 1
            right += 1
        return right - left - 1

    for i in range(n):
        # Odd length palindromes
        len1 = expand_around_center(i, i)
        # Even length palindromes
        len2 = expand_around_center(i, i + 1)

        current_length = max(len1, len2)
        if current_length > max_length:
            max_length = current_length
            start = i - (current_length - 1) // 2

    return s[start:start + max_length]

# Test cases
print(longest_palindromic_substring("babad"))  # "bab" or "aba"
print(longest_palindromic_substring("cbbd"))   # "bb"

Python

Palindrome Partitioning

def palindrome_partitioning(s):
    """
    Partition string into minimum number of palindromic substrings
    Time: O(n²), Space: O(n²)
    """
    n = len(s)
    # dp[i] = minimum cuts needed for s[0..i-1]
    dp = [float('inf')] * (n + 1)
    dp[0] = 0

    # is_pal[i][j] = True if s[i..j] is palindrome
    is_pal = [[False] * n for _ in range(n)]

    for i in range(n):
        for j in range(i + 1):
            if s[i] == s[j] and (i - j <= 2 or is_pal[j + 1][i - 1]):
                is_pal[j][i] = True
                if j == 0:
                    dp[i + 1] = 0
                else:
                    dp[i + 1] = min(dp[i + 1], dp[j] + 1)

    return dp[n]

# Test
print(palindrome_partitioning("aab"))  # 1 ("aa", "b")

def palindrome_partitioning(s):
    """
    Partition string into minimum number of palindromic substrings
    Time: O(n²), Space: O(n²)
    """
    n = len(s)
    # dp[i] = minimum cuts needed for s[0..i-1]
    dp = [float('inf')] * (n + 1)
    dp[0] = 0

    # is_pal[i][j] = True if s[i..j] is palindrome
    is_pal = [[False] * n for _ in range(n)]

    for i in range(n):
        for j in range(i + 1):
            if s[i] == s[j] and (i - j <= 2 or is_pal[j + 1][i - 1]):
                is_pal[j][i] = True
                if j == 0:
                    dp[i + 1] = 0
                else:
                    dp[i + 1] = min(dp[i + 1], dp[j] + 1)

    return dp[n]

# Test
print(palindrome_partitioning("aab"))  # 1 ("aa", "b")

Python

Anagram Problems

Check if Two Strings are Anagrams

def are_anagrams(s1, s2):
    """
    Check if two strings are anagrams
    Time: O(n log n), Space: O(1) for fixed alphabet
    """
    return sorted(s1) == sorted(s2)

def are_anagrams_optimized(s1, s2):
    """
    Check if two strings are anagrams using frequency count
    Time: O(n), Space: O(1) for ASCII
    """
    if len(s1) != len(s2):
        return False

    count = [0] * 256  # ASCII characters

    for char in s1:
        count[ord(char)] += 1

    for char in s2:
        count[ord(char)] -= 1
        if count[ord(char)] < 0:
            return False

    return True

# Test cases
print(are_anagrams("listen", "silent"))  # True
print(are_anagrams("hello", "world"))    # False

def are_anagrams(s1, s2):
    """
    Check if two strings are anagrams
    Time: O(n log n), Space: O(1) for fixed alphabet
    """
    return sorted(s1) == sorted(s2)

def are_anagrams_optimized(s1, s2):
    """
    Check if two strings are anagrams using frequency count
    Time: O(n), Space: O(1) for ASCII
    """
    if len(s1) != len(s2):
        return False

    count = [0] * 256  # ASCII characters

    for char in s1:
        count[ord(char)] += 1

    for char in s2:
        count[ord(char)] -= 1
        if count[ord(char)] < 0:
            return False

    return True

# Test cases
print(are_anagrams("listen", "silent"))  # True
print(are_anagrams("hello", "world"))    # False

Python

Group Anagrams

def group_anagrams(strs):
    """
    Group strings that are anagrams
    Time: O(n * k log k) where n=len(strs), k=max string length
    Space: O(n * k)
    """
    anagram_groups = {}

    for s in strs:
        # Sort string as key
        key = ''.join(sorted(s))
        if key not in anagram_groups:
            anagram_groups[key] = []
        anagram_groups[key].append(s)

    return list(anagram_groups.values())

# Test
strs = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(strs))
# Output: [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]

def group_anagrams(strs):
    """
    Group strings that are anagrams
    Time: O(n * k log k) where n=len(strs), k=max string length
    Space: O(n * k)
    """
    anagram_groups = {}

    for s in strs:
        # Sort string as key
        key = ''.join(sorted(s))
        if key not in anagram_groups:
            anagram_groups[key] = []
        anagram_groups[key].append(s)

    return list(anagram_groups.values())

# Test
strs = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(strs))
# Output: [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]

Python

Find All Anagrams in String

def find_anagrams(s, p):
    """
    Find all starting indices of p's anagrams in s
    Time: O(n), Space: O(1) for fixed alphabet
    """
    if len(p) > len(s):
        return []

    p_count = [0] * 256
    s_count = [0] * 256
    result = []

    # Count frequency of p
    for char in p:
        p_count[ord(char)] += 1

    # Sliding window
    for i in range(len(s)):
        s_count[ord(s[i])] += 1

        # Remove character that's out of window
        if i >= len(p):
            s_count[ord(s[i - len(p)])] -= 1

        # Check if current window matches p
        if i >= len(p) - 1 and p_count == s_count:
            result.append(i - len(p) + 1)

    return result

# Test
s = "cbaebabacd"
p = "abc"
print(find_anagrams(s, p))  # [0, 6]

def find_anagrams(s, p):
    """
    Find all starting indices of p's anagrams in s
    Time: O(n), Space: O(1) for fixed alphabet
    """
    if len(p) > len(s):
        return []

    p_count = [0] * 256
    s_count = [0] * 256
    result = []

    # Count frequency of p
    for char in p:
        p_count[ord(char)] += 1

    # Sliding window
    for i in range(len(s)):
        s_count[ord(s[i])] += 1

        # Remove character that's out of window
        if i >= len(p):
            s_count[ord(s[i - len(p)])] -= 1

        # Check if current window matches p
        if i >= len(p) - 1 and p_count == s_count:
            result.append(i - len(p) + 1)

    return result

# Test
s = "cbaebabacd"
p = "abc"
print(find_anagrams(s, p))  # [0, 6]

Python

String Compression

Run Length Encoding

def compress_string(s):
    """
    Compress string using run-length encoding
    Time: O(n), Space: O(n)
    """
    if not s:
        return ""

    compressed = []
    count = 1

    for i in range(1, len(s)):
        if s[i] == s[i-1]:
            count += 1
        else:
            compressed.append(s[i-1] + str(count))
            count = 1

    # Add last group
    compressed.append(s[-1] + str(count))

    result = ''.join(compressed)
    return result if len(result) < len(s) else s

# Test
print(compress_string("aabcccccaaa"))  # "a2b1c5a3"
print(compress_string("abc"))          # "abc" (no compression)

def compress_string(s):
    """
    Compress string using run-length encoding
    Time: O(n), Space: O(n)
    """
    if not s:
        return ""

    compressed = []
    count = 1

    for i in range(1, len(s)):
        if s[i] == s[i-1]:
            count += 1
        else:
            compressed.append(s[i-1] + str(count))
            count = 1

    # Add last group
    compressed.append(s[-1] + str(count))

    result = ''.join(compressed)
    return result if len(result) < len(s) else s

# Test
print(compress_string("aabcccccaaa"))  # "a2b1c5a3"
print(compress_string("abc"))          # "abc" (no compression)

Python

String Compression II (LeetCode 1531)

def get_length_of_optimal_compression(s, k):
    """
    Find minimum length after removing k characters and compressing
    Time: O(n^2 * k), Space: O(n^2)
    """
    n = len(s)

    # dp[i][j] = min length for first i chars, removing j chars
    dp = [[float('inf')] * (k + 1) for _ in range(n + 1)]
    dp[0][0] = 0

    for i in range(1, n + 1):
        for j in range(min(i, k) + 1):
            # Case 1: Remove current character
            if j > 0:
                dp[i][j] = dp[i-1][j-1]

            # Case 2: Keep current character
            count = 0
            for prev in range(i):
                if s[prev] == s[i-1]:
                    count += 1
                else:
                    count = 1

                # Length needed for this run
                run_length = 1
                if count >= 10:
                    run_length = 3  # e.g., "a10"
                elif count >= 2:
                    run_length = 2  # e.g., "a2"
                # else run_length = 1 (just the character)

                if prev + 1 >= run_length:
                    dp[i][j] = min(dp[i][j], dp[prev][j] + run_length)

    return dp[n][k]

def get_length_of_optimal_compression(s, k):
    """
    Find minimum length after removing k characters and compressing
    Time: O(n^2 * k), Space: O(n^2)
    """
    n = len(s)

    # dp[i][j] = min length for first i chars, removing j chars
    dp = [[float('inf')] * (k + 1) for _ in range(n + 1)]
    dp[0][0] = 0

    for i in range(1, n + 1):
        for j in range(min(i, k) + 1):
            # Case 1: Remove current character
            if j > 0:
                dp[i][j] = dp[i-1][j-1]

            # Case 2: Keep current character
            count = 0
            for prev in range(i):
                if s[prev] == s[i-1]:
                    count += 1
                else:
                    count = 1

                # Length needed for this run
                run_length = 1
                if count >= 10:
                    run_length = 3  # e.g., "a10"
                elif count >= 2:
                    run_length = 2  # e.g., "a2"
                # else run_length = 1 (just the character)

                if prev + 1 >= run_length:
                    dp[i][j] = min(dp[i][j], dp[prev][j] + run_length)

    return dp[n][k]

Python

Trie (Prefix Tree)

Trie Implementation

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end_of_word = False
        self.count = 0  # For frequency counting

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        """Insert word into trie"""
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
            node.count += 1
        node.is_end_of_word = True

    def search(self, word):
        """Check if word exists in trie"""
        node = self.root
        for char in word:
            if char not in node.children:
                return False
            node = node.children[char]
        return node.is_end_of_word

    def starts_with(self, prefix):
        """Check if any word starts with prefix"""
        node = self.root
        for char in prefix:
            if char not in node.children:
                return False
            node = node.children[char]
        return True

    def get_words_with_prefix(self, prefix):
        """Get all words starting with prefix"""
        def dfs(node, current_word, result):
            if node.is_end_of_word:
                result.append(current_word)

            for char, child in node.children.items():
                dfs(child, current_word + char, result)

        node = self.root
        for char in prefix:
            if char not in node.children:
                return []
            node = node.children[char]

        result = []
        dfs(node, prefix, result)
        return result

    def delete(self, word):
        """Delete word from trie"""
        def _delete(node, word, index):
            if index == len(word):
                if not node.is_end_of_word:
                    return False
                node.is_end_of_word = False
                return len(node.children) == 0

            char = word[index]
            if char not in node.children:
                return False

            should_delete = _delete(node.children[char], word, index + 1)

            if should_delete:
                del node.children[char]
                return len(node.children) == 0
            else:
                return False

        return _delete(self.root, word, 0)

# Example usage
trie = Trie()
words = ["apple", "app", "application", "bat", "ball"]

for word in words:
    trie.insert(word)

print(trie.search("app"))        # True
print(trie.search("appl"))       # False
print(trie.starts_with("app"))   # True
print(trie.get_words_with_prefix("app"))  # ["apple", "app", "application"]

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end_of_word = False
        self.count = 0  # For frequency counting

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        """Insert word into trie"""
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
            node.count += 1
        node.is_end_of_word = True

    def search(self, word):
        """Check if word exists in trie"""
        node = self.root
        for char in word:
            if char not in node.children:
                return False
            node = node.children[char]
        return node.is_end_of_word

    def starts_with(self, prefix):
        """Check if any word starts with prefix"""
        node = self.root
        for char in prefix:
            if char not in node.children:
                return False
            node = node.children[char]
        return True

    def get_words_with_prefix(self, prefix):
        """Get all words starting with prefix"""
        def dfs(node, current_word, result):
            if node.is_end_of_word:
                result.append(current_word)

            for char, child in node.children.items():
                dfs(child, current_word + char, result)

        node = self.root
        for char in prefix:
            if char not in node.children:
                return []
            node = node.children[char]

        result = []
        dfs(node, prefix, result)
        return result

    def delete(self, word):
        """Delete word from trie"""
        def _delete(node, word, index):
            if index == len(word):
                if not node.is_end_of_word:
                    return False
                node.is_end_of_word = False
                return len(node.children) == 0

            char = word[index]
            if char not in node.children:
                return False

            should_delete = _delete(node.children[char], word, index + 1)

            if should_delete:
                del node.children[char]
                return len(node.children) == 0
            else:
                return False

        return _delete(self.root, word, 0)

# Example usage
trie = Trie()
words = ["apple", "app", "application", "bat", "ball"]

for word in words:
    trie.insert(word)

print(trie.search("app"))        # True
print(trie.search("appl"))       # False
print(trie.starts_with("app"))   # True
print(trie.get_words_with_prefix("app"))  # ["apple", "app", "application"]

Python

Applications of Trie

# Auto-complete
def autocomplete(trie, prefix):
    return trie.get_words_with_prefix(prefix)

# Word frequency
def word_frequency(trie, word):
    node = trie.root
    for char in word:
        if char not in node.children:
            return 0
        node = node.children[char]
    return node.count if node.is_end_of_word else 0

# Longest common prefix
def longest_common_prefix(words):
    if not words:
        return ""

    trie = Trie()
    for word in words:
        trie.insert(word)

    prefix = ""
    node = trie.root

    while len(node.children) == 1 and not node.is_end_of_word:
        char = next(iter(node.children.keys()))
        prefix += char
        node = node.children[char]

    return prefix

# Test
words = ["flower", "flow", "flight"]
print(longest_common_prefix(words))  # "fl"

# Auto-complete
def autocomplete(trie, prefix):
    return trie.get_words_with_prefix(prefix)

# Word frequency
def word_frequency(trie, word):
    node = trie.root
    for char in word:
        if char not in node.children:
            return 0
        node = node.children[char]
    return node.count if node.is_end_of_word else 0

# Longest common prefix
def longest_common_prefix(words):
    if not words:
        return ""

    trie = Trie()
    for word in words:
        trie.insert(word)

    prefix = ""
    node = trie.root

    while len(node.children) == 1 and not node.is_end_of_word:
        char = next(iter(node.children.keys()))
        prefix += char
        node = node.children[char]

    return prefix

# Test
words = ["flower", "flow", "flight"]
print(longest_common_prefix(words))  # "fl"

Python

Suffix Arrays and Trees

Suffix Array Construction

def build_suffix_array(s):
    """
    Build suffix array for string s
    Time: O(n log n), Space: O(n)
    """
    n = len(s)
    suffixes = [(s[i:], i) for i in range(n)]
    suffixes.sort()
    return [suffix[1] for suffix in suffixes]

def build_suffix_array_efficient(s):
    """
    Efficient suffix array construction using radix sort
    Time: O(n log n), Space: O(n)
    """
    n = len(s)
    sa = list(range(n))
    rank = [ord(c) for c in s]
    tmp = [0] * n

    k = 1
    while k < n:
        # Sort by rank[i] and rank[i+k]
        sa.sort(key=lambda x: (rank[x], rank[x + k] if x + k < n else -1))

        # Update ranks
        tmp[sa[0]] = 0
        for i in range(1, n):
            prev = sa[i-1]
            curr = sa[i]
            if (rank[prev], rank[prev + k] if prev + k < n else -1) < \
               (rank[curr], rank[curr + k] if curr + k < n else -1):
                tmp[curr] = tmp[prev] + 1
            else:
                tmp[curr] = tmp[prev]

        rank, tmp = tmp, rank
        k *= 2

    return sa

# Example
s = "banana"
sa = build_suffix_array(s)
print("Suffix Array:", sa)
print("Sorted suffixes:")
for i in sa:
    print(f"{i}: {s[i:]}")

def build_suffix_array(s):
    """
    Build suffix array for string s
    Time: O(n log n), Space: O(n)
    """
    n = len(s)
    suffixes = [(s[i:], i) for i in range(n)]
    suffixes.sort()
    return [suffix[1] for suffix in suffixes]

def build_suffix_array_efficient(s):
    """
    Efficient suffix array construction using radix sort
    Time: O(n log n), Space: O(n)
    """
    n = len(s)
    sa = list(range(n))
    rank = [ord(c) for c in s]
    tmp = [0] * n

    k = 1
    while k < n:
        # Sort by rank[i] and rank[i+k]
        sa.sort(key=lambda x: (rank[x], rank[x + k] if x + k < n else -1))

        # Update ranks
        tmp[sa[0]] = 0
        for i in range(1, n):
            prev = sa[i-1]
            curr = sa[i]
            if (rank[prev], rank[prev + k] if prev + k < n else -1) < \
               (rank[curr], rank[curr + k] if curr + k < n else -1):
                tmp[curr] = tmp[prev] + 1
            else:
                tmp[curr] = tmp[prev]

        rank, tmp = tmp, rank
        k *= 2

    return sa

# Example
s = "banana"
sa = build_suffix_array(s)
print("Suffix Array:", sa)
print("Sorted suffixes:")
for i in sa:
    print(f"{i}: {s[i:]}")

Python

Pattern Matching with Suffix Array

def search_pattern_suffix_array(s, pattern, sa):
    """
    Search pattern using suffix array
    Time: O(m log n) where m=len(pattern), n=len(s)
    """
    n, m = len(s), len(pattern)
    left, right = 0, n - 1
    result = []

    while left <= right:
        mid = (left + right) // 2
        suffix = s[sa[mid]:]

        if suffix.startswith(pattern):
            # Find all occurrences
            i = mid
            while i >= 0 and s[sa[i]:].startswith(pattern):
                result.append(sa[i])
                i -= 1
            i = mid + 1
            while i < n and s[sa[i]:].startswith(pattern):
                result.append(sa[i])
                i += 1
            return sorted(result)
        elif pattern < suffix:
            right = mid - 1
        else:
            left = mid + 1

    return []

# Example
s = "banana"
pattern = "ana"
sa = build_suffix_array(s)
print(search_pattern_suffix_array(s, pattern, sa))  # [1, 3]

def search_pattern_suffix_array(s, pattern, sa):
    """
    Search pattern using suffix array
    Time: O(m log n) where m=len(pattern), n=len(s)
    """
    n, m = len(s), len(pattern)
    left, right = 0, n - 1
    result = []

    while left <= right:
        mid = (left + right) // 2
        suffix = s[sa[mid]:]

        if suffix.startswith(pattern):
            # Find all occurrences
            i = mid
            while i >= 0 and s[sa[i]:].startswith(pattern):
                result.append(sa[i])
                i -= 1
            i = mid + 1
            while i < n and s[sa[i]:].startswith(pattern):
                result.append(sa[i])
                i += 1
            return sorted(result)
        elif pattern < suffix:
            right = mid - 1
        else:
            left = mid + 1

    return []

# Example
s = "banana"
pattern = "ana"
sa = build_suffix_array(s)
print(search_pattern_suffix_array(s, pattern, sa))  # [1, 3]

Python

Regular Expressions

Basic Regex Patterns

import re

# Common patterns
patterns = {
    'email': r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$',
    'phone': r'^\+?1?[-.\s]?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})$',
    'url': r'https?://(?:[-\w.])+(?:[:\d]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:\w*))*)?',
    'date': r'^\d{4}-\d{2}-\d{2}$',
    'ip_address': r'^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$'
}

def validate_pattern(text, pattern):
    return bool(re.match(pattern, text))

# Test
print(validate_pattern("user@example.com", patterns['email']))  # True
print(validate_pattern("192.168.1.1", patterns['ip_address']))   # True

import re

# Common patterns
patterns = {
    'email': r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$',
    'phone': r'^\+?1?[-.\s]?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})$',
    'url': r'https?://(?:[-\w.])+(?:[:\d]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:\w*))*)?',
    'date': r'^\d{4}-\d{2}-\d{2}$',
    'ip_address': r'^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$'
}

def validate_pattern(text, pattern):
    return bool(re.match(pattern, text))

# Test
print(validate_pattern("user@example.com", patterns['email']))  # True
print(validate_pattern("192.168.1.1", patterns['ip_address']))   # True

Python

String Matching with Regex

def find_all_matches(text, pattern):
    """Find all matches of pattern in text"""
    return re.findall(pattern, text)

def replace_pattern(text, pattern, replacement):
    """Replace all occurrences of pattern"""
    return re.sub(pattern, replacement, text)

def split_by_pattern(text, pattern):
    """Split text by pattern"""
    return re.split(pattern, text)

# Examples
text = "The price is $100.50 and the discount is 20%."

# Find all numbers
numbers = find_all_matches(text, r'\d+\.?\d*')
print("Numbers:", numbers)  # ['100.50', '20']

# Replace currency symbols
cleaned = replace_pattern(text, r'\$', 'USD ')
print("Cleaned:", cleaned)

# Extract words
words = find_all_matches(text, r'\b\w+\b')
print("Words:", words)

def find_all_matches(text, pattern):
    """Find all matches of pattern in text"""
    return re.findall(pattern, text)

def replace_pattern(text, pattern, replacement):
    """Replace all occurrences of pattern"""
    return re.sub(pattern, replacement, text)

def split_by_pattern(text, pattern):
    """Split text by pattern"""
    return re.split(pattern, text)

# Examples
text = "The price is $100.50 and the discount is 20%."

# Find all numbers
numbers = find_all_matches(text, r'\d+\.?\d*')
print("Numbers:", numbers)  # ['100.50', '20']

# Replace currency symbols
cleaned = replace_pattern(text, r'\$', 'USD ')
print("Cleaned:", cleaned)

# Extract words
words = find_all_matches(text, r'\b\w+\b')
print("Words:", words)

Python

Advanced Regex Techniques

def extract_emails(text):
    """Extract all email addresses from text"""
    email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
    return re.findall(email_pattern, text)

def validate_password(password):
    """
    Validate password strength
    Requirements: 8+ chars, uppercase, lowercase, digit, special char
    """
    if len(password) < 8:
        return False, "Password must be at least 8 characters"

    patterns = [
        (r'[A-Z]', "uppercase letter"),
        (r'[a-z]', "lowercase letter"),
        (r'\d', "digit"),
        (r'[!@#$%^&*(),.?":{}|<>]', "special character")
    ]

    for pattern, requirement in patterns:
        if not re.search(pattern, password):
            return False, f"Password must contain at least one {requirement}"

    return True, "Password is valid"

# Test
text = "Contact us at support@example.com or admin@company.org"
emails = extract_emails(text)
print("Emails:", emails)

passwords = ["weak", "Strong123!", "weakpassword"]
for pwd in passwords:
    valid, msg = validate_password(pwd)
    print(f"{pwd}: {msg}")

def extract_emails(text):
    """Extract all email addresses from text"""
    email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
    return re.findall(email_pattern, text)

def validate_password(password):
    """
    Validate password strength
    Requirements: 8+ chars, uppercase, lowercase, digit, special char
    """
    if len(password) < 8:
        return False, "Password must be at least 8 characters"

    patterns = [
        (r'[A-Z]', "uppercase letter"),
        (r'[a-z]', "lowercase letter"),
        (r'\d', "digit"),
        (r'[!@#$%^&*(),.?":{}|<>]', "special character")
    ]

    for pattern, requirement in patterns:
        if not re.search(pattern, password):
            return False, f"Password must contain at least one {requirement}"

    return True, "Password is valid"

# Test
text = "Contact us at support@example.com or admin@company.org"
emails = extract_emails(text)
print("Emails:", emails)

passwords = ["weak", "Strong123!", "weakpassword"]
for pwd in passwords:
    valid, msg = validate_password(pwd)
    print(f"{pwd}: {msg}")

Python

Common String Interview Problems

1. Valid Parentheses

def is_valid_parentheses(s):
    """
    Check if parentheses are valid
    Time: O(n), Space: O(n)
    """
    stack = []
    mapping = {')': '(', '}': '{', ']': '['}

    for char in s:
        if char in mapping:
            top = stack.pop() if stack else '#'
            if mapping[char] != top:
                return False
        else:
            stack.append(char)

    return not stack

# Test
print(is_valid_parentheses("()[]{}"))  # True
print(is_valid_parentheses("(]"))      # False

def is_valid_parentheses(s):
    """
    Check if parentheses are valid
    Time: O(n), Space: O(n)
    """
    stack = []
    mapping = {')': '(', '}': '{', ']': '['}

    for char in s:
        if char in mapping:
            top = stack.pop() if stack else '#'
            if mapping[char] != top:
                return False
        else:
            stack.append(char)

    return not stack

# Test
print(is_valid_parentheses("()[]{}"))  # True
print(is_valid_parentheses("(]"))      # False

Python

2. Longest Substring Without Repeating Characters

def longest_substring_without_repeating(s):
    """
    Find length of longest substring without repeating characters
    Time: O(n), Space: O(min(n, 128))
    """
    n = len(s)
    max_length = 0
    char_index = {}
    left = 0

    for right in range(n):
        if s[right] in char_index:
            left = max(left, char_index[s[right]] + 1)

        char_index[s[right]] = right
        max_length = max(max_length, right - left + 1)

    return max_length

# Test
print(longest_substring_without_repeating("abcabcbb"))  # 3 ("abc")
print(longest_substring_without_repeating("bbbbb"))     # 1 ("b")

def longest_substring_without_repeating(s):
    """
    Find length of longest substring without repeating characters
    Time: O(n), Space: O(min(n, 128))
    """
    n = len(s)
    max_length = 0
    char_index = {}
    left = 0

    for right in range(n):
        if s[right] in char_index:
            left = max(left, char_index[s[right]] + 1)

        char_index[s[right]] = right
        max_length = max(max_length, right - left + 1)

    return max_length

# Test
print(longest_substring_without_repeating("abcabcbb"))  # 3 ("abc")
print(longest_substring_without_repeating("bbbbb"))     # 1 ("b")

Python

3. String to Integer (atoi)

def my_atoi(s):
    """
    Convert string to integer
    Time: O(n), Space: O(1)
    """
    s = s.strip()
    if not s:
        return 0

    sign = 1
    if s[0] in ['-', '+']:
        sign = -1 if s[0] == '-' else 1
        s = s[1:]

    result = 0
    for char in s:
        if not char.isdigit():
            break
        result = result * 10 + int(char)

        # Handle overflow
        if sign * result > 2**31 - 1:
            return 2**31 - 1
        if sign * result < -2**31:
            return -2**31

    return sign * result

# Test
print(my_atoi("42"))        # 42
print(my_atoi("   -42"))    # -42
print(my_atoi("4193 with words"))  # 4193

def my_atoi(s):
    """
    Convert string to integer
    Time: O(n), Space: O(1)
    """
    s = s.strip()
    if not s:
        return 0

    sign = 1
    if s[0] in ['-', '+']:
        sign = -1 if s[0] == '-' else 1
        s = s[1:]

    result = 0
    for char in s:
        if not char.isdigit():
            break
        result = result * 10 + int(char)

        # Handle overflow
        if sign * result > 2**31 - 1:
            return 2**31 - 1
        if sign * result < -2**31:
            return -2**31

    return sign * result

# Test
print(my_atoi("42"))        # 42
print(my_atoi("   -42"))    # -42
print(my_atoi("4193 with words"))  # 4193

Python

4. Group Shifted Strings

def group_shifted_strings(strings):
    """
    Group strings that are shifted versions of each other
    Time: O(n*k) where n=len(strings), k=max string length
    """
    def get_shift_key(s):
        if not s:
            return ""
        # Use first character as reference
        shift = ord(s[0])
        key = []
        for char in s:
            # Normalize shift relative to first character
            diff = (ord(char) - shift) % 26
            key.append(str(diff))
        return ','.join(key)

    groups = {}
    for s in strings:
        key = get_shift_key(s)
        if key not in groups:
            groups[key] = []
        groups[key].append(s)

    return list(groups.values())

# Test
strings = ["abc", "bcd", "acef", "xyz", "az", "ba", "a", "z"]
print(group_shifted_strings(strings))
# Output: [['abc', 'bcd', 'xyz'], ['acef'], ['az', 'ba'], ['a', 'z']]

def group_shifted_strings(strings):
    """
    Group strings that are shifted versions of each other
    Time: O(n*k) where n=len(strings), k=max string length
    """
    def get_shift_key(s):
        if not s:
            return ""
        # Use first character as reference
        shift = ord(s[0])
        key = []
        for char in s:
            # Normalize shift relative to first character
            diff = (ord(char) - shift) % 26
            key.append(str(diff))
        return ','.join(key)

    groups = {}
    for s in strings:
        key = get_shift_key(s)
        if key not in groups:
            groups[key] = []
        groups[key].append(s)

    return list(groups.values())

# Test
strings = ["abc", "bcd", "acef", "xyz", "az", "ba", "a", "z"]
print(group_shifted_strings(strings))
# Output: [['abc', 'bcd', 'xyz'], ['acef'], ['az', 'ba'], ['a', 'z']]

Python

5. Minimum Window Substring

def min_window(s, t):
    """
    Find minimum window in s that contains all characters in t
    Time: O(n), Space: O(k) where k is size of character set
    """
    if not t or not s:
        return ""

    # Count characters in t
    target_count = {}
    for char in t:
        target_count[char] = target_count.get(char, 0) + 1

    required = len(target_count)
    formed = 0

    # Window counts
    window_count = {}
    left = 0
    min_length = float('inf')
    min_window_start = 0

    for right in range(len(s)):
        char = s[right]
        window_count[char] = window_count.get(char, 0) + 1

        if char in target_count and window_count[char] == target_count[char]:
            formed += 1

        # Shrink window from left
        while left <= right and formed == required:
            char = s[left]

            # Update minimum window
            if right - left + 1 < min_length:
                min_length = right - left + 1
                min_window_start = left

            window_count[char] -= 1
            if char in target_count and window_count[char] < target_count[char]:
                formed -= 1

            left += 1

    return "" if min_length == float('inf') else s[min_window_start:min_window_start + min_length]

# Test
s = "ADOBECODEBANC"
t = "ABC"
print(min_window(s, t))  # "BANC"

def min_window(s, t):
    """
    Find minimum window in s that contains all characters in t
    Time: O(n), Space: O(k) where k is size of character set
    """
    if not t or not s:
        return ""

    # Count characters in t
    target_count = {}
    for char in t:
        target_count[char] = target_count.get(char, 0) + 1

    required = len(target_count)
    formed = 0

    # Window counts
    window_count = {}
    left = 0
    min_length = float('inf')
    min_window_start = 0

    for right in range(len(s)):
        char = s[right]
        window_count[char] = window_count.get(char, 0) + 1

        if char in target_count and window_count[char] == target_count[char]:
            formed += 1

        # Shrink window from left
        while left <= right and formed == required:
            char = s[left]

            # Update minimum window
            if right - left + 1 < min_length:
                min_length = right - left + 1
                min_window_start = left

            window_count[char] -= 1
            if char in target_count and window_count[char] < target_count[char]:
                formed -= 1

            left += 1

    return "" if min_length == float('inf') else s[min_window_start:min_window_start + min_length]

# Test
s = "ADOBECODEBANC"
t = "ABC"
print(min_window(s, t))  # "BANC"

Python

Advanced String Algorithms

Manacher’s Algorithm for Palindromes

def manacher(s):
    """
    Find longest palindromic substring using Manacher's algorithm
    Time: O(n), Space: O(n)
    """
    # Transform string to handle even-length palindromes
    t = '^#' + '#'.join(s) + '#$'
    n = len(t)
    p = [0] * n  # p[i] = length of palindrome centered at i
    c = r = 0    # center and right boundary

    for i in range(1, n-1):
        i_mirror = 2*c - i  # mirror of i around c

        if r > i:
            p[i] = min(r - i, p[i_mirror])

        # Expand around center i
        while t[i + p[i] + 1] == t[i - p[i] - 1]:
            p[i] += 1

        # Update center and right boundary
        if i + p[i] > r:
            c = i
            r = i + p[i]

    # Find maximum palindrome
    max_len = max(p)
    center_index = p.index(max_len)
    start = (center_index - max_len) // 2

    return s[start:start + max_len]

# Test
print(manacher("babad"))  # "bab" or "aba"
print(manacher("cbbd"))   # "bb"

def manacher(s):
    """
    Find longest palindromic substring using Manacher's algorithm
    Time: O(n), Space: O(n)
    """
    # Transform string to handle even-length palindromes
    t = '^#' + '#'.join(s) + '#$'
    n = len(t)
    p = [0] * n  # p[i] = length of palindrome centered at i
    c = r = 0    # center and right boundary

    for i in range(1, n-1):
        i_mirror = 2*c - i  # mirror of i around c

        if r > i:
            p[i] = min(r - i, p[i_mirror])

        # Expand around center i
        while t[i + p[i] + 1] == t[i - p[i] - 1]:
            p[i] += 1

        # Update center and right boundary
        if i + p[i] > r:
            c = i
            r = i + p[i]

    # Find maximum palindrome
    max_len = max(p)
    center_index = p.index(max_len)
    start = (center_index - max_len) // 2

    return s[start:start + max_len]

# Test
print(manacher("babad"))  # "bab" or "aba"
print(manacher("cbbd"))   # "bb"

Python

Z-Algorithm for Pattern Matching

def z_algorithm(s):
    """
    Compute Z-array for string s
    Z[i] = length of longest substring starting at i that matches prefix
    Time: O(n), Space: O(n)
    """
    n = len(s)
    z = [0] * n
    l = r = 0  # [l, r] is the rightmost segment match

    for i in range(1, n):
        if i < r:
            # i is within current segment
            k = i - l
            if z[k] < r - i + 1:
                z[i] = z[k]
            else:
                z[i] = r - i + 1
                while i + z[i] < n and s[z[i]] == s[i + z[i]]:
                    z[i] += 1
        else:
            # i is outside current segment
            z[i] = 0
            while i + z[i] < n and s[z[i]] == s[i + z[i]]:
                z[i] += 1

        # Update segment
        if i + z[i] - 1 > r:
            l = i
            r = i + z[i] - 1

    return z

def search_pattern_z(s, pattern):
    """
    Search pattern in s using Z-algorithm
    """
    concat = pattern + '$' + s
    z = z_algorithm(concat)
    result = []

    for i in range(len(pattern) + 1, len(concat)):
        if z[i] == len(pattern):
            result.append(i - len(pattern) - 1)

    return result

# Test
s = "AABAACAADAABAABA"
pattern = "AABA"
print(search_pattern_z(s, pattern))  # [0, 9, 12]

def z_algorithm(s):
    """
    Compute Z-array for string s
    Z[i] = length of longest substring starting at i that matches prefix
    Time: O(n), Space: O(n)
    """
    n = len(s)
    z = [0] * n
    l = r = 0  # [l, r] is the rightmost segment match

    for i in range(1, n):
        if i < r:
            # i is within current segment
            k = i - l
            if z[k] < r - i + 1:
                z[i] = z[k]
            else:
                z[i] = r - i + 1
                while i + z[i] < n and s[z[i]] == s[i + z[i]]:
                    z[i] += 1
        else:
            # i is outside current segment
            z[i] = 0
            while i + z[i] < n and s[z[i]] == s[i + z[i]]:
                z[i] += 1

        # Update segment
        if i + z[i] - 1 > r:
            l = i
            r = i + z[i] - 1

    return z

def search_pattern_z(s, pattern):
    """
    Search pattern in s using Z-algorithm
    """
    concat = pattern + '$' + s
    z = z_algorithm(concat)
    result = []

    for i in range(len(pattern) + 1, len(concat)):
        if z[i] == len(pattern):
            result.append(i - len(pattern) - 1)

    return result

# Test
s = "AABAACAADAABAABA"
pattern = "AABA"
print(search_pattern_z(s, pattern))  # [0, 9, 12]

Python

Aho-Corasick Algorithm

class AhoCorasickNode:
    def __init__(self):
        self.children = {}
        self.fail = None
        self.output = []

class AhoCorasick:
    def __init__(self, patterns):
        self.root = AhoCorasickNode()
        self.build_trie(patterns)
        self.build_failure_links()

    def build_trie(self, patterns):
        for pattern in patterns:
            node = self.root
            for char in pattern:
                if char not in node.children:
                    node.children[char] = AhoCorasickNode()
                node = node.children[char]
            node.output.append(pattern)

    def build_failure_links(self):
        from collections import deque

        queue = deque()
        for node in self.root.children.values():
            node.fail = self.root
            queue.append(node)

        while queue:
            current = queue.popleft()

            for char, child in current.children.items():
                queue.append(child)

                # Find failure link
                fail = current.fail
                while fail != self.root and char not in fail.children:
                    fail = fail.fail
                if char in fail.children:
                    fail = fail.children[char]
                else:
                    fail = self.root

                child.fail = fail
                child.output.extend(fail.output)

    def search(self, text):
        """Find all pattern occurrences in text"""
        node = self.root
        results = []

        for i, char in enumerate(text):
            while node != self.root and char not in node.children:
                node = node.fail
            if char in node.children:
                node = node.children[char]
            else:
                node = self.root

            if node.output:
                for pattern in node.output:
                    results.append((i - len(pattern) + 1, pattern))

        return results

# Test
patterns = ["he", "she", "his", "hers"]
ac = AhoCorasick(patterns)
text = "ushers"
print(ac.search(text))  # [(1, 'he'), (1, 'she'), (4, 'he')]

class AhoCorasickNode:
    def __init__(self):
        self.children = {}
        self.fail = None
        self.output = []

class AhoCorasick:
    def __init__(self, patterns):
        self.root = AhoCorasickNode()
        self.build_trie(patterns)
        self.build_failure_links()

    def build_trie(self, patterns):
        for pattern in patterns:
            node = self.root
            for char in pattern:
                if char not in node.children:
                    node.children[char] = AhoCorasickNode()
                node = node.children[char]
            node.output.append(pattern)

    def build_failure_links(self):
        from collections import deque

        queue = deque()
        for node in self.root.children.values():
            node.fail = self.root
            queue.append(node)

        while queue:
            current = queue.popleft()

            for char, child in current.children.items():
                queue.append(child)

                # Find failure link
                fail = current.fail
                while fail != self.root and char not in fail.children:
                    fail = fail.fail
                if char in fail.children:
                    fail = fail.children[char]
                else:
                    fail = self.root

                child.fail = fail
                child.output.extend(fail.output)

    def search(self, text):
        """Find all pattern occurrences in text"""
        node = self.root
        results = []

        for i, char in enumerate(text):
            while node != self.root and char not in node.children:
                node = node.fail
            if char in node.children:
                node = node.children[char]
            else:
                node = self.root

            if node.output:
                for pattern in node.output:
                    results.append((i - len(pattern) + 1, pattern))

        return results

# Test
patterns = ["he", "she", "his", "hers"]
ac = AhoCorasick(patterns)
text = "ushers"
print(ac.search(text))  # [(1, 'he'), (1, 'she'), (4, 'he')]

Python

Time and Space Complexity

String Algorithm Complexities

Algorithm	Time Complexity	Space Complexity	Notes
Naive Search	O((n-m+1)×m)	O(1)	Simple, works for small inputs
KMP	O(n+m)	O(m)	Best for multiple searches
Rabin-Karp	O(n+m) avg, O((n-m+1)×m) worst	O(1)	Good for multiple patterns
Boyer-Moore	O(n/m) best, O(n×m) worst	O(σ)	Fast in practice
Z-Algorithm	O(n)	O(n)	Linear time pattern matching
Aho-Corasick	O(n+m)	O(m×σ)	Multiple pattern search
Trie Operations	O(m)	O(m×σ)	m = key length, σ = alphabet size
Suffix Array	O(n log n)	O(n)	For advanced string problems
Manacher	O(n)	O(n)	Fast palindrome finding

Python String Operation Complexities

Operation	Time Complexity	Notes
len(s)	O(1)	Pre-computed
s[i]	O(1)	Direct access
s[i:j]	O(j-i)	Creates new string
s + t	O(n+m)	Creates new string
s.find(t)	O(n×m)	Naive search
s.replace(a,b)	O(n)	Creates new string
s.split()	O(n)	Creates list
”.join(list)	O(n)	Efficient concatenation

Best Practices and Tips

1. String Immutability Awareness

# Bad: Repeated concatenation in loop
result = ""
for i in range(1000):
    result += str(i)  # Creates new string each time

# Good: Use list and join
parts = []
for i in range(1000):
    parts.append(str(i))
result = ''.join(parts)  # Single string creation

# Bad: Repeated concatenation in loop
result = ""
for i in range(1000):
    result += str(i)  # Creates new string each time

# Good: Use list and join
parts = []
for i in range(1000):
    parts.append(str(i))
result = ''.join(parts)  # Single string creation

Python

2. Efficient String Building

# Use io.StringIO for large string building
import io

output = io.StringIO()
for line in large_data:
    output.write(line)
    output.write('\n')
result = output.getvalue()

# Use io.StringIO for large string building
import io

output = io.StringIO()
for line in large_data:
    output.write(line)
    output.write('\n')
result = output.getvalue()

Python

3. Character Set Considerations

# For ASCII-only strings
char_count = [0] * 128

# For Unicode strings
from collections import Counter
char_count = Counter(s)

# For ASCII-only strings
char_count = [0] * 128

# For Unicode strings
from collections import Counter
char_count = Counter(s)

Python

4. Memory-Efficient String Processing

# Process large files line by line
def process_large_file(filename):
    with open(filename, 'r') as f:
        for line in f:
            # Process one line at a time
            process_line(line.strip())

# Process large files line by line
def process_large_file(filename):
    with open(filename, 'r') as f:
        for line in f:
            # Process one line at a time
            process_line(line.strip())

Python

5. Regular Expression Optimization

import re

# Compile regex for reuse
email_pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')

def is_valid_email(email):
    return bool(email_pattern.match(email))

import re

# Compile regex for reuse
email_pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')

def is_valid_email(email):
    return bool(email_pattern.match(email))

Python

6. String Comparison Best Practices

# Use == for content comparison
s1 = "hello"
s2 = "hello"
print(s1 == s2)  # True

# Use 'is' for identity comparison (rarely needed for strings)
print(s1 is s2)  # May be True due to interning

# For case-insensitive comparison
print(s1.lower() == s2.lower())

# For lexicographical comparison
print(s1 < s2)  # Alphabetical order

# Use == for content comparison
s1 = "hello"
s2 = "hello"
print(s1 == s2)  # True

# Use 'is' for identity comparison (rarely needed for strings)
print(s1 is s2)  # May be True due to interning

# For case-insensitive comparison
print(s1.lower() == s2.lower())

# For lexicographical comparison
print(s1 < s2)  # Alphabetical order

Python

7. Encoding and Unicode Handling

# Handle different encodings
text = "Hello 世界"
utf8_bytes = text.encode('utf-8')
decoded = utf8_bytes.decode('utf-8')

# Normalize Unicode
import unicodedata
normalized = unicodedata.normalize('NFC', text)

# Handle different encodings
text = "Hello 世界"
utf8_bytes = text.encode('utf-8')
decoded = utf8_bytes.decode('utf-8')

# Normalize Unicode
import unicodedata
normalized = unicodedata.normalize('NFC', text)

Python

Interview Tips

Clarify Constraints: Ask about string length, character set, case sensitivity
Edge Cases: Empty strings, single characters, all same characters
Time vs Space: Discuss trade-offs between different approaches
Built-in Functions: Know when to use Python’s string methods vs custom implementation
Pattern Recognition: Identify common patterns (palindromes, anagrams, substrings)
Optimization: Start with brute force, then optimize

Common Pitfalls

Index Errors: Always check bounds before accessing s[i]
Off-by-One Errors: Careful with loop bounds in string algorithms
Character Encoding: Handle Unicode properly
Memory Usage: Be aware of string immutability costs
Case Sensitivity: Consider case-insensitive requirements

Discover more from Altgr Blog

Subscribe to get the latest posts sent to your email.

Table of Contents

Introduction

Key Characteristics of Strings in Python

Why String Algorithms Matter

String Basics in Python

Creating Strings

String Properties

String Immutability

String Manipulation Techniques

Basic Operations

Case Conversion

String Formatting

String Stripping and Alignment

String Searching Algorithms

Naive String Search

Knuth-Morris-Pratt (KMP) Algorithm

Rabin-Karp Algorithm

Boyer-Moore Algorithm

Palindrome Problems

Check if String is Palindrome

Longest Palindromic Substring

Palindrome Partitioning

Anagram Problems

Check if Two Strings are Anagrams

Group Anagrams

Find All Anagrams in String

String Compression

Run Length Encoding

String Compression II (LeetCode 1531)

Trie (Prefix Tree)

Trie Implementation

Applications of Trie

Suffix Arrays and Trees

Suffix Array Construction

Pattern Matching with Suffix Array

Regular Expressions

Basic Regex Patterns

String Matching with Regex

Advanced Regex Techniques

Common String Interview Problems

1. Valid Parentheses

2. Longest Substring Without Repeating Characters

3. String to Integer (atoi)

4. Group Shifted Strings

5. Minimum Window Substring

Advanced String Algorithms

Manacher’s Algorithm for Palindromes

Z-Algorithm for Pattern Matching

Aho-Corasick Algorithm

Time and Space Complexity

String Algorithm Complexities

Python String Operation Complexities

Best Practices and Tips

1. String Immutability Awareness

2. Efficient String Building

3. Character Set Considerations

4. Memory-Efficient String Processing

5. Regular Expression Optimization

6. String Comparison Best Practices

7. Encoding and Unicode Handling

Interview Tips

Common Pitfalls

Related

Discover more from Altgr Blog

Related Posts

Tree Data Structures in Python – DSA

Python DSA (Q101-Q199)

Python Fundamentals (Q1-Q100)

Leave a Reply Cancel reply