String Algorithms in Python – DSA

    Table of Contents

    1. Introduction
    2. String Basics in Python
    3. String Manipulation Techniques
    4. String Searching Algorithms
    5. Palindrome Problems
    6. Anagram Problems
    7. String Compression
    8. Trie (Prefix Tree)
    9. Suffix Arrays and Trees
    10. Regular Expressions
    11. Common String Interview Problems
    12. Advanced String Algorithms
    13. Time and Space Complexity
    14. Best Practices and Tips

    Introduction

    String algorithms are fundamental to computer science and form the backbone of many applications including text processing, search engines, bioinformatics, and data compression. In Python, strings are immutable sequences of Unicode characters, making them efficient for read operations but requiring careful handling for modifications.

    Key Characteristics of Strings in Python

    • Immutable: Cannot be modified in-place
    • Sequence Type: Supports indexing, slicing, and iteration
    • Unicode Support: Handles international characters
    • Rich Built-in Methods: Extensive string manipulation capabilities

    Why String Algorithms Matter

    • Text Processing: Search, replace, parse text data
    • Pattern Matching: Find patterns in large texts
    • Data Validation: Check formats, validate inputs
    • Compression: Reduce storage space
    • Bioinformatics: DNA sequence analysis
    • Search Engines: Indexing and querying

    String Basics in Python

    Creating Strings

    # Different ways to create strings
    single_quotes = 'Hello World'
    double_quotes = "Hello World"
    triple_quotes = """Multi-line
    string"""
    
    # String with special characters
    escaped = "He said, \"Hello!\""
    raw_string = r"C:\Users\Path"  # Raw string
    
    # Unicode strings
    unicode_str = "Hello 世界"
    Python

    String Properties

    s = "Python"
    
    print(len(s))        # 6 - length
    print(s[0])          # 'P' - indexing
    print(s[-1])         # 'n' - negative indexing
    print(s[1:4])        # 'yth' - slicing
    print('P' in s)      # True - membership
    print(s * 2)         # 'PythonPython' - repetition
    Python

    String Immutability

    s = "Hello"
    # s[0] = 'h'  # TypeError: 'str' object does not support item assignment
    
    # Instead, create new string
    s = 'h' + s[1:]  # 'hello'
    Python

    String Manipulation Techniques

    Basic Operations

    # Concatenation
    s1 = "Hello"
    s2 = "World"
    result = s1 + " " + s2  # "Hello World"
    
    # Joining strings
    words = ["Hello", "World", "Python"]
    sentence = " ".join(words)  # "Hello World Python"
    
    # Splitting
    text = "Hello World Python"
    words = text.split()  # ['Hello', 'World', 'Python']
    parts = text.split(" ", 1)  # ['Hello', 'World Python']
    Python

    Case Conversion

    s = "Hello World"
    
    print(s.upper())      # "HELLO WORLD"
    print(s.lower())      # "hello world"
    print(s.title())      # "Hello World"
    print(s.capitalize()) # "Hello world"
    print(s.swapcase())   # "hELLO wORLD"
    Python

    String Formatting

    # Old style formatting
    name = "Alice"
    age = 30
    print("My name is %s and I am %d years old" % (name, age))
    
    # New style formatting
    print("My name is {} and I am {} years old".format(name, age))
    print("My name is {0} and I am {1} years old".format(name, age))
    
    # f-strings (Python 3.6+)
    print(f"My name is {name} and I am {age} years old")
    
    # Template strings
    from string import Template
    template = Template("My name is $name and I am $age years old")
    print(template.substitute(name=name, age=age))
    Python

    String Stripping and Alignment

    s = "  Hello World  "
    
    print(s.strip())    # "Hello World"
    print(s.lstrip())   # "Hello World  "
    print(s.rstrip())   # "  Hello World"
    
    # Alignment
    print("Hello".ljust(10, '-'))   # "Hello-----"
    print("Hello".rjust(10, '-'))   # "-----Hello"
    print("Hello".center(10, '-'))  # "--Hello---"
    Python

    String Searching Algorithms

    def naive_search(text, pattern):
        """
        Naive string search algorithm
        Time: O((n-m+1)*m) where n=len(text), m=len(pattern)
        """
        n, m = len(text), len(pattern)
        results = []
    
        for i in range(n - m + 1):
            if text[i:i+m] == pattern:
                results.append(i)
    
        return results
    
    # Example
    text = "ABABDABACDABABCABAB"
    pattern = "ABABCABAB"
    print(naive_search(text, pattern))  # [10]
    Python

    Knuth-Morris-Pratt (KMP) Algorithm

    def compute_lps(pattern):
        """Compute Longest Prefix Suffix array"""
        m = len(pattern)
        lps = [0] * m
        length = 0
        i = 1
    
        while i < m:
            if pattern[i] == pattern[length]:
                length += 1
                lps[i] = length
                i += 1
            else:
                if length != 0:
                    length = lps[length - 1]
                else:
                    lps[i] = 0
                    i += 1
    
        return lps
    
    def kmp_search(text, pattern):
        """
        KMP string search algorithm
        Time: O(n + m) where n=len(text), m=len(pattern)
        """
        n, m = len(text), len(pattern)
        lps = compute_lps(pattern)
        results = []
    
        i = j = 0
        while i < n:
            if pattern[j] == text[i]:
                i += 1
                j += 1
    
            if j == m:
                results.append(i - j)
                j = lps[j - 1]
            elif i < n and pattern[j] != text[i]:
                if j != 0:
                    j = lps[j - 1]
                else:
                    i += 1
    
        return results
    
    # Example
    text = "ABABDABACDABABCABAB"
    pattern = "ABABCABAB"
    print(kmp_search(text, pattern))  # [10]
    Python

    Rabin-Karp Algorithm

    def rabin_karp(text, pattern, prime=101):
        """
        Rabin-Karp string search algorithm
        Time: O(n + m) average case, O((n-m+1)*m) worst case
        """
        n, m = len(text), len(pattern)
        results = []
    
        # Calculate hash of pattern
        pattern_hash = 0
        text_hash = 0
        h = 1
    
        # h = d^(m-1) % prime
        for i in range(m-1):
            h = (h * 256) % prime
    
        # Calculate initial hashes
        for i in range(m):
            pattern_hash = (256 * pattern_hash + ord(pattern[i])) % prime
            text_hash = (256 * text_hash + ord(text[i])) % prime
    
        # Slide the pattern over text
        for i in range(n - m + 1):
            if pattern_hash == text_hash:
                # Check for spurious hits
                if text[i:i+m] == pattern:
                    results.append(i)
    
            # Calculate hash for next window
            if i < n - m:
                text_hash = (256 * (text_hash - ord(text[i]) * h) + ord(text[i+m])) % prime
                if text_hash < 0:
                    text_hash += prime
    
        return results
    
    # Example
    text = "ABABDABACDABABCABAB"
    pattern = "ABABCABAB"
    print(rabin_karp(text, pattern))  # [10]
    Python

    Boyer-Moore Algorithm

    def bad_character_table(pattern):
        """Create bad character table for Boyer-Moore"""
        table = {}
        m = len(pattern)
    
        for i in range(m - 1):
            table[pattern[i]] = m - 1 - i
    
        return table
    
    def boyer_moore_search(text, pattern):
        """
        Boyer-Moore string search algorithm
        Time: O(n/m) best case, O(n*m) worst case
        """
        n, m = len(text), len(pattern)
        if m == 0:
            return []
    
        results = []
        bad_char = bad_character_table(pattern)
    
        i = 0
        while i <= n - m:
            j = m - 1
    
            while j >= 0 and pattern[j] == text[i + j]:
                j -= 1
    
            if j < 0:
                results.append(i)
                i += 1
            else:
                shift = bad_char.get(text[i + j], m)
                i += shift
    
        return results
    
    # Example
    text = "ABABDABACDABABCABAB"
    pattern = "ABABCABAB"
    print(boyer_moore_search(text, pattern))  # [10]
    Python

    Palindrome Problems

    Check if String is Palindrome

    def is_palindrome(s):
        """
        Check if string is palindrome
        Time: O(n), Space: O(1)
        """
        left, right = 0, len(s) - 1
    
        while left < right:
            if s[left] != s[right]:
                return False
            left += 1
            right -= 1
    
        return True
    
    # Test cases
    print(is_palindrome("racecar"))  # True
    print(is_palindrome("hello"))    # False
    print(is_palindrome("A man a plan a canal Panama".lower().replace(" ", "")))  # True
    Python

    Longest Palindromic Substring

    def longest_palindromic_substring(s):
        """
        Find longest palindromic substring using expand around center
        Time: O(n²), Space: O(1)
        """
        if not s:
            return ""
    
        n = len(s)
        start = max_length = 0
    
        def expand_around_center(left, right):
            while left >= 0 and right < n and s[left] == s[right]:
                left -= 1
                right += 1
            return right - left - 1
    
        for i in range(n):
            # Odd length palindromes
            len1 = expand_around_center(i, i)
            # Even length palindromes
            len2 = expand_around_center(i, i + 1)
    
            current_length = max(len1, len2)
            if current_length > max_length:
                max_length = current_length
                start = i - (current_length - 1) // 2
    
        return s[start:start + max_length]
    
    # Test cases
    print(longest_palindromic_substring("babad"))  # "bab" or "aba"
    print(longest_palindromic_substring("cbbd"))   # "bb"
    Python

    Palindrome Partitioning

    def palindrome_partitioning(s):
        """
        Partition string into minimum number of palindromic substrings
        Time: O(n²), Space: O(n²)
        """
        n = len(s)
        # dp[i] = minimum cuts needed for s[0..i-1]
        dp = [float('inf')] * (n + 1)
        dp[0] = 0
    
        # is_pal[i][j] = True if s[i..j] is palindrome
        is_pal = [[False] * n for _ in range(n)]
    
        for i in range(n):
            for j in range(i + 1):
                if s[i] == s[j] and (i - j <= 2 or is_pal[j + 1][i - 1]):
                    is_pal[j][i] = True
                    if j == 0:
                        dp[i + 1] = 0
                    else:
                        dp[i + 1] = min(dp[i + 1], dp[j] + 1)
    
        return dp[n]
    
    # Test
    print(palindrome_partitioning("aab"))  # 1 ("aa", "b")
    Python

    Anagram Problems

    Check if Two Strings are Anagrams

    def are_anagrams(s1, s2):
        """
        Check if two strings are anagrams
        Time: O(n log n), Space: O(1) for fixed alphabet
        """
        return sorted(s1) == sorted(s2)
    
    def are_anagrams_optimized(s1, s2):
        """
        Check if two strings are anagrams using frequency count
        Time: O(n), Space: O(1) for ASCII
        """
        if len(s1) != len(s2):
            return False
    
        count = [0] * 256  # ASCII characters
    
        for char in s1:
            count[ord(char)] += 1
    
        for char in s2:
            count[ord(char)] -= 1
            if count[ord(char)] < 0:
                return False
    
        return True
    
    # Test cases
    print(are_anagrams("listen", "silent"))  # True
    print(are_anagrams("hello", "world"))    # False
    Python

    Group Anagrams

    def group_anagrams(strs):
        """
        Group strings that are anagrams
        Time: O(n * k log k) where n=len(strs), k=max string length
        Space: O(n * k)
        """
        anagram_groups = {}
    
        for s in strs:
            # Sort string as key
            key = ''.join(sorted(s))
            if key not in anagram_groups:
                anagram_groups[key] = []
            anagram_groups[key].append(s)
    
        return list(anagram_groups.values())
    
    # Test
    strs = ["eat", "tea", "tan", "ate", "nat", "bat"]
    print(group_anagrams(strs))
    # Output: [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
    Python

    Find All Anagrams in String

    def find_anagrams(s, p):
        """
        Find all starting indices of p's anagrams in s
        Time: O(n), Space: O(1) for fixed alphabet
        """
        if len(p) > len(s):
            return []
    
        p_count = [0] * 256
        s_count = [0] * 256
        result = []
    
        # Count frequency of p
        for char in p:
            p_count[ord(char)] += 1
    
        # Sliding window
        for i in range(len(s)):
            s_count[ord(s[i])] += 1
    
            # Remove character that's out of window
            if i >= len(p):
                s_count[ord(s[i - len(p)])] -= 1
    
            # Check if current window matches p
            if i >= len(p) - 1 and p_count == s_count:
                result.append(i - len(p) + 1)
    
        return result
    
    # Test
    s = "cbaebabacd"
    p = "abc"
    print(find_anagrams(s, p))  # [0, 6]
    Python

    String Compression

    Run Length Encoding

    def compress_string(s):
        """
        Compress string using run-length encoding
        Time: O(n), Space: O(n)
        """
        if not s:
            return ""
    
        compressed = []
        count = 1
    
        for i in range(1, len(s)):
            if s[i] == s[i-1]:
                count += 1
            else:
                compressed.append(s[i-1] + str(count))
                count = 1
    
        # Add last group
        compressed.append(s[-1] + str(count))
    
        result = ''.join(compressed)
        return result if len(result) < len(s) else s
    
    # Test
    print(compress_string("aabcccccaaa"))  # "a2b1c5a3"
    print(compress_string("abc"))          # "abc" (no compression)
    Python

    String Compression II (LeetCode 1531)

    def get_length_of_optimal_compression(s, k):
        """
        Find minimum length after removing k characters and compressing
        Time: O(n^2 * k), Space: O(n^2)
        """
        n = len(s)
    
        # dp[i][j] = min length for first i chars, removing j chars
        dp = [[float('inf')] * (k + 1) for _ in range(n + 1)]
        dp[0][0] = 0
    
        for i in range(1, n + 1):
            for j in range(min(i, k) + 1):
                # Case 1: Remove current character
                if j > 0:
                    dp[i][j] = dp[i-1][j-1]
    
                # Case 2: Keep current character
                count = 0
                for prev in range(i):
                    if s[prev] == s[i-1]:
                        count += 1
                    else:
                        count = 1
    
                    # Length needed for this run
                    run_length = 1
                    if count >= 10:
                        run_length = 3  # e.g., "a10"
                    elif count >= 2:
                        run_length = 2  # e.g., "a2"
                    # else run_length = 1 (just the character)
    
                    if prev + 1 >= run_length:
                        dp[i][j] = min(dp[i][j], dp[prev][j] + run_length)
    
        return dp[n][k]
    Python

    Trie (Prefix Tree)

    Trie Implementation

    class TrieNode:
        def __init__(self):
            self.children = {}
            self.is_end_of_word = False
            self.count = 0  # For frequency counting
    
    class Trie:
        def __init__(self):
            self.root = TrieNode()
    
        def insert(self, word):
            """Insert word into trie"""
            node = self.root
            for char in word:
                if char not in node.children:
                    node.children[char] = TrieNode()
                node = node.children[char]
                node.count += 1
            node.is_end_of_word = True
    
        def search(self, word):
            """Check if word exists in trie"""
            node = self.root
            for char in word:
                if char not in node.children:
                    return False
                node = node.children[char]
            return node.is_end_of_word
    
        def starts_with(self, prefix):
            """Check if any word starts with prefix"""
            node = self.root
            for char in prefix:
                if char not in node.children:
                    return False
                node = node.children[char]
            return True
    
        def get_words_with_prefix(self, prefix):
            """Get all words starting with prefix"""
            def dfs(node, current_word, result):
                if node.is_end_of_word:
                    result.append(current_word)
    
                for char, child in node.children.items():
                    dfs(child, current_word + char, result)
    
            node = self.root
            for char in prefix:
                if char not in node.children:
                    return []
                node = node.children[char]
    
            result = []
            dfs(node, prefix, result)
            return result
    
        def delete(self, word):
            """Delete word from trie"""
            def _delete(node, word, index):
                if index == len(word):
                    if not node.is_end_of_word:
                        return False
                    node.is_end_of_word = False
                    return len(node.children) == 0
    
                char = word[index]
                if char not in node.children:
                    return False
    
                should_delete = _delete(node.children[char], word, index + 1)
    
                if should_delete:
                    del node.children[char]
                    return len(node.children) == 0
                else:
                    return False
    
            return _delete(self.root, word, 0)
    
    # Example usage
    trie = Trie()
    words = ["apple", "app", "application", "bat", "ball"]
    
    for word in words:
        trie.insert(word)
    
    print(trie.search("app"))        # True
    print(trie.search("appl"))       # False
    print(trie.starts_with("app"))   # True
    print(trie.get_words_with_prefix("app"))  # ["apple", "app", "application"]
    Python

    Applications of Trie

    # Auto-complete
    def autocomplete(trie, prefix):
        return trie.get_words_with_prefix(prefix)
    
    # Word frequency
    def word_frequency(trie, word):
        node = trie.root
        for char in word:
            if char not in node.children:
                return 0
            node = node.children[char]
        return node.count if node.is_end_of_word else 0
    
    # Longest common prefix
    def longest_common_prefix(words):
        if not words:
            return ""
    
        trie = Trie()
        for word in words:
            trie.insert(word)
    
        prefix = ""
        node = trie.root
    
        while len(node.children) == 1 and not node.is_end_of_word:
            char = next(iter(node.children.keys()))
            prefix += char
            node = node.children[char]
    
        return prefix
    
    # Test
    words = ["flower", "flow", "flight"]
    print(longest_common_prefix(words))  # "fl"
    Python

    Suffix Arrays and Trees

    Suffix Array Construction

    def build_suffix_array(s):
        """
        Build suffix array for string s
        Time: O(n log n), Space: O(n)
        """
        n = len(s)
        suffixes = [(s[i:], i) for i in range(n)]
        suffixes.sort()
        return [suffix[1] for suffix in suffixes]
    
    def build_suffix_array_efficient(s):
        """
        Efficient suffix array construction using radix sort
        Time: O(n log n), Space: O(n)
        """
        n = len(s)
        sa = list(range(n))
        rank = [ord(c) for c in s]
        tmp = [0] * n
    
        k = 1
        while k < n:
            # Sort by rank[i] and rank[i+k]
            sa.sort(key=lambda x: (rank[x], rank[x + k] if x + k < n else -1))
    
            # Update ranks
            tmp[sa[0]] = 0
            for i in range(1, n):
                prev = sa[i-1]
                curr = sa[i]
                if (rank[prev], rank[prev + k] if prev + k < n else -1) < \
                   (rank[curr], rank[curr + k] if curr + k < n else -1):
                    tmp[curr] = tmp[prev] + 1
                else:
                    tmp[curr] = tmp[prev]
    
            rank, tmp = tmp, rank
            k *= 2
    
        return sa
    
    # Example
    s = "banana"
    sa = build_suffix_array(s)
    print("Suffix Array:", sa)
    print("Sorted suffixes:")
    for i in sa:
        print(f"{i}: {s[i:]}")
    Python

    Pattern Matching with Suffix Array

    def search_pattern_suffix_array(s, pattern, sa):
        """
        Search pattern using suffix array
        Time: O(m log n) where m=len(pattern), n=len(s)
        """
        n, m = len(s), len(pattern)
        left, right = 0, n - 1
        result = []
    
        while left <= right:
            mid = (left + right) // 2
            suffix = s[sa[mid]:]
    
            if suffix.startswith(pattern):
                # Find all occurrences
                i = mid
                while i >= 0 and s[sa[i]:].startswith(pattern):
                    result.append(sa[i])
                    i -= 1
                i = mid + 1
                while i < n and s[sa[i]:].startswith(pattern):
                    result.append(sa[i])
                    i += 1
                return sorted(result)
            elif pattern < suffix:
                right = mid - 1
            else:
                left = mid + 1
    
        return []
    
    # Example
    s = "banana"
    pattern = "ana"
    sa = build_suffix_array(s)
    print(search_pattern_suffix_array(s, pattern, sa))  # [1, 3]
    Python

    Regular Expressions

    Basic Regex Patterns

    import re
    
    # Common patterns
    patterns = {
        'email': r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$',
        'phone': r'^\+?1?[-.\s]?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})$',
        'url': r'https?://(?:[-\w.])+(?:[:\d]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:\w*))*)?',
        'date': r'^\d{4}-\d{2}-\d{2}$',
        'ip_address': r'^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$'
    }
    
    def validate_pattern(text, pattern):
        return bool(re.match(pattern, text))
    
    # Test
    print(validate_pattern("user@example.com", patterns['email']))  # True
    print(validate_pattern("192.168.1.1", patterns['ip_address']))   # True
    Python

    String Matching with Regex

    def find_all_matches(text, pattern):
        """Find all matches of pattern in text"""
        return re.findall(pattern, text)
    
    def replace_pattern(text, pattern, replacement):
        """Replace all occurrences of pattern"""
        return re.sub(pattern, replacement, text)
    
    def split_by_pattern(text, pattern):
        """Split text by pattern"""
        return re.split(pattern, text)
    
    # Examples
    text = "The price is $100.50 and the discount is 20%."
    
    # Find all numbers
    numbers = find_all_matches(text, r'\d+\.?\d*')
    print("Numbers:", numbers)  # ['100.50', '20']
    
    # Replace currency symbols
    cleaned = replace_pattern(text, r'\$', 'USD ')
    print("Cleaned:", cleaned)
    
    # Extract words
    words = find_all_matches(text, r'\b\w+\b')
    print("Words:", words)
    Python

    Advanced Regex Techniques

    def extract_emails(text):
        """Extract all email addresses from text"""
        email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
        return re.findall(email_pattern, text)
    
    def validate_password(password):
        """
        Validate password strength
        Requirements: 8+ chars, uppercase, lowercase, digit, special char
        """
        if len(password) < 8:
            return False, "Password must be at least 8 characters"
    
        patterns = [
            (r'[A-Z]', "uppercase letter"),
            (r'[a-z]', "lowercase letter"),
            (r'\d', "digit"),
            (r'[!@#$%^&*(),.?":{}|<>]', "special character")
        ]
    
        for pattern, requirement in patterns:
            if not re.search(pattern, password):
                return False, f"Password must contain at least one {requirement}"
    
        return True, "Password is valid"
    
    # Test
    text = "Contact us at support@example.com or admin@company.org"
    emails = extract_emails(text)
    print("Emails:", emails)
    
    passwords = ["weak", "Strong123!", "weakpassword"]
    for pwd in passwords:
        valid, msg = validate_password(pwd)
        print(f"{pwd}: {msg}")
    Python

    Common String Interview Problems

    1. Valid Parentheses

    def is_valid_parentheses(s):
        """
        Check if parentheses are valid
        Time: O(n), Space: O(n)
        """
        stack = []
        mapping = {')': '(', '}': '{', ']': '['}
    
        for char in s:
            if char in mapping:
                top = stack.pop() if stack else '#'
                if mapping[char] != top:
                    return False
            else:
                stack.append(char)
    
        return not stack
    
    # Test
    print(is_valid_parentheses("()[]{}"))  # True
    print(is_valid_parentheses("(]"))      # False
    Python

    2. Longest Substring Without Repeating Characters

    def longest_substring_without_repeating(s):
        """
        Find length of longest substring without repeating characters
        Time: O(n), Space: O(min(n, 128))
        """
        n = len(s)
        max_length = 0
        char_index = {}
        left = 0
    
        for right in range(n):
            if s[right] in char_index:
                left = max(left, char_index[s[right]] + 1)
    
            char_index[s[right]] = right
            max_length = max(max_length, right - left + 1)
    
        return max_length
    
    # Test
    print(longest_substring_without_repeating("abcabcbb"))  # 3 ("abc")
    print(longest_substring_without_repeating("bbbbb"))     # 1 ("b")
    Python

    3. String to Integer (atoi)

    def my_atoi(s):
        """
        Convert string to integer
        Time: O(n), Space: O(1)
        """
        s = s.strip()
        if not s:
            return 0
    
        sign = 1
        if s[0] in ['-', '+']:
            sign = -1 if s[0] == '-' else 1
            s = s[1:]
    
        result = 0
        for char in s:
            if not char.isdigit():
                break
            result = result * 10 + int(char)
    
            # Handle overflow
            if sign * result > 2**31 - 1:
                return 2**31 - 1
            if sign * result < -2**31:
                return -2**31
    
        return sign * result
    
    # Test
    print(my_atoi("42"))        # 42
    print(my_atoi("   -42"))    # -42
    print(my_atoi("4193 with words"))  # 4193
    Python

    4. Group Shifted Strings

    def group_shifted_strings(strings):
        """
        Group strings that are shifted versions of each other
        Time: O(n*k) where n=len(strings), k=max string length
        """
        def get_shift_key(s):
            if not s:
                return ""
            # Use first character as reference
            shift = ord(s[0])
            key = []
            for char in s:
                # Normalize shift relative to first character
                diff = (ord(char) - shift) % 26
                key.append(str(diff))
            return ','.join(key)
    
        groups = {}
        for s in strings:
            key = get_shift_key(s)
            if key not in groups:
                groups[key] = []
            groups[key].append(s)
    
        return list(groups.values())
    
    # Test
    strings = ["abc", "bcd", "acef", "xyz", "az", "ba", "a", "z"]
    print(group_shifted_strings(strings))
    # Output: [['abc', 'bcd', 'xyz'], ['acef'], ['az', 'ba'], ['a', 'z']]
    Python

    5. Minimum Window Substring

    def min_window(s, t):
        """
        Find minimum window in s that contains all characters in t
        Time: O(n), Space: O(k) where k is size of character set
        """
        if not t or not s:
            return ""
    
        # Count characters in t
        target_count = {}
        for char in t:
            target_count[char] = target_count.get(char, 0) + 1
    
        required = len(target_count)
        formed = 0
    
        # Window counts
        window_count = {}
        left = 0
        min_length = float('inf')
        min_window_start = 0
    
        for right in range(len(s)):
            char = s[right]
            window_count[char] = window_count.get(char, 0) + 1
    
            if char in target_count and window_count[char] == target_count[char]:
                formed += 1
    
            # Shrink window from left
            while left <= right and formed == required:
                char = s[left]
    
                # Update minimum window
                if right - left + 1 < min_length:
                    min_length = right - left + 1
                    min_window_start = left
    
                window_count[char] -= 1
                if char in target_count and window_count[char] < target_count[char]:
                    formed -= 1
    
                left += 1
    
        return "" if min_length == float('inf') else s[min_window_start:min_window_start + min_length]
    
    # Test
    s = "ADOBECODEBANC"
    t = "ABC"
    print(min_window(s, t))  # "BANC"
    Python

    Advanced String Algorithms

    Manacher’s Algorithm for Palindromes

    def manacher(s):
        """
        Find longest palindromic substring using Manacher's algorithm
        Time: O(n), Space: O(n)
        """
        # Transform string to handle even-length palindromes
        t = '^#' + '#'.join(s) + '#$'
        n = len(t)
        p = [0] * n  # p[i] = length of palindrome centered at i
        c = r = 0    # center and right boundary
    
        for i in range(1, n-1):
            i_mirror = 2*c - i  # mirror of i around c
    
            if r > i:
                p[i] = min(r - i, p[i_mirror])
    
            # Expand around center i
            while t[i + p[i] + 1] == t[i - p[i] - 1]:
                p[i] += 1
    
            # Update center and right boundary
            if i + p[i] > r:
                c = i
                r = i + p[i]
    
        # Find maximum palindrome
        max_len = max(p)
        center_index = p.index(max_len)
        start = (center_index - max_len) // 2
    
        return s[start:start + max_len]
    
    # Test
    print(manacher("babad"))  # "bab" or "aba"
    print(manacher("cbbd"))   # "bb"
    Python

    Z-Algorithm for Pattern Matching

    def z_algorithm(s):
        """
        Compute Z-array for string s
        Z[i] = length of longest substring starting at i that matches prefix
        Time: O(n), Space: O(n)
        """
        n = len(s)
        z = [0] * n
        l = r = 0  # [l, r] is the rightmost segment match
    
        for i in range(1, n):
            if i < r:
                # i is within current segment
                k = i - l
                if z[k] < r - i + 1:
                    z[i] = z[k]
                else:
                    z[i] = r - i + 1
                    while i + z[i] < n and s[z[i]] == s[i + z[i]]:
                        z[i] += 1
            else:
                # i is outside current segment
                z[i] = 0
                while i + z[i] < n and s[z[i]] == s[i + z[i]]:
                    z[i] += 1
    
            # Update segment
            if i + z[i] - 1 > r:
                l = i
                r = i + z[i] - 1
    
        return z
    
    def search_pattern_z(s, pattern):
        """
        Search pattern in s using Z-algorithm
        """
        concat = pattern + '$' + s
        z = z_algorithm(concat)
        result = []
    
        for i in range(len(pattern) + 1, len(concat)):
            if z[i] == len(pattern):
                result.append(i - len(pattern) - 1)
    
        return result
    
    # Test
    s = "AABAACAADAABAABA"
    pattern = "AABA"
    print(search_pattern_z(s, pattern))  # [0, 9, 12]
    Python

    Aho-Corasick Algorithm

    class AhoCorasickNode:
        def __init__(self):
            self.children = {}
            self.fail = None
            self.output = []
    
    class AhoCorasick:
        def __init__(self, patterns):
            self.root = AhoCorasickNode()
            self.build_trie(patterns)
            self.build_failure_links()
    
        def build_trie(self, patterns):
            for pattern in patterns:
                node = self.root
                for char in pattern:
                    if char not in node.children:
                        node.children[char] = AhoCorasickNode()
                    node = node.children[char]
                node.output.append(pattern)
    
        def build_failure_links(self):
            from collections import deque
    
            queue = deque()
            for node in self.root.children.values():
                node.fail = self.root
                queue.append(node)
    
            while queue:
                current = queue.popleft()
    
                for char, child in current.children.items():
                    queue.append(child)
    
                    # Find failure link
                    fail = current.fail
                    while fail != self.root and char not in fail.children:
                        fail = fail.fail
                    if char in fail.children:
                        fail = fail.children[char]
                    else:
                        fail = self.root
    
                    child.fail = fail
                    child.output.extend(fail.output)
    
        def search(self, text):
            """Find all pattern occurrences in text"""
            node = self.root
            results = []
    
            for i, char in enumerate(text):
                while node != self.root and char not in node.children:
                    node = node.fail
                if char in node.children:
                    node = node.children[char]
                else:
                    node = self.root
    
                if node.output:
                    for pattern in node.output:
                        results.append((i - len(pattern) + 1, pattern))
    
            return results
    
    # Test
    patterns = ["he", "she", "his", "hers"]
    ac = AhoCorasick(patterns)
    text = "ushers"
    print(ac.search(text))  # [(1, 'he'), (1, 'she'), (4, 'he')]
    Python

    Time and Space Complexity

    String Algorithm Complexities

    AlgorithmTime ComplexitySpace ComplexityNotes
    Naive SearchO((n-m+1)×m)O(1)Simple, works for small inputs
    KMPO(n+m)O(m)Best for multiple searches
    Rabin-KarpO(n+m) avg, O((n-m+1)×m) worstO(1)Good for multiple patterns
    Boyer-MooreO(n/m) best, O(n×m) worstO(σ)Fast in practice
    Z-AlgorithmO(n)O(n)Linear time pattern matching
    Aho-CorasickO(n+m)O(m×σ)Multiple pattern search
    Trie OperationsO(m)O(m×σ)m = key length, σ = alphabet size
    Suffix ArrayO(n log n)O(n)For advanced string problems
    ManacherO(n)O(n)Fast palindrome finding

    Python String Operation Complexities

    OperationTime ComplexityNotes
    len(s)O(1)Pre-computed
    s[i]O(1)Direct access
    s[i:j]O(j-i)Creates new string
    s + tO(n+m)Creates new string
    s.find(t)O(n×m)Naive search
    s.replace(a,b)O(n)Creates new string
    s.split()O(n)Creates list
    ”.join(list)O(n)Efficient concatenation

    Best Practices and Tips

    1. String Immutability Awareness

    # Bad: Repeated concatenation in loop
    result = ""
    for i in range(1000):
        result += str(i)  # Creates new string each time
    
    # Good: Use list and join
    parts = []
    for i in range(1000):
        parts.append(str(i))
    result = ''.join(parts)  # Single string creation
    Python

    2. Efficient String Building

    # Use io.StringIO for large string building
    import io
    
    output = io.StringIO()
    for line in large_data:
        output.write(line)
        output.write('\n')
    result = output.getvalue()
    Python

    3. Character Set Considerations

    # For ASCII-only strings
    char_count = [0] * 128
    
    # For Unicode strings
    from collections import Counter
    char_count = Counter(s)
    Python

    4. Memory-Efficient String Processing

    # Process large files line by line
    def process_large_file(filename):
        with open(filename, 'r') as f:
            for line in f:
                # Process one line at a time
                process_line(line.strip())
    Python

    5. Regular Expression Optimization

    import re
    
    # Compile regex for reuse
    email_pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
    
    def is_valid_email(email):
        return bool(email_pattern.match(email))
    Python

    6. String Comparison Best Practices

    # Use == for content comparison
    s1 = "hello"
    s2 = "hello"
    print(s1 == s2)  # True
    
    # Use 'is' for identity comparison (rarely needed for strings)
    print(s1 is s2)  # May be True due to interning
    
    # For case-insensitive comparison
    print(s1.lower() == s2.lower())
    
    # For lexicographical comparison
    print(s1 < s2)  # Alphabetical order
    Python

    7. Encoding and Unicode Handling

    # Handle different encodings
    text = "Hello 世界"
    utf8_bytes = text.encode('utf-8')
    decoded = utf8_bytes.decode('utf-8')
    
    # Normalize Unicode
    import unicodedata
    normalized = unicodedata.normalize('NFC', text)
    Python

    Interview Tips

    1. Clarify Constraints: Ask about string length, character set, case sensitivity
    2. Edge Cases: Empty strings, single characters, all same characters
    3. Time vs Space: Discuss trade-offs between different approaches
    4. Built-in Functions: Know when to use Python’s string methods vs custom implementation
    5. Pattern Recognition: Identify common patterns (palindromes, anagrams, substrings)
    6. Optimization: Start with brute force, then optimize

    Common Pitfalls

    1. Index Errors: Always check bounds before accessing s[i]
    2. Off-by-One Errors: Careful with loop bounds in string algorithms
    3. Character Encoding: Handle Unicode properly
    4. Memory Usage: Be aware of string immutability costs
    5. Case Sensitivity: Consider case-insensitive requirements


    Discover more from Altgr Blog

    Subscribe to get the latest posts sent to your email.

    Leave a Reply

    Your email address will not be published. Required fields are marked *