Python Regular Expressions

Introduction to Regular Expressions
- Why Learn Regex?
- Common Use Cases
- Regex Learning Roadmap
Getting Started with Python’s re Module
- Setting Up the Environment
- First Regex Example
- Raw Strings and Escape Characters
Basic Regex Patterns
- Literal Character Matching
- Case Sensitivity Options
- Basic Pattern Exercises
Character Classes and Special Characters
- Built-in Character Classes
- Custom Character Classes
- Character Class Combinations
Quantifiers
- Basic Quantifiers
- Greedy vs Non-Greedy Matching
- Quantifier Practice Examples
Groups and Capturing
- Basic Grouping
- Named Groups
- Non-Capturing Groups
- Backreferences
Anchors and Boundaries
- Start and End Anchors
- Word Boundaries
- Multiline Mode
Advanced Patterns
- Lookahead and Lookbehind Assertions
- Alternation and Conditional Matching
- Recursive Patterns
Regex Methods in Python
- Core Methods Comparison
- Compiled Patterns
- Working with Match Objects
Interactive Examples and Debugging
- Step-by-Step Pattern Building
- Debugging Tools and Techniques
- Common Error Messages
Real-World Applications
- Data Validation Patterns
- Text Processing and Extraction
- Log Analysis and Parsing
- Web Scraping Applications
Performance and Optimization
- Pattern Compilation Best Practices
- Avoiding Catastrophic Backtracking
- Memory and Speed Optimization
Common Pitfalls and Best Practices
- Frequent Mistakes
- Testing Strategies
- Code Organization
Comprehensive Regex Cheat Sheet
- Quick Reference Patterns
- Method Comparison Table
- Flag Options
Practice Exercises
- Beginner Exercises
- Intermediate Challenges
- Advanced Problems

1. Introduction to Regular Expressions

Regular expressions (regex) are powerful pattern-matching tools used to find, match, and manipulate text. They provide a concise way to describe complex string patterns.

graph LR
    A[Text Input] --> B[Regex Pattern]
    B --> C{Match?}
    C -->|Yes| D[Extract/Replace/Validate]
    C -->|No| E[No Action]

Why Learn Regex?

Text Processing: Extract specific information from large text files
Data Validation: Validate email addresses, phone numbers, etc.
Data Cleaning: Remove unwanted characters or format data
Log Analysis: Parse and analyze log files
Web Scraping: Extract specific data from HTML

graph TB
    A[Regular Expressions] --> B[Pattern Matching]
    A --> C[Text Manipulation]
    A --> D[Data Validation]

    B --> B1[Find specific text]
    B --> B2[Extract information]
    B --> B3[Search & Replace]

    C --> C1[Clean data]
    C --> C2[Transform format]
    C --> C3[Split strings]

    D --> D1[Email validation]
    D --> D2[Phone numbers]
    D --> D3[Input sanitization]

Common Use Cases

# Example scenarios where regex excels
examples = {
    "Email extraction": "Extract all emails from a text file",
    "Phone formatting": "Convert (555) 123-4567 to 555-123-4567",
    "Data cleaning": "Remove extra whitespace and special characters",
    "Log parsing": "Extract timestamps and error codes from logs",
    "URL validation": "Check if a string is a valid web address"
}

# Example scenarios where regex excels
examples = {
    "Email extraction": "Extract all emails from a text file",
    "Phone formatting": "Convert (555) 123-4567 to 555-123-4567",
    "Data cleaning": "Remove extra whitespace and special characters",
    "Log parsing": "Extract timestamps and error codes from logs",
    "URL validation": "Check if a string is a valid web address"
}

Python

Regex Learning Roadmap

graph TD
    A[Start Here] --> B[Basic Patterns]
    B --> C[Character Classes]
    C --> D[Quantifiers]
    D --> E[Groups & Capturing]
    E --> F[Anchors & Boundaries]
    F --> G[Advanced Features]
    G --> H[Real-world Applications]
    H --> I[Performance & Optimization]

    style A fill:#e1f5fe
    style I fill:#c8e6c9

2. Getting Started with Python’s `re` Module

Python’s built-in re module provides regex functionality.

import re

# Basic pattern matching
pattern = r"hello"
text = "hello world"
match = re.search(pattern, text)
print(match.group() if match else "No match")  # Output: hello

import re

# Basic pattern matching
pattern = r"hello"
text = "hello world"
match = re.search(pattern, text)
print(match.group() if match else "No match")  # Output: hello

Python

Raw Strings

Always use raw strings (r"") for regex patterns to avoid escaping issues:

# Good practice
pattern = r"\d+\.\d+"  # Matches decimal numbers

# Avoid this
pattern = "\\d+\\.\\d+"  # Same pattern but harder to read

# Good practice
pattern = r"\d+\.\d+"  # Matches decimal numbers

# Avoid this
pattern = "\\d+\\.\\d+"  # Same pattern but harder to read

Python

flowchart TD
    A[Regex Pattern] --> B{Raw String?}
    B -->|Yes r""| C[Clean, Readable Pattern]
    B -->|No ""| D[Escaped Characters \\\\]
    C --> E[Easy to Debug]
    D --> F[Hard to Read/Maintain]

3. Basic Regex Patterns

Literal Characters

Match exact characters:

import re

pattern = r"cat"
text = "The cat sat on the mat"
matches = re.findall(pattern, text)
print(matches)  # Output: ['cat']

import re

pattern = r"cat"
text = "The cat sat on the mat"
matches = re.findall(pattern, text)
print(matches)  # Output: ['cat']

Python

Case Sensitivity

# Case sensitive (default)
pattern = r"Cat"
text = "cat and Cat"
matches = re.findall(pattern, text)
print(matches)  # Output: ['Cat']

# Case insensitive
pattern = r"Cat"
text = "cat and Cat"
matches = re.findall(pattern, text, re.IGNORECASE)
print(matches)  # Output: ['cat', 'Cat']

# Case sensitive (default)
pattern = r"Cat"
text = "cat and Cat"
matches = re.findall(pattern, text)
print(matches)  # Output: ['Cat']

# Case insensitive
pattern = r"Cat"
text = "cat and Cat"
matches = re.findall(pattern, text, re.IGNORECASE)
print(matches)  # Output: ['cat', 'Cat']

Python

graph TB
    A["Input Text: 'cat and Cat'"] --> B["Pattern: 'Cat'"]
    B --> C{Case Sensitive?}
    C -->|Yes| D["Matches: ['Cat']"]
    C -->|No re.IGNORECASE| E["Matches: ['cat', 'Cat']"]

4. Character Classes and Special Characters

Basic Character Classes

import re

# \d - digits (0-9)
pattern = r"\d+"
text = "I have 25 apples and 10 oranges"
matches = re.findall(pattern, text)
print(matches)  # Output: ['25', '10']

# \w - word characters (letters, digits, underscore)
pattern = r"\w+"
text = "hello_world 123!"
matches = re.findall(pattern, text)
print(matches)  # Output: ['hello_world', '123']

# \s - whitespace characters
pattern = r"\s+"
text = "hello   world\t\n"
matches = re.findall(pattern, text)
print(matches)  # Output: ['   ', '\t\n']

import re

# \d - digits (0-9)
pattern = r"\d+"
text = "I have 25 apples and 10 oranges"
matches = re.findall(pattern, text)
print(matches)  # Output: ['25', '10']

# \w - word characters (letters, digits, underscore)
pattern = r"\w+"
text = "hello_world 123!"
matches = re.findall(pattern, text)
print(matches)  # Output: ['hello_world', '123']

# \s - whitespace characters
pattern = r"\s+"
text = "hello   world\t\n"
matches = re.findall(pattern, text)
print(matches)  # Output: ['   ', '\t\n']

Python

Character Class Summary

Pattern	Description	Example Match
`\d`	Digit (0-9)	`5`, `42`
`\D`	Non-digit	`a`, `@`
`\w`	Word character	`a`, `Z`, `_`, `5`
`\W`	Non-word character	`@`, `!`,
`\s`	Whitespace	, `\t`, `\n`
`\S`	Non-whitespace	`a`, `1`, `@`
`.`	Any character (except newline)	`a`, `1`, `@`

graph TD
    A[Character Classes] --> B["\d Digits"]
    A --> C["\w Word Chars"]
    A --> D["\s Whitespace"]
    A --> E[". Any Char"]
    A --> F[Custom Classes]

    B --> B1["0-9 (Numbers)"]
    C --> C1["a-z, A-Z, 0-9, _ (Alphanumeric + underscore)"]
    D --> D1["Space, Tab, Newline"]
    E --> E1["Everything except newline"]
    F --> F1["[abc] - specific chars"]
    F --> F2["[a-z] - ranges"]
    F --> F3["[^abc] - negation"]

    style A fill:#e3f2fd
    style B fill:#fff3e0
    style C fill:#f3e5f5
    style D fill:#e8f5e8
    style E fill:#ffebee
    style F fill:#fce4ec

Custom Character Classes

# [abc] - matches a, b, or c
pattern = r"[aeiou]"
text = "hello world"
matches = re.findall(pattern, text)
print(matches)  # Output: ['e', 'o', 'o']

# [a-z] - matches lowercase letters
pattern = r"[a-z]+"
text = "Hello World 123"
matches = re.findall(pattern, text)
print(matches)  # Output: ['ello', 'orld']

# [^abc] - matches anything except a, b, or c
pattern = r"[^aeiou]+"
text = "hello world"
matches = re.findall(pattern, text)
print(matches)  # Output: ['h', 'll', ' w', 'rld']

# [abc] - matches a, b, or c
pattern = r"[aeiou]"
text = "hello world"
matches = re.findall(pattern, text)
print(matches)  # Output: ['e', 'o', 'o']

# [a-z] - matches lowercase letters
pattern = r"[a-z]+"
text = "Hello World 123"
matches = re.findall(pattern, text)
print(matches)  # Output: ['ello', 'orld']

# [^abc] - matches anything except a, b, or c
pattern = r"[^aeiou]+"
text = "hello world"
matches = re.findall(pattern, text)
print(matches)  # Output: ['h', 'll', ' w', 'rld']

Python

5. Quantifiers {#quantifiers}

Quantifiers specify how many times a pattern should match.

import re

# * - zero or more
pattern = r"ab*c"
texts = ["ac", "abc", "abbc", "abbbc"]
for text in texts:
    match = re.search(pattern, text)
    print(f"{text}: {bool(match)}")
# Output: ac: True, abc: True, abbc: True, abbbc: True

# + - one or more
pattern = r"ab+c"
texts = ["ac", "abc", "abbc"]
for text in texts:
    match = re.search(pattern, text)
    print(f"{text}: {bool(match)}")
# Output: ac: False, abc: True, abbc: True

# ? - zero or one
pattern = r"colou?r"
texts = ["color", "colour"]
for text in texts:
    match = re.search(pattern, text)
    print(f"{text}: {bool(match)}")
# Output: color: True, colour: True

# {n} - exactly n times
pattern = r"\d{3}"
text = "Call 123-456-7890"
matches = re.findall(pattern, text)
print(matches)  # Output: ['123', '456', '789']

# {n,m} - between n and m times
pattern = r"\d{2,4}"
text = "1 22 333 4444 55555"
matches = re.findall(pattern, text)
print(matches)  # Output: ['22', '333', '4444', '5555']

import re

# * - zero or more
pattern = r"ab*c"
texts = ["ac", "abc", "abbc", "abbbc"]
for text in texts:
    match = re.search(pattern, text)
    print(f"{text}: {bool(match)}")
# Output: ac: True, abc: True, abbc: True, abbbc: True

# + - one or more
pattern = r"ab+c"
texts = ["ac", "abc", "abbc"]
for text in texts:
    match = re.search(pattern, text)
    print(f"{text}: {bool(match)}")
# Output: ac: False, abc: True, abbc: True

# ? - zero or one
pattern = r"colou?r"
texts = ["color", "colour"]
for text in texts:
    match = re.search(pattern, text)
    print(f"{text}: {bool(match)}")
# Output: color: True, colour: True

# {n} - exactly n times
pattern = r"\d{3}"
text = "Call 123-456-7890"
matches = re.findall(pattern, text)
print(matches)  # Output: ['123', '456', '789']

# {n,m} - between n and m times
pattern = r"\d{2,4}"
text = "1 22 333 4444 55555"
matches = re.findall(pattern, text)
print(matches)  # Output: ['22', '333', '4444', '5555']

Python

Quantifier Summary

Quantifier	Description	Example
`*`	0 or more	`ab*` matches `a`, `ab`, `abb`
`+`	1 or more	`ab+` matches `ab`, `abb` (not `a`)
`?`	0 or 1	`ab?` matches `a`, `ab`
`{n}`	Exactly n	`a{3}` matches `aaa`
`{n,}`	n or more	`a{2,}` matches `aa`, `aaa`, `aaaa`
`{n,m}`	Between n and m	`a{2,4}` matches `aa`, `aaa`, `aaaa`

graph LR
    A[Quantifiers] --> B["Greedy: *, +, ?"]
    A --> C["Exact: {n}"]
    A --> D["Range: {n,m}"]

    B --> B1[Match as much as possible]
    C --> C1[Match exact count]
    D --> D1[Match within range]

Greedy vs Non-Greedy

import re

text = "<tag>content</tag>"

# Greedy matching (default)
pattern = r"<.*>"
match = re.search(pattern, text)
print(match.group())  # Output: <tag>content</tag>

# Non-greedy matching
pattern = r"<.*?>"
matches = re.findall(pattern, text)
print(matches)  # Output: ['<tag>', '</tag>']

import re

text = "<tag>content</tag>"

# Greedy matching (default)
pattern = r"<.*>"
match = re.search(pattern, text)
print(match.group())  # Output: <tag>content</tag>

# Non-greedy matching
pattern = r"<.*?>"
matches = re.findall(pattern, text)
print(matches)  # Output: ['<tag>', '</tag>']

Python

6. Groups and Capturing

Groups allow you to capture parts of a match and apply quantifiers to multiple characters.

import re

# Basic grouping
pattern = r"(\d{3})-(\d{3})-(\d{4})"
text = "Call me at 123-456-7890"
match = re.search(pattern, text)
if match:
    print(f"Full match: {match.group(0)}")  # 123-456-7890
    print(f"Area code: {match.group(1)}")   # 123
    print(f"Exchange: {match.group(2)}")    # 456
    print(f"Number: {match.group(3)}")      # 7890
    print(f"All groups: {match.groups()}")  # ('123', '456', '7890')

import re

# Basic grouping
pattern = r"(\d{3})-(\d{3})-(\d{4})"
text = "Call me at 123-456-7890"
match = re.search(pattern, text)
if match:
    print(f"Full match: {match.group(0)}")  # 123-456-7890
    print(f"Area code: {match.group(1)}")   # 123
    print(f"Exchange: {match.group(2)}")    # 456
    print(f"Number: {match.group(3)}")      # 7890
    print(f"All groups: {match.groups()}")  # ('123', '456', '7890')

Python

Named Groups

pattern = r"(?P<area>\d{3})-(?P<exchange>\d{3})-(?P<number>\d{4})"
text = "Call me at 123-456-7890"
match = re.search(pattern, text)
if match:
    print(f"Area: {match.group('area')}")        # 123
    print(f"Exchange: {match.group('exchange')}")# 456
    print(f"Number: {match.group('number')}")    # 7890
    print(f"Dict: {match.groupdict()}")          # {'area': '123', 'exchange': '456', 'number': '7890'}

pattern = r"(?P<area>\d{3})-(?P<exchange>\d{3})-(?P<number>\d{4})"
text = "Call me at 123-456-7890"
match = re.search(pattern, text)
if match:
    print(f"Area: {match.group('area')}")        # 123
    print(f"Exchange: {match.group('exchange')}")# 456
    print(f"Number: {match.group('number')}")    # 7890
    print(f"Dict: {match.groupdict()}")          # {'area': '123', 'exchange': '456', 'number': '7890'}

Python

Non-Capturing Groups

# (?:...) - non-capturing group
pattern = r"(?:Mr|Mrs|Ms)\. (\w+)"
text = "Hello Mr. Smith and Mrs. Johnson"
matches = re.findall(pattern, text)
print(matches)  # Output: ['Smith', 'Johnson']

# (?:...) - non-capturing group
pattern = r"(?:Mr|Mrs|Ms)\. (\w+)"
text = "Hello Mr. Smith and Mrs. Johnson"
matches = re.findall(pattern, text)
print(matches)  # Output: ['Smith', 'Johnson']

Python

graph TD
    A[Groups in Regex] --> B["Capturing Groups </br> (...)"]
    A --> C["Named Groups </br> (?P(name)...)"]
    A --> D["Non-Capturing Groups </br> (?:...)"]

    B --> B1[Accessible by number]
    C --> C1[Accessible by name]
    D --> D1[Not captured, just grouped]

7. Anchors and Boundaries

Anchors specify where in the string a match should occur.

import re

text = "The cat in the hat"

# ^ - start of string
pattern = r"^The"
match = re.search(pattern, text)
print(bool(match))  # True

pattern = r"^cat"
match = re.search(pattern, text)
print(bool(match))  # False

# $ - end of string
pattern = r"hat$"
match = re.search(pattern, text)
print(bool(match))  # True

# \b - word boundary
pattern = r"\bcat\b"
texts = ["cat", "catch", "scattered", "the cat runs"]
for t in texts:
    match = re.search(pattern, t)
    print(f"'{t}': {bool(match)}")
# Output: 'cat': True, 'catch': False, 'scattered': False, 'the cat runs': True

import re

text = "The cat in the hat"

# ^ - start of string
pattern = r"^The"
match = re.search(pattern, text)
print(bool(match))  # True

pattern = r"^cat"
match = re.search(pattern, text)
print(bool(match))  # False

# $ - end of string
pattern = r"hat$"
match = re.search(pattern, text)
print(bool(match))  # True

# \b - word boundary
pattern = r"\bcat\b"
texts = ["cat", "catch", "scattered", "the cat runs"]
for t in texts:
    match = re.search(pattern, t)
    print(f"'{t}': {bool(match)}")
# Output: 'cat': True, 'catch': False, 'scattered': False, 'the cat runs': True

Python

Anchor Summary

Anchor	Description	Example
`^`	Start of string	`^Hello` matches “Hello world”
`$`	End of string	`world$` matches “Hello world”
`\b`	Word boundary	`\bcat\b` matches “cat” but not “catch”
`\B`	Non-word boundary	`\Bcat\B` matches “scattered”

graph LR
    A["Text: 'The cat in the hat'"] --> B[Anchors]
    B --> C["^ Start"]
    B --> D["$ End"]
    B --> E["\b Word Boundary"]

    C --> C1["^The ✓"]
    D --> D1["hat$ ✓"]
    E --> E1["\bcat\b ✓"]

8. Advanced Patterns

Lookahead and Lookbehind

import re

# Positive lookahead (?=...)
pattern = r"\d+(?= dollars)"
text = "I have 50 dollars and 25 cents"
matches = re.findall(pattern, text)
print(matches)  # Output: ['50']

# Negative lookahead (?!...)
pattern = r"\d+(?! dollars)"
text = "I have 50 dollars and 25 cents"
matches = re.findall(pattern, text)
print(matches)  # Output: ['5', '0', '2', '5']

# Positive lookbehind (?<=...)
pattern = r"(?<=\$)\d+"
text = "Price: $25.99"
matches = re.findall(pattern, text)
print(matches)  # Output: ['25']

# Negative lookbehind (?<!...)
pattern = r"(?<!\$)\d+"
text = "Price: $25.99 and 10 items"
matches = re.findall(pattern, text)
print(matches)  # Output: ['5', '99', '10']

import re

# Positive lookahead (?=...)
pattern = r"\d+(?= dollars)"
text = "I have 50 dollars and 25 cents"
matches = re.findall(pattern, text)
print(matches)  # Output: ['50']

# Negative lookahead (?!...)
pattern = r"\d+(?! dollars)"
text = "I have 50 dollars and 25 cents"
matches = re.findall(pattern, text)
print(matches)  # Output: ['5', '0', '2', '5']

# Positive lookbehind (?<=...)
pattern = r"(?<=\$)\d+"
text = "Price: $25.99"
matches = re.findall(pattern, text)
print(matches)  # Output: ['25']

# Negative lookbehind (?<!...)
pattern = r"(?<!\$)\d+"
text = "Price: $25.99 and 10 items"
matches = re.findall(pattern, text)
print(matches)  # Output: ['5', '99', '10']

Python

Alternation

# | - OR operator
pattern = r"cat|dog|bird"
text = "I have a cat and a dog"
matches = re.findall(pattern, text)
print(matches)  # Output: ['cat', 'dog']

# Grouped alternation
pattern = r"(Mr|Mrs|Ms)\. (\w+)"
text = "Mr. Smith and Ms. Johnson"
matches = re.findall(pattern, text)
print(matches)  # Output: [('Mr', 'Smith'), ('Ms', 'Johnson')]

# | - OR operator
pattern = r"cat|dog|bird"
text = "I have a cat and a dog"
matches = re.findall(pattern, text)
print(matches)  # Output: ['cat', 'dog']

# Grouped alternation
pattern = r"(Mr|Mrs|Ms)\. (\w+)"
text = "Mr. Smith and Ms. Johnson"
matches = re.findall(pattern, text)
print(matches)  # Output: [('Mr', 'Smith'), ('Ms', 'Johnson')]

Python

graph TD
    A[Advanced Patterns] --> B["Lookahead (?=...)"]
    A --> C["Lookbehind (?<=...)"]
    A --> D["Alternation |"]

    B --> B1["Match if followed by..."]
    C --> C1["Match if preceded by..."]
    D --> D1[Match this OR that]

9. Regex Methods in Python

Essential `re` Module Functions

import re

text = "The price is $25.99 and $15.50"
pattern = r"\$(\d+\.\d+)"

# re.search() - finds first match
match = re.search(pattern, text)
if match:
    print(f"First price: {match.group(1)}")  # 25.99

# re.findall() - finds all matches
matches = re.findall(pattern, text)
print(f"All prices: {matches}")  # ['25.99', '15.50']

# re.finditer() - returns match objects
for match in re.finditer(pattern, text):
    print(f"Price: ${match.group(1)} at position {match.start()}-{match.end()}")

# re.sub() - substitute matches
new_text = re.sub(pattern, r"$XXX.XX", text)
print(new_text)  # The price is $XXX.XX and $XXX.XX

# re.split() - split by pattern
text = "apple,banana;orange:grape"
fruits = re.split(r"[,;:]", text)
print(fruits)  # ['apple', 'banana', 'orange', 'grape']

import re

text = "The price is $25.99 and $15.50"
pattern = r"\$(\d+\.\d+)"

# re.search() - finds first match
match = re.search(pattern, text)
if match:
    print(f"First price: {match.group(1)}")  # 25.99

# re.findall() - finds all matches
matches = re.findall(pattern, text)
print(f"All prices: {matches}")  # ['25.99', '15.50']

# re.finditer() - returns match objects
for match in re.finditer(pattern, text):
    print(f"Price: ${match.group(1)} at position {match.start()}-{match.end()}")

# re.sub() - substitute matches
new_text = re.sub(pattern, r"$XXX.XX", text)
print(new_text)  # The price is $XXX.XX and $XXX.XX

# re.split() - split by pattern
text = "apple,banana;orange:grape"
fruits = re.split(r"[,;:]", text)
print(fruits)  # ['apple', 'banana', 'orange', 'grape']

Python

Compiled Patterns

# Compile pattern for reuse (more efficient)
pattern = re.compile(r"\$(\d+\.\d+)")
text = "Price: $25.99"

match = pattern.search(text)
matches = pattern.findall(text)
new_text = pattern.sub("$XX.XX", text)

# Compile pattern for reuse (more efficient)
pattern = re.compile(r"\$(\d+\.\d+)")
text = "Price: $25.99"

match = pattern.search(text)
matches = pattern.findall(text)
new_text = pattern.sub("$XX.XX", text)

Python

flowchart TD
    A[re Module Methods] --> B[re.search]
    A --> C[re.findall]
    A --> D[re.finditer]
    A --> E[re.sub]
    A --> F[re.split]
    A --> G[re.compile]

    B --> B1[First match object]
    C --> C1[List of all matches]
    D --> D1[Iterator of match objects]
    E --> E1[Replace matches]
    F --> F1[Split by pattern]
    G --> G1[Compiled pattern object]

10. Interactive Examples and Debugging {#interactive-debugging}

Step-by-Step Pattern Building

Let’s build a complex email validation pattern step by step:

import re

# Step 1: Start simple - match any characters before @
pattern1 = r".+@"
test_email = "user@example.com"
print(f"Step 1: {bool(re.search(pattern1, test_email))}")  # True

# Step 2: Be more specific about allowed characters before @
pattern2 = r"[a-zA-Z0-9._%+-]+@"
print(f"Step 2: {bool(re.search(pattern2, test_email))}")  # True

# Step 3: Add domain part
pattern3 = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+"
print(f"Step 3: {bool(re.search(pattern3, test_email))}")  # True

# Step 4: Ensure domain has extension
pattern4 = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]+"
print(f"Step 4: {bool(re.search(pattern4, test_email))}")  # True

# Step 5: Add anchors for exact matching
pattern5 = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
print(f"Step 5 (final): {bool(re.match(pattern5, test_email))}")  # True

import re

# Step 1: Start simple - match any characters before @
pattern1 = r".+@"
test_email = "user@example.com"
print(f"Step 1: {bool(re.search(pattern1, test_email))}")  # True

# Step 2: Be more specific about allowed characters before @
pattern2 = r"[a-zA-Z0-9._%+-]+@"
print(f"Step 2: {bool(re.search(pattern2, test_email))}")  # True

# Step 3: Add domain part
pattern3 = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+"
print(f"Step 3: {bool(re.search(pattern3, test_email))}")  # True

# Step 4: Ensure domain has extension
pattern4 = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]+"
print(f"Step 4: {bool(re.search(pattern4, test_email))}")  # True

# Step 5: Add anchors for exact matching
pattern5 = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
print(f"Step 5 (final): {bool(re.match(pattern5, test_email))}")  # True

Python

graph TD
    A["Start: .+@"] --> B["Refine: [a-zA-Z0-9._%+-]+@"]
    B --> C["Add domain: +@[a-zA-Z0-9.-]+"]
    C --> D["Add extension: +\.[a-zA-Z]+"]
    D --> E["Add anchors: ^...$"]
    E --> F["Final Pattern"]

    style A fill:#ffebee
    style F fill:#e8f5e8

Interactive Pattern Tester

def test_pattern_interactively():
    """Interactive pattern testing function"""

    def test_regex(pattern, test_strings, description=""):
        """Test a regex pattern against multiple strings"""
        print(f"\n{'='*50}")
        print(f"Testing: {description}")
        print(f"Pattern: {pattern}")
        print(f"{'='*50}")

        compiled_pattern = re.compile(pattern)

        for test_string in test_strings:
            match = compiled_pattern.search(test_string)
            if match:
                print(f"✓ '{test_string}' -> Match: '{match.group()}'")
                if match.groups():
                    print(f"  Groups: {match.groups()}")
            else:
                print(f"✗ '{test_string}' -> No match")

    # Example usage
    phone_pattern = r"(\(?\d{3}\)?[-.\s]?)(\d{3}[-.\s]?\d{4})"
    phone_tests = [
        "123-456-7890",
        "(555) 123-4567",
        "555.123.4567",
        "1234567890",
        "invalid-phone"
    ]

    test_regex(phone_pattern, phone_tests, "Phone Number Validation")

# Run the interactive tester
test_pattern_interactively()

def test_pattern_interactively():
    """Interactive pattern testing function"""

    def test_regex(pattern, test_strings, description=""):
        """Test a regex pattern against multiple strings"""
        print(f"\n{'='*50}")
        print(f"Testing: {description}")
        print(f"Pattern: {pattern}")
        print(f"{'='*50}")

        compiled_pattern = re.compile(pattern)

        for test_string in test_strings:
            match = compiled_pattern.search(test_string)
            if match:
                print(f"✓ '{test_string}' -> Match: '{match.group()}'")
                if match.groups():
                    print(f"  Groups: {match.groups()}")
            else:
                print(f"✗ '{test_string}' -> No match")

    # Example usage
    phone_pattern = r"(\(?\d{3}\)?[-.\s]?)(\d{3}[-.\s]?\d{4})"
    phone_tests = [
        "123-456-7890",
        "(555) 123-4567",
        "555.123.4567",
        "1234567890",
        "invalid-phone"
    ]

    test_regex(phone_pattern, phone_tests, "Phone Number Validation")

# Run the interactive tester
test_pattern_interactively()

Python

Debugging Tools and Techniques

def debug_regex_step_by_step(pattern, text):
    """Debug a regex pattern by showing each step"""

    print(f"Debugging pattern: {pattern}")
    print(f"Against text: '{text}'")
    print("-" * 50)

    try:
        compiled_pattern = re.compile(pattern)
        match = compiled_pattern.search(text)

        if match:
            print(f"✓ Match found!")
            print(f"  Full match: '{match.group()}'")
            print(f"  Start position: {match.start()}")
            print(f"  End position: {match.end()}")
            print(f"  Span: {match.span()}")

            if match.groups():
                print("  Captured groups:")
                for i, group in enumerate(match.groups(), 1):
                    print(f"    Group {i}: '{group}'")

            if hasattr(match, 'groupdict') and match.groupdict():
                print("  Named groups:")
                for name, value in match.groupdict().items():
                    print(f"    {name}: '{value}'")
        else:
            print("✗ No match found")

        # Show all matches if there are multiple
        all_matches = compiled_pattern.findall(text)
        if len(all_matches) > 1:
            print(f"\nAll matches found: {all_matches}")

    except re.error as e:
        print(f"❌ Regex error: {e}")
        return False

    return True

# Example debugging session
debug_regex_step_by_step(
    r"(\w+)@(\w+)\.(\w+)",
    "Contact us at john@example.com or mary@test.org"
)

def debug_regex_step_by_step(pattern, text):
    """Debug a regex pattern by showing each step"""

    print(f"Debugging pattern: {pattern}")
    print(f"Against text: '{text}'")
    print("-" * 50)

    try:
        compiled_pattern = re.compile(pattern)
        match = compiled_pattern.search(text)

        if match:
            print(f"✓ Match found!")
            print(f"  Full match: '{match.group()}'")
            print(f"  Start position: {match.start()}")
            print(f"  End position: {match.end()}")
            print(f"  Span: {match.span()}")

            if match.groups():
                print("  Captured groups:")
                for i, group in enumerate(match.groups(), 1):
                    print(f"    Group {i}: '{group}'")

            if hasattr(match, 'groupdict') and match.groupdict():
                print("  Named groups:")
                for name, value in match.groupdict().items():
                    print(f"    {name}: '{value}'")
        else:
            print("✗ No match found")

        # Show all matches if there are multiple
        all_matches = compiled_pattern.findall(text)
        if len(all_matches) > 1:
            print(f"\nAll matches found: {all_matches}")

    except re.error as e:
        print(f"❌ Regex error: {e}")
        return False

    return True

# Example debugging session
debug_regex_step_by_step(
    r"(\w+)@(\w+)\.(\w+)",
    "Contact us at john@example.com or mary@test.org"
)

Python

Common Error Messages and Solutions

def demonstrate_common_errors():
    """Show common regex errors and their solutions"""

    common_errors = [
        {
            "error": "Unbalanced parenthesis",
            "bad_pattern": r"(\d+",
            "good_pattern": r"(\d+)",
            "description": "Always close parentheses"
        },
        {
            "error": "Invalid escape sequence",
            "bad_pattern": "\\d+\\s+\\w+",  # Python string
            "good_pattern": r"\d+\s+\w+",   # Raw string
            "description": "Use raw strings to avoid double escaping"
        },
        {
            "error": "Nothing to repeat",
            "bad_pattern": r"+\d+",
            "good_pattern": r"\d+",
            "description": "Quantifiers need something to quantify"
        }
    ]

    for error_info in common_errors:
        print(f"\n{error_info['error']}:")
        print(f"❌ Bad: {error_info['bad_pattern']}")
        print(f"✅ Good: {error_info['good_pattern']}")
        print(f"💡 Tip: {error_info['description']}")

demonstrate_common_errors()

def demonstrate_common_errors():
    """Show common regex errors and their solutions"""

    common_errors = [
        {
            "error": "Unbalanced parenthesis",
            "bad_pattern": r"(\d+",
            "good_pattern": r"(\d+)",
            "description": "Always close parentheses"
        },
        {
            "error": "Invalid escape sequence",
            "bad_pattern": "\\d+\\s+\\w+",  # Python string
            "good_pattern": r"\d+\s+\w+",   # Raw string
            "description": "Use raw strings to avoid double escaping"
        },
        {
            "error": "Nothing to repeat",
            "bad_pattern": r"+\d+",
            "good_pattern": r"\d+",
            "description": "Quantifiers need something to quantify"
        }
    ]

    for error_info in common_errors:
        print(f"\n{error_info['error']}:")
        print(f"❌ Bad: {error_info['bad_pattern']}")
        print(f"✅ Good: {error_info['good_pattern']}")
        print(f"💡 Tip: {error_info['description']}")

demonstrate_common_errors()

Python

graph TD
    A[Regex Debugging] --> B[Step-by-step Testing]
    A --> C[Interactive Pattern Building]
    A --> D[Error Analysis]

    B --> B1[Print intermediate results]
    B --> B2[Test with multiple inputs]
    B --> B3[Visualize matches]

    C --> C1[Start simple]
    C --> C2[Add complexity gradually]
    C --> C3[Test each iteration]

    D --> D1[Common syntax errors]
    D --> D2[Logic errors]
    D --> D3[Performance issues]

11. Real-World Applications

Email Validation

import re
from typing import List, Dict, Optional

def validate_email(email: str) -> Dict[str, any]:
    """
    Comprehensive email validation with detailed feedback

    Args:
        email: Email address to validate

    Returns:
        Dictionary with validation result and details
    """
    if not isinstance(email, str):
        return {"valid": False, "error": "Input must be a string"}

    if not email:
        return {"valid": False, "error": "Email cannot be empty"}

    # Basic pattern for email validation
    pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"

    try:
        is_valid = bool(re.match(pattern, email))

        if is_valid:
            # Extract parts for additional validation
            local_part, domain = email.split('@')
            domain_parts = domain.split('.')

            return {
                "valid": True,
                "email": email,
                "local_part": local_part,
                "domain": domain,
                "tld": domain_parts[-1],
                "length": len(email)
            }
        else:
            # Provide specific error feedback
            errors = []

            if '@' not in email:
                errors.append("Missing @ symbol")
            elif email.count('@') > 1:
                errors.append("Multiple @ symbols")
            elif email.startswith('@'):
                errors.append("Cannot start with @")
            elif email.endswith('@'):
                errors.append("Cannot end with @")
            elif '.' not in email.split('@')[-1]:
                errors.append("Domain must contain a dot")
            else:
                errors.append("Invalid format")

            return {"valid": False, "error": "; ".join(errors)}

    except Exception as e:
        return {"valid": False, "error": f"Validation error: {str(e)}"}

# Test cases with comprehensive feedback
test_emails = [
    "user@example.com",           # Valid
    "test.email+tag@domain.org",  # Valid with plus
    "user.name@sub.domain.com",   # Valid with subdomain
    "invalid.email",              # Invalid - no @
    "@domain.com",               # Invalid - starts with @
    "user@",                     # Invalid - ends with @
    "user@@domain.com",          # Invalid - double @
    "user@domain",               # Invalid - no TLD
    "",                          # Invalid - empty
    123,                         # Invalid - not string
]

print("Email Validation Results:")
print("=" * 50)

for email in test_emails:
    result = validate_email(email)
    status = "✓" if result["valid"] else "✗"

    if result["valid"]:
        print(f"{status} {email}")
        print(f"  Local: {result['local_part']}, Domain: {result['domain']}")
    else:
        print(f"{status} {email} - {result['error']}")

import re
from typing import List, Dict, Optional

def validate_email(email: str) -> Dict[str, any]:
    """
    Comprehensive email validation with detailed feedback

    Args:
        email: Email address to validate

    Returns:
        Dictionary with validation result and details
    """
    if not isinstance(email, str):
        return {"valid": False, "error": "Input must be a string"}

    if not email:
        return {"valid": False, "error": "Email cannot be empty"}

    # Basic pattern for email validation
    pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"

    try:
        is_valid = bool(re.match(pattern, email))

        if is_valid:
            # Extract parts for additional validation
            local_part, domain = email.split('@')
            domain_parts = domain.split('.')

            return {
                "valid": True,
                "email": email,
                "local_part": local_part,
                "domain": domain,
                "tld": domain_parts[-1],
                "length": len(email)
            }
        else:
            # Provide specific error feedback
            errors = []

            if '@' not in email:
                errors.append("Missing @ symbol")
            elif email.count('@') > 1:
                errors.append("Multiple @ symbols")
            elif email.startswith('@'):
                errors.append("Cannot start with @")
            elif email.endswith('@'):
                errors.append("Cannot end with @")
            elif '.' not in email.split('@')[-1]:
                errors.append("Domain must contain a dot")
            else:
                errors.append("Invalid format")

            return {"valid": False, "error": "; ".join(errors)}

    except Exception as e:
        return {"valid": False, "error": f"Validation error: {str(e)}"}

# Test cases with comprehensive feedback
test_emails = [
    "user@example.com",           # Valid
    "test.email+tag@domain.org",  # Valid with plus
    "user.name@sub.domain.com",   # Valid with subdomain
    "invalid.email",              # Invalid - no @
    "@domain.com",               # Invalid - starts with @
    "user@",                     # Invalid - ends with @
    "user@@domain.com",          # Invalid - double @
    "user@domain",               # Invalid - no TLD
    "",                          # Invalid - empty
    123,                         # Invalid - not string
]

print("Email Validation Results:")
print("=" * 50)

for email in test_emails:
    result = validate_email(email)
    status = "✓" if result["valid"] else "✗"

    if result["valid"]:
        print(f"{status} {email}")
        print(f"  Local: {result['local_part']}, Domain: {result['domain']}")
    else:
        print(f"{status} {email} - {result['error']}")

Python

Phone Number Extraction

def extract_phone_numbers(text):
    # Matches various phone number formats
    pattern = r"(\+?1[-.\s]?)?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})"
    matches = re.finditer(pattern, text)

    phones = []
    for match in matches:
        phone = f"({match.group(2)}) {match.group(3)}-{match.group(4)}"
        phones.append(phone)

    return phones

text = """
Contact us at 123-456-7890 or (555) 123-4567.
You can also reach us at +1 (800) 555-0123.
"""

phones = extract_phone_numbers(text)
for phone in phones:
    print(phone)

def extract_phone_numbers(text):
    # Matches various phone number formats
    pattern = r"(\+?1[-.\s]?)?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})"
    matches = re.finditer(pattern, text)

    phones = []
    for match in matches:
        phone = f"({match.group(2)}) {match.group(3)}-{match.group(4)}"
        phones.append(phone)

    return phones

text = """
Contact us at 123-456-7890 or (555) 123-4567.
You can also reach us at +1 (800) 555-0123.
"""

phones = extract_phone_numbers(text)
for phone in phones:
    print(phone)

Python

Log File Analysis

def parse_log_entry(log_line):
    # Common log format: IP - - [timestamp] "method url protocol" status size
    pattern = r'(\d+\.\d+\.\d+\.\d+).*?\[(.*?)\]\s+"(\w+)\s+(.*?)\s+.*?"\s+(\d+)\s+(\d+)'

    match = re.search(pattern, log_line)
    if match:
        return {
            'ip': match.group(1),
            'timestamp': match.group(2),
            'method': match.group(3),
            'url': match.group(4),
            'status': int(match.group(5)),
            'size': int(match.group(6))
        }
    return None

log = '192.168.1.1 - - [01/Jan/2024:12:00:00 +0000] "GET /index.html HTTP/1.1" 200 1234'
parsed = parse_log_entry(log)
print(parsed)

def parse_log_entry(log_line):
    # Common log format: IP - - [timestamp] "method url protocol" status size
    pattern = r'(\d+\.\d+\.\d+\.\d+).*?\[(.*?)\]\s+"(\w+)\s+(.*?)\s+.*?"\s+(\d+)\s+(\d+)'

    match = re.search(pattern, log_line)
    if match:
        return {
            'ip': match.group(1),
            'timestamp': match.group(2),
            'method': match.group(3),
            'url': match.group(4),
            'status': int(match.group(5)),
            'size': int(match.group(6))
        }
    return None

log = '192.168.1.1 - - [01/Jan/2024:12:00:00 +0000] "GET /index.html HTTP/1.1" 200 1234'
parsed = parse_log_entry(log)
print(parsed)

Python

URL Extraction

def extract_urls(text):
    pattern = r'https?://(?:[-\w.])+(?:[:\d]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:\w)*)?)?'
    return re.findall(pattern, text)

text = """
Visit our website at https://example.com or check out
the documentation at https://docs.example.com/guide?lang=en#overview
"""

urls = extract_urls(text)
for url in urls:
    print(url)

def extract_urls(text):
    pattern = r'https?://(?:[-\w.])+(?:[:\d]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:\w)*)?)?'
    return re.findall(pattern, text)

text = """
Visit our website at https://example.com or check out
the documentation at https://docs.example.com/guide?lang=en#overview
"""

urls = extract_urls(text)
for url in urls:
    print(url)

Python

Web Scraping Applications

import re
import requests

def scrape_product_info(html_content):
    """Extract product information from e-commerce HTML"""

    patterns = {
        'title': r'<h1[^>]*class="[^"]*product-title[^"]*"[^>]*>(.*?)</h1>',
        'price': r'<span[^>]*class="[^"]*price[^"]*"[^>]*>\$?([\d,]+\.?\d*)</span>',
        'rating': r'<div[^>]*class="[^"]*rating[^"]*"[^>]*>.*?(\d+\.?\d*)\s*out of',
        'availability': r'<span[^>]*class="[^"]*stock[^"]*"[^>]*>(In Stock|Out of Stock)</span>'
    }

    product_info = {}

    for key, pattern in patterns.items():
        match = re.search(pattern, html_content, re.IGNORECASE | re.DOTALL)
        if match:
            product_info[key] = match.group(1).strip()
        else:
            product_info[key] = "Not found"

    return product_info

# Example usage
sample_html = """
<div class="product-container">
    <h1 class="product-title">Wireless Bluetooth Headphones</h1>
    <span class="price-current">$79.99</span>
    <div class="rating-section">4.5 out of 5 stars</div>
    <span class="stock-status">In Stock</span>
</div>
"""

product = scrape_product_info(sample_html)
print(product)

import re
import requests

def scrape_product_info(html_content):
    """Extract product information from e-commerce HTML"""

    patterns = {
        'title': r'<h1[^>]*class="[^"]*product-title[^"]*"[^>]*>(.*?)</h1>',
        'price': r'<span[^>]*class="[^"]*price[^"]*"[^>]*>\$?([\d,]+\.?\d*)</span>',
        'rating': r'<div[^>]*class="[^"]*rating[^"]*"[^>]*>.*?(\d+\.?\d*)\s*out of',
        'availability': r'<span[^>]*class="[^"]*stock[^"]*"[^>]*>(In Stock|Out of Stock)</span>'
    }

    product_info = {}

    for key, pattern in patterns.items():
        match = re.search(pattern, html_content, re.IGNORECASE | re.DOTALL)
        if match:
            product_info[key] = match.group(1).strip()
        else:
            product_info[key] = "Not found"

    return product_info

# Example usage
sample_html = """
<div class="product-container">
    <h1 class="product-title">Wireless Bluetooth Headphones</h1>
    <span class="price-current">$79.99</span>
    <div class="rating-section">4.5 out of 5 stars</div>
    <span class="stock-status">In Stock</span>
</div>
"""

product = scrape_product_info(sample_html)
print(product)

Python

Data Analysis and Processing

import re
import pandas as pd
from collections import defaultdict

def analyze_text_data(text_data):
    """Comprehensive text analysis using regex"""

    analysis = {
        'word_count': len(re.findall(r'\b\w+\b', text_data)),
        'sentence_count': len(re.findall(r'[.!?]+', text_data)),
        'paragraph_count': len(re.findall(r'\n\s*\n', text_data)) + 1,
        'emails': re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text_data),
        'phone_numbers': re.findall(r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}', text_data),
        'urls': re.findall(r'https?://[^\s<>"\']+', text_data),
        'mentions': re.findall(r'@\w+', text_data),
        'hashtags': re.findall(r'#\w+', text_data),
        'numbers': re.findall(r'\b\d+\.?\d*\b', text_data),
        'dates': re.findall(r'\b\d{1,2}/\d{1,2}/\d{4}\b', text_data)
    }

    # Word frequency analysis
    words = re.findall(r'\b\w+\b', text_data.lower())
    word_freq = defaultdict(int)
    for word in words:
        if len(word) > 3:  # Only count words longer than 3 characters
            word_freq[word] += 1

    analysis['top_words'] = sorted(word_freq.items(), key=lambda x: x[1], reverse=True)[:10]

    return analysis

# Example usage
sample_text = """
Contact our team at support@example.com or call us at (555) 123-4567.
Visit our website at https://example.com for more information.
Follow us @example_company and use #ExampleProduct in your posts.
Our sales increased by 25.5% on 12/15/2023. We now have 1000+ customers!
"""

analysis_result = analyze_text_data(sample_text)
for key, value in analysis_result.items():
    print(f"{key}: {value}")

import re
import pandas as pd
from collections import defaultdict

def analyze_text_data(text_data):
    """Comprehensive text analysis using regex"""

    analysis = {
        'word_count': len(re.findall(r'\b\w+\b', text_data)),
        'sentence_count': len(re.findall(r'[.!?]+', text_data)),
        'paragraph_count': len(re.findall(r'\n\s*\n', text_data)) + 1,
        'emails': re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text_data),
        'phone_numbers': re.findall(r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}', text_data),
        'urls': re.findall(r'https?://[^\s<>"\']+', text_data),
        'mentions': re.findall(r'@\w+', text_data),
        'hashtags': re.findall(r'#\w+', text_data),
        'numbers': re.findall(r'\b\d+\.?\d*\b', text_data),
        'dates': re.findall(r'\b\d{1,2}/\d{1,2}/\d{4}\b', text_data)
    }

    # Word frequency analysis
    words = re.findall(r'\b\w+\b', text_data.lower())
    word_freq = defaultdict(int)
    for word in words:
        if len(word) > 3:  # Only count words longer than 3 characters
            word_freq[word] += 1

    analysis['top_words'] = sorted(word_freq.items(), key=lambda x: x[1], reverse=True)[:10]

    return analysis

# Example usage
sample_text = """
Contact our team at support@example.com or call us at (555) 123-4567.
Visit our website at https://example.com for more information.
Follow us @example_company and use #ExampleProduct in your posts.
Our sales increased by 25.5% on 12/15/2023. We now have 1000+ customers!
"""

analysis_result = analyze_text_data(sample_text)
for key, value in analysis_result.items():
    print(f"{key}: {value}")

Python

File Processing Applications

import re
import os
from pathlib import Path

class LogAnalyzer:
    """Comprehensive log file analyzer using regex patterns"""

    def __init__(self):
        self.patterns = {
            'apache_access': r'(\d+\.\d+\.\d+\.\d+).*?\[(.*?)\].*?"(\w+)\s+([^"]+).*?"\s+(\d+)\s+(\d+)',
            'error_log': r'\[(.*?)\]\s*\[(\w+)\]\s*(.*)',
            'application_log': r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+(\w+)\s+(.*)',
            'nginx_access': r'(\d+\.\d+\.\d+\.\d+).*?\[(.*?)\].*?"([^"]+)"\s+(\d+)\s+(\d+)',
        }

    def parse_apache_access_log(self, log_line):
        """Parse Apache access log format"""
        match = re.search(self.patterns['apache_access'], log_line)
        if match:
            return {
                'ip': match.group(1),
                'timestamp': match.group(2),
                'method': match.group(3),
                'url': match.group(4),
                'status': int(match.group(5)),
                'size': int(match.group(6))
            }
        return None

    def analyze_log_file(self, file_path, log_type='apache_access'):
        """Analyze entire log file and generate statistics"""

        stats = {
            'total_requests': 0,
            'status_codes': defaultdict(int),
            'ips': defaultdict(int),
            'methods': defaultdict(int),
            'urls': defaultdict(int),
            'errors': []
        }

        try:
            with open(file_path, 'r') as f:
                for line_num, line in enumerate(f, 1):
                    parsed = self.parse_apache_access_log(line.strip())
                    if parsed:
                        stats['total_requests'] += 1
                        stats['status_codes'][parsed['status']] += 1
                        stats['ips'][parsed['ip']] += 1
                        stats['methods'][parsed['method']] += 1
                        stats['urls'][parsed['url']] += 1

                        if parsed['status'] >= 400:
                            stats['errors'].append({
                                'line': line_num,
                                'status': parsed['status'],
                                'url': parsed['url'],
                                'ip': parsed['ip']
                            })

        except FileNotFoundError:
            print(f"File {file_path} not found")
            return None

        # Convert to regular dicts and get top entries
        for key in ['status_codes', 'ips', 'methods', 'urls']:
            stats[key] = dict(sorted(stats[key].items(), key=lambda x: x[1], reverse=True)[:10])

        return stats

# Text cleaning and normalization
def clean_and_normalize_text(text):
    """Clean and normalize text data using regex"""

    # Remove HTML tags
    text = re.sub(r'<[^>]+>', '', text)

    # Remove extra whitespace
    text = re.sub(r'\s+', ' ', text)

    # Remove special characters but keep basic punctuation
    text = re.sub(r'[^\w\s\.\,\!\?\-]', '', text)

    # Normalize multiple punctuation marks
    text = re.sub(r'[\.]{2,}', '...', text)
    text = re.sub(r'[!]{2,}', '!', text)
    text = re.sub(r'[?]{2,}', '?', text)

    # Fix spacing around punctuation
    text = re.sub(r'\s*([,.!?])\s*', r'\1 ', text)

    # Remove leading/trailing whitespace
    text = text.strip()

    return text

# CSV data extraction from text
def extract_csv_like_data(text):
    """Extract structured data that looks like CSV from text"""

    # Pattern for CSV-like data (comma-separated values)
    csv_pattern = r'^[^,\n]+(?:,[^,\n]+)+$'

    lines = text.split('\n')
    csv_lines = []

    for line in lines:
        if re.match(csv_pattern, line.strip()):
            csv_lines.append(line.strip())

    return csv_lines

# Example usage
sample_dirty_text = """
<html><body>This is some  messy    text!!! 
It has <b>HTML tags</b> and irregular   spacing...
Contact: user@example.com, phone: 555-123-4567.
Data: John,25,Engineer
      Jane,30,Designer  
      Bob,28,Developer
</body></html>
"""

cleaned_text = clean_and_normalize_text(sample_dirty_text)
print("Cleaned text:", cleaned_text)

csv_data = extract_csv_like_data(sample_dirty_text)
print("Extracted CSV data:", csv_data)

import re
import os
from pathlib import Path

class LogAnalyzer:
    """Comprehensive log file analyzer using regex patterns"""

    def __init__(self):
        self.patterns = {
            'apache_access': r'(\d+\.\d+\.\d+\.\d+).*?\[(.*?)\].*?"(\w+)\s+([^"]+).*?"\s+(\d+)\s+(\d+)',
            'error_log': r'\[(.*?)\]\s*\[(\w+)\]\s*(.*)',
            'application_log': r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+(\w+)\s+(.*)',
            'nginx_access': r'(\d+\.\d+\.\d+\.\d+).*?\[(.*?)\].*?"([^"]+)"\s+(\d+)\s+(\d+)',
        }

    def parse_apache_access_log(self, log_line):
        """Parse Apache access log format"""
        match = re.search(self.patterns['apache_access'], log_line)
        if match:
            return {
                'ip': match.group(1),
                'timestamp': match.group(2),
                'method': match.group(3),
                'url': match.group(4),
                'status': int(match.group(5)),
                'size': int(match.group(6))
            }
        return None

    def analyze_log_file(self, file_path, log_type='apache_access'):
        """Analyze entire log file and generate statistics"""

        stats = {
            'total_requests': 0,
            'status_codes': defaultdict(int),
            'ips': defaultdict(int),
            'methods': defaultdict(int),
            'urls': defaultdict(int),
            'errors': []
        }

        try:
            with open(file_path, 'r') as f:
                for line_num, line in enumerate(f, 1):
                    parsed = self.parse_apache_access_log(line.strip())
                    if parsed:
                        stats['total_requests'] += 1
                        stats['status_codes'][parsed['status']] += 1
                        stats['ips'][parsed['ip']] += 1
                        stats['methods'][parsed['method']] += 1
                        stats['urls'][parsed['url']] += 1

                        if parsed['status'] >= 400:
                            stats['errors'].append({
                                'line': line_num,
                                'status': parsed['status'],
                                'url': parsed['url'],
                                'ip': parsed['ip']
                            })

        except FileNotFoundError:
            print(f"File {file_path} not found")
            return None

        # Convert to regular dicts and get top entries
        for key in ['status_codes', 'ips', 'methods', 'urls']:
            stats[key] = dict(sorted(stats[key].items(), key=lambda x: x[1], reverse=True)[:10])

        return stats

# Text cleaning and normalization
def clean_and_normalize_text(text):
    """Clean and normalize text data using regex"""

    # Remove HTML tags
    text = re.sub(r'<[^>]+>', '', text)

    # Remove extra whitespace
    text = re.sub(r'\s+', ' ', text)

    # Remove special characters but keep basic punctuation
    text = re.sub(r'[^\w\s\.\,\!\?\-]', '', text)

    # Normalize multiple punctuation marks
    text = re.sub(r'[\.]{2,}', '...', text)
    text = re.sub(r'[!]{2,}', '!', text)
    text = re.sub(r'[?]{2,}', '?', text)

    # Fix spacing around punctuation
    text = re.sub(r'\s*([,.!?])\s*', r'\1 ', text)

    # Remove leading/trailing whitespace
    text = text.strip()

    return text

# CSV data extraction from text
def extract_csv_like_data(text):
    """Extract structured data that looks like CSV from text"""

    # Pattern for CSV-like data (comma-separated values)
    csv_pattern = r'^[^,\n]+(?:,[^,\n]+)+$'

    lines = text.split('\n')
    csv_lines = []

    for line in lines:
        if re.match(csv_pattern, line.strip()):
            csv_lines.append(line.strip())

    return csv_lines

# Example usage
sample_dirty_text = """
<html><body>This is some  messy    text!!! 
It has <b>HTML tags</b> and irregular   spacing...
Contact: user@example.com, phone: 555-123-4567.
Data: John,25,Engineer
      Jane,30,Designer  
      Bob,28,Developer
</body></html>
"""

cleaned_text = clean_and_normalize_text(sample_dirty_text)
print("Cleaned text:", cleaned_text)

csv_data = extract_csv_like_data(sample_dirty_text)
print("Extracted CSV data:", csv_data)

Python

graph TD
    A[Real-World Applications] --> B[Email Validation]
    A --> C[Phone Numbers]
    A --> D[Log Parsing]
    A --> E[URL Extraction]
    A --> F[Data Cleaning]

    B --> B1[Format Verification]
    C --> C1[Various Formats]
    D --> D1[Structure Extraction]
    E --> E1[Link Discovery]
    F --> F1[Text Normalization]

11. Performance and Optimization

Compiling Patterns

import re
import time

# Inefficient - recompiles pattern each time
def slow_search(texts):
    pattern = r"\b\w+@\w+\.\w+\b"
    results = []
    for text in texts:
        matches = re.findall(pattern, text)
        results.extend(matches)
    return results

# Efficient - compile once, use many times
def fast_search(texts):
    pattern = re.compile(r"\b\w+@\w+\.\w+\b")
    results = []
    for text in texts:
        matches = pattern.findall(text)
        results.extend(matches)
    return results

# Test with large dataset
texts = ["Contact user@example.com for info"] * 10000

# Time the approaches
start = time.time()
slow_results = slow_search(texts)
slow_time = time.time() - start

start = time.time()
fast_results = fast_search(texts)
fast_time = time.time() - start

print(f"Slow approach: {slow_time:.4f} seconds")
print(f"Fast approach: {fast_time:.4f} seconds")
print(f"Speed improvement: {slow_time / fast_time:.2f}x")

import re
import time

# Inefficient - recompiles pattern each time
def slow_search(texts):
    pattern = r"\b\w+@\w+\.\w+\b"
    results = []
    for text in texts:
        matches = re.findall(pattern, text)
        results.extend(matches)
    return results

# Efficient - compile once, use many times
def fast_search(texts):
    pattern = re.compile(r"\b\w+@\w+\.\w+\b")
    results = []
    for text in texts:
        matches = pattern.findall(text)
        results.extend(matches)
    return results

# Test with large dataset
texts = ["Contact user@example.com for info"] * 10000

# Time the approaches
start = time.time()
slow_results = slow_search(texts)
slow_time = time.time() - start

start = time.time()
fast_results = fast_search(texts)
fast_time = time.time() - start

print(f"Slow approach: {slow_time:.4f} seconds")
print(f"Fast approach: {fast_time:.4f} seconds")
print(f"Speed improvement: {slow_time / fast_time:.2f}x")

Python

Optimizing Patterns

# Inefficient - backtracking
bad_pattern = r"(a+)+b"

# Efficient - atomic grouping or possessive quantifiers
good_pattern = r"a+b"

# Use specific character classes instead of .
# Bad
pattern = r".*@.*\..*"
# Good
pattern = r"[^@]+@[^.]+\.[a-zA-Z]+"

# Anchor patterns when possible
# Unanchored (slower)
pattern = r"\d{3}-\d{3}-\d{4}"
# Anchored (faster)
pattern = r"^\d{3}-\d{3}-\d{4}$"

# Inefficient - backtracking
bad_pattern = r"(a+)+b"

# Efficient - atomic grouping or possessive quantifiers
good_pattern = r"a+b"

# Use specific character classes instead of .
# Bad
pattern = r".*@.*\..*"
# Good
pattern = r"[^@]+@[^.]+\.[a-zA-Z]+"

# Anchor patterns when possible
# Unanchored (slower)
pattern = r"\d{3}-\d{3}-\d{4}"
# Anchored (faster)
pattern = r"^\d{3}-\d{3}-\d{4}$"

Python

Performance Tips

graph TB
    A["Regex Performance"] --> B["Compile Patterns"]
    A --> C[Avoid Backtracking]
    A --> D[Use Specific Classes]
    A --> E[Anchor Patterns]

    B --> B1[re.compile for reuse]
    C --> C1[Avoid nested quantifiers]
    D --> D1["[^@]+ instead of .*"]
    E --> E1[^ and $ when appropriate]

12. Common Pitfalls and Best Practices {#best-practices}

Common Mistakes

1. Forgetting Raw Strings

# Wrong - need to escape backslashes
pattern = "\\d+\\.\\d+"

# Right - use raw strings
pattern = r"\d+\.\d+"

# Wrong - need to escape backslashes
pattern = "\\d+\\.\\d+"

# Right - use raw strings
pattern = r"\d+\.\d+"

Python

2. Greedy vs Non-Greedy

html = "<div>Hello</div><div>World</div>"

# Wrong - matches entire string
pattern = r"<div>.*</div>"
match = re.search(pattern, html)
print(match.group())  # <div>Hello</div><div>World</div>

# Right - non-greedy matching
pattern = r"<div>.*?</div>"
matches = re.findall(pattern, html)
print(matches)  # ['<div>Hello</div>', '<div>World</div>']

html = "<div>Hello</div><div>World</div>"

# Wrong - matches entire string
pattern = r"<div>.*</div>"
match = re.search(pattern, html)
print(match.group())  # <div>Hello</div><div>World</div>

# Right - non-greedy matching
pattern = r"<div>.*?</div>"
matches = re.findall(pattern, html)
print(matches)  # ['<div>Hello</div>', '<div>World</div>']

Python

3. Not Escaping Special Characters

# Wrong - . matches any character
pattern = r"3.14"
text = "3X14"
print(bool(re.search(pattern, text)))  # True (unexpected)

# Right - escape the literal dot
pattern = r"3\.14"
text = "3X14"
print(bool(re.search(pattern, text)))  # False

# Wrong - . matches any character
pattern = r"3.14"
text = "3X14"
print(bool(re.search(pattern, text)))  # True (unexpected)

# Right - escape the literal dot
pattern = r"3\.14"
text = "3X14"
print(bool(re.search(pattern, text)))  # False

Python

Best Practices

1. Use Verbose Mode for Complex Patterns

# Complex pattern - hard to read
pattern = r"^(?P<area>\d{3})-(?P<exchange>\d{3})-(?P<number>\d{4})$"

# Same pattern with verbose mode - much clearer
pattern = re.compile(r"""
    ^                   # Start of string
    (?P<area>\d{3})     # Area code (3 digits)
    -                   # Literal hyphen
    (?P<exchange>\d{3}) # Exchange (3 digits)
    -                   # Literal hyphen
    (?P<number>\d{4})   # Number (4 digits)
    $                   # End of string
""", re.VERBOSE)

# Complex pattern - hard to read
pattern = r"^(?P<area>\d{3})-(?P<exchange>\d{3})-(?P<number>\d{4})$"

# Same pattern with verbose mode - much clearer
pattern = re.compile(r"""
    ^                   # Start of string
    (?P<area>\d{3})     # Area code (3 digits)
    -                   # Literal hyphen
    (?P<exchange>\d{3}) # Exchange (3 digits)
    -                   # Literal hyphen
    (?P<number>\d{4})   # Number (4 digits)
    $                   # End of string
""", re.VERBOSE)

Python

2. Validate Input Before Processing

def safe_regex_search(pattern, text):
    if not isinstance(text, str):
        return None

    try:
        compiled_pattern = re.compile(pattern)
        return compiled_pattern.search(text)
    except re.error as e:
        print(f"Invalid regex pattern: {e}")
        return None

def safe_regex_search(pattern, text):
    if not isinstance(text, str):
        return None

    try:
        compiled_pattern = re.compile(pattern)
        return compiled_pattern.search(text)
    except re.error as e:
        print(f"Invalid regex pattern: {e}")
        return None

Python

3. Use Appropriate Methods

# For validation - use match()
def is_valid_email(email):
    pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
    return bool(re.match(pattern, email))

# For finding - use search() or findall()
def extract_emails(text):
    pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
    return re.findall(pattern, text)

# For replacement - use sub()
def mask_emails(text):
    pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
    return re.sub(pattern, "***@***.***", text)

# For validation - use match()
def is_valid_email(email):
    pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
    return bool(re.match(pattern, email))

# For finding - use search() or findall()
def extract_emails(text):
    pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
    return re.findall(pattern, text)

# For replacement - use sub()
def mask_emails(text):
    pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
    return re.sub(pattern, "***@***.***", text)

Python

Testing Regex Patterns

import unittest

class TestEmailRegex(unittest.TestCase):
    def setUp(self):
        self.email_pattern = re.compile(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$")

    def test_valid_emails(self):
        valid_emails = [
            "user@example.com",
            "test.email@domain.org",
            "user+tag@example.co.uk"
        ]

        for email in valid_emails:
            with self.subTest(email=email):
                self.assertTrue(self.email_pattern.match(email))

    def test_invalid_emails(self):
        invalid_emails = [
            "invalid.email",
            "@domain.com",
            "user@",
            "user space@domain.com"
        ]

        for email in invalid_emails:
            with self.subTest(email=email):
                self.assertFalse(self.email_pattern.match(email))

# Run tests
if __name__ == "__main__":
    unittest.main()

import unittest

class TestEmailRegex(unittest.TestCase):
    def setUp(self):
        self.email_pattern = re.compile(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$")

    def test_valid_emails(self):
        valid_emails = [
            "user@example.com",
            "test.email@domain.org",
            "user+tag@example.co.uk"
        ]

        for email in valid_emails:
            with self.subTest(email=email):
                self.assertTrue(self.email_pattern.match(email))

    def test_invalid_emails(self):
        invalid_emails = [
            "invalid.email",
            "@domain.com",
            "user@",
            "user space@domain.com"
        ]

        for email in invalid_emails:
            with self.subTest(email=email):
                self.assertFalse(self.email_pattern.match(email))

# Run tests
if __name__ == "__main__":
    unittest.main()

Python

graph TD
    A[Best Practices] --> B[Use Raw Strings]
    A --> C[Test Thoroughly]
    A --> D[Handle Errors]
    A --> E[Document Complex Patterns]
    A --> F[Choose Right Method]

    B --> B1[Avoid escape issues]
    C --> C1[Unit test patterns]
    D --> D1[Catch re.error]
    E --> E1[Use re.VERBOSE]
    F --> F1[match vs search vs findall]

14. Comprehensive Regex Cheat Sheet

Quick Reference Patterns

# BASIC PATTERNS
basic_patterns = {
    'literal': r'hello',           # Matches "hello" exactly
    'case_insensitive': r'(?i)hello',  # Matches "Hello", "HELLO", etc.
    'any_char': r'h.llo',          # Matches "hello", "hallo", "h3llo"
    'optional': r'colou?r',        # Matches "color" or "colour"
}

# CHARACTER CLASSES
character_classes = {
    'digit': r'\d',                # [0-9]
    'non_digit': r'\D',            # [^0-9]
    'word': r'\w',                 # [a-zA-Z0-9_]
    'non_word': r'\W',             # [^a-zA-Z0-9_]
    'whitespace': r'\s',           # [ \t\n\r\f\v]
    'non_whitespace': r'\S',       # [^ \t\n\r\f\v]
    'custom_class': r'[aeiou]',    # Vowels only
    'range': r'[a-z]',             # Lowercase letters
    'negated': r'[^0-9]',          # Not digits
}

# QUANTIFIERS
quantifiers = {
    'zero_or_more': r'a*',         # "", "a", "aa", "aaa"
    'one_or_more': r'a+',          # "a", "aa", "aaa"
    'zero_or_one': r'a?',          # "", "a"
    'exactly_n': r'a{3}',          # "aaa"
    'n_or_more': r'a{3,}',         # "aaa", "aaaa", "aaaaa"
    'between_n_m': r'a{2,4}',      # "aa", "aaa", "aaaa"
    'non_greedy': r'a+?',          # Non-greedy one or more
}

# ANCHORS AND BOUNDARIES
anchors = {
    'start_of_string': r'^hello',  # Must start with "hello"
    'end_of_string': r'world$',    # Must end with "world"
    'word_boundary': r'\bhello\b', # "hello" as whole word
    'non_word_boundary': r'\Bhello\B', # "hello" inside word
    'start_of_line': r'(?m)^hello',    # Start of any line
    'end_of_line': r'(?m)world$',      # End of any line
}

# GROUPS AND CAPTURING
groups = {
    'capturing': r'(hello)',       # Captures "hello"
    'non_capturing': r'(?:hello)', # Groups but doesn't capture
    'named_group': r'(?P<greeting>hello)', # Named capture
    'backreference': r'(\w+) \1',  # Matches repeated words
    'conditional': r'(a)?(?(1)b|c)', # Complex conditional
}

# LOOKAROUNDS
lookarounds = {
    'positive_lookahead': r'hello(?= world)',    # "hello" followed by " world"
    'negative_lookahead': r'hello(?! world)',    # "hello" NOT followed by " world"
    'positive_lookbehind': r'(?<=say )hello',    # "hello" preceded by "say "
    'negative_lookbehind': r'(?<!say )hello',    # "hello" NOT preceded by "say "
}

# BASIC PATTERNS
basic_patterns = {
    'literal': r'hello',           # Matches "hello" exactly
    'case_insensitive': r'(?i)hello',  # Matches "Hello", "HELLO", etc.
    'any_char': r'h.llo',          # Matches "hello", "hallo", "h3llo"
    'optional': r'colou?r',        # Matches "color" or "colour"
}

# CHARACTER CLASSES
character_classes = {
    'digit': r'\d',                # [0-9]
    'non_digit': r'\D',            # [^0-9]
    'word': r'\w',                 # [a-zA-Z0-9_]
    'non_word': r'\W',             # [^a-zA-Z0-9_]
    'whitespace': r'\s',           # [ \t\n\r\f\v]
    'non_whitespace': r'\S',       # [^ \t\n\r\f\v]
    'custom_class': r'[aeiou]',    # Vowels only
    'range': r'[a-z]',             # Lowercase letters
    'negated': r'[^0-9]',          # Not digits
}

# QUANTIFIERS
quantifiers = {
    'zero_or_more': r'a*',         # "", "a", "aa", "aaa"
    'one_or_more': r'a+',          # "a", "aa", "aaa"
    'zero_or_one': r'a?',          # "", "a"
    'exactly_n': r'a{3}',          # "aaa"
    'n_or_more': r'a{3,}',         # "aaa", "aaaa", "aaaaa"
    'between_n_m': r'a{2,4}',      # "aa", "aaa", "aaaa"
    'non_greedy': r'a+?',          # Non-greedy one or more
}

# ANCHORS AND BOUNDARIES
anchors = {
    'start_of_string': r'^hello',  # Must start with "hello"
    'end_of_string': r'world$',    # Must end with "world"
    'word_boundary': r'\bhello\b', # "hello" as whole word
    'non_word_boundary': r'\Bhello\B', # "hello" inside word
    'start_of_line': r'(?m)^hello',    # Start of any line
    'end_of_line': r'(?m)world$',      # End of any line
}

# GROUPS AND CAPTURING
groups = {
    'capturing': r'(hello)',       # Captures "hello"
    'non_capturing': r'(?:hello)', # Groups but doesn't capture
    'named_group': r'(?P<greeting>hello)', # Named capture
    'backreference': r'(\w+) \1',  # Matches repeated words
    'conditional': r'(a)?(?(1)b|c)', # Complex conditional
}

# LOOKAROUNDS
lookarounds = {
    'positive_lookahead': r'hello(?= world)',    # "hello" followed by " world"
    'negative_lookahead': r'hello(?! world)',    # "hello" NOT followed by " world"
    'positive_lookbehind': r'(?<=say )hello',    # "hello" preceded by "say "
    'negative_lookbehind': r'(?<!say )hello',    # "hello" NOT preceded by "say "
}

Python

Common Use Case Patterns

# EMAIL VALIDATION
email_patterns = {
    'basic': r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
    'strict': r'^[a-zA-Z0-9.!#$%&\'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$'
}

# PHONE NUMBERS
phone_patterns = {
    'us_simple': r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}',
    'us_with_country': r'(\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}',
    'international': r'^\+?[1-9]\d{1,14}$'
}

# URLS
url_patterns = {
    'http_https': r'https?://(?:[-\w.])+(?:[:\d]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:\w)*)?)?',
    'with_subdomains': r'https?://(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\+.~#?&=]*)'
}

# DATES
date_patterns = {
    'mm_dd_yyyy': r'\b(0?[1-9]|1[0-2])/(0?[1-9]|[12]\d|3[01])/\d{4}\b',
    'yyyy_mm_dd': r'\b\d{4}-(0?[1-9]|1[0-2])-(0?[1-9]|[12]\d|3[01])\b',
    'dd_mm_yyyy': r'\b(0?[1-9]|[12]\d|3[01])/(0?[1-9]|1[0-2])/\d{4}\b'
}

# IP ADDRESSES
ip_patterns = {
    'ipv4': r'\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b',
    'ipv4_strict': r'\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b'
}

# CREDIT CARDS
credit_card_patterns = {
    'visa': r'4[0-9]{12}(?:[0-9]{3})?',
    'mastercard': r'5[1-5][0-9]{14}',
    'amex': r'3[47][0-9]{13}',
    'any': r'(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|3[0-9]{13}|6(?:011|5[0-9]{2})[0-9]{12})'
}

# EMAIL VALIDATION
email_patterns = {
    'basic': r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
    'strict': r'^[a-zA-Z0-9.!#$%&\'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$'
}

# PHONE NUMBERS
phone_patterns = {
    'us_simple': r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}',
    'us_with_country': r'(\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}',
    'international': r'^\+?[1-9]\d{1,14}$'
}

# URLS
url_patterns = {
    'http_https': r'https?://(?:[-\w.])+(?:[:\d]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:\w)*)?)?',
    'with_subdomains': r'https?://(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\+.~#?&=]*)'
}

# DATES
date_patterns = {
    'mm_dd_yyyy': r'\b(0?[1-9]|1[0-2])/(0?[1-9]|[12]\d|3[01])/\d{4}\b',
    'yyyy_mm_dd': r'\b\d{4}-(0?[1-9]|1[0-2])-(0?[1-9]|[12]\d|3[01])\b',
    'dd_mm_yyyy': r'\b(0?[1-9]|[12]\d|3[01])/(0?[1-9]|1[0-2])/\d{4}\b'
}

# IP ADDRESSES
ip_patterns = {
    'ipv4': r'\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b',
    'ipv4_strict': r'\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b'
}

# CREDIT CARDS
credit_card_patterns = {
    'visa': r'4[0-9]{12}(?:[0-9]{3})?',
    'mastercard': r'5[1-5][0-9]{14}',
    'amex': r'3[47][0-9]{13}',
    'any': r'(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|3[0-9]{13}|6(?:011|5[0-9]{2})[0-9]{12})'
}

Python

Method Comparison Table

Method	Purpose	Returns	Use When
`re.match()`	Match from start	Match object or None	Validating entire string
`re.search()`	Find first match	Match object or None	Finding one occurrence
`re.findall()`	Find all matches	List of strings	Getting all matches as list
`re.finditer()`	Find all matches	Iterator of Match objects	Need match details for all
`re.sub()`	Replace matches	Modified string	Text replacement
`re.subn()`	Replace matches	(string, count) tuple	Replacement + count needed
`re.split()`	Split by pattern	List of strings	Splitting text
`re.compile()`	Compile pattern	Pattern object	Reusing same pattern

Flag Options

import re

# COMMON FLAGS
flags_demo = {
    'IGNORECASE': re.IGNORECASE,    # Case-insensitive matching
    'MULTILINE': re.MULTILINE,      # ^ and $ match line boundaries
    'DOTALL': re.DOTALL,           # . matches newlines too
    'VERBOSE': re.VERBOSE,          # Allow comments and whitespace
    'ASCII': re.ASCII,              # Make \w, \W, \b, \B ASCII-only
    'LOCALE': re.LOCALE,            # Make \w, \W, \b, \B locale-aware
    'UNICODE': re.UNICODE,          # Make \w, \W, \b, \B Unicode-aware
}

# COMBINING FLAGS
combined_flags = re.IGNORECASE | re.MULTILINE | re.DOTALL

# INLINE FLAGS
inline_flags = {
    'case_insensitive': r'(?i)pattern',
    'multiline': r'(?m)pattern',
    'dotall': r'(?s)pattern',
    'verbose': r'(?x)pattern',
    'multiple': r'(?ims)pattern',  # Combined flags
}

import re

# COMMON FLAGS
flags_demo = {
    'IGNORECASE': re.IGNORECASE,    # Case-insensitive matching
    'MULTILINE': re.MULTILINE,      # ^ and $ match line boundaries
    'DOTALL': re.DOTALL,           # . matches newlines too
    'VERBOSE': re.VERBOSE,          # Allow comments and whitespace
    'ASCII': re.ASCII,              # Make \w, \W, \b, \B ASCII-only
    'LOCALE': re.LOCALE,            # Make \w, \W, \b, \B locale-aware
    'UNICODE': re.UNICODE,          # Make \w, \W, \b, \B Unicode-aware
}

# COMBINING FLAGS
combined_flags = re.IGNORECASE | re.MULTILINE | re.DOTALL

# INLINE FLAGS
inline_flags = {
    'case_insensitive': r'(?i)pattern',
    'multiline': r'(?m)pattern',
    'dotall': r'(?s)pattern',
    'verbose': r'(?x)pattern',
    'multiple': r'(?ims)pattern',  # Combined flags
}

Python

graph TB
    A[Regex Cheat Sheet] --> B[Basic Patterns]
    A --> C[Character Classes]
    A --> D[Quantifiers]
    A --> E[Common Use Cases]

    B --> B1[Literal matching]
    B --> B2[Case options]
    B --> B3[Any character]

    C --> C1[Built-in classes]
    C --> C2[Custom classes]
    C --> C3[Negated classes]

    D --> D1[Basic quantifiers]
    D --> D2[Exact counts]
    D --> D3[Greedy vs lazy]

    E --> E1[Email validation]
    E --> E2[Phone numbers]
    E --> E3[URLs & IPs]

    style A fill:#e1f5fe
    style E fill:#c8e6c9

15. Practice Exercises

Beginner Exercises

# Exercise 1: Basic Pattern Matching
def exercise_1():
    """Find all words that start with 'th' (case insensitive)"""
    text = "The quick brown fox thinks that this is the best thing."
    # Your pattern here:
    pattern = r"th\w*"
    # Solution: re.findall(pattern, text, re.IGNORECASE)

# Exercise 2: Number Extraction
def exercise_2():
    """Extract all numbers (integers and decimals) from text"""
    text = "The price is $29.99 and we have 15 items in stock. Call 555-1234."
    # Your pattern here:
    pattern = r"\d+\.?\d*"
    # Test your solution

# Exercise 3: Email Finding
def exercise_3():
    """Find all email addresses in the text"""
    text = "Contact john@example.com or mary.smith@company.org for more info."
    # Your pattern here:
    pattern = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"
    # Test your solution

# Exercise 1: Basic Pattern Matching
def exercise_1():
    """Find all words that start with 'th' (case insensitive)"""
    text = "The quick brown fox thinks that this is the best thing."
    # Your pattern here:
    pattern = r"th\w*"
    # Solution: re.findall(pattern, text, re.IGNORECASE)

# Exercise 2: Number Extraction
def exercise_2():
    """Extract all numbers (integers and decimals) from text"""
    text = "The price is $29.99 and we have 15 items in stock. Call 555-1234."
    # Your pattern here:
    pattern = r"\d+\.?\d*"
    # Test your solution

# Exercise 3: Email Finding
def exercise_3():
    """Find all email addresses in the text"""
    text = "Contact john@example.com or mary.smith@company.org for more info."
    # Your pattern here:
    pattern = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"
    # Test your solution

Python

Intermediate Challenges

# Challenge 1: Phone Number Standardization
def challenge_1():
    """Convert various phone formats to (XXX) XXX-XXXX"""
    phones = [
        "123-456-7890",
        "(555) 123-4567", 
        "555.123.4567",
        "5551234567"
    ]
    # Create a function to standardize all formats

    def standardize_phone(phone):
        pattern = r"(\(?\d{3}\)?[-.\s]?)(\d{3}[-.\s]?)(\d{4})"
        match = re.search(pattern, phone)
        if match:
            return f"({match.group(1)}) {match.group(2)}-{match.group(3)}"
        return None

# Challenge 2: Log Parser
def challenge_2():
    """Parse web server logs to extract useful information"""
    log_entry = '192.168.1.100 - - [10/Oct/2023:13:55:36 +0000] "GET /api/users HTTP/1.1" 200 2326'

    # Extract: IP, timestamp, method, endpoint, status code, response size
    pattern = r'(\d+\.\d+\.\d+\.\d+).*?\[(.*?)\].*?"(\w+)\s+([^"]+).*?"\s+(\d+)\s+(\d+)'
    # Complete the parser function

# Challenge 3: HTML Tag Extractor
def challenge_3():
    """Extract all HTML tags and their attributes"""
    html = '<div class="container" id="main"><p>Hello</p><a href="http://example.com">Link</a></div>'

    # Extract tag name and attributes separately
    tag_pattern = r'<(\w+)([^>]*)>'
    # Complete the extraction logic

# Challenge 1: Phone Number Standardization
def challenge_1():
    """Convert various phone formats to (XXX) XXX-XXXX"""
    phones = [
        "123-456-7890",
        "(555) 123-4567", 
        "555.123.4567",
        "5551234567"
    ]
    # Create a function to standardize all formats

    def standardize_phone(phone):
        pattern = r"(\(?\d{3}\)?[-.\s]?)(\d{3}[-.\s]?)(\d{4})"
        match = re.search(pattern, phone)
        if match:
            return f"({match.group(1)}) {match.group(2)}-{match.group(3)}"
        return None

# Challenge 2: Log Parser
def challenge_2():
    """Parse web server logs to extract useful information"""
    log_entry = '192.168.1.100 - - [10/Oct/2023:13:55:36 +0000] "GET /api/users HTTP/1.1" 200 2326'

    # Extract: IP, timestamp, method, endpoint, status code, response size
    pattern = r'(\d+\.\d+\.\d+\.\d+).*?\[(.*?)\].*?"(\w+)\s+([^"]+).*?"\s+(\d+)\s+(\d+)'
    # Complete the parser function

# Challenge 3: HTML Tag Extractor
def challenge_3():
    """Extract all HTML tags and their attributes"""
    html = '<div class="container" id="main"><p>Hello</p><a href="http://example.com">Link</a></div>'

    # Extract tag name and attributes separately
    tag_pattern = r'<(\w+)([^>]*)>'
    # Complete the extraction logic

Python

Advanced Problems

# Advanced 1: Nested Structure Parser
def advanced_1():
    """Parse nested parentheses and extract content at each level"""
    text = "((hello (world) test) (foo bar))"

    # This requires recursive or advanced techniques
    # Hint: You might need to use a stack-based approach or recursive regex

# Advanced 2: Template Variable Extractor
def advanced_2():
    """Extract template variables like {{variable}} and {{variable|filter}}"""
    template = "Hello {{name}}, your balance is {{account.balance|currency}}."

    # Extract variable names and filters
    pattern = r'\{\{\s*([^|\}]+)(?:\|([^}]+))?\s*\}\}'
    # Complete the extraction and parsing

# Advanced 3: SQL Query Parser
def advanced_3():
    """Parse SQL SELECT statements to extract tables and columns"""
    sql = """
    SELECT users.name, orders.total, products.title 
    FROM users 
    JOIN orders ON users.id = orders.user_id 
    JOIN products ON orders.product_id = products.id 
    WHERE users.active = 1
    """

    # Extract table names, column names, and JOIN conditions
    # This is a complex parsing challenge

# Advanced 1: Nested Structure Parser
def advanced_1():
    """Parse nested parentheses and extract content at each level"""
    text = "((hello (world) test) (foo bar))"

    # This requires recursive or advanced techniques
    # Hint: You might need to use a stack-based approach or recursive regex

# Advanced 2: Template Variable Extractor
def advanced_2():
    """Extract template variables like {{variable}} and {{variable|filter}}"""
    template = "Hello {{name}}, your balance is {{account.balance|currency}}."

    # Extract variable names and filters
    pattern = r'\{\{\s*([^|\}]+)(?:\|([^}]+))?\s*\}\}'
    # Complete the extraction and parsing

# Advanced 3: SQL Query Parser
def advanced_3():
    """Parse SQL SELECT statements to extract tables and columns"""
    sql = """
    SELECT users.name, orders.total, products.title 
    FROM users 
    JOIN orders ON users.id = orders.user_id 
    JOIN products ON orders.product_id = products.id 
    WHERE users.active = 1
    """

    # Extract table names, column names, and JOIN conditions
    # This is a complex parsing challenge

Python

Exercise Solutions and Explanations

def show_solutions():
    """Detailed solutions with explanations"""

    solutions = {
        "Beginner 1": {
            "pattern": r"th\w*",
            "flags": "re.IGNORECASE",
            "explanation": "th matches literal 'th', \\w* matches zero or more word characters"
        },
        "Beginner 2": {
            "pattern": r"\d+\.?\d*",
            "explanation": "\\d+ matches digits, \\.? optional decimal point, \\d* optional decimal digits"
        },
        "Intermediate 1": {
            "approach": "Capture groups for area code, exchange, and number",
            "pattern": r"(\(?\d{3}\)?[-.\s]?)(\d{3}[-.\s]?)(\d{4})",
            "replacement": r"(\1) \2-\3"
        }
    }

    for exercise, solution in solutions.items():
        print(f"\n{exercise}:")
        for key, value in solution.items():
            print(f"  {key}: {value}")

# Interactive practice function
def practice_regex():
    """Interactive regex practice session"""

    exercises = [
        {
            "description": "Find all words ending in 'ing'",
            "text": "Running and jumping are fun activities.",
            "expected": ["Running", "jumping"]
        },
        {
            "description": "Extract all hashtags from social media text",
            "text": "Love this weather! #sunny #beautiful #weekend",
            "expected": ["#sunny", "#beautiful", "#weekend"]
        }
    ]

    for i, exercise in enumerate(exercises, 1):
        print(f"\nExercise {i}: {exercise['description']}")
        print(f"Text: {exercise['text']}")
        print(f"Expected: {exercise['expected']}")

        # Student would input their pattern here
        # pattern = input("Enter your regex pattern: ")
        # result = re.findall(pattern, exercise['text'])
        # print(f"Your result: {result}")

def show_solutions():
    """Detailed solutions with explanations"""

    solutions = {
        "Beginner 1": {
            "pattern": r"th\w*",
            "flags": "re.IGNORECASE",
            "explanation": "th matches literal 'th', \\w* matches zero or more word characters"
        },
        "Beginner 2": {
            "pattern": r"\d+\.?\d*",
            "explanation": "\\d+ matches digits, \\.? optional decimal point, \\d* optional decimal digits"
        },
        "Intermediate 1": {
            "approach": "Capture groups for area code, exchange, and number",
            "pattern": r"(\(?\d{3}\)?[-.\s]?)(\d{3}[-.\s]?)(\d{4})",
            "replacement": r"(\1) \2-\3"
        }
    }

    for exercise, solution in solutions.items():
        print(f"\n{exercise}:")
        for key, value in solution.items():
            print(f"  {key}: {value}")

# Interactive practice function
def practice_regex():
    """Interactive regex practice session"""

    exercises = [
        {
            "description": "Find all words ending in 'ing'",
            "text": "Running and jumping are fun activities.",
            "expected": ["Running", "jumping"]
        },
        {
            "description": "Extract all hashtags from social media text",
            "text": "Love this weather! #sunny #beautiful #weekend",
            "expected": ["#sunny", "#beautiful", "#weekend"]
        }
    ]

    for i, exercise in enumerate(exercises, 1):
        print(f"\nExercise {i}: {exercise['description']}")
        print(f"Text: {exercise['text']}")
        print(f"Expected: {exercise['expected']}")

        # Student would input their pattern here
        # pattern = input("Enter your regex pattern: ")
        # result = re.findall(pattern, exercise['text'])
        # print(f"Your result: {result}")

Python

graph TB
    A[Practice Exercises] --> B[Beginner Level]
    A --> C[Intermediate Level]
    A --> D[Advanced Level]

    B --> B1[Basic matching]
    B --> B2[Character classes]
    B --> B3[Simple quantifiers]

    C --> C1[Complex patterns]
    C --> C2[Data parsing]
    C --> C3[Text processing]

    D --> D1[Nested structures]
    D --> D2[Template parsing]
    D --> D3[Language parsing]

    style A fill:#e1f5fe
    style B fill:#e8f5e8
    style C fill:#fff3e0
    style D fill:#ffebee

Conclusion

Regular expressions are a powerful tool for text processing in Python. This comprehensive guide has taken you from beginner concepts to expert-level techniques.

Key Learning Outcomes

By completing this guide, you should be able to:

Understand Regex Fundamentals: Basic patterns, character classes, and quantifiers
Apply Advanced Techniques: Lookarounds, groups, and complex pattern matching
Debug Regex Patterns: Use systematic approaches to troubleshoot issues
Optimize Performance: Write efficient patterns and avoid common pitfalls
Solve Real-World Problems: Apply regex to practical text processing tasks

Best Practices Summary

graph TB
    A[Regex Best Practices] --> B[Development]
    A --> C[Testing]
    A --> D[Performance]
    A --> E[Maintenance]

    B --> B1[Start simple, add complexity]
    B --> B2[Use raw strings r'']
    B --> B3[Comment complex patterns]

    C --> C1[Test edge cases]
    C --> C2[Validate with real data]
    C --> C3[Use unit tests]

    D --> D1[Compile patterns for reuse]
    D --> D2[Avoid catastrophic backtracking]
    D --> D3[Use specific character classes]

    E --> E1[Document pattern purpose]
    E --> E2[Use meaningful variable names]
    E --> E3[Keep patterns readable]

Essential Quick Reference

import re

# Most commonly used patterns
essential_patterns = {
    'email': r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
    'phone_us': r"\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})",
    'url': r"https?://(?:[-\w.])+(?:[:\d]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:\w)*)?)?",
    'ipv4': r"\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b",
    'date_mdy': r"\b(0?[1-9]|1[0-2])/(0?[1-9]|[12]\d|3[01])/\d{4}\b",
    'number': r"-?\d+\.?\d*",
    'word': r"\b\w+\b"
}

# Essential methods
re.search(pattern, text)      # Find first match
re.findall(pattern, text)     # Find all matches
re.finditer(pattern, text)    # Iterator of match objects
re.sub(pattern, repl, text)   # Replace matches
re.split(pattern, text)       # Split by pattern
re.compile(pattern)           # Compile for reuse

import re

# Most commonly used patterns
essential_patterns = {
    'email': r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
    'phone_us': r"\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})",
    'url': r"https?://(?:[-\w.])+(?:[:\d]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:\w)*)?)?",
    'ipv4': r"\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b",
    'date_mdy': r"\b(0?[1-9]|1[0-2])/(0?[1-9]|[12]\d|3[01])/\d{4}\b",
    'number': r"-?\d+\.?\d*",
    'word': r"\b\w+\b"
}

# Essential methods
re.search(pattern, text)      # Find first match
re.findall(pattern, text)     # Find all matches
re.finditer(pattern, text)    # Iterator of match objects
re.sub(pattern, repl, text)   # Replace matches
re.split(pattern, text)       # Split by pattern
re.compile(pattern)           # Compile for reuse

Python

Your Next Steps

Practice Daily: Work with regex patterns regularly
Build a Pattern Library: Save useful patterns you create
Join Communities: Engage with other regex learners
Contribute: Help others and share your knowledge
Stay Updated: Follow regex developments

When to Use Regex (and When Not To)

✅ Good for:

Pattern matching and validation
Text extraction and parsing
Find and replace operations
Data cleaning and preprocessing

❌ Consider alternatives for:

Complex parsing (use dedicated parsers)
Simple string operations (use str methods)
Structured data (JSON, XML, CSV libraries)
Performance-critical code without optimization

Happy regex coding! 🚀

*”Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.” – Jamie Zawinski*

While humorous, this quote reminds us to use regex thoughtfully. This guide teaches you to be one of the developers who wields regex effectively and responsibly.

Discover more from Altgr Blog

Subscribe to get the latest posts sent to your email.

Table of Contents

1. Introduction to Regular Expressions

Why Learn Regex?

Common Use Cases

Regex Learning Roadmap

2. Getting Started with Python’s re Module

Raw Strings

3. Basic Regex Patterns

Literal Characters

Case Sensitivity

4. Character Classes and Special Characters

Basic Character Classes

Character Class Summary

Custom Character Classes

5. Quantifiers {#quantifiers}

Quantifier Summary

Greedy vs Non-Greedy

6. Groups and Capturing

Named Groups

Non-Capturing Groups

7. Anchors and Boundaries

Anchor Summary

8. Advanced Patterns

Lookahead and Lookbehind

Alternation

9. Regex Methods in Python

Essential re Module Functions

Compiled Patterns

10. Interactive Examples and Debugging {#interactive-debugging}

Step-by-Step Pattern Building

Interactive Pattern Tester

Debugging Tools and Techniques

Common Error Messages and Solutions

11. Real-World Applications

Email Validation

Phone Number Extraction

Log File Analysis

URL Extraction

Web Scraping Applications

Data Analysis and Processing

File Processing Applications

11. Performance and Optimization

Compiling Patterns

Optimizing Patterns

Performance Tips

12. Common Pitfalls and Best Practices {#best-practices}

Common Mistakes

1. Forgetting Raw Strings

2. Greedy vs Non-Greedy

3. Not Escaping Special Characters

Best Practices

1. Use Verbose Mode for Complex Patterns

2. Validate Input Before Processing

3. Use Appropriate Methods

Testing Regex Patterns

14. Comprehensive Regex Cheat Sheet

Quick Reference Patterns

Common Use Case Patterns

Method Comparison Table

Flag Options

15. Practice Exercises

Beginner Exercises

Intermediate Challenges

Advanced Problems

Exercise Solutions and Explanations

Conclusion

Key Learning Outcomes

Best Practices Summary

Essential Quick Reference

Your Next Steps

When to Use Regex (and When Not To)

Related

Discover more from Altgr Blog

Related Posts

Comprehensive Guide to API Endpoint Creation

Recursion in Python – DSA

Sorting Algorithms in Python – DSA

Leave a Reply Cancel reply

2. Getting Started with Python’s `re` Module

Essential `re` Module Functions