Real-World Examples
Data Extraction

Data Extraction

Extracting structured data from unstructured text is a core regex use case. From parsing CSV files to extracting phone numbers, these patterns will help you pull data from any source.

Phone Numbers

US Phone Numbers (Flexible)

(?:\+?1[-.\s]?)?\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})

Matches:

  • (555) 123-4567
  • 555-123-4567
  • 555.123.4567
  • 5551234567
  • +1 555 123 4567

Try it → (opens in a new tab)

International Phone Numbers

\+?\d{1,3}[-.\s]?\(?\d{1,4}\)?[-.\s]?\d{1,4}[-.\s]?\d{1,9}

With Named Groups

\(?(?<area>\d{3})\)?[-.\s]?(?<exchange>\d{3})[-.\s]?(?<subscriber>\d{4})

Try it → (opens in a new tab)

Dates

ISO 8601 Date

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

Try it → (opens in a new tab)

US Date Format (MM/DD/YYYY)

(?<month>\d{1,2})/(?<day>\d{1,2})/(?<year>\d{4})

European Date Format (DD/MM/YYYY)

(?<day>\d{1,2})/(?<month>\d{1,2})/(?<year>\d{4})

Multiple Formats

\d{1,2}[-/]\d{1,2}[-/]\d{2,4}|\d{4}[-/]\d{2}[-/]\d{2}

Try it → (opens in a new tab)

Currency and Prices

US Dollar Amounts

\$\d{1,3}(?:,\d{3})*(?:\.\d{2})?

Matches:

  • $5.99
  • $1,234.56
  • $1,000,000.00

Try it → (opens in a new tab)

Multiple Currencies

(?<currency>[€£$¥])(?<amount>\d{1,3}(?:[,.\s]\d{3})*(?:[.,]\d{2})?)

Extract Price from Text

(?:price|cost|total):\s*\$?(?<amount>[\d,.]+)

Try it → (opens in a new tab)

CSV Parsing

Simple CSV Fields

(?:^|,)(?:"([^"]*(?:""[^"]*)*)"|([^,]*))

This handles:

  • Unquoted fields
  • Quoted fields
  • Escaped quotes ("" inside quoted fields)

Extract Specific Column

For the 3rd column (0-indexed as column 2):

^(?:[^,]*,){2}([^,]*)

Try it → (opens in a new tab)

JSON Key-Value Extraction

Extract Specific Key

"username":\s*"([^"]+)"

Try it → (opens in a new tab)

Extract All String Values

"(\w+)":\s*"([^"]+)"

Try it → (opens in a new tab)

Extract Numeric Values

"(\w+)":\s*(\d+(?:\.\d+)?)

Addresses

US ZIP Codes

\b\d{5}(?:-\d{4})?\b

Matches:

  • 12345
  • 12345-6789

Try it → (opens in a new tab)

US State Abbreviations

\b[A-Z]{2}\b

Street Addresses

\d+\s+[\w\s]+(?:Street|St|Avenue|Ave|Road|Rd|Boulevard|Blvd|Drive|Dr|Lane|Ln|Way|Court|Ct)\.?

Try it → (opens in a new tab)

Social Security Numbers

🚫

Security Warning: Be extremely careful when handling SSNs. Never log them, and always mask them in displays.

\b\d{3}-\d{2}-\d{4}\b

Masked SSN (for display)

Find and replace to mask:

Find: \b(\d{3})-(\d{2})-(\d{4})\b Replace: XXX-XX-$3

Credit Card Numbers

🚫

PCI Compliance: Never store or log full credit card numbers. Use this only for format validation before sending to a payment processor.

Basic Format (with spaces or dashes)

\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b

By Card Type

Visa:

\b4\d{3}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b

Mastercard:

\b5[1-5]\d{2}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b

American Express:

\b3[47]\d{2}[-\s]?\d{6}[-\s]?\d{5}\b

Practical Tips

Handling Optional Parts

Use ? for optional elements:

\+?1?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}

Non-Capturing Groups for Structure

Use (?:...) when you need grouping but don't need to capture:

(?:\d{1,3}\.){3}\d{1,3}

This matches IP addresses without creating 3 capture groups.

Greedy vs Lazy

For extracting quoted strings, use lazy quantifiers:

"[^"]*"     // Greedy but limited by [^"]
".*?"       // Lazy - stops at first quote

Common Extraction Patterns

Data TypePattern
Email[\w.+-]+@[\w.-]+\.[a-zA-Z]{2,}
Phone (US)\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}
Date (ISO)\d{4}-\d{2}-\d{2}
Time (24h)\d{2}:\d{2}(?::\d{2})?
IP Address\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
ZIP Code\d{5}(?:-\d{4})?
UUID[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}
Hex Color#[a-fA-F0-9]{6}|#[a-fA-F0-9]{3}