Data Extraction
Extracting structured data from unstructured text is a core regex use case. From parsing CSV files to extracting phone numbers, these patterns will help you pull data from any source.
Phone Numbers
US Phone Numbers (Flexible)
(?:\+?1[-.\s]?)?\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})Matches:
(555) 123-4567555-123-4567555.123.45675551234567+1 555 123 4567
International Phone Numbers
\+?\d{1,3}[-.\s]?\(?\d{1,4}\)?[-.\s]?\d{1,4}[-.\s]?\d{1,9}With Named Groups
\(?(?<area>\d{3})\)?[-.\s]?(?<exchange>\d{3})[-.\s]?(?<subscriber>\d{4})Dates
ISO 8601 Date
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})US Date Format (MM/DD/YYYY)
(?<month>\d{1,2})/(?<day>\d{1,2})/(?<year>\d{4})European Date Format (DD/MM/YYYY)
(?<day>\d{1,2})/(?<month>\d{1,2})/(?<year>\d{4})Multiple Formats
\d{1,2}[-/]\d{1,2}[-/]\d{2,4}|\d{4}[-/]\d{2}[-/]\d{2}Currency and Prices
US Dollar Amounts
\$\d{1,3}(?:,\d{3})*(?:\.\d{2})?Matches:
$5.99$1,234.56$1,000,000.00
Multiple Currencies
(?<currency>[€£$¥])(?<amount>\d{1,3}(?:[,.\s]\d{3})*(?:[.,]\d{2})?)Extract Price from Text
(?:price|cost|total):\s*\$?(?<amount>[\d,.]+)CSV Parsing
Simple CSV Fields
(?:^|,)(?:"([^"]*(?:""[^"]*)*)"|([^,]*))This handles:
- Unquoted fields
- Quoted fields
- Escaped quotes (
""inside quoted fields)
Extract Specific Column
For the 3rd column (0-indexed as column 2):
^(?:[^,]*,){2}([^,]*)JSON Key-Value Extraction
Extract Specific Key
"username":\s*"([^"]+)"Extract All String Values
"(\w+)":\s*"([^"]+)"Extract Numeric Values
"(\w+)":\s*(\d+(?:\.\d+)?)Addresses
US ZIP Codes
\b\d{5}(?:-\d{4})?\bMatches:
1234512345-6789
US State Abbreviations
\b[A-Z]{2}\bStreet Addresses
\d+\s+[\w\s]+(?:Street|St|Avenue|Ave|Road|Rd|Boulevard|Blvd|Drive|Dr|Lane|Ln|Way|Court|Ct)\.?Social Security Numbers
Security Warning: Be extremely careful when handling SSNs. Never log them, and always mask them in displays.
\b\d{3}-\d{2}-\d{4}\bMasked SSN (for display)
Find and replace to mask:
Find: \b(\d{3})-(\d{2})-(\d{4})\b
Replace: XXX-XX-$3
Credit Card Numbers
PCI Compliance: Never store or log full credit card numbers. Use this only for format validation before sending to a payment processor.
Basic Format (with spaces or dashes)
\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\bBy Card Type
Visa:
\b4\d{3}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\bMastercard:
\b5[1-5]\d{2}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\bAmerican Express:
\b3[47]\d{2}[-\s]?\d{6}[-\s]?\d{5}\bPractical Tips
Handling Optional Parts
Use ? for optional elements:
\+?1?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}Non-Capturing Groups for Structure
Use (?:...) when you need grouping but don't need to capture:
(?:\d{1,3}\.){3}\d{1,3}This matches IP addresses without creating 3 capture groups.
Greedy vs Lazy
For extracting quoted strings, use lazy quantifiers:
"[^"]*" // Greedy but limited by [^"]
".*?" // Lazy - stops at first quoteCommon Extraction Patterns
| Data Type | Pattern |
|---|---|
[\w.+-]+@[\w.-]+\.[a-zA-Z]{2,} | |
| Phone (US) | \(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4} |
| Date (ISO) | \d{4}-\d{2}-\d{2} |
| Time (24h) | \d{2}:\d{2}(?::\d{2})? |
| IP Address | \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} |
| ZIP Code | \d{5}(?:-\d{4})? |
| UUID | [a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12} |
| Hex Color | #[a-fA-F0-9]{6}|#[a-fA-F0-9]{3} |