Log Parsing
Parsing log files is a daily task for many developers. Whether you're debugging production issues or building monitoring tools, regex is essential for extracting structured data from log lines.
Apache/Nginx Access Logs
Combined Log Format
A typical Apache/Nginx access log line:
192.168.1.1 - john [10/Oct/2023:13:55:36 -0700] "GET /api/users HTTP/1.1" 200 2326 "https://example.com" "Mozilla/5.0"Pattern to parse it:
^(?<ip>[\d.]+)\s+-\s+(?<user>\S+)\s+\[(?<timestamp>[^\]]+)\]\s+"(?<method>\w+)\s+(?<path>\S+)\s+(?<protocol>[^"]+)"\s+(?<status>\d+)\s+(?<bytes>\d+)Try this pattern → (opens in a new tab)
Captured groups:
ip— Client IP addressuser— Authenticated user (or-)timestamp— Request timestampmethod— HTTP method (GET, POST, etc.)path— Requested URL pathprotocol— HTTP protocol versionstatus— Response status codebytes— Response size
Extract Just Status Codes
Quick pattern to find all HTTP status codes:
"\s(\d{3})\sFind Error Responses (4xx and 5xx)
"\s([45]\d{2})\sApplication Logs
Timestamp Extraction
Common timestamp formats:
ISO 8601:
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:\d{2})?Common log format:
\d{2}/\w{3}/\d{4}:\d{2}:\d{2}:\d{2}\s[+-]\d{4}Log Levels
Extract log level from structured logs:
\b(DEBUG|INFO|WARN|ERROR|FATAL)\bJSON Log Parsing
For JSON-structured logs, extract specific fields:
Extract error messages:
"error":\s*"([^"]+)"Extract request IDs:
"request_id":\s*"([a-f0-9-]+)"Try JSON parsing → (opens in a new tab)
Stack Traces
Java Stack Trace
Extract class and line number from Java exceptions:
at\s+(?<class>[\w.$]+)\.(?<method>\w+)\((?<file>\w+\.java):(?<line>\d+)\)Python Traceback
File "(?<file>[^"]+)", line (?<line>\d+), in (?<function>\w+)JavaScript Stack Trace
at\s+(?:(?<function>\w+)\s+\()?(?<file>[^:]+):(?<line>\d+):(?<column>\d+)\)?Practical Tips
Performance Warning: When parsing large log files, compile your regex once and reuse it. Creating a new RegExp for each line is slow.
Use Named Groups
Named groups make your code self-documenting:
const pattern = /^(?<ip>[\d.]+)\s.*?"(?<method>\w+)\s(?<path>\S+)/;
const match = line.match(pattern);
if (match) {
console.log(match.groups.ip); // "192.168.1.1"
console.log(match.groups.method); // "GET"
console.log(match.groups.path); // "/api/users"
}Handle Variations
Real logs have inconsistencies. Use optional groups:
(?:ERROR|Error|error):\s*(.+)Test with Real Data
Always test your patterns with actual log samples, including edge cases:
- Empty fields
- Special characters in paths
- Multiline log entries
- Malformed entries
Common Log Patterns Cheatsheet
| Log Type | Pattern |
|---|---|
| IP Address | \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} |
| ISO Timestamp | \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2} |
| UUID | [a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12} |
| HTTP Status | [1-5]\d{2} |
| Log Level | (DEBUG|INFO|WARN|ERROR|FATAL) |