bokamba / logforge / parse / Nginx access

$ logforge parse nginx

Parse Nginx access logs → regex, Grok, Wazuh & rsyslog

Nginx writes one request per line to its access log, by default at /var/log/nginx/access.log, driven by the log_format directive in the http, server, or location block. The stock format nginx ships with is named combined and is byte-for-byte the same layout Apache calls the Combined Log Format, so patterns written for one usually parse the other. A line is a fixed sequence of space-separated fields: the client address ($remote_addr), an RFC 1413 identity that is almost always a dash, the authenticated user ($remote_user) or a dash, the local time inside square brackets, the full request line in double quotes ($request, which is method + path + protocol as one blob), the numeric status code, the response body size in bytes, and — because this is the combined variant — the Referer and User-Agent, each in double quotes.

The parsing traps are all about the quoting and the optional fields. The request line and both header values are wrapped in double quotes precisely because they can contain spaces; a naive whitespace split will shred a User-Agent like "Mozilla/5.0 (X11; Linux x86_64; rv:126.0) …" into a dozen useless tokens. The bracketed timestamp uses the strftime-ish %d/%b/%Y:%H:%M:%S %z form (03/Jul/2026:14:22:15 +0300), which is not ISO 8601 and needs its own strptime string. A missing Referer or user is rendered as a bare dash, not an empty string, so a field that is 'usually an IP' can legitimately be '-'. Bodies of zero bytes (health-check 200s that return nothing) show up as a literal 0, and clients that reset mid-response can leave the size as a dash too.

For detection and SIEM work the fields that carry signal are the client IP (rate-limiting, geolocation, blocklists), the request path and method (scanning for /.env, /wp-admin, admin panels), and the status code (bursts of 401/403/404 from one source, or a 200 on a path that should never return 200). The User-Agent is weak evidence on its own but flags the obvious automated clients — curl, python-requests, kube-probe, and the long tail of vulnerability scanners that never bother to fake a browser string. Because the format is so stable and so widely deployed, a correct regex or Grok pattern for nginx combined is one of the highest-leverage parsers you can keep on hand.

Open this in LogForge →

What a Nginx access line looks like

The Combined sample below is fed verbatim into the engine to produce every parser on this page.

203.0.113.45 - - [03/Jul/2026:14:22:15 +0300] "GET /api/health HTTP/1.1" 200 2 "-" "kube-probe/1.29"
198.51.100.23 - - [03/Jul/2026:14:22:19 +0300] "POST /login HTTP/1.1" 401 231 "https://example.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:126.0) Gecko/20100101 Firefox/126.0"

Detected fields

The engine classified this sample as freeform and consolidated 11 fields across 2 lines. Fields marked literal were identical on every sample line, so they are baked into the pattern as anchors rather than captured.

  • ip1 : ipv4
  • _lit1 : literal · literal
  • _lit2 : literal · literal
  • timestamp : timestamp
  • method : http_method
  • path : path
  • quoted_string : quoted_string · literal
  • status : http_status
  • number : number
  • url : url
  • user_agent : user_agent

Regex (named capture groups)

# sample: 203.0.113.45 - - [03/Jul/2026:14:22:15 +0300] "GET /api/health HTTP/1.1" 200 2 "-" "kube-probe/1.29"
# groups: ip1=203.0.113.45, timestamp=03/Jul/2026:14:22:15 +0300, method=GET, path=/api/health, status=200, number=2, url=-, user_agent=kube-probe/1.29
^(?<ip1>\d{1,3}(?:\.\d{1,3}){3}) - - \[(?<timestamp>\d+/[A-Za-z]+/\d+:\d+:\d+:\d+ \+\d+)\] "(?<method>[^"]*) (?<path>(?:/[^\s"']*|[A-Za-z]:[^\s"']*)) HTTP/1\.1" (?<status>\d{3}) (?<number>-?\d+(?:\.\d+)?) "(?<url>[^"]*)" "(?<user_agent>[^"]*)"$

Grok pattern (Logstash / Elastic)

# custom patterns
NGINX_NOTDQUOTE [^"]*

%{IPV4:ip1} - - \[%{HTTPDATE:timestamp}\] "%{WORD:method} %{UNIXPATH:path} HTTP/1\.1" %{INT:status} %{NUMBER:number} "%{NGINX_NOTDQUOTE:url}" "%{NGINX_NOTDQUOTE:user_agent}
  • note constant field "quoted_string" embedded as literal anchor "HTTP/1.1" (varying=false)
  • note field "url" (url): samples do not all match %{URI}; using %{NGINX_NOTDQUOTE} instead
  • note custom patterns emitted — save the '# custom patterns' block to a file in your patterns_dir

Wazuh decoder (OS_Regex XML)

<!--
  Generated by LogForge - Wazuh decoder (OS_Regex dialect, not PCRE)
  sample: 203.0.113.45 - - [03/Jul/2026:14:22:15 +0300] "GET /api/health HTTP/1.1" 200 2 "-" "kube-probe/1.29"
  test with: /var/ossec/bin/wazuh-logtest
-->

<decoder name="nginx-freeform">
  <prematch>^\d+.\d+.\d+.\d+ </prematch>
</decoder>

<decoder name="nginx-freeform">
  <parent>nginx-freeform</parent>
  <regex>^(\d+.\d+.\d+.\d+) - - [(\d+/\w+/\d+:\d+:\d+:\d+ \p\d+)] "(\w+) (\S+) HTTP/1.1" (\d+) (\d+) "(\.+)" "(\.+)"</regex>
  <order>srcip, timestamp, method, path, status, number, url, user_agent</order>
</decoder>
  • note no stable literal prefix found — <prematch> anchors on the leading field pattern; tighten it for your environment
  • note field "ip1" mapped to Wazuh conventional field "srcip"
  • note field "url": free-text capture (\.+) bounded by a quote anchor — OS_Regex greediness may over-consume if the anchor repeats
  • note field "user_agent": free-text capture (\.+) bounded by end of line — OS_Regex greediness may over-consume if the anchor repeats
  • note constant field "quoted_string" embedded as literal anchor "HTTP/1.1"
  • note decoder order and prematch specificity may need site-specific tuning (other decoders in your ruleset can shadow these) — validate with /var/ossec/bin/wazuh-logtest

rsyslog template / liblognorm rulebase

version=2
# nginx — liblognorm v2 rulebase (generated by LogForge)
# Usage with rsyslog (mmnormalize runs liblognorm):
#   module(load="mmnormalize")
#   action(type="mmnormalize" rulebase="/etc/rsyslog.d/nginx.rb" useRawMsg="on")
# Literal "%" is escaped as "%%"; raw tabs are written as \x09.
rule=nginx:%ip1:ipv4% - - [%timestamp:char-to{"extradata":"]"}%] "%method:word% %path:word% HTTP/1.1" %status:number% %number:number% "%url:char-to{"extradata":"\""}%" "%user_agent:char-to{"extradata":"\""}%"
  • note trailing literal "\"" reconstructed from line 1
  • note field "timestamp": samples do not uniformly match engine type "timestamp"; using a generic parser
  • note chosen parser types: ip1=ipv4, timestamp=char-to(]), method=word, path=word, status=number, number=number, url=char-to("), user_agent=char-to(")

FAQ

What is the difference between the nginx main, combined, and common log formats?
combined (nginx's built-in default) is the Common Log Format plus two trailing quoted fields, the Referer and User-Agent. The bare 'common' format stops after the response size. nginx also ships a predefined format literally named 'combined'; 'main' is just a conventional name people give their own custom log_format and carries whatever fields they defined — always check the log_format directive rather than assuming.
Why does my nginx timestamp fail to parse as ISO 8601?
Because it is not ISO 8601. nginx writes the local time as 03/Jul/2026:14:22:15 +0300 — day/abbreviated-month/year, a colon before the time, and a numeric UTC offset. Parse it with a %d/%b/%Y:%H:%M:%S %z strptime pattern (Grok's HTTPDATE handles it), and remember the month is a locale-sensitive English abbreviation.
How do I split the nginx request field into method, path, and protocol?
The $request field is logged as a single quoted string like "GET /api/health HTTP/1.1". Capture the whole quoted value first, then split it on spaces into method, request URI, and HTTP version. Doing it as one capture avoids miscounting fields when the URI itself contains encoded spaces or the request line is malformed.
Can the same regex parse both nginx and Apache access logs?
For the combined format, usually yes — nginx combined and Apache's combined layout are identical field-for-field. The differences that bite are custom log_format lines, Apache's optional %v (vhost) or response-time fields, and whichever module added extra columns. Always validate the pattern against a real sample from each server rather than trusting that 'combined means combined'.

Try it on your own Nginx access lines

Paste a few real lines, review the detected fields, and copy whichever format your stack needs. Free, no account, nothing uploaded.

Open this sample in LogForge →