What is the difference between Apache combined and common log format?

The 'common' format (CLF) ends after the response size: host, ident, user, time, request, status, bytes. 'combined' appends two quoted header fields, Referer and User-Agent. Both are just nicknames defined by LogFormat directives, so a given server logs whatever its active LogFormat says — inspect the config, not the file name.

Why does my Apache line have an extra field my parser does not expect?

Almost certainly a customized LogFormat. Common additions are a leading %v (virtual host), a %D/%T response-time column, %p (port), or %{X-Forwarded-For}i to capture the real client behind a proxy. Grab a representative sample from the actual server and regenerate the parser against it rather than assuming stock combined.

How do I recover the real client IP when Apache is behind a load balancer?

The %h field will be the proxy or load-balancer address. To get the originating client you need the X-Forwarded-For header, which is only present if the LogFormat includes %{X-Forwarded-For}i. If it does, capture that field and take the left-most address in the comma-separated list as the client (trusting it only as far as you trust your proxy chain).

Parse Apache access (combined) logs → regex, Grok, Wazuh & rsyslog

What an Apache access (combined) line looks like

The Combined sample below is fed verbatim into the engine to produce every parser on this page.

192.0.2.10 - jdoe [03/Jul/2026:14:22:15 +0300] "GET /wp-admin/ HTTP/1.1" 302 512 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/126.0"
203.0.113.99 - - [03/Jul/2026:14:22:40 +0300] "GET /.env HTTP/1.1" 404 153 "-" "curl/8.6.0"

Detected fields

The engine classified this sample as freeform and consolidated 11 fields across 2 lines. Fields marked literal were identical on every sample line, so they are baked into the pattern as anchors rather than captured.

ip1 : ipv4
_lit1 : literal · literal
literal : literal
timestamp : timestamp
method : http_method · literal
quoted_string : quoted_string
quoted_string2 : quoted_string · literal
status : http_status
number : number
url : url
user_agent : user_agent

Regex (named capture groups)

# sample: 192.0.2.10 - jdoe [03/Jul/2026:14:22:15 +0300] "GET /wp-admin/ HTTP/1.1" 302 512 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/126.0"
# groups: ip1=192.0.2.10, literal=jdoe, timestamp=03/Jul/2026:14:22:15 +0300, quoted_string=/wp-admin/, status=302, number=512, url=https://example.com/, user_agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/126.0
^(?<ip1>\d{1,3}(?:\.\d{1,3}){3}) - (?<literal>(?:[A-Za-z]+|-)) \[(?<timestamp>\d+/[A-Za-z]+/\d+:\d+:\d+:\d+ \+\d+)\] "GET (?<quoted_string>(?:/[A-Za-z]+-[A-Za-z]+/|/\.[A-Za-z]+)) HTTP/1\.1" (?<status>\d{3}) (?<number>-?\d+(?:\.\d+)?) "(?<url>[^"]*)" "(?<user_agent>[^"]*)"$

Grok pattern (Logstash / Elastic)

# custom patterns
APACHE_NOTDQUOTE [^"]*

%{IPV4:ip1} - %{NOTSPACE:literal} \[%{HTTPDATE:timestamp}\] "GET %{NOTSPACE:quoted_string} HTTP/1\.1" %{INT:status} %{NUMBER:number} "%{APACHE_NOTDQUOTE:url}" "%{APACHE_NOTDQUOTE:user_agent}

note constant field "method" embedded as literal anchor "GET" (varying=false)
note constant field "quoted_string2" embedded as literal anchor "HTTP/1.1" (varying=false)
note field "url" (url): samples do not all match %{URI}; using %{APACHE_NOTDQUOTE} instead
note custom patterns emitted — save the '# custom patterns' block to a file in your patterns_dir

Wazuh decoder (OS_Regex XML)

<!--
  Generated by LogForge - Wazuh decoder (OS_Regex dialect, not PCRE)
  sample: 192.0.2.10 - jdoe [03/Jul/2026:14:22:15 +0300] "GET /wp-admin/ HTTP/1.1" 302 512 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/5
  test with: /var/ossec/bin/wazuh-logtest
-->

<decoder name="apache-freeform">
  <prematch>^\d+.\d+.\d+.\d+ </prematch>
</decoder>

<decoder name="apache-freeform">
  <parent>apache-freeform</parent>
  <regex>^(\d+.\d+.\d+.\d+) - (\w+) [(\d+/\w+/\d+:\d+:\d+:\d+ \p\d+)] "GET (\S+) HTTP/1.1" (\d+) (\d+) "(\.+)" "(\.+)"</regex>
  <order>srcip, literal, timestamp, quoted_string, status, number, url, user_agent</order>
</decoder>

note no stable literal prefix found — <prematch> anchors on the leading field pattern; tighten it for your environment
note field "ip1" mapped to Wazuh conventional field "srcip"
note field "url": free-text capture (\.+) bounded by a quote anchor — OS_Regex greediness may over-consume if the anchor repeats
note field "user_agent": free-text capture (\.+) bounded by end of line — OS_Regex greediness may over-consume if the anchor repeats
note constant field "method" embedded as literal anchor "GET"
note constant field "quoted_string2" embedded as literal anchor "HTTP/1.1"
note decoder order and prematch specificity may need site-specific tuning (other decoders in your ruleset can shadow these) — validate with /var/ossec/bin/wazuh-logtest

rsyslog template / liblognorm rulebase

version=2
# apache — liblognorm v2 rulebase (generated by LogForge)
# Usage with rsyslog (mmnormalize runs liblognorm):
#   module(load="mmnormalize")
#   action(type="mmnormalize" rulebase="/etc/rsyslog.d/apache.rb" useRawMsg="on")
# Literal "%" is escaped as "%%"; raw tabs are written as \x09.
rule=apache:%ip1:ipv4% - %literal:word% [%timestamp:char-to{"extradata":"]"}%] "GET %quoted_string:word% HTTP/1.1" %status:number% %number:number% "%url:char-to{"extradata":"\""}%" "%user_agent:char-to{"extradata":"\""}%"

note trailing literal "\"" reconstructed from line 1
note field "timestamp": samples do not uniformly match engine type "timestamp"; using a generic parser
note chosen parser types: ip1=ipv4, literal=word, timestamp=char-to(]), quoted_string=word, status=number, number=number, url=char-to("), user_agent=char-to(")

FAQ

What is the difference between Apache combined and common log format?: The 'common' format (CLF) ends after the response size: host, ident, user, time, request, status, bytes. 'combined' appends two quoted header fields, Referer and User-Agent. Both are just nicknames defined by LogFormat directives, so a given server logs whatever its active LogFormat says — inspect the config, not the file name.
Why does my Apache line have an extra field my parser does not expect?: Almost certainly a customized LogFormat. Common additions are a leading %v (virtual host), a %D/%T response-time column, %p (port), or %{X-Forwarded-For}i to capture the real client behind a proxy. Grab a representative sample from the actual server and regenerate the parser against it rather than assuming stock combined.
How do I recover the real client IP when Apache is behind a load balancer?: The %h field will be the proxy or load-balancer address. To get the originating client you need the X-Forwarded-For header, which is only present if the LogFormat includes %{X-Forwarded-For}i. If it does, capture that field and take the left-most address in the comma-separated list as the client (trusting it only as far as you trust your proxy chain).

Try it on your own Apache access (combined) lines

Paste a few real lines, review the detected fields, and copy whichever format your stack needs. Free, no account, nothing uploaded.

Open this sample in LogForge →