bokamba / logforge / parse / Wazuh alert

$ logforge parse wazuh

Parse Wazuh alert logs → regex, Grok, Wazuh & rsyslog

Parsing Wazuh output is a level more meta than the other sources on this list, and it helps to say so up front: Wazuh has ALREADY decoded the raw log. Its analysis engine ran the original event (an sshd line, a Windows event, a firewall log) through its own decoders and rules, and what you are now parsing is Wazuh's ALERT output — the alerts.json file (typically /var/ossec/logs/alerts/alerts.json), where each alert is a single JSON object per line. So you are not extracting fields from a raw log; you are consuming the structured, already-enriched verdict Wazuh produced about that log.

Each alert JSON carries a consistent envelope. rule is the object that fired: rule.level is the severity on Wazuh's 0-15 scale (12+ is critical, and the level is your primary triage axis), rule.id is the numeric rule identifier, rule.description is the human label ('sshd: authentication failure'), and rule.groups categorizes it (authentication_failed, syscheck, etc.). agent identifies the endpoint: agent.id and agent.name (and agent.ip). decoder.name tells you which Wazuh decoder parsed the original line. Then comes the enriched payload: data.* holds the fields Wazuh's decoder EXTRACTED from the raw log — data.srcip, data.srcuser, data.dstuser, data.srcport — which is exactly the structured field set you would otherwise have had to parse yourself. full_log preserves the original raw line for reference, and location records where it came from (the file path or the Windows channel). timestamp is ISO 8601 with a millisecond fraction and offset.

The practical consequence is that parsing Wazuh alerts is JSON path selection over a known-ish schema, with the caveat that data.* varies by which decoder fired — a sshd alert populates data.srcip and data.srcuser, while a web alert populates data.url and data.id, so treat data.* as an open, event-type-dependent sub-object. For detection and correlation the fields that matter are rule.level and rule.id (severity and exact rule), rule.groups (class of event), agent.name (which host), and the decoder-extracted data.srcip / data.srcuser (attacker and account) — plus full_log when you need to see the untouched original. Because Wazuh has normalized across source types, a single pipeline keyed on rule.level, rule.groups, and data.srcip spans every log type Wazuh ingests.

Open this in LogForge →

What a Wazuh alert line looks like

The JSON sample below is fed verbatim into the engine to produce every parser on this page.

{"timestamp":"2026-07-03T14:22:15.123+0000","rule":{"level":10,"description":"sshd: brute force trying to get access to the system","id":"5712"},"agent":{"id":"003","name":"web01"},"data":{"srcip":"203.0.113.45","srcuser":"admin"},"location":"/var/log/auth.log"}
{"timestamp":"2026-07-03T14:22:48.900+0000","rule":{"level":5,"description":"Login session opened","id":"5501"},"agent":{"id":"003","name":"web01"},"data":{"srcuser":"jdoe"},"location":"/var/log/auth.log"}

Detected fields

The engine classified this sample as json and consolidated 9 fields across 2 lines. Fields marked literal were identical on every sample line, so they are baked into the pattern as anchors rather than captured.

  • timestamp : timestamp
  • rule_level : number
  • rule_description : quoted_string
  • rule_id : number
  • agent_id : number · literal
  • agent_name : quoted_string · literal
  • data_srcip : ipv4
  • data_srcuser : username
  • location : path · literal
  • note gap before column "data_srcuser" differs across lines; using line 1

Regex (named capture groups)

# sample: {"timestamp":"2026-07-03T14:22:15.123+0000","rule":{"level":10,"description":"sshd: brute force trying to get access to the system","id":"5712"},"agent":{"id":"003","name":"web01"},"data":{"srcip":"203.0.113.45","srcuser":"admin"},"location":"/var/log/auth.log"}
# groups: timestamp=2026-07-03T14:22:15.123+0000, rule_level=10, rule_description=sshd: brute force trying to get access to the system, rule_id=5712, data_srcip=203.0.113.45, data_srcuser=admin
^(?=.*?"timestamp":"(?<timestamp>[^"]*)")(?=.*?"level":(?<rule_level>-?\d+(?:\.\d+)?))(?=.*?"description":"(?<rule_description>[^"]*)")(?=.*?"id":"(?<rule_id>[^"]*)")(?=.*?"id":"003")(?=.*?"name":"web01")(?=.*?"srcip":"(?<data_srcip>[^"]*)"|)(?=.*?"srcuser":"(?<data_srcuser>[^"]*)")(?=.*?"location":"/var/log/auth\.log").*$
  • note input is JSON — use a JSON parser (jq, Logstash json filter, …) instead of a regex where possible
  • note a single linear template could not reproduce every input line — fields are captured with order-independent lookaheads instead
  • note caution: the engine reported 1 consolidation warning(s) (alignment fallbacks) — review the generated pattern against more sample lines

Grok pattern (Logstash / Elastic)

# custom patterns
WAZUH_NOTDQUOTE [^"]*

\{"timestamp":"%{TIMESTAMP_ISO8601:timestamp}","rule":\{"level":%{NUMBER:rule_level},"description":"%{WAZUH_NOTDQUOTE:rule_description}","id":"%{NUMBER:rule_id}"\},"agent":\{"id":"003","name":"web01(?:"\},"data":\{"srcip":"%{IPV4:data_srcip})?","srcuser":"%{USERNAME:data_srcuser}"\},"location":"/var/log/auth\.log
  • note json input — consider the Logstash json codec/filter instead of grok
  • note constant field "agent_id" embedded as literal anchor "003" (varying=false)
  • note constant field "agent_name" embedded as literal anchor "web01" (varying=false)
  • note constant field "location" embedded as literal anchor "/var/log/auth.log" (varying=false)
  • note 1 optional field(s) wrapped in (?:…)? inline regex — grok has no native optional syntax
  • note caution: the engine reported 1 consolidation warning(s) (alignment fallbacks) — gap literals were taken from line 1 and may not represent every line
  • note the assembled pattern does NOT match input line(s) 2 of 2 — those lines would _grokparsefailure; add per-shape patterns or use additional match entries
  • note custom patterns emitted — save the '# custom patterns' block to a file in your patterns_dir

Wazuh decoder (OS_Regex XML)

<!--
  Generated by LogForge - Wazuh decoder (OS_Regex dialect, not PCRE)
  sample: {"timestamp":"2026-07-03T14:22:15.123+0000","rule":{"level":10,"description":"sshd: brute force trying to get access to the system","id":"5712"},"agent":{"id":"
  test with: /var/ossec/bin/wazuh-logtest
-->

<decoder name="wazuh-json">
  <prematch>^{</prematch>
  <plugin_decoder>JSON_Decoder</plugin_decoder>
</decoder>
  • note JSON input: emitted a JSON_Decoder plugin decoder — Wazuh extracts every key automatically as dynamic fields (nested keys become dotted names)
  • note field names above are what the other LogForge generators use; JSON_Decoder will use the raw JSON keys instead
  • note decoder order and prematch specificity may need site-specific tuning (other decoders in your ruleset can shadow these) — validate with /var/ossec/bin/wazuh-logtest

rsyslog template / liblognorm rulebase

version=2
# wazuh — liblognorm v2 rulebase (generated by LogForge)
# Usage with rsyslog (mmnormalize runs liblognorm):
#   module(load="mmnormalize")
#   action(type="mmnormalize" rulebase="/etc/rsyslog.d/wazuh.rb" useRawMsg="on")
# Literal "%" is escaped as "%%"; raw tabs are written as \x09.
rule=wazuh:{"timestamp":"%timestamp:date-rfc5424%","rule":{"level":%rule_level:number%,"description":"%rule_description:char-to{"extradata":"\""}%","id":"%rule_id:number%"},"agent":{"id":"003","name":"web01"},"data":{"srcip":"%data_srcip:ipv4%","srcuser":"%data_srcuser:char-to{"extradata":"\""}%"},"location":"/var/log/auth.log"}
rule=wazuh:{"timestamp":"%timestamp:date-rfc5424%","rule":{"level":%rule_level:number%,"description":"%rule_description:char-to{"extradata":"\""}%","id":"%rule_id:number%"},"agent":{"id":"003","name":"web01","srcuser":"%data_srcuser:char-to{"extradata":"\""}%"},"location":"/var/log/auth.log
  • note json structure: rsyslog mmjsonparse handles CEE/JSON natively — consider action(type="mmjsonparse") instead of this rulebase
  • note trailing literal "\"}" reconstructed from line 1
  • note chosen parser types: timestamp=date-rfc5424, rule_level=number, rule_description=char-to("), rule_id=number, data_srcip=ipv4, data_srcuser=char-to(")
  • note optional columns (data_srcip): liblognorm has no optional parts within a single rule — emitted a second rule variant with only the always-present columns (max 2 variants; lines with other column combinations will not match and need extra rule= lines)

FAQ

What am I actually parsing when I parse Wazuh logs?
Wazuh's alert output, not a raw log. Wazuh already ran the original event through its own decoders and rules; alerts.json contains one JSON alert per line describing the verdict. So you consume Wazuh's enriched, structured output — the raw-log parsing has already been done for you and preserved in fields like data.* and full_log.
What does rule.level mean in a Wazuh alert?
It is the alert severity on Wazuh's 0-15 scale, where higher is more serious (12 and above is typically critical). It is the primary triage axis: filter or page on rule.level thresholds, and use rule.id for the exact rule and rule.groups for the category (authentication_failed, syscheck, etc.).
What is in the data.* object of a Wazuh alert?
The fields Wazuh's decoder extracted from the original raw log — for an sshd alert that is data.srcip, data.srcuser, data.srcport; for a web alert it might be data.url and data.id. It is event-type-dependent, so treat data.* as an open sub-object whose keys depend on which decoder fired.
Where do I find the original raw log inside a Wazuh alert?
In the full_log field, which preserves the untouched original line, and location, which records where it came from (a file path or Windows channel). Use full_log when you need to see exactly what Wazuh decoded, and data.* when you want the already-extracted structured fields.

Try it on your own Wazuh alert lines

Paste a few real lines, review the detected fields, and copy whichever format your stack needs. Free, no account, nothing uploaded.

Open this sample in LogForge →