What am I actually parsing when I parse Wazuh logs?

Wazuh's alert output, not a raw log. Wazuh already ran the original event through its own decoders and rules; alerts.json contains one JSON alert per line describing the verdict. So you consume Wazuh's enriched, structured output — the raw-log parsing has already been done for you and preserved in fields like data.* and full_log.

What does rule.level mean in a Wazuh alert?

It is the alert severity on Wazuh's 0-15 scale, where higher is more serious (12 and above is typically critical). It is the primary triage axis: filter or page on rule.level thresholds, and use rule.id for the exact rule and rule.groups for the category (authentication_failed, syscheck, etc.).

What is in the data.* object of a Wazuh alert?

The fields Wazuh's decoder extracted from the original raw log — for an sshd alert that is data.srcip, data.srcuser, data.srcport; for a web alert it might be data.url and data.id. It is event-type-dependent, so treat data.* as an open sub-object whose keys depend on which decoder fired.

Where do I find the original raw log inside a Wazuh alert?

In the full_log field, which preserves the untouched original line, and location, which records where it came from (a file path or Windows channel). Use full_log when you need to see exactly what Wazuh decoded, and data.* when you want the already-extracted structured fields.

Parse Wazuh alert logs → regex, Grok, Wazuh & rsyslog

What a Wazuh alert line looks like

The JSON sample below is fed verbatim into the engine to produce every parser on this page.

{"timestamp":"2026-07-03T14:22:15.123+0000","rule":{"level":10,"description":"sshd: brute force trying to get access to the system","id":"5712"},"agent":{"id":"003","name":"web01"},"data":{"srcip":"203.0.113.45","srcuser":"admin"},"location":"/var/log/auth.log"}
{"timestamp":"2026-07-03T14:22:48.900+0000","rule":{"level":5,"description":"Login session opened","id":"5501"},"agent":{"id":"003","name":"web01"},"data":{"srcuser":"jdoe"},"location":"/var/log/auth.log"}

Detected fields

The engine classified this sample as json and consolidated 9 fields across 2 lines. Fields marked literal were identical on every sample line, so they are baked into the pattern as anchors rather than captured.

timestamp : timestamp
rule_level : number
rule_description : quoted_string
rule_id : number
agent_id : number · literal
agent_name : quoted_string · literal
data_srcip : ipv4
data_srcuser : username
location : path · literal

note gap before column "data_srcuser" differs across lines; using line 1

Regex (named capture groups)

# sample: {"timestamp":"2026-07-03T14:22:15.123+0000","rule":{"level":10,"description":"sshd: brute force trying to get access to the system","id":"5712"},"agent":{"id":"003","name":"web01"},"data":{"srcip":"203.0.113.45","srcuser":"admin"},"location":"/var/log/auth.log"}
# groups: timestamp=2026-07-03T14:22:15.123+0000, rule_level=10, rule_description=sshd: brute force trying to get access to the system, rule_id=5712, data_srcip=203.0.113.45, data_srcuser=admin
^(?=.*?"timestamp":"(?<timestamp>[^"]*)")(?=.*?"level":(?<rule_level>-?\d+(?:\.\d+)?))(?=.*?"description":"(?<rule_description>[^"]*)")(?=.*?"id":"(?<rule_id>[^"]*)")(?=.*?"id":"003")(?=.*?"name":"web01")(?=.*?"srcip":"(?<data_srcip>[^"]*)"|)(?=.*?"srcuser":"(?<data_srcuser>[^"]*)")(?=.*?"location":"/var/log/auth\.log").*$

note input is JSON — use a JSON parser (jq, Logstash json filter, …) instead of a regex where possible
note a single linear template could not reproduce every input line — fields are captured with order-independent lookaheads instead
note caution: the engine reported 1 consolidation warning(s) (alignment fallbacks) — review the generated pattern against more sample lines

Grok pattern (Logstash / Elastic)

# custom patterns
WAZUH_NOTDQUOTE [^"]*

\{"timestamp":"%{TIMESTAMP_ISO8601:timestamp}","rule":\{"level":%{NUMBER:rule_level},"description":"%{WAZUH_NOTDQUOTE:rule_description}","id":"%{NUMBER:rule_id}"\},"agent":\{"id":"003","name":"web01(?:"\},"data":\{"srcip":"%{IPV4:data_srcip})?","srcuser":"%{USERNAME:data_srcuser}"\},"location":"/var/log/auth\.log

note json input — consider the Logstash json codec/filter instead of grok
note constant field "agent_id" embedded as literal anchor "003" (varying=false)
note constant field "agent_name" embedded as literal anchor "web01" (varying=false)
note constant field "location" embedded as literal anchor "/var/log/auth.log" (varying=false)
note 1 optional field(s) wrapped in (?:…)? inline regex — grok has no native optional syntax
note caution: the engine reported 1 consolidation warning(s) (alignment fallbacks) — gap literals were taken from line 1 and may not represent every line
note the assembled pattern does NOT match input line(s) 2 of 2 — those lines would _grokparsefailure; add per-shape patterns or use additional match entries
note custom patterns emitted — save the '# custom patterns' block to a file in your patterns_dir

Wazuh decoder (OS_Regex XML)

<!--
  Generated by LogForge - Wazuh decoder (OS_Regex dialect, not PCRE)
  sample: {"timestamp":"2026-07-03T14:22:15.123+0000","rule":{"level":10,"description":"sshd: brute force trying to get access to the system","id":"5712"},"agent":{"id":"
  test with: /var/ossec/bin/wazuh-logtest
-->

<decoder name="wazuh-json">
  <prematch>^{</prematch>
  <plugin_decoder>JSON_Decoder</plugin_decoder>
</decoder>

<!-- ============================================================
     ALERT RULE (starter) — put this in a RULES file, e.g.
     /var/ossec/etc/rules/local_rules.xml. Decoders and rules live
     in SEPARATE files. The rule matches the decoder above through
     <decoded_as>; set <level> and add <field>/<match> conditions so
     it alerts only on the events you care about. Rule ids 100000+
     are the user range — change them if they collide with yours.
     ============================================================ -->
<group name="wazuh,">
  <rule id="100000" level="3">
    <decoded_as>wazuh-json</decoded_as>
    <description>wazuh: decoded event</description>
  </rule>
</group>

note JSON input: emitted a JSON_Decoder plugin decoder — Wazuh extracts every key automatically as dynamic fields (nested keys become dotted names)
note field names above are what the other LogForge generators use; JSON_Decoder will use the raw JSON keys instead
note added a starter alert <rule> (level 3, matched to the decoder via <decoded_as>) — put it in a RULES file (not the decoders file), set the level, and add <field>/<match> conditions; the commented example child rule shows the pattern
note decoder order and prematch specificity may need site-specific tuning (other decoders in your ruleset can shadow these) — validate with /var/ossec/bin/wazuh-logtest

Wazuh's OS_Regex is not PCRE — a bare . is a literal dot and \. matches any character. Test Wazuh OS_Regex patterns →

rsyslog template / liblognorm rulebase

version=2
# wazuh — liblognorm v2 rulebase (generated by LogForge)
# Usage with rsyslog (mmnormalize runs liblognorm):
#   module(load="mmnormalize")
#   action(type="mmnormalize" rulebase="/etc/rsyslog.d/wazuh.rb" useRawMsg="on")
# Literal "%" is escaped as "%%"; raw tabs are written as \x09.
rule=wazuh:{"timestamp":"%timestamp:char-to{"extradata":"\""}%","rule":{"level":%rule_level:number%,"description":"%rule_description:char-to{"extradata":"\""}%","id":"%rule_id:number%"},"agent":{"id":"003","name":"web01"},"data":{"srcip":"%data_srcip:ipv4%","srcuser":"%data_srcuser:char-to{"extradata":"\""}%"},"location":"/var/log/auth.log"}
rule=wazuh:{"timestamp":"%timestamp:char-to{"extradata":"\""}%","rule":{"level":%rule_level:number%,"description":"%rule_description:char-to{"extradata":"\""}%","id":"%rule_id:number%"},"agent":{"id":"003","name":"web01"},"data":{"srcuser":"%data_srcuser:char-to{"extradata":"\""}%"},"location":"/var/log/auth.log"}

note json structure: rsyslog mmjsonparse handles CEE/JSON natively — consider action(type="mmjsonparse") instead of this rulebase
note trailing literal "\"}" reconstructed from line 1
note field "timestamp": timestamp uses an ISO-8601-basic offset (e.g. "+0000", no colon) that liblognorm's date-rfc5424 rejects; captured as a raw token instead of a typed date
note chosen parser types: timestamp=char-to("), rule_level=number, rule_description=char-to("), rule_id=number, data_srcip=ipv4, data_srcuser=char-to(")
note optional columns (data_srcip): liblognorm has no optional parts within a single rule — emitted a second rule variant with only the always-present columns (max 2 variants; lines with other column combinations will not match and need extra rule= lines)

Splunk

# props.conf  (search-time extraction)
[<REPLACE_WITH_SOURCETYPE>]
EXTRACT-logforge = (?=.*?"timestamp":"(?<timestamp>[^"]*)")(?=.*?"level":(?<rule_level>-?\d+(?:\.\d+)?))(?=.*?"description":"(?<rule_description>[^"]*)")(?=.*?"id":"(?<rule_id>[^"]*)")(?=.*?"id":"003")(?=.*?"name":"web01")(?=.*?"srcip":"(?<data_srcip>[^"]*)"|)(?=.*?"srcuser":"(?<data_srcuser>[^"]*)")(?=.*?"location":"/var/log/auth\.log").*

# Quick search-time test in SPL:
# | rex field=_raw "(?=.*?\"timestamp\":\"(?<timestamp>[^\"]*)\")(?=.*?\"level\":(?<rule_level>-?\\d+(?:\\.\\d+)?))(?=.*?\"description\":\"(?<rule_description>[^\"]*)\")(?=.*?\"id\":\"(?<rule_id>[^\"]*)\")(?=.*?\"id\":\"003\")(?=.*?\"name\":\"web01\")(?=.*?\"srcip\":\"(?<data_srcip>[^\"]*)\"|)(?=.*?\"srcuser\":\"(?<data_srcuser>[^\"]*)\")(?=.*?\"location\":\"/var/log/auth\\.log\").*"

note regex: input is JSON — use a JSON parser (jq, Logstash json filter, …) instead of a regex where possible
note regex: a single linear template could not reproduce every input line — fields are captured with order-independent lookaheads instead
note regex: caution: the engine reported 1 consolidation warning(s) (alignment fallbacks) — review the generated pattern against more sample lines
note EXTRACT-<class> names must be unique within a sourcetype stanza — rename EXTRACT-logforge if you already use that class for this sourcetype
note a timestamp field was detected: this EXTRACT only makes it a searchable field. To set the event _time at index time, add TIME_PREFIX and TIME_FORMAT to this props.conf stanza (TIME_FORMAT uses Splunk strptime, e.g. %Y-%m-%dT%H:%M:%S) — this generator does not guess the strptime format.

ES ingest

PUT _ingest/pipeline/wazuh
{
  "description": "LogForge-generated ingest pipeline for wazuh",
  "processors": [
    {
      "json": {
        "field": "message",
        "add_to_root": true
      }
    }
  ]
}

note grok: json input — consider the Logstash json codec/filter instead of grok
note grok: constant field "agent_id" embedded as literal anchor "003" (varying=false)
note grok: constant field "agent_name" embedded as literal anchor "web01" (varying=false)
note grok: constant field "location" embedded as literal anchor "/var/log/auth.log" (varying=false)
note grok: 1 optional field(s) wrapped in (?:…)? inline regex — grok has no native optional syntax
note grok: caution: the engine reported 1 consolidation warning(s) (alignment fallbacks) — gap literals were taken from line 1 and may not represent every line
note grok: the assembled pattern does NOT match input line(s) 2 of 2 — those lines would _grokparsefailure; add per-shape patterns or use additional match entries
note grok: custom patterns emitted — save the '# custom patterns' block to a file in your patterns_dir
note json structure: emitted a { json: { field: "message", add_to_root: true } } processor — it parses the JSON line and lifts every field to the top level of the record, so a grok pattern is unnecessary
note test in Kibana Dev Tools with: POST _ingest/pipeline/wazuh/_simulate (supply a docs[] array whose _source.message holds a sample line)

Graylog

# Grok patterns to add under System > Grok Patterns:
# (Graylog needs these custom patterns installed globally BEFORE the rule/extractor below will work.)
#   WAZUH_NOTDQUOTE [^"]*

# --- Graylog processing pipeline rule (primary) ---
# Paste under System > Pipelines > Manage rules, then attach the rule to a pipeline stage.
rule "wazuh-parse"
when
  has_field("message")
then
  let gp = grok(pattern: "\\{\"timestamp\":\"%{TIMESTAMP_ISO8601:timestamp}\",\"rule\":\\{\"level\":%{NUMBER:rule_level},\"description\":\"%{WAZUH_NOTDQUOTE:rule_description}\",\"id\":\"%{NUMBER:rule_id}\"\\},\"agent\":\\{\"id\":\"003\",\"name\":\"web01(?:\"\\},\"data\":\\{\"srcip\":\"%{IPV4:data_srcip})?\",\"srcuser\":\"%{USERNAME:data_srcuser}\"\\},\"location\":\"/var/log/auth\\.log", value: to_string($message.message), only_named_captures: true);
  set_fields(gp);
end

# --- Graylog import-ready extractor JSON (secondary) ---
# Save as a .json file and import under System > Inputs > (input) > Manage extractors > Actions > Import extractors.
{
  "extractors": [
    {
      "title": "wazuh",
      "extractor_type": "grok",
      "converters": [],
      "order": 0,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "\\{\"timestamp\":\"%{TIMESTAMP_ISO8601:timestamp}\",\"rule\":\\{\"level\":%{NUMBER:rule_level},\"description\":\"%{WAZUH_NOTDQUOTE:rule_description}\",\"id\":\"%{NUMBER:rule_id}\"\\},\"agent\":\\{\"id\":\"003\",\"name\":\"web01(?:\"\\},\"data\":\\{\"srcip\":\"%{IPV4:data_srcip})?\",\"srcuser\":\"%{USERNAME:data_srcuser}\"\\},\"location\":\"/var/log/auth\\.log",
        "named_captures_only": true
      },
      "condition_type": "none",
      "condition_value": ""
    }
  ],
  "version": "5.0.0"
}

note grok: json input — consider the Logstash json codec/filter instead of grok
note grok: constant field "agent_id" embedded as literal anchor "003" (varying=false)
note grok: constant field "agent_name" embedded as literal anchor "web01" (varying=false)
note grok: constant field "location" embedded as literal anchor "/var/log/auth.log" (varying=false)
note grok: 1 optional field(s) wrapped in (?:…)? inline regex — grok has no native optional syntax
note grok: caution: the engine reported 1 consolidation warning(s) (alignment fallbacks) — gap literals were taken from line 1 and may not represent every line
note grok: the assembled pattern does NOT match input line(s) 2 of 2 — those lines would _grokparsefailure; add per-shape patterns or use additional match entries
note grok: custom patterns emitted — save the '# custom patterns' block to a file in your patterns_dir
note 1 custom grok pattern(s) (WAZUH_NOTDQUOTE) must be installed globally first under System > Grok Patterns — see the block at the top of the output
note primary artifact is the processing-pipeline rule; the extractor JSON is an equivalent import-ready alternative for the classic extractor UI

Datadog

logforge_rule \{"timestamp":"%{date("yyyy-MM-dd'T'HH:mm:ss.SSSZ"):timestamp}","rule":\{"level":%{number:rule_level},"description":"%{quotedString:rule_description}","id":"%{number:rule_id}"\},"agent":\{"id":"003","name":"web01(?:"\},"data":\{"srcip":"%{ipv4:data_srcip})?","srcuser":"%{notSpace:data_srcuser}"\},"location":"/var/log/auth\.log

note emitted rule name is "logforge_rule"; rename it to match your "wazuh" convention if desired
note json input — Datadog can parse JSON logs automatically; a Grok Parser is only needed for non-JSON message bodies
note constant field "agent_id" embedded as literal anchor "003" (varying=false)
note constant field "agent_name" embedded as literal anchor "web01" (varying=false)
note constant field "location" embedded as literal anchor "/var/log/auth.log" (varying=false)
note 1 optional field(s) wrapped in (?:…)? — Datadog Grok has no native optional matcher; a chain of optional columns may need a Helper Rule per shape
note caution: the engine reported 1 consolidation warning(s) (alignment fallbacks) — gap literals were taken from line 1 and may not represent every line
note paste this line into a Grok Parser processor in a Datadog Log Pipeline; matchers are anchored left-to-right and rule whitespace matches log whitespace. Complex or multi-shape logs may need Helper Rules.

Fluent Bit

[PARSER]
    Name        wazuh
    Format      regex
    Regex       ^(?=.*?"timestamp":"(?<timestamp>[^"]*)")(?=.*?"level":(?<rule_level>-?\d+(?:\.\d+)?))(?=.*?"description":"(?<rule_description>[^"]*)")(?=.*?"id":"(?<rule_id>[^"]*)")(?=.*?"id":"003")(?=.*?"name":"web01")(?=.*?"srcip":"(?<data_srcip>[^"]*)"|)(?=.*?"srcuser":"(?<data_srcuser>[^"]*)")(?=.*?"location":"/var/log/auth\.log").*$
    Time_Key    timestamp
    Time_Format %Y-%m-%dT%H:%M:%S.%L%z
# Fluentd <parse> block:
#   <parse>
#     @type regexp
#     expression /^(?=.*?"timestamp":"(?<timestamp>[^"]*)")(?=.*?"level":(?<rule_level>-?\d+(?:\.\d+)?))(?=.*?"description":"(?<rule_description>[^"]*)")(?=.*?"id":"(?<rule_id>[^"]*)")(?=.*?"id":"003")(?=.*?"name":"web01")(?=.*?"srcip":"(?<data_srcip>[^"]*)"|)(?=.*?"srcuser":"(?<data_srcuser>[^"]*)")(?=.*?"location":"\/var\/log\/auth\.log").*$/
#     time_key timestamp
#     time_format %Y-%m-%dT%H:%M:%S.%L%z
#   </parse>

note regex: input is JSON — use a JSON parser (jq, Logstash json filter, …) instead of a regex where possible
note regex: a single linear template could not reproduce every input line — fields are captured with order-independent lookaheads instead
note regex: caution: the engine reported 1 consolidation warning(s) (alignment fallbacks) — review the generated pattern against more sample lines
note Time_Key set to "timestamp"; Time_Format "%Y-%m-%dT%H:%M:%S.%L%z" is a best-effort strptime derived from the sample shape — verify it against your data (Fluent Bit uses %L for fractional seconds and %z for numeric offsets)

Vector

[transforms.wazuh_parse]
type = "remap"
inputs = ["REPLACE_WITH_SOURCE"]
source = '''
. = parse_json!(.message)
'''

note the reused regex needs lookahead, which the Rust regex crate behind parse_regex rejects; used parse_json! on .message (also the idiomatic Vector parser for JSON logs)

Loki

# promtail pipeline for "wazuh" (generated by LogForge)
# Add these stages under a scrape_config in your promtail config:
#   scrape_configs:
#     - job_name: wazuh
#       pipeline_stages:
# (the stages below are indented to sit under pipeline_stages)
pipeline_stages:
  - json:
      expressions:
        timestamp: 'timestamp'
        rule_level: 'rule.level'
        rule_description: 'rule.description'
        rule_id: 'rule.id'
        data_srcip: 'data.srcip'
        data_srcuser: 'data.srcuser'

note regex: input is JSON — use a JSON parser (jq, Logstash json filter, …) instead of a regex where possible
note regex: a single linear template could not reproduce every input line — fields are captured with order-independent lookaheads instead
note regex: caution: the engine reported 1 consolidation warning(s) (alignment fallbacks) — review the generated pattern against more sample lines
note JSON input: the reused regex uses lookaheads (RE2 cannot compile them), so a native `- json:` stage is emitted instead — it extracts each field by key without a regex
note caution: the engine reported 1 consolidation warning(s) (alignment fallbacks) — review the generated pipeline against more sample lines

syslog-ng

parser p_wazuh {
    regexp-parser(
        prefix(".wazuh.")
        patterns("(?=.*?\"timestamp\":\"(?<timestamp>[^\"]*)\")(?=.*?\"level\":(?<rule_level>-?\\d+(?:\\.\\d+)?))(?=.*?\"description\":\"(?<rule_description>[^\"]*)\")(?=.*?\"id\":\"(?<rule_id>[^\"]*)\")(?=.*?\"id\":\"003\")(?=.*?\"name\":\"web01\")(?=.*?\"srcip\":\"(?<data_srcip>[^\"]*)\"|)(?=.*?\"srcuser\":\"(?<data_srcuser>[^\"]*)\")(?=.*?\"location\":\"/var/log/auth\\.log\").*")
    );
};

note regex: input is JSON — use a JSON parser (jq, Logstash json filter, …) instead of a regex where possible
note regex: a single linear template could not reproduce every input line — fields are captured with order-independent lookaheads instead
note regex: caution: the engine reported 1 consolidation warning(s) (alignment fallbacks) — review the generated pattern against more sample lines
note captured fields are stored as name-value pairs under the prefix ".wazuh." (e.g. a group (?<srcip>…) becomes ".wazuh.srcip")
note json structure: syslog-ng has a dedicated json-parser() that handles nested objects natively — consider json-parser(prefix(".logforge.")) instead of the emitted regexp-parser

FAQ

What am I actually parsing when I parse Wazuh logs?: Wazuh's alert output, not a raw log. Wazuh already ran the original event through its own decoders and rules; alerts.json contains one JSON alert per line describing the verdict. So you consume Wazuh's enriched, structured output — the raw-log parsing has already been done for you and preserved in fields like data.* and full_log.
What does rule.level mean in a Wazuh alert?: It is the alert severity on Wazuh's 0-15 scale, where higher is more serious (12 and above is typically critical). It is the primary triage axis: filter or page on rule.level thresholds, and use rule.id for the exact rule and rule.groups for the category (authentication_failed, syscheck, etc.).
What is in the data.* object of a Wazuh alert?: The fields Wazuh's decoder extracted from the original raw log — for an sshd alert that is data.srcip, data.srcuser, data.srcport; for a web alert it might be data.url and data.id. It is event-type-dependent, so treat data.* as an open sub-object whose keys depend on which decoder fired.
Where do I find the original raw log inside a Wazuh alert?: In the full_log field, which preserves the untouched original line, and location, which records where it came from (a file path or Windows channel). Use full_log when you need to see exactly what Wazuh decoded, and data.* when you want the already-extracted structured fields.

Try it on your own Wazuh alert lines

Paste a few real lines, review the detected fields, and copy whichever format your stack needs. Free, no account, nothing uploaded.

Open this sample in LogForge →