bokamba / logforge / parse / CEF / Trellix / ArcSight (SIEM)

$ logforge parse cef

Parse CEF / Trellix / ArcSight (SIEM) logs → regex, Grok, Wazuh & rsyslog

CEF, the Common Event Format, is a vendor-neutral log standard originally defined by ArcSight and now emitted by a long list of security products — Trellix (formerly McAfee) endpoint tools, Imperva, many WAFs, and any appliance that wants its events to land cleanly in a SIEM without a custom parser. Because it is a shared envelope rather than one product's format, learning to parse CEF once pays off across your whole security stack. Every CEF line has the same two-part shape: a header of the literal 'CEF:0' plus exactly six pipe-delimited fields — CEF:0|Vendor|Product|Version|Signature ID|Name|Severity — followed by a key=value extension carrying the event's actual data. The Trellix example CEF:0|Trellix|Endpoint Security|10.7|1092|Threat detected and blocked|8 tells you the product, the numeric signature that fired, a human name, and a severity of 8 on the 0–10 scale, all before the first extension key.

The extension is where CEF's flexibility and its parsing hazards both live. It uses ArcSight's dictionary of standard short keys — src, dst, spt, dpt, suser, act — supplemented by product-specific ones such as fname (file name) and fileHash. Three rules make or break a CEF parser. Pipes inside header fields are escaped as \|, so the header must be split on unescaped pipes only. In the extension, a value may contain spaces and a pair ends where the next 'key=' begins, so you split on the key boundary rather than on whitespace. And equals signs, pipes, and backslashes that appear literally inside a value are backslash-escaped and must be unescaped after extraction. A parser that ignores any of these will silently mangle events that contain filenames with spaces, URLs, or messages.

For detection the header gives you fast triage — vendor/product for routing, Signature ID for grouping like events, and severity for prioritization — while the extension supplies the who/what/where. In the endpoint example the meaningful fields are act=blocked (the verdict), fname=invoice_scan.exe (the object), and fileHash=44d88612fea8a8f36de82e1278abb02f, an MD5 you can pivot on against threat-intelligence feeds. The src/dst/spt/dpt tuple and suser give you host and user context for correlation. Because CEF normalizes field names across vendors, a single set of detections keyed on act, fileHash, and severity can span every CEF-emitting product you run.

Open this in LogForge →

What a CEF / Trellix / ArcSight (SIEM) line looks like

The CEF sample below is fed verbatim into the engine to produce every parser on this page.

CEF:0|Trellix|Endpoint Security|10.7|1092|Threat detected and blocked|8|src=192.0.2.10 dst=198.51.100.20 spt=51234 dpt=445 suser=jdoe act=blocked fname=invoice_scan.exe fileHash=44d88612fea8a8f36de82e1278abb02f
CEF:0|Trellix|Endpoint Security|10.7|1041|Malware quarantined|6|src=192.0.2.44 dst=198.51.100.77 spt=49888 dpt=443 suser=asmith act=quarantined fname=update_patch.dll fileHash=e99a18c428cb38d5f260853678922e03

Detected fields

The engine classified this sample as kv and consolidated 15 fields across 2 lines. Fields marked literal were identical on every sample line, so they are baked into the pattern as anchors rather than captured.

  • cef_version : number · literal
  • cef_vendor : literal · literal
  • cef_product : literal · literal
  • cef_device_version : number · literal
  • cef_signature_id : number
  • cef_name : literal
  • cef_severity : number
  • src : ipv4
  • dst : ipv4
  • spt : port
  • dpt : port
  • suser : username
  • act : literal
  • fname : literal
  • filehash : literal

Regex (named capture groups)

# sample: CEF:0|Trellix|Endpoint Security|10.7|1092|Threat detected and blocked|8|src=192.0.2.10 dst=198.51.100.20 spt=51234 dpt=445 suser=jdoe act=blocked fname=invoice_scan.exe fileHash=44d88612fea8a8f36de82e1278abb02f
# groups: cef_signature_id=1092, cef_name=Threat detected and blocked, cef_severity=8, src=192.0.2.10, dst=198.51.100.20, spt=51234, dpt=445, suser=jdoe, act=blocked, fname=invoice_scan.exe, filehash=44d88612fea8a8f36de82e1278abb02f
^CEF:0\|Trellix\|Endpoint Security\|10\.7\|(?<cef_signature_id>-?\d+(?:\.\d+)?)\|(?<cef_name>(?:[A-Za-z]+ [A-Za-z]+ [A-Za-z]+ [A-Za-z]+|[A-Za-z]+ [A-Za-z]+))\|(?<cef_severity>-?\d+(?:\.\d+)?)\|src=(?<src>\d{1,3}(?:\.\d{1,3}){3}) dst=(?<dst>\d{1,3}(?:\.\d{1,3}){3}) spt=(?<spt>\d{1,5}) dpt=(?<dpt>\d{1,5}) suser=(?<suser>[A-Za-z0-9._@-]+) act=(?<act>[A-Za-z]+) fname=(?<fname>[A-Za-z]+_[A-Za-z]+\.[A-Za-z]+) fileHash=(?<filehash>(?:[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+|\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+))$

Grok pattern (Logstash / Elastic)

CEF:0\|Trellix\|Endpoint Security\|10\.7\|%{NUMBER:cef_signature_id}\|%{DATA:cef_name}\|%{NUMBER:cef_severity}\|src=%{IPV4:src} dst=%{IPV4:dst} spt=%{INT:spt} dpt=%{INT:dpt} suser=%{USERNAME:suser} act=%{NOTSPACE:act} fname=%{NOTSPACE:fname} fileHash=%{GREEDYDATA:filehash}
  • note kv-structured input — consider the Logstash kv filter instead of (or after) grok
  • note constant field "cef_version" embedded as literal anchor "0" (varying=false)
  • note constant field "cef_device_version" embedded as literal anchor "10.7" (varying=false)

Wazuh decoder (OS_Regex XML)

<!--
  Generated by LogForge - Wazuh decoder (OS_Regex dialect, not PCRE)
  sample: CEF:0|Trellix|Endpoint Security|10.7|1092|Threat detected and blocked|8|src=192.0.2.10 dst=198.51.100.20 spt=51234 dpt=445 suser=jdoe act=blocked fname=invoice_
  test with: /var/ossec/bin/wazuh-logtest
-->

<decoder name="cef-kv">
  <prematch>^CEF:\d+\|\w+\|</prematch>
</decoder>

<decoder name="cef-kv">
  <parent>cef-kv</parent>
  <regex offset="after_parent">\|src=(\d+.\d+.\d+.\d+)</regex>
  <order>srcip</order>
</decoder>

<decoder name="cef-kv">
  <parent>cef-kv</parent>
  <regex offset="after_parent"> dst=(\d+.\d+.\d+.\d+)</regex>
  <order>dstip</order>
</decoder>

<decoder name="cef-kv">
  <parent>cef-kv</parent>
  <regex offset="after_parent"> spt=(\d+)</regex>
  <order>srcport</order>
</decoder>

<decoder name="cef-kv">
  <parent>cef-kv</parent>
  <regex offset="after_parent"> dpt=(\d+)</regex>
  <order>dstport</order>
</decoder>

<decoder name="cef-kv">
  <parent>cef-kv</parent>
  <regex offset="after_parent"> suser=(\w+)</regex>
  <order>srcuser</order>
</decoder>

<decoder name="cef-kv">
  <parent>cef-kv</parent>
  <regex offset="after_parent"> act=(\w+)</regex>
  <order>action</order>
</decoder>

<decoder name="cef-kv">
  <parent>cef-kv</parent>
  <regex offset="after_parent"> fname=(\w+.\w+)</regex>
  <order>fname</order>
</decoder>

<decoder name="cef-kv">
  <parent>cef-kv</parent>
  <regex offset="after_parent"> fileHash=(\w+)</regex>
  <order>filehash</order>
</decoder>
  • note constant field "cef_version" skipped (identical in every line)
  • note constant field "cef_vendor" skipped (identical in every line)
  • note constant field "cef_product" skipped (identical in every line)
  • note constant field "cef_device_version" skipped (identical in every line)
  • note field "cef_signature_id" skipped: positional header field without a key literal cannot be safely captured in OS_Regex (values may contain spaces; \.+ is unsafe mid-pattern)
  • note field "cef_name" skipped: positional header field without a key literal cannot be safely captured in OS_Regex (values may contain spaces; \.+ is unsafe mid-pattern)
  • note field "cef_severity" skipped: positional header field without a key literal cannot be safely captured in OS_Regex (values may contain spaces; \.+ is unsafe mid-pattern)
  • note field "src" mapped to Wazuh conventional field "srcip"
  • note field "dst" mapped to Wazuh conventional field "dstip"
  • note field "spt" mapped to Wazuh conventional field "srcport"
  • note field "dpt" mapped to Wazuh conventional field "dstport"
  • note field "suser" mapped to Wazuh conventional field "srcuser"
  • note field "act" mapped to Wazuh conventional field "action"
  • note kv fields are extracted by same-named sibling decoders (offset="after_parent"), so per-line field order/absence is tolerated — the shared name is what makes Wazuh evaluate every sibling
  • note decoder order and prematch specificity may need site-specific tuning (other decoders in your ruleset can shadow these) — validate with /var/ossec/bin/wazuh-logtest

rsyslog template / liblognorm rulebase

version=2
# cef — liblognorm v2 rulebase (generated by LogForge)
# Usage with rsyslog (mmnormalize runs liblognorm):
#   module(load="mmnormalize")
#   action(type="mmnormalize" rulebase="/etc/rsyslog.d/cef.rb" useRawMsg="on")
# Literal "%" is escaped as "%%"; raw tabs are written as \x09.
rule=cef:CEF:0|Trellix|Endpoint Security|10.7|%cef_signature_id:number%|%cef_name:char-to{"extradata":"|"}%|%cef_severity:number%|src=%src:ipv4% dst=%dst:ipv4% spt=%spt:number% dpt=%dpt:number% suser=%suser:word% act=%act:word% fname=%fname:word% fileHash=%filehash:word%
  • note kv structure: rsyslog offers mmfields (fast, fixed single-char separator, untyped) and mmnormalize (this rulebase, typed fields + literal anchors); mmnormalize was chosen for typed extraction
  • note chosen parser types: cef_signature_id=number, cef_name=char-to(|), cef_severity=number, src=ipv4, dst=ipv4, spt=number, dpt=number, suser=word, act=word, fname=word, filehash=word

FAQ

Is CEF the same as LEEF?
No, though they solve the same problem and look superficially alike. CEF (ArcSight) uses a six-field pipe header followed by a space-separated key=value extension with dictionary keys like src and dst. LEEF (IBM QRadar) uses a different pipe header and, in LEEF 2.0, a caret or tab delimiter between attributes with its own key names. They are not interchangeable — a CEF parser will not read LEEF and vice versa.
What do the six CEF header fields mean?
After CEF:0: Device Vendor, Device Product, Device Version, Signature ID, Name, and Severity (0–10). Vendor/Product/Version identify the source, Signature ID groups like events, Name is a human label, and Severity drives triage. The extension after the seventh pipe holds the event-specific key=value data.
How do I extract a file hash or filename from a CEF event?
They live in the extension as product-specific keys — commonly fname for the file name and fileHash (or the more specific fileHash variants) for the digest. Extract them by splitting the extension on key boundaries and reading the values. A fileHash gives you an indicator you can enrich against threat-intel feeds; watch for spaces in fname, which require boundary-aware splitting.
Why do custom CEF extension keys break a strict parser?
Because the extension is open-ended: beyond ArcSight's standard dictionary, any vendor can add its own keys, and different signatures emit different key sets. A parser that expects a fixed list of keys in a fixed order will fail. Match key=value pairs wherever they appear (order-independent), and treat unknown keys as capturable rather than fatal.

Try it on your own CEF / Trellix / ArcSight (SIEM) lines

Paste a few real lines, review the detected fields, and copy whichever format your stack needs. Free, no account, nothing uploaded.

Open this sample in LogForge →