bokamba / logforge / docs

$ man logforge

LogForge output formats — regex, Grok, Wazuh decoders, rsyslog

Paste a log line into LogForge and it hands back four working parser configs. This page explains what each output is, shows real generated examples, and covers the gotchas of every format. Everything below was generated at build time by the same engine that runs in your browser.

How LogForge detects fields

Before anything is generated, each pasted line is classified into a structure, tried in a fixed order: json (one object per line), key=value (FortiGate style; CEF and LEEF are key=value behind a pipe header), syslog (RFC 3164 and RFC 5424 headers), delimited (CSV/TSV), and finally freeform tokenization as the fallback.

Every token then runs through 19 field detectors — timestamps, IPv4/IPv6 addresses, ip:port pairs, emails, URLs, MAC addresses, UUIDs, severity levels, HTTP methods and status codes, user agents, ports, usernames, hostnames, numbers, quoted strings, filesystem paths, and literals. Each detector returns a confidence score, and results are consolidated across all pasted lines: a column whose value changes line-to-line becomes a varying captured field, while a column that never changes becomes a literal anchor baked into the pattern.

Detection is heuristic, and that is exactly why the review chips exist: when the engine guesses a type wrong or picks a bland name, you rename or retype the field in one click and all four outputs regenerate instantly. The more varied the lines you paste, the better the consolidation — one line tells the engine almost nothing about what varies.

Log to regex generator: named capture groups from your sample

The regex output is a single anchored pattern with a named capture group (?<name>…) per varying field. It is emitted in the common subset of PCRE and JavaScript RegExp, so the same string works in grep -P, Python's re, PCRE2, Ruby, Java, and JavaScript alike (Go's RE2-based regexp is the notable holdout — it rejects lookarounds). Before showing you anything, the generator verifies its own pattern against every line you pasted — if a straight left-to-right template can't reproduce all lines (key=value logs love reordering fields), it degrades to order-independent lookahead captures and says so in the notes.

Given these two nginx access log lines:

203.0.113.45 - - [30/Jun/2026:22:14:15 +0300] "GET /index.html HTTP/1.1" 200 5324 "https://www.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
198.51.100.77 - - [30/Jun/2026:22:14:19 +0300] "GET /assets/app.9f3c2d1a.js HTTP/2.0" 200 88213 "https://bokamba.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15"

LogForge generates:

# sample: 203.0.113.45 - - [30/Jun/2026:22:14:15 +0300] "GET /index.html HTTP/1.1" 200 5324 "https://www.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
# groups: ip1=203.0.113.45, timestamp=30/Jun/2026:22:14:15 +0300, path=/index.html, quoted_string=HTTP/1.1, number=5324, url=https://www.google.com/, user_agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36
^(?<ip1>\d{1,3}(?:\.\d{1,3}){3}) - - \[(?<timestamp>\d+/[A-Za-z]+/\d+:\d+:\d+:\d+ \+\d+)\] "GET (?<path>(?:/[^\s"']*|[A-Za-z]:[^\s"']*)) (?<quoted_string>[A-Za-z]+/\d+\.\d+)" 200 (?<number>-?\d+(?:\.\d+)?) "(?<url>[^"]*)" "(?<user_agent>[^"]*)"$

The # sample and # groups comment lines show the first input line and what each group captured from it, so you can eyeball correctness before deploying. Gotcha to remember: anchored patterns (^…$) are strict by design — if your production lines carry trailing whitespace or extra columns the sample didn't, re-paste with more representative lines rather than hand-loosening the pattern.

Grok pattern generator for Logstash and Elastic

The Grok output maps each detected field onto the standard pattern library that ships with Logstash — %{IPV4}, %{HTTPDATE}, %{NUMBER}, %{LOGLEVEL} and friends — validated against your real sample values, not just the declared type. When no standard pattern fits a value's shape, LogForge emits a namespaced custom pattern (e.g. LOGFORGE_EPOCH) in a # custom patterns block at the top of the output.

The same nginx sample produces:

# custom patterns
LOGFORGE_NOTDQUOTE [^"]*

%{IPV4:ip1} - - \[%{HTTPDATE:timestamp}\] "GET %{UNIXPATH:path} %{DATA:quoted_string}" 200 %{NUMBER:number} "%{URI:url}" "%{LOGFORGE_NOTDQUOTE:user_agent}

To install a custom-pattern block, save it to a file inside the directory your grok filter's patterns_dir points at, then reference the generated match line in your filter. Fields that could not be matched by a named pattern degrade to %{DATA} or %{GREEDYDATA} with a note explaining why — watch the notes list under the output tab, because a grok line full of GREEDYDATA is a sign you should paste more sample lines or fix a field type in the review strip.

Wazuh decoder generator (OS_Regex dialect)

Wazuh decoders are the rarest of the four formats to find good tooling for, because they are written in OS_Regex — Wazuh's own dialect that looks like regex but behaves very differently. LogForge emits a complete drop-in snippet for /var/ossec/etc/decoders/local_decoder.xml: a parent decoder whose <prematch> gates the log line, plus a child decoder whose <regex> captures map positionally onto the comma-separated field names in <order> — the same parent/child pattern the official FortiGate decoders use.

From two FortiGate key=value lines:

<!--
  Generated by LogForge - Wazuh decoder (OS_Regex dialect, not PCRE)
  sample: date=2026-06-30 time=22:14:15 devname="FGT60F-EDGE" devid="FGT60FTK20012345" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="root" eventt
  test with: /var/ossec/bin/wazuh-logtest
-->

<decoder name="logforge-kv">
  <prematch>^date=\d+-\d+-\d+ time=</prematch>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent">^(\d+:\d+:\d+) devname="\w+" devid="\w+" logid="\d+" type="\w+" subtype="\w+" level="\w+" vd="\w+" eventtime=(\d+) srcip=(\d+.\d+.\d+.\d+) srcport=(\d+) srcintf="\w+</regex>
  <order>time, eventtime, srcip, srcport</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> srcintfrole="(\w+)"</regex>
  <order>srcintfrole</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> dstip=(\d+.\d+.\d+.\d+)</regex>
  <order>dstip</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> dstport=(\d+)</regex>
  <order>dstport</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> dstintfrole="(\w+)"</regex>
  <order>dstintfrole</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> policyid=(\d+)</regex>
  <order>policyid</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> sessionid=(\d+)</regex>
  <order>sessionid</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> action="(\w+)"</regex>
  <order>action</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> service="(\w+)"</regex>
  <order>service</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> dstcountry="(\w+)"</regex>
  <order>dstcountry</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> srccountry="(\w+)"</regex>
  <order>srccountry</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> trandisp="(\w+)"</regex>
  <order>trandisp</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> transip=(\d+.\d+.\d+.\d+)</regex>
  <order>transip</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> transport=(\d+)</regex>
  <order>transport</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> duration=(\d+)</regex>
  <order>duration</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> sentbyte=(\d+)</regex>
  <order>sentbyte</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> rcvdbyte=(\d+)</regex>
  <order>rcvdbyte</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> sentpkt=(\d+)</regex>
  <order>sentpkt</order>
</decoder>

<decoder name="logforge-kv">
  <parent>logforge-kv</parent>
  <regex offset="after_parent"> rcvdpkt=(\d+)</regex>
  <order>rcvdpkt</order>
</decoder>

If you write or edit these by hand, keep the OS_Regex differences in mind — they are the number-one source of silently-broken decoders:

construct PCRE OS_Regex (Wazuh)
any character . \.  (a bare . is a literal dot)
\w A-Z a-z 0-9 _ A-Z a-z 0-9 - @ _
\s space, tab, newline… spaces only (\t is separate)
repetition {n,m} yes no — only + and * on backslash classes
char classes [...] yes no (\p covers punctuation)
named groups (?<name>…) no — plain () mapped by <order>

To test before deploying: copy the snippet into /var/ossec/etc/decoders/local_decoder.xml, run /var/ossec/bin/wazuh-logtest, paste one of your raw lines, and check that the "phase 2" output shows your decoder name and every field you expect. Restart the manager (systemctl restart wazuh-manager) once the decoder looks right.

rsyslog template generator: RainerScript and liblognorm

The rsyslog output adapts to your log's structure. For lines with a real syslog header (RFC 3164 or 5424), you get a RainerScript template(…) built from standard rsyslog properties, plus an mmnormalize rulebase for the message body — because rsyslog properties alone cannot pull fields out of the free-form msg part. For everything else (key=value, JSON, CEF, delimited, freeform) you get a liblognorm v2 rulebase: a version=2 header followed by rule= lines with typed %field:parser% slots.

The FortiGate sample generates this rulebase:

version=2
# logforge — liblognorm v2 rulebase (generated by LogForge)
# Usage with rsyslog (mmnormalize runs liblognorm):
#   module(load="mmnormalize")
#   action(type="mmnormalize" rulebase="/etc/rsyslog.d/logforge.rb" useRawMsg="on")
# Literal "%" is escaped as "%%"; raw tabs are written as \x09.
rule=logforge:date=2026-06-30 time=%time:time-24hr% devname="FGT60F-EDGE" devid="FGT60FTK20012345" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="root" eventtime=%eventtime:number% srcip=%srcip:ipv4% srcport=%srcport:number% srcintf="internal" srcintfrole="%srcintfrole:char-to{"extradata":"\""}%" dstip=%dstip:ipv4% dstport=%dstport:number% dstintf="wan1" dstintfrole="%dstintfrole:char-to{"extradata":"\""}%" policyid=%policyid:number% sessionid=%sessionid:number% proto=6 action="%action:char-to{"extradata":"\""}%" service="%service:char-to{"extradata":"\""}%" dstcountry="%dstcountry:char-to{"extradata":"\""}%" srccountry="%srccountry:char-to{"extradata":"\""}%" trandisp="%trandisp:char-to{"extradata":"\""}%" transip=%transip:ipv4% transport=%transport:number% duration=%duration:number% sentbyte=%sentbyte:number% rcvdbyte=%rcvdbyte:number% sentpkt=%sentpkt:number% rcvdpkt=%rcvdpkt:number%
rule=logforge:date=2026-06-30 time=%time:time-24hr% devname="FGT60F-EDGE" devid="FGT60FTK20012345" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="root" eventtime=%eventtime:number% srcip=%srcip:ipv4% srcport=%srcport:number% srcintf="internal" dstip=%dstip:ipv4% dstport=%dstport:number% dstintf="wan1" policyid=%policyid:number% proto=6 action="%action:char-to{"extradata":"\""}%" service="%service:char-to{"extradata":"\""}%" duration=%duration:number% sentbyte=%sentbyte:number% rcvdbyte=%rcvdbyte:number%

To load a rulebase: save it as e.g. /etc/rsyslog.d/logforge.rb, then in your rsyslog config load the module with module(load="mmnormalize") and attach action(type="mmnormalize" rulebase="/etc/rsyslog.d/logforge.rb") to the ruleset that receives these messages; parsed fields land in $! JSON properties. A RainerScript template goes straight into any .conf under /etc/rsyslog.d/ and is referenced by name from an action's template= parameter. Remember that version=2 must stay the very first line of the rulebase file.

Pro: Ask AI to map fields

Detection is heuristic, so sometimes it names a field blandly or picks the wrong type. The optional Pro tier adds an "Ask AI to map fields" button: press it and a model proposes better field names and types for the lines you submitted, which you can accept or ignore in the review strip. It is the only feature that sends log content off your machine, and only when you explicitly invoke it — everything covered above stays 100% client-side. See the pricing page for details and how the license key works.

FAQ

Does my log data get uploaded?
No. Detection and generation run entirely as JavaScript in your browser — there is no parsing server, and no request carrying your log content is ever sent. See the privacy page for how to verify this yourself with the Network tab.
How many lines can I paste?
Up to 500 lines are analyzed. You can paste more, but only the first 500 are used for detection — which is plenty to establish which fields vary and which are constant.
Why is a field detected as literal?
A column that has the exact same value on every line you pasted is treated as a literal anchor rather than a captured field. Paste more varied lines (different IPs, different users, different status codes) and it will flip to a varying field automatically.
Can I rename fields?
Yes. Every detected field appears as a chip in the FIELD REVIEW strip; click its name to rename it, change its type from the dropdown, or toggle whether it is captured at all. The outputs regenerate instantly with your names.
Why did the output get much looser after I toggled capture off?
Turning capture off tells the generators to treat that field as a constant literal anchor. If its value actually differs between your pasted lines, no literal can match every line, so the generators fall back to a catch-all and say so in the notes. Toggle capture back on (or paste lines where the value really is constant) to get the precise pattern back. Also note that very small key=value samples — lines with fewer than three key=value pairs — are treated as freeform text rather than key=value structure.

More on data handling on the privacy page.

Try it on your own logs

Paste a few lines, review the detected fields, copy whichever format your stack needs. Free, no account, nothing uploaded.

open LogForge →