bokamba / logforge / parse / JSON application

$ logforge parse json

Parse JSON application logs → regex, Grok, Wazuh & rsyslog

Structured JSON logging is the default for modern application stacks — Node, Go, Java with Logback/Log4j2's JSON layout, Python's structlog, and virtually every twelve-factor service writing to stdout for a container runtime to collect. The convention is one JSON object per line (NDJSON / JSON Lines), so each line is independently parseable and a broken line never poisons the ones around it. Unlike every text format on this list, the field boundaries are unambiguous: keys are named, types are explicit (strings quoted, numbers and booleans bare), and there is no positional guessing. A line like {"ts":"2026-07-03T14:22:15.003Z","level":"error","service":"checkout","msg":"payment failed","order_id":"ord_9f3c","user":"jdoe","ip":"203.0.113.45","gateway":{"name":"stripe","code":"card_declined"}} carries its own schema.

The interesting problems with JSON logs are not tokenization but shape. Keys are not standardized across libraries: the timestamp might be ts, time, @timestamp, or t; the message is msg, message, or msg; severity is level, lvl, or severity — so 'parse the timestamp' still means knowing which key your framework uses. Objects nest, and most downstream systems want a flat field set, so nested structures get flattened with a delimiter — gateway.name and gateway.code, or gateway_name / gateway_code depending on convention. The schema is heterogeneous between event types on the same stream: the error line here has order_id, user, and a gateway object, while the very next line ({"ts":"…","level":"info","service":"auth","msg":"login ok","user":"berkay","ip":"192.0.2.10","mfa":true}) has mfa:true and no gateway at all. Types matter too: mfa is a real boolean, not the string "true", and a parser or schema that coerces it wrong will break boolean filters. Deeply nested payloads, arrays, and occasional embedded newlines inside a string value are the edge cases that separate a robust JSON-log pipeline from a fragile one.

For detection and analytics JSON logs are the easy case once flattened: you filter on level to isolate errors, group by service to localize a fault, and pivot on the semantic fields the developers chose to emit — user and ip for attribution, order_id or request_id for tracing a transaction across services, and nested detail like gateway.code=card_declined for root cause. The trade-off is that the fields are only as good as the instrumentation: JSON gives you clean structure but no guarantee that the right fields (an IP, a user, a correlation id) were logged in the first place.

Open this in LogForge →

What a JSON application line looks like

The JSON sample below is fed verbatim into the engine to produce every parser on this page.

{"ts":"2026-07-03T14:22:15.003Z","level":"error","service":"checkout","msg":"payment failed","order_id":"ord_9f3c","user":"jdoe","ip":"203.0.113.45","gateway":{"name":"stripe","code":"card_declined"}}
{"ts":"2026-07-03T14:22:18.220Z","level":"info","service":"auth","msg":"login ok","user":"berkay","ip":"192.0.2.10","mfa":true}

Detected fields

The engine classified this sample as json and consolidated 10 fields across 2 lines. Fields marked literal were identical on every sample line, so they are baked into the pattern as anchors rather than captured.

  • ts : timestamp
  • level : severity
  • service : quoted_string
  • msg : quoted_string
  • order_id : quoted_string
  • user : username
  • ip : ipv4
  • gateway_name : quoted_string
  • gateway_code : quoted_string
  • mfa : literal

Regex (named capture groups)

# sample: {"ts":"2026-07-03T14:22:15.003Z","level":"error","service":"checkout","msg":"payment failed","order_id":"ord_9f3c","user":"jdoe","ip":"203.0.113.45","gateway":{"name":"stripe","code":"card_declined"}}
# groups: ts=2026-07-03T14:22:15.003Z, level=error, service=checkout, msg=payment failed, order_id=ord_9f3c, user=jdoe, ip=203.0.113.45, gateway_name=stripe, gateway_code=card_declined
^(?=.*?"ts":"(?<ts>[^"]*)")(?=.*?"level":"(?<level>[^"]*)")(?=.*?"service":"(?<service>[^"]*)")(?=.*?"msg":"(?<msg>[^"]*)")(?=.*?"order_id":"(?<order_id>[^"]*)"|)(?=.*?"user":"(?<user>[^"]*)")(?=.*?"ip":"(?<ip>[^"]*)")(?=.*?"name":"(?<gateway_name>[^"]*)"|)(?=.*?"code":"(?<gateway_code>[^"]*)"|)(?=.*?"mfa":(?<mfa>[A-Za-z]+)|).*$
  • note input is JSON — use a JSON parser (jq, Logstash json filter, …) instead of a regex where possible
  • note a single linear template could not reproduce every input line — fields are captured with order-independent lookaheads instead

Grok pattern (Logstash / Elastic)

# custom patterns
JSON_NOTDQUOTE [^"]*

\{"ts":"%{TIMESTAMP_ISO8601:ts}","level":"%{LOGLEVEL:level}","service":"%{JSON_NOTDQUOTE:service}","msg":"%{JSON_NOTDQUOTE:msg}(?:","order_id":"%{JSON_NOTDQUOTE:order_id})?","user":"%{USERNAME:user}","ip":"%{IPV4:ip}(?:","gateway":\{"name":"%{JSON_NOTDQUOTE:gateway_name})?(?:","code":"%{JSON_NOTDQUOTE:gateway_code})?(?:","mfa":%{GREEDYDATA:mfa})?
  • note json input — consider the Logstash json codec/filter instead of grok
  • note 4 optional field(s) wrapped in (?:…)? inline regex — grok has no native optional syntax
  • note custom patterns emitted — save the '# custom patterns' block to a file in your patterns_dir

Wazuh decoder (OS_Regex XML)

<!--
  Generated by LogForge - Wazuh decoder (OS_Regex dialect, not PCRE)
  sample: {"ts":"2026-07-03T14:22:15.003Z","level":"error","service":"checkout","msg":"payment failed","order_id":"ord_9f3c","user":"jdoe","ip":"203.0.113.45","gateway":{
  test with: /var/ossec/bin/wazuh-logtest
-->

<decoder name="json-json">
  <prematch>^{</prematch>
  <plugin_decoder>JSON_Decoder</plugin_decoder>
</decoder>
  • note JSON input: emitted a JSON_Decoder plugin decoder — Wazuh extracts every key automatically as dynamic fields (nested keys become dotted names)
  • note field "ip" mapped to Wazuh conventional field "srcip"
  • note field names above are what the other LogForge generators use; JSON_Decoder will use the raw JSON keys instead
  • note decoder order and prematch specificity may need site-specific tuning (other decoders in your ruleset can shadow these) — validate with /var/ossec/bin/wazuh-logtest

rsyslog template / liblognorm rulebase

version=2
# json — liblognorm v2 rulebase (generated by LogForge)
# Usage with rsyslog (mmnormalize runs liblognorm):
#   module(load="mmnormalize")
#   action(type="mmnormalize" rulebase="/etc/rsyslog.d/json.rb" useRawMsg="on")
# Literal "%" is escaped as "%%"; raw tabs are written as \x09.
rule=json:{"ts":"%ts:date-rfc5424%","level":"%level:char-to{"extradata":"\""}%","service":"%service:char-to{"extradata":"\""}%","msg":"%msg:char-to{"extradata":"\""}%","order_id":"%order_id:char-to{"extradata":"\""}%","user":"%user:char-to{"extradata":"\""}%","ip":"%ip:ipv4%","gateway":{"name":"%gateway_name:char-to{"extradata":"\""}%","code":"%gateway_code:char-to{"extradata":"\""}%","mfa":%mfa:char-to{"extradata":"\""}%"}}
rule=json:{"ts":"%ts:date-rfc5424%","level":"%level:char-to{"extradata":"\""}%","service":"%service:char-to{"extradata":"\""}%","msg":"%msg:char-to{"extradata":"\""}%","user":"%user:char-to{"extradata":"\""}%","ip":"%ip:ipv4%"
  • note json structure: rsyslog mmjsonparse handles CEE/JSON natively — consider action(type="mmjsonparse") instead of this rulebase
  • note trailing literal "\"}}" reconstructed from line 1
  • note chosen parser types: ts=date-rfc5424, level=char-to("), service=char-to("), msg=char-to("), order_id=char-to("), user=char-to("), ip=ipv4, gateway_name=char-to("), gateway_code=char-to("), mfa=char-to(")
  • note optional columns (order_id, gateway_name, gateway_code, mfa): liblognorm has no optional parts within a single rule — emitted a second rule variant with only the always-present columns (max 2 variants; lines with other column combinations will not match and need extra rule= lines)

FAQ

What is NDJSON / JSON Lines and why do logs use it?
NDJSON (newline-delimited JSON), also called JSON Lines, is one complete JSON object per line with no enclosing array. Logs use it because each line is independently valid and parseable — a tail can stream them, a single corrupt line does not break the rest, and appending is trivial. A whole-file JSON array would require rewriting the closing bracket on every append.
How are nested JSON log fields flattened for a SIEM?
Nested objects are collapsed into dotted or underscored keys — gateway.name becomes gateway.name or gateway_name, and gateway.code becomes gateway.code / gateway_code. The exact delimiter depends on the shipper (Filebeat, Fluent Bit, Vector) and its config. Flattening lets flat-schema stores index and filter on what were nested paths.
Why do my JSON log lines have different fields from each other?
Because the schema is per-event: an error event may carry order_id, user, and an error object while an info event carries none of them. This is normal for structured logging. Downstream systems should treat the union of possible keys as optional, not require a fixed set — the presence or absence of a key is itself signal.
Which key holds the timestamp in a JSON log?
There is no universal key — it depends on the logging library. Common ones are ts, time, @timestamp (the Elastic convention), and t. Check what your framework emits; the value is usually ISO 8601 with milliseconds (2026-07-03T14:22:15.003Z), but some libraries write epoch seconds or milliseconds instead.

Try it on your own JSON application lines

Paste a few real lines, review the detected fields, and copy whichever format your stack needs. Free, no account, nothing uploaded.

Open this sample in LogForge →