bokamba / logforge / parse / AWS CloudTrail

$ logforge parse aws-cloudtrail

Parse AWS CloudTrail logs → regex, Grok, Wazuh & rsyslog

AWS CloudTrail records the control-plane and (optionally) data-plane API activity of an AWS account, and it is pure JSON — but with a delivery quirk that shapes how you parse it. CloudTrail writes gzipped files to S3, and inside each file the events are wrapped in a single {"Records":[ … ]} array, one JSON object per API call. So the practical unit you parse is one Record object: you gunzip the S3 object, iterate the Records array, and hand each element to your parser. A Record is a nested document, not a flat line, which is the defining difference from every text format on this list — the interesting fields live several levels deep.

The top-level fields you always want are eventName (the API action, e.g. ConsoleLogin, RunInstances, PutObject), eventSource (the service endpoint, e.g. ec2.amazonaws.com, s3.amazonaws.com), eventTime (ISO 8601 UTC), awsRegion, and sourceIPAddress (which, for AWS-console or AWS-service-initiated calls, can be a service DNS name like 'cloudtrail.amazonaws.com' rather than an IP — do not assume it is always an address). The who is inside the nested userIdentity object: userIdentity.type (IAMUser, AssumedRole, Root, AWSService), userIdentity.userName, userIdentity.arn, and userIdentity.accountId. The what-with is in requestParameters and the outcome in responseElements, both service-specific nested objects whose shape depends entirely on the eventName — a RunInstances requestParameters looks nothing like a PutBucketPolicy one. errorCode and errorMessage appear only when the call failed (AccessDenied, UnauthorizedOperation), which is itself a strong detection signal.

Because the payload is nested JSON, 'parsing' is really field selection by path: userIdentity.userName, requestParameters.instanceType, responseElements.instancesSet.items[0].instanceId, and so on, with array indexing where AWS returns lists. For detection the high-value pivots are eventName + userIdentity + sourceIPAddress (who did what from where), errorCode (a spray of AccessDenied from one principal is reconnaissance or a compromised key probing permissions), and specific sensitive actions — ConsoleLogin with additionalEventData.MFAUsed=No, CreateAccessKey, PutUserPolicy, or DeleteTrail (an attacker disabling logging). Because eventName drives the shape of requestParameters/responseElements, a robust pipeline flattens the nested paths per event type rather than expecting a fixed schema.

Open this in LogForge →

What an AWS CloudTrail line looks like

The JSON sample below is fed verbatim into the engine to produce every parser on this page.

{"eventVersion":"1.09","eventTime":"2026-07-03T14:22:15Z","eventSource":"s3.amazonaws.com","eventName":"GetObject","awsRegion":"eu-central-1","sourceIPAddress":"203.0.113.45","userIdentity":{"type":"IAMUser","userName":"jdoe"},"requestParameters":{"bucketName":"onber-logs"}}
{"eventVersion":"1.09","eventTime":"2026-07-03T14:22:31Z","eventSource":"signin.amazonaws.com","eventName":"ConsoleLogin","awsRegion":"us-east-1","sourceIPAddress":"198.51.100.77","userIdentity":{"type":"IAMUser","userName":"admin"},"responseElements":{"ConsoleLogin":"Failure"}}

Detected fields

The engine classified this sample as json and consolidated 10 fields across 2 lines. Fields marked literal were identical on every sample line, so they are baked into the pattern as anchors rather than captured.

  • eventversion : number · literal
  • eventtime : timestamp
  • eventsource : hostname
  • eventname : quoted_string
  • awsregion : quoted_string
  • sourceipaddress : ipv4
  • useridentity_type : quoted_string · literal
  • useridentity_username : username
  • requestparameters_bucketname : quoted_string
  • responseelements_consolelogin : quoted_string

Regex (named capture groups)

# sample: {"eventVersion":"1.09","eventTime":"2026-07-03T14:22:15Z","eventSource":"s3.amazonaws.com","eventName":"GetObject","awsRegion":"eu-central-1","sourceIPAddress":"203.0.113.45","userIdentity":{"type":"IAMUser","userName":"jdoe"},"requestParameters":{"bucketName":"onber-logs"}}
# groups: eventtime=2026-07-03T14:22:15Z, eventsource=s3.amazonaws.com, eventname=GetObject, awsregion=eu-central-1, sourceipaddress=203.0.113.45, useridentity_username=jdoe, requestparameters_bucketname=onber-logs
^(?=.*?"eventVersion":"1\.09")(?=.*?"eventTime":"(?<eventtime>[^"]*)")(?=.*?"eventSource":"(?<eventsource>[^"]*)")(?=.*?"eventName":"(?<eventname>[^"]*)")(?=.*?"awsRegion":"(?<awsregion>[^"]*)")(?=.*?"sourceIPAddress":"(?<sourceipaddress>[^"]*)")(?=.*?"type":"IAMUser")(?=.*?"userName":"(?<useridentity_username>[^"]*)")(?=.*?"bucketName":"(?<requestparameters_bucketname>[^"]*)"|)(?=.*?"ConsoleLogin":"(?<responseelements_consolelogin>[^"]*)"|).*$
  • note input is JSON — use a JSON parser (jq, Logstash json filter, …) instead of a regex where possible
  • note a single linear template could not reproduce every input line — fields are captured with order-independent lookaheads instead

Grok pattern (Logstash / Elastic)

# custom patterns
AWS_CLOUDTRAIL_NOTDQUOTE [^"]*

\{"eventVersion":"1\.09","eventTime":"%{TIMESTAMP_ISO8601:eventtime}","eventSource":"%{HOSTNAME:eventsource}","eventName":"%{AWS_CLOUDTRAIL_NOTDQUOTE:eventname}","awsRegion":"%{AWS_CLOUDTRAIL_NOTDQUOTE:awsregion}","sourceIPAddress":"%{IPV4:sourceipaddress}","userIdentity":\{"type":"IAMUser","userName":"%{USERNAME:useridentity_username}(?:"\},"requestParameters":\{"bucketName":"%{AWS_CLOUDTRAIL_NOTDQUOTE:requestparameters_bucketname})?(?:"\},"responseElements":\{"ConsoleLogin":"%{AWS_CLOUDTRAIL_NOTDQUOTE:responseelements_consolelogin})?
  • note json input — consider the Logstash json codec/filter instead of grok
  • note constant field "eventversion" embedded as literal anchor "1.09" (varying=false)
  • note constant field "useridentity_type" embedded as literal anchor "IAMUser" (varying=false)
  • note 2 optional field(s) wrapped in (?:…)? inline regex — grok has no native optional syntax
  • note custom patterns emitted — save the '# custom patterns' block to a file in your patterns_dir

Wazuh decoder (OS_Regex XML)

<!--
  Generated by LogForge - Wazuh decoder (OS_Regex dialect, not PCRE)
  sample: {"eventVersion":"1.09","eventTime":"2026-07-03T14:22:15Z","eventSource":"s3.amazonaws.com","eventName":"GetObject","awsRegion":"eu-central-1","sourceIPAddress":
  test with: /var/ossec/bin/wazuh-logtest
-->

<decoder name="aws-cloudtrail-json">
  <prematch>^{</prematch>
  <plugin_decoder>JSON_Decoder</plugin_decoder>
</decoder>
  • note JSON input: emitted a JSON_Decoder plugin decoder — Wazuh extracts every key automatically as dynamic fields (nested keys become dotted names)
  • note field names above are what the other LogForge generators use; JSON_Decoder will use the raw JSON keys instead
  • note decoder order and prematch specificity may need site-specific tuning (other decoders in your ruleset can shadow these) — validate with /var/ossec/bin/wazuh-logtest

rsyslog template / liblognorm rulebase

version=2
# aws_cloudtrail — liblognorm v2 rulebase (generated by LogForge)
# Usage with rsyslog (mmnormalize runs liblognorm):
#   module(load="mmnormalize")
#   action(type="mmnormalize" rulebase="/etc/rsyslog.d/aws_cloudtrail.rb" useRawMsg="on")
# Literal "%" is escaped as "%%"; raw tabs are written as \x09.
rule=aws_cloudtrail:{"eventVersion":"1.09","eventTime":"%eventtime:date-rfc5424%","eventSource":"%eventsource:char-to{"extradata":"\""}%","eventName":"%eventname:char-to{"extradata":"\""}%","awsRegion":"%awsregion:char-to{"extradata":"\""}%","sourceIPAddress":"%sourceipaddress:ipv4%","userIdentity":{"type":"IAMUser","userName":"%useridentity_username:char-to{"extradata":"\""}%"},"requestParameters":{"bucketName":"%requestparameters_bucketname:char-to{"extradata":"\""}%"},"responseElements":{"ConsoleLogin":"%responseelements_consolelogin:char-to{"extradata":"\""}%"}}
rule=aws_cloudtrail:{"eventVersion":"1.09","eventTime":"%eventtime:date-rfc5424%","eventSource":"%eventsource:char-to{"extradata":"\""}%","eventName":"%eventname:char-to{"extradata":"\""}%","awsRegion":"%awsregion:char-to{"extradata":"\""}%","sourceIPAddress":"%sourceipaddress:ipv4%","userIdentity":{"type":"IAMUser","userName":"%useridentity_username:char-to{"extradata":"\""}%"
  • note json structure: rsyslog mmjsonparse handles CEE/JSON natively — consider action(type="mmjsonparse") instead of this rulebase
  • note trailing literal "\"}}" reconstructed from line 1
  • note chosen parser types: eventtime=date-rfc5424, eventsource=char-to("), eventname=char-to("), awsregion=char-to("), sourceipaddress=ipv4, useridentity_username=char-to("), requestparameters_bucketname=char-to("), responseelements_consolelogin=char-to(")
  • note optional columns (requestparameters_bucketname, responseelements_consolelogin): liblognorm has no optional parts within a single rule — emitted a second rule variant with only the always-present columns (max 2 variants; lines with other column combinations will not match and need extra rule= lines)

FAQ

How are CloudTrail events delivered and what is the unit to parse?
CloudTrail writes gzipped JSON files to S3, and each file contains a {"Records":[…]} array with one object per API call. To parse, gunzip the object and iterate the Records array — the unit of work is a single Record, a nested JSON document, not a flat log line.
Where is the user identity in a CloudTrail record?
In the nested userIdentity object: userIdentity.type (IAMUser, AssumedRole, Root, AWSService), userIdentity.userName, userIdentity.arn, and userIdentity.accountId. For assumed roles the useful name is often in sessionContext. Extract these by JSON path — they are not top-level fields.
Why is sourceIPAddress sometimes not an IP address?
Because for actions initiated by an AWS service or the console on your behalf, CloudTrail records a service DNS name (like cloudtrail.amazonaws.com) in sourceIPAddress instead of a numeric address. A parser and any detection logic must tolerate a hostname there, not assume it is always an IPv4/IPv6 address.
Which CloudTrail fields matter most for threat detection?
eventName and eventSource (the action and service), userIdentity (who), sourceIPAddress (from where), and errorCode/errorMessage (failures). A burst of AccessDenied errorCodes from one principal signals permission probing; sensitive eventNames like CreateAccessKey, PutUserPolicy, DeleteTrail, or ConsoleLogin without MFA are high-priority.

Try it on your own AWS CloudTrail lines

Paste a few real lines, review the detected fields, and copy whichever format your stack needs. Free, no account, nothing uploaded.

Open this sample in LogForge →