bokamba / logforge / parse / AWS VPC Flow

$ logforge parse aws-vpc-flow

Parse AWS VPC Flow logs → regex, Grok, Wazuh & rsyslog

AWS VPC Flow Logs capture the IP traffic that traverses your VPC's network interfaces, and unlike CloudTrail they are not JSON — the default format is a single line of SPACE-DELIMITED fixed columns. The version-2 default layout is a positional sequence of exactly these fields, in order: version, account-id, interface-id, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action, log-status. So a line like '2 123456789012 eni-0abc12 203.0.113.45 10.0.1.20 51234 443 6 12 3520 1783085000 1783085060 ACCEPT OK' is read strictly by position. This positional nature is the whole game: get the column order right and every field falls out; get it wrong and you silently mislabel ports as protocols.

Two fields need decoding. protocol is the IANA protocol NUMBER, not a name: 6 is TCP, 17 is UDP, 1 is ICMP — so a value of 6 in that column means TCP, and a detection rule keyed on 'protocol == TCP' has to compare against 6. start and end are Unix epoch SECONDS marking the capture window for the aggregated flow (VPC flow logs aggregate traffic over a window rather than logging per-packet), so packets and bytes are totals for that window, not for one packet. action is ACCEPT or REJECT — whether the traffic was permitted or denied by the security groups and network ACLs — and log-status is OK, NODATA (no traffic in the window), or SKIPDATA (records were dropped by the capture). The srcaddr/dstaddr can be private VPC addresses, public addresses, or, for AWS-service traffic, sometimes the interface's own address depending on flow direction.

The critical parsing caveat is that the format is CUSTOMIZABLE: you can define your own field list and order (adding vpc-id, subnet-id, tcp-flags, pkt-srcaddr, flow-direction, and many more), and when you do, the columns are no longer the version-2 default and appear in whatever order your format string specifies. So a parser must be told, or must infer from the format definition, which columns are present and in what order — the format string is authoritative, not a fixed template. For detection the load-bearing fields are srcaddr/dstaddr and srcport/dstport (the flow), action (REJECT bursts to many dstports from one srcaddr is a scan; REJECTs to a sensitive dstport are probes), protocol, and bytes (a large outbound byte total to an external dstaddr is possible exfiltration).

Open this in LogForge →

What an AWS VPC Flow line looks like

The delimited sample below is fed verbatim into the engine to produce every parser on this page.

2 123456789012 eni-0abc12de34567890 203.0.113.45 192.0.2.10 51234 443 6 12 1520 1783085000 1783085060 ACCEPT OK
2 445566778899 eni-0f9e87dc65432100 198.51.100.77 192.0.2.20 40112 53 17 3 240 1783085100 1783085160 REJECT OK

Detected fields

The engine classified this sample as freeform and consolidated 14 fields across 2 lines. Fields marked literal were identical on every sample line, so they are baked into the pattern as anchors rather than captured.

  • number : number · literal
  • number2 : number
  • literal : literal
  • ip1 : ipv4
  • ip2 : ipv4
  • number3 : number
  • number4 : number
  • number5 : number
  • number6 : number
  • number7 : number
  • timestamp : timestamp
  • timestamp2 : timestamp
  • literal2 : literal
  • _lit1 : literal · literal

Regex (named capture groups)

# sample: 2 123456789012 eni-0abc12de34567890 203.0.113.45 192.0.2.10 51234 443 6 12 1520 1783085000 1783085060 ACCEPT OK
# groups: number2=123456789012, literal=eni-0abc12de34567890, ip1=203.0.113.45, ip2=192.0.2.10, number3=51234, number4=443, number5=6, number6=12, number7=1520, timestamp=1783085000, timestamp2=1783085060, literal2=ACCEPT
^2 (?<number2>-?\d+(?:\.\d+)?) (?<literal>(?:[A-Za-z]+-\d+[A-Za-z]+\d+[A-Za-z]+\d+|[A-Za-z]+-\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+)) (?<ip1>\d{1,3}(?:\.\d{1,3}){3}) (?<ip2>\d{1,3}(?:\.\d{1,3}){3}) (?<number3>-?\d+(?:\.\d+)?) (?<number4>-?\d+(?:\.\d+)?) (?<number5>-?\d+(?:\.\d+)?) (?<number6>-?\d+(?:\.\d+)?) (?<number7>-?\d+(?:\.\d+)?) (?<timestamp>\d+) (?<timestamp2>\d+) (?<literal2>[A-Za-z]+) OK$

Grok pattern (Logstash / Elastic)

# custom patterns
AWS_VPC_FLOW_EPOCH \d{10}(?:\d{3})?

2 %{NUMBER:number2} %{NOTSPACE:literal} %{IPV4:ip1} %{IPV4:ip2} %{NUMBER:number3} %{NUMBER:number4} %{NUMBER:number5} %{NUMBER:number6} %{NUMBER:number7} %{AWS_VPC_FLOW_EPOCH:timestamp} %{AWS_VPC_FLOW_EPOCH:timestamp2} %{NOTSPACE:literal2} OK
  • note constant field "number" embedded as literal anchor "2" (varying=false)
  • note custom patterns emitted — save the '# custom patterns' block to a file in your patterns_dir

Wazuh decoder (OS_Regex XML)

<!--
  Generated by LogForge - Wazuh decoder (OS_Regex dialect, not PCRE)
  sample: 2 123456789012 eni-0abc12de34567890 203.0.113.45 192.0.2.10 51234 443 6 12 1520 1783085000 1783085060 ACCEPT OK
  test with: /var/ossec/bin/wazuh-logtest
-->

<decoder name="aws-vpc-flow-freeform">
  <prematch>^\d+ </prematch>
</decoder>

<decoder name="aws-vpc-flow-freeform">
  <parent>aws-vpc-flow-freeform</parent>
  <regex>^2 (\d+) (\w+) (\d+.\d+.\d+.\d+) (\d+.\d+.\d+.\d+) (\d+) (\d+) (\d+) (\d+)</regex>
  <order>number2, literal, srcip, ip2, number3, number4, number5, number6</order>
</decoder>

<decoder name="aws-vpc-flow-freeform">
  <parent>aws-vpc-flow-freeform</parent>
  <regex offset="after_regex">^ (\d+) (\d+) (\d+) (\w+) OK</regex>
  <order>number7, timestamp, timestamp2, literal2</order>
</decoder>
  • note field "ip1" mapped to Wazuh conventional field "srcip"
  • note constant field "number" embedded as literal anchor "2"
  • note decoder order and prematch specificity may need site-specific tuning (other decoders in your ruleset can shadow these) — validate with /var/ossec/bin/wazuh-logtest

rsyslog template / liblognorm rulebase

version=2
# aws_vpc_flow — liblognorm v2 rulebase (generated by LogForge)
# Usage with rsyslog (mmnormalize runs liblognorm):
#   module(load="mmnormalize")
#   action(type="mmnormalize" rulebase="/etc/rsyslog.d/aws_vpc_flow.rb" useRawMsg="on")
# Literal "%" is escaped as "%%"; raw tabs are written as \x09.
rule=aws_vpc_flow:2 %number2:number% %literal:word% %ip1:ipv4% %ip2:ipv4% %number3:number% %number4:number% %number5:number% %number6:number% %number7:number% %timestamp:number% %timestamp2:number% %literal2:word% OK
  • note chosen parser types: number2=number, literal=word, ip1=ipv4, ip2=ipv4, number3=number, number4=number, number5=number, number6=number, number7=number, timestamp=number, timestamp2=number, literal2=word

FAQ

What are the columns in a default VPC Flow Log line?
The version-2 default is, in order: version, account-id, interface-id, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action, log-status. They are space-delimited and strictly positional, so the parser reads by column position — there are no keys.
Why is the protocol a number like 6 or 17?
It is the IANA protocol number, not a name: 6 is TCP, 17 is UDP, 1 is ICMP. Any rule that thinks in protocol names must map these numbers first. This is a common source of misparsed flow logs when someone expects "TCP" and gets "6".
What do the start and end fields represent?
They are Unix epoch seconds bounding the aggregation window for the flow, because VPC Flow Logs aggregate traffic over a capture window rather than logging each packet. So packets and bytes are totals across that window, not per-packet values — keep that in mind when computing rates.
Does the column order ever change?
Yes. VPC Flow Logs support a custom format where you choose the fields and their order (adding tcp-flags, vpc-id, pkt-srcaddr, flow-direction, and others). When a custom format is in use the default positional layout no longer applies — the format string is authoritative and the parser must follow it, not the version-2 default.

Try it on your own AWS VPC Flow lines

Paste a few real lines, review the detected fields, and copy whichever format your stack needs. Free, no account, nothing uploaded.

Open this sample in LogForge →