What are the columns in a default VPC Flow Log line?

The version-2 default is, in order: version, account-id, interface-id, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action, log-status. They are space-delimited and strictly positional, so the parser reads by column position — there are no keys.

Why is the protocol a number like 6 or 17?

It is the IANA protocol number, not a name: 6 is TCP, 17 is UDP, 1 is ICMP. Any rule that thinks in protocol names must map these numbers first. This is a common source of misparsed flow logs when someone expects "TCP" and gets "6".

What do the start and end fields represent?

They are Unix epoch seconds bounding the aggregation window for the flow, because VPC Flow Logs aggregate traffic over a capture window rather than logging each packet. So packets and bytes are totals across that window, not per-packet values — keep that in mind when computing rates.

Does the column order ever change?

Yes. VPC Flow Logs support a custom format where you choose the fields and their order (adding tcp-flags, vpc-id, pkt-srcaddr, flow-direction, and others). When a custom format is in use the default positional layout no longer applies — the format string is authoritative and the parser must follow it, not the version-2 default.

Parse AWS VPC Flow logs → regex, Grok, Wazuh & rsyslog

What an AWS VPC Flow line looks like

The delimited sample below is fed verbatim into the engine to produce every parser on this page.

2 123456789012 eni-0abc12de34567890 203.0.113.45 192.0.2.10 51234 443 6 12 1520 1783085000 1783085060 ACCEPT OK
2 445566778899 eni-0f9e87dc65432100 198.51.100.77 192.0.2.20 40112 53 17 3 240 1783085100 1783085160 REJECT OK

Detected fields

The engine classified this sample as freeform and consolidated 14 fields across 2 lines. Fields marked literal were identical on every sample line, so they are baked into the pattern as anchors rather than captured.

number : number · literal
number2 : number
literal : literal
ip1 : ipv4
ip2 : ipv4
number3 : number
number4 : number
number5 : number
number6 : number
number7 : number
timestamp : timestamp
timestamp2 : timestamp
literal2 : literal
_lit1 : literal · literal

Regex (named capture groups)

# sample: 2 123456789012 eni-0abc12de34567890 203.0.113.45 192.0.2.10 51234 443 6 12 1520 1783085000 1783085060 ACCEPT OK
# groups: number2=123456789012, literal=eni-0abc12de34567890, ip1=203.0.113.45, ip2=192.0.2.10, number3=51234, number4=443, number5=6, number6=12, number7=1520, timestamp=1783085000, timestamp2=1783085060, literal2=ACCEPT
^2 (?<number2>-?\d+(?:\.\d+)?) (?<literal>(?:[A-Za-z]+-\d+[A-Za-z]+\d+[A-Za-z]+\d+|[A-Za-z]+-\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+)) (?<ip1>\d{1,3}(?:\.\d{1,3}){3}) (?<ip2>\d{1,3}(?:\.\d{1,3}){3}) (?<number3>-?\d+(?:\.\d+)?) (?<number4>-?\d+(?:\.\d+)?) (?<number5>-?\d+(?:\.\d+)?) (?<number6>-?\d+(?:\.\d+)?) (?<number7>-?\d+(?:\.\d+)?) (?<timestamp>\d+) (?<timestamp2>\d+) (?<literal2>[A-Za-z]+) OK$

note the message tail diverges across lines and could not be split into stable fields — the trailing per-token literal group(s) are positional word-slices of that unstructured message, not stable fields

Grok pattern (Logstash / Elastic)

# custom patterns
AWS_VPC_FLOW_EPOCH \d{10}(?:\d{3})?

2 %{NUMBER:number2} %{NOTSPACE:literal} %{IPV4:ip1} %{IPV4:ip2} %{NUMBER:number3} %{NUMBER:number4} %{NUMBER:number5} %{NUMBER:number6} %{NUMBER:number7} %{AWS_VPC_FLOW_EPOCH:timestamp} %{AWS_VPC_FLOW_EPOCH:timestamp2} %{NOTSPACE:literal2} OK

note constant field "number" embedded as literal anchor "2" (varying=false)
note custom patterns emitted — save the '# custom patterns' block to a file in your patterns_dir

Wazuh decoder (OS_Regex XML)

<!--
  Generated by LogForge - Wazuh decoder (OS_Regex dialect, not PCRE)
  sample: 2 123456789012 eni-0abc12de34567890 203.0.113.45 192.0.2.10 51234 443 6 12 1520 1783085000 1783085060 ACCEPT OK
  test with: /var/ossec/bin/wazuh-logtest
-->

<decoder name="aws-vpc-flow-freeform">
  <prematch>^\d+ </prematch>
</decoder>

<decoder name="aws-vpc-flow-freeform">
  <parent>aws-vpc-flow-freeform</parent>
  <regex>^2 (\d+) (\w+) (\d+.\d+.\d+.\d+) (\d+.\d+.\d+.\d+) (\d+) (\d+) (\d+) (\d+)</regex>
  <order>number2, literal, srcip, ip2, number3, number4, number5, number6</order>
</decoder>

<decoder name="aws-vpc-flow-freeform">
  <parent>aws-vpc-flow-freeform</parent>
  <regex offset="after_regex">^ (\d+) (\d+) (\d+) (\w+) OK</regex>
  <order>number7, timestamp, timestamp2, literal2</order>
</decoder>

<!-- ============================================================
     ALERT RULE (starter) — put this in a RULES file, e.g.
     /var/ossec/etc/rules/local_rules.xml. Decoders and rules live
     in SEPARATE files. The rule matches the decoder above through
     <decoded_as>; set <level> and add <field>/<match> conditions so
     it alerts only on the events you care about. Rule ids 100000+
     are the user range — change them if they collide with yours.
     ============================================================ -->
<group name="aws-vpc-flow,">
  <rule id="100000" level="3">
    <decoded_as>aws-vpc-flow-freeform</decoded_as>
    <description>aws-vpc-flow: srcip=$(srcip)</description>
  </rule>

  <!-- Example — a higher-level alert gated on one field (uncomment and edit):
  <rule id="100001" level="10">
    <if_sid>100000</if_sid>
    <field name="srcip">^CHANGE_ME$</field>
    <description>aws-vpc-flow: a value you care about from $(srcip)</description>
  </rule>
  -->
</group>

note field "ip1" mapped to Wazuh conventional field "srcip"
note constant field "number" embedded as literal anchor "2"
note added a starter alert <rule> (level 3, matched to the decoder via <decoded_as>) — put it in a RULES file (not the decoders file), set the level, and add <field>/<match> conditions; the commented example child rule shows the pattern
note decoder order and prematch specificity may need site-specific tuning (other decoders in your ruleset can shadow these) — validate with /var/ossec/bin/wazuh-logtest

Wazuh's OS_Regex is not PCRE — a bare . is a literal dot and \. matches any character. Test Wazuh OS_Regex patterns →

rsyslog template / liblognorm rulebase

version=2
# aws_vpc_flow — liblognorm v2 rulebase (generated by LogForge)
# Usage with rsyslog (mmnormalize runs liblognorm):
#   module(load="mmnormalize")
#   action(type="mmnormalize" rulebase="/etc/rsyslog.d/aws_vpc_flow.rb" useRawMsg="on")
# Literal "%" is escaped as "%%"; raw tabs are written as \x09.
rule=aws_vpc_flow:2 %number2:number% %literal:word% %ip1:ipv4% %ip2:ipv4% %number3:number% %number4:number% %number5:number% %number6:number% %number7:number% %timestamp:number% %timestamp2:number% %literal2:word% OK

note chosen parser types: number2=number, literal=word, ip1=ipv4, ip2=ipv4, number3=number, number4=number, number5=number, number6=number, number7=number, timestamp=number, timestamp2=number, literal2=word

Splunk

# props.conf  (search-time extraction)
[<REPLACE_WITH_SOURCETYPE>]
EXTRACT-logforge = 2 (?<number2>-?\d+(?:\.\d+)?) (?<literal>(?:[A-Za-z]+-\d+[A-Za-z]+\d+[A-Za-z]+\d+|[A-Za-z]+-\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+)) (?<ip1>\d{1,3}(?:\.\d{1,3}){3}) (?<ip2>\d{1,3}(?:\.\d{1,3}){3}) (?<number3>-?\d+(?:\.\d+)?) (?<number4>-?\d+(?:\.\d+)?) (?<number5>-?\d+(?:\.\d+)?) (?<number6>-?\d+(?:\.\d+)?) (?<number7>-?\d+(?:\.\d+)?) (?<timestamp>\d+) (?<timestamp2>\d+) (?<literal2>[A-Za-z]+) OK

# Quick search-time test in SPL:
# | rex field=_raw "2 (?<number2>-?\\d+(?:\\.\\d+)?) (?<literal>(?:[A-Za-z]+-\\d+[A-Za-z]+\\d+[A-Za-z]+\\d+|[A-Za-z]+-\\d+[A-Za-z]+\\d+[A-Za-z]+\\d+[A-Za-z]+\\d+)) (?<ip1>\\d{1,3}(?:\\.\\d{1,3}){3}) (?<ip2>\\d{1,3}(?:\\.\\d{1,3}){3}) (?<number3>-?\\d+(?:\\.\\d+)?) (?<number4>-?\\d+(?:\\.\\d+)?) (?<number5>-?\\d+(?:\\.\\d+)?) (?<number6>-?\\d+(?:\\.\\d+)?) (?<number7>-?\\d+(?:\\.\\d+)?) (?<timestamp>\\d+) (?<timestamp2>\\d+) (?<literal2>[A-Za-z]+) OK"

note regex: the message tail diverges across lines and could not be split into stable fields — the trailing per-token literal group(s) are positional word-slices of that unstructured message, not stable fields
note EXTRACT-<class> names must be unique within a sourcetype stanza — rename EXTRACT-logforge if you already use that class for this sourcetype
note a timestamp field was detected: this EXTRACT only makes it a searchable field. To set the event _time at index time, add TIME_PREFIX and TIME_FORMAT to this props.conf stanza (TIME_FORMAT uses Splunk strptime, e.g. %Y-%m-%dT%H:%M:%S) — this generator does not guess the strptime format.

ES ingest

PUT _ingest/pipeline/aws-vpc-flow
{
  "description": "LogForge-generated ingest pipeline for aws-vpc-flow",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "2 %{NUMBER:number2} %{NOTSPACE:literal} %{IPV4:ip1} %{IPV4:ip2} %{NUMBER:number3} %{NUMBER:number4} %{NUMBER:number5} %{NUMBER:number6} %{NUMBER:number7} %{AWS_VPC_FLOW_EPOCH:timestamp} %{AWS_VPC_FLOW_EPOCH:timestamp2} %{NOTSPACE:literal2} OK"
        ],
        "pattern_definitions": {
          "AWS_VPC_FLOW_EPOCH": "\\d{10}(?:\\d{3})?"
        }
      }
    }
  ]
}

note grok: constant field "number" embedded as literal anchor "2" (varying=false)
note grok: custom patterns emitted — save the '# custom patterns' block to a file in your patterns_dir
note test in Kibana Dev Tools with: POST _ingest/pipeline/aws-vpc-flow/_simulate (supply a docs[] array whose _source.message holds a sample line)

Graylog

# Grok patterns to add under System > Grok Patterns:
# (Graylog needs these custom patterns installed globally BEFORE the rule/extractor below will work.)
#   AWS_VPC_FLOW_EPOCH \d{10}(?:\d{3})?

# --- Graylog processing pipeline rule (primary) ---
# Paste under System > Pipelines > Manage rules, then attach the rule to a pipeline stage.
rule "aws-vpc-flow-parse"
when
  has_field("message")
then
  let gp = grok(pattern: "2 %{NUMBER:number2} %{NOTSPACE:literal} %{IPV4:ip1} %{IPV4:ip2} %{NUMBER:number3} %{NUMBER:number4} %{NUMBER:number5} %{NUMBER:number6} %{NUMBER:number7} %{AWS_VPC_FLOW_EPOCH:timestamp} %{AWS_VPC_FLOW_EPOCH:timestamp2} %{NOTSPACE:literal2} OK", value: to_string($message.message), only_named_captures: true);
  set_fields(gp);
end

# --- Graylog import-ready extractor JSON (secondary) ---
# Save as a .json file and import under System > Inputs > (input) > Manage extractors > Actions > Import extractors.
{
  "extractors": [
    {
      "title": "aws-vpc-flow",
      "extractor_type": "grok",
      "converters": [],
      "order": 0,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "2 %{NUMBER:number2} %{NOTSPACE:literal} %{IPV4:ip1} %{IPV4:ip2} %{NUMBER:number3} %{NUMBER:number4} %{NUMBER:number5} %{NUMBER:number6} %{NUMBER:number7} %{AWS_VPC_FLOW_EPOCH:timestamp} %{AWS_VPC_FLOW_EPOCH:timestamp2} %{NOTSPACE:literal2} OK",
        "named_captures_only": true
      },
      "condition_type": "none",
      "condition_value": ""
    }
  ],
  "version": "5.0.0"
}

note grok: constant field "number" embedded as literal anchor "2" (varying=false)
note grok: custom patterns emitted — save the '# custom patterns' block to a file in your patterns_dir
note 1 custom grok pattern(s) (AWS_VPC_FLOW_EPOCH) must be installed globally first under System > Grok Patterns — see the block at the top of the output
note primary artifact is the processing-pipeline rule; the extractor JSON is an equivalent import-ready alternative for the classic extractor UI

Datadog

logforge_rule 2 %{number:number2} %{notSpace:literal} %{ipv4:ip1} %{ipv4:ip2} %{number:number3} %{number:number4} %{number:number5} %{number:number6} %{number:number7} %{data:timestamp} %{data:timestamp2} %{notSpace:literal2} OK

note emitted rule name is "logforge_rule"; rename it to match your "aws-vpc-flow" convention if desired
note constant field "number" embedded as literal anchor "2" (varying=false)
note field "timestamp" (timestamp): could not derive a Joda/Java date format from the sample shape; using %{data} — add a date("…") format by hand if you need a parsed timestamp
note field "timestamp2" (timestamp): could not derive a Joda/Java date format from the sample shape; using %{data} — add a date("…") format by hand if you need a parsed timestamp
note paste this line into a Grok Parser processor in a Datadog Log Pipeline; matchers are anchored left-to-right and rule whitespace matches log whitespace. Complex or multi-shape logs may need Helper Rules.

Fluent Bit

[PARSER]
    Name        aws-vpc-flow
    Format      regex
    Regex       ^2 (?<number2>-?\d+(?:\.\d+)?) (?<literal>(?:[A-Za-z]+-\d+[A-Za-z]+\d+[A-Za-z]+\d+|[A-Za-z]+-\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+)) (?<ip1>\d{1,3}(?:\.\d{1,3}){3}) (?<ip2>\d{1,3}(?:\.\d{1,3}){3}) (?<number3>-?\d+(?:\.\d+)?) (?<number4>-?\d+(?:\.\d+)?) (?<number5>-?\d+(?:\.\d+)?) (?<number6>-?\d+(?:\.\d+)?) (?<number7>-?\d+(?:\.\d+)?) (?<timestamp>\d+) (?<timestamp2>\d+) (?<literal2>[A-Za-z]+) OK$
    Time_Key    timestamp
    # Time_Format <unrecognized timestamp shape — set a strptime format, e.g. %Y-%m-%dT%H:%M:%S>
# Fluentd <parse> block:
#   <parse>
#     @type regexp
#     expression /^2 (?<number2>-?\d+(?:\.\d+)?) (?<literal>(?:[A-Za-z]+-\d+[A-Za-z]+\d+[A-Za-z]+\d+|[A-Za-z]+-\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+)) (?<ip1>\d{1,3}(?:\.\d{1,3}){3}) (?<ip2>\d{1,3}(?:\.\d{1,3}){3}) (?<number3>-?\d+(?:\.\d+)?) (?<number4>-?\d+(?:\.\d+)?) (?<number5>-?\d+(?:\.\d+)?) (?<number6>-?\d+(?:\.\d+)?) (?<number7>-?\d+(?:\.\d+)?) (?<timestamp>\d+) (?<timestamp2>\d+) (?<literal2>[A-Za-z]+) OK$/
#     time_key timestamp
#   </parse>

note regex: the message tail diverges across lines and could not be split into stable fields — the trailing per-token literal group(s) are positional word-slices of that unstructured message, not stable fields
note Time_Key set to "timestamp", but the timestamp shape was not recognized — fill in Time_Format (strptime) yourself; for epoch values consider Fluent Bit's time_as_integer / a Lua filter instead

Vector

[transforms.aws_vpc_flow_parse]
type = "remap"
inputs = ["REPLACE_WITH_SOURCE"]
source = '''
. |= parse_regex!(.message, r'2 (?P<number2>-?\d+(?:\.\d+)?) (?P<literal>(?:[A-Za-z]+-\d+[A-Za-z]+\d+[A-Za-z]+\d+|[A-Za-z]+-\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+)) (?P<ip1>\d{1,3}(?:\.\d{1,3}){3}) (?P<ip2>\d{1,3}(?:\.\d{1,3}){3}) (?P<number3>-?\d+(?:\.\d+)?) (?P<number4>-?\d+(?:\.\d+)?) (?P<number5>-?\d+(?:\.\d+)?) (?P<number6>-?\d+(?:\.\d+)?) (?P<number7>-?\d+(?:\.\d+)?) (?P<timestamp>\d+) (?P<timestamp2>\d+) (?P<literal2>[A-Za-z]+) OK')
'''

note [regex] the message tail diverges across lines and could not be split into stable fields — the trailing per-token literal group(s) are positional word-slices of that unstructured message, not stable fields

Loki

# promtail pipeline for "aws-vpc-flow" (generated by LogForge)
# Add these stages under a scrape_config in your promtail config:
#   scrape_configs:
#     - job_name: aws-vpc-flow
#       pipeline_stages:
# (the stages below are indented to sit under pipeline_stages)
pipeline_stages:
  - regex:
      expression: '^2 (?P<number2>-?\d+(?:\.\d+)?) (?P<literal>(?:[A-Za-z]+-\d+[A-Za-z]+\d+[A-Za-z]+\d+|[A-Za-z]+-\d+[A-Za-z]+\d+[A-Za-z]+\d+[A-Za-z]+\d+)) (?P<ip1>\d{1,3}(?:\.\d{1,3}){3}) (?P<ip2>\d{1,3}(?:\.\d{1,3}){3}) (?P<number3>-?\d+(?:\.\d+)?) (?P<number4>-?\d+(?:\.\d+)?) (?P<number5>-?\d+(?:\.\d+)?) (?P<number6>-?\d+(?:\.\d+)?) (?P<number7>-?\d+(?:\.\d+)?) (?P<timestamp>\d+) (?P<timestamp2>\d+) (?P<literal2>[A-Za-z]+) OK$'

note regex: the message tail diverges across lines and could not be split into stable fields — the trailing per-token literal group(s) are positional word-slices of that unstructured message, not stable fields
note no low-cardinality field found to promote to a Loki label — omitted the `- labels:` stage; every captured field stays in the extracted map for later stages
note left in the extracted map (NOT promoted to labels — high cardinality would explode Loki streams): number2, literal, ip1, ip2, number3, number4, number5, number6, number7, timestamp, timestamp2, literal2

syslog-ng

parser p_aws_vpc_flow {
    regexp-parser(
        prefix(".aws_vpc_flow.")
        patterns("2 (?<number2>-?\\d+(?:\\.\\d+)?) (?<literal>(?:[A-Za-z]+-\\d+[A-Za-z]+\\d+[A-Za-z]+\\d+|[A-Za-z]+-\\d+[A-Za-z]+\\d+[A-Za-z]+\\d+[A-Za-z]+\\d+)) (?<ip1>\\d{1,3}(?:\\.\\d{1,3}){3}) (?<ip2>\\d{1,3}(?:\\.\\d{1,3}){3}) (?<number3>-?\\d+(?:\\.\\d+)?) (?<number4>-?\\d+(?:\\.\\d+)?) (?<number5>-?\\d+(?:\\.\\d+)?) (?<number6>-?\\d+(?:\\.\\d+)?) (?<number7>-?\\d+(?:\\.\\d+)?) (?<timestamp>\\d+) (?<timestamp2>\\d+) (?<literal2>[A-Za-z]+) OK")
    );
};

note regex: the message tail diverges across lines and could not be split into stable fields — the trailing per-token literal group(s) are positional word-slices of that unstructured message, not stable fields
note captured fields are stored as name-value pairs under the prefix ".aws_vpc_flow." (e.g. a group (?<srcip>…) becomes ".aws_vpc_flow.srcip")

FAQ

What are the columns in a default VPC Flow Log line?: The version-2 default is, in order: version, account-id, interface-id, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action, log-status. They are space-delimited and strictly positional, so the parser reads by column position — there are no keys.
Why is the protocol a number like 6 or 17?: It is the IANA protocol number, not a name: 6 is TCP, 17 is UDP, 1 is ICMP. Any rule that thinks in protocol names must map these numbers first. This is a common source of misparsed flow logs when someone expects "TCP" and gets "6".
What do the start and end fields represent?: They are Unix epoch seconds bounding the aggregation window for the flow, because VPC Flow Logs aggregate traffic over a capture window rather than logging each packet. So packets and bytes are totals across that window, not per-packet values — keep that in mind when computing rates.
Does the column order ever change?: Yes. VPC Flow Logs support a custom format where you choose the fields and their order (adding tcp-flags, vpc-id, pkt-srcaddr, flow-direction, and others). When a custom format is in use the default positional layout no longer applies — the format string is authoritative and the parser must follow it, not the version-2 default.

Try it on your own AWS VPC Flow lines

Paste a few real lines, review the detected fields, and copy whichever format your stack needs. Free, no account, nothing uploaded.

Open this sample in LogForge →