bokamba / logforge / parse / OpenSSH / sshd auth

$ logforge parse sshd

Parse OpenSSH / sshd auth logs → regex, Grok, Wazuh & rsyslog

OpenSSH's server daemon, sshd, is the authentication log source most SOCs watch first, because every interactive login to a Linux host passes through it. It logs via syslog to the auth facility, which lands in /var/log/auth.log on Debian/Ubuntu and /var/log/secure on RHEL/CentOS. Each entry is a standard RFC 3164 syslog line — a bare timestamp with no year (Jul 3 14:22:15, note the two spaces padding a single-digit day), the hostname, then the program tag with its PID in brackets (sshd[4721]:) — followed by a free-form English message that sshd builds itself. That message is the hard part: it is not structured, it is a sentence, and its wording changes with the outcome.

The messages that matter for detection are a small, well-known set with fixed shapes you can pin regexes to. 'Failed password for invalid user admin from 203.0.113.45 port 51234 ssh2' means someone tried to authenticate as a username that does not exist on the box — the 'invalid user' phrase is the tell that this is untargeted brute-forcing. 'Failed password for root from …' (without 'invalid user') means the account exists and someone is guessing its password. 'Accepted publickey for berkay from 192.0.2.10 port 50022 ssh2: ED25519 SHA256:kQx9mZ2v' is a successful key-based login, and the trailing key type and fingerprint let you tie the session to a specific key. Other high-signal lines include 'Accepted password' (a successful password login, often worth alerting on for privileged accounts), 'Connection closed by … [preauth]', and 'Disconnected from invalid user'.

Parsing sshd is non-trivial precisely because the payload is prose with optional clauses. The word 'invalid user' appears only sometimes and shifts the position of the username; the trailing 'ssh2: ED25519 SHA256:…' clause exists only on publickey acceptances; and the RFC 3164 header carries no year, so anything doing date math has to infer it (and cope with the December-to-January rollover). The fields worth extracting are the outcome (Failed/Accepted), the auth method (password vs publickey), the username, the source IP and port, and — on success — the key fingerprint. Those feed the canonical detections: brute-force by counting failures per source IP, credential-stuffing by counting distinct usernames per source, and lateral-movement hunting by flagging a successful login from an IP that only ever produced failures before.

Open this in LogForge →

What an OpenSSH / sshd auth line looks like

The syslog sample below is fed verbatim into the engine to produce every parser on this page.

Jul  3 14:22:15 fw01 sshd[4721]: Failed password for invalid user admin from 203.0.113.45 port 51234 ssh2
Jul  3 14:22:22 fw01 sshd[4725]: Failed password for invalid user oracle from 198.51.100.23 port 50022 ssh2

Detected fields

The engine classified this sample as syslog3164 and consolidated 15 fields across 2 lines. Fields marked literal were identical on every sample line, so they are baked into the pattern as anchors rather than captured.

  • timestamp : timestamp
  • hostname : hostname · literal
  • program : literal · literal
  • pid : number
  • _lit1 : literal · literal
  • _lit2 : literal · literal
  • _lit3 : literal · literal
  • _lit4 : literal · literal
  • _lit5 : literal · literal
  • user : username
  • _lit6 : literal · literal
  • srcip : ipv4
  • _lit7 : literal · literal
  • port1 : port
  • _lit8 : literal · literal

Regex (named capture groups)

# sample: Jul  3 14:22:15 fw01 sshd[4721]: Failed password for invalid user admin from 203.0.113.45 port 51234 ssh2
# groups: timestamp=Jul  3 14:22:15, pid=4721, user=admin, srcip=203.0.113.45, port1=51234
^(?<timestamp>[A-Za-z]+  \d+ \d+:\d+:\d+) fw01 sshd\[(?<pid>-?\d+(?:\.\d+)?)\]: Failed password for invalid user (?<user>[A-Za-z0-9._@-]+) from (?<srcip>\d{1,3}(?:\.\d{1,3}){3}) port (?<port1>\d{1,5}) ssh2$

Grok pattern (Logstash / Elastic)

%{SYSLOGTIMESTAMP:timestamp} fw01 sshd\[%{NUMBER:pid}\]: Failed password for invalid user %{USERNAME:user} from %{IPV4:srcip} port %{INT:port1} ssh2
  • note constant field "hostname" embedded as literal anchor "fw01" (varying=false)

Wazuh decoder (OS_Regex XML)

<!--
  Generated by LogForge - Wazuh decoder (OS_Regex dialect, not PCRE)
  sample: Jul  3 14:22:15 fw01 sshd[4721]: Failed password for invalid user admin from 203.0.113.45 port 51234 ssh2
  test with: /var/ossec/bin/wazuh-logtest
-->

<decoder name="sshd-syslog3164">
  <program_name>^sshd$</program_name>
</decoder>

<decoder name="sshd-syslog3164">
  <parent>sshd-syslog3164</parent>
  <regex offset="after_parent">^Failed password for invalid user (\w+) from (\d+.\d+.\d+.\d+) port (\d+) ssh2</regex>
  <order>user, srcip, port1</order>
</decoder>
  • note syslog header fields handled by Wazuh pre-decoding (not re-parsed): timestamp, hostname, program, pid
  • note parent matches by <program_name> (1 program(s) seen in the samples) — extend the alternation for other programs
  • note decoder order and prematch specificity may need site-specific tuning (other decoders in your ruleset can shadow these) — validate with /var/ossec/bin/wazuh-logtest

rsyslog template / liblognorm rulebase

# rsyslog template — put in /etc/rsyslog.d/sshd.conf
# Emits the detected RFC 3164 header fields plus the raw
# message body as JSON-ish text using standard rsyslog properties.
# Bind the template to an action, e.g.:
#   action(type="omfile" file="/var/log/sshd.log" template="sshd")
# A literal "%" inside a template string must be escaped as "%%".
template(name="sshd" type="string" string="{\"timestamp\":\"%timereported%\",\"host\":\"%hostname%\",\"program\":\"%programname%\",\"procid\":\"%procid%\",\"msg\":\"%msg:::json%\"}\n")

# --- mmnormalize rulebase for the message body ----------------------
# rsyslog properties cover only the syslog header. To extract the fields
# inside the message body, save the lines below (starting at 'version=2',
# which must be the VERY FIRST line of the file) as /etc/rsyslog.d/sshd.rb
# and load them with:
#   module(load="mmnormalize")
#   action(type="mmnormalize" rulebase="/etc/rsyslog.d/sshd.rb")
version=2
rule=sshd_msg: Failed password for invalid user %user:word% from %srcip:ipv4% port %port1:number% ssh2
  • note header property mapping: timestamp -> %timereported%, hostname -> %hostname%, program -> %programname%, pid -> %procid%
  • note message-body fields cannot be extracted by rsyslog properties alone — an mmnormalize (liblognorm v2) rulebase for the msg part is appended below the template; save it as its own .rb file
  • note dropped header separator "]:" before the first message field (it belongs to the syslog tag, not the msg property)
  • note chosen parser types: user=word, srcip=ipv4, port1=number

FAQ

What does "invalid user" mean in an sshd Failed password line?
It means the username being tried does not exist on the system. sshd inserts the phrase 'invalid user' before the name (Failed password for invalid user admin from …) to distinguish it from a failure against a real account. A flood of invalid-user failures is the classic signature of untargeted SSH brute-forcing, whereas repeated failures for a real account like root suggest a targeted guess.
How do I detect an SSH brute-force attack from these logs?
Count 'Failed password' events grouped by source IP over a short window; a single address producing dozens of failures — especially against invalid users — is brute-forcing. The higher-fidelity signal is a source that shows many failures followed by an 'Accepted password' or 'Accepted publickey': that is a successful break-in and should page, not just alert.
Why is there no year in the sshd log timestamp?
Because it uses the RFC 3164 (BSD syslog) header, which only records month, day, and time — never the year. Parsers must supply the year from context (usually the current year), and take care around the year boundary where a December log read in January can be misdated. RFC 5424 syslog and journald both fix this by using full ISO 8601 timestamps.
Which fields should I extract from an sshd auth log for a SIEM?
The outcome (Accepted vs Failed), the authentication method (password vs publickey), the username, the source IP and source port, and on successful publickey logins the key type and fingerprint. Those five to six fields cover brute-force detection, credential-stuffing (distinct users per source), and attribution of a session to a specific key.

Try it on your own OpenSSH / sshd auth lines

Paste a few real lines, review the detected fields, and copy whichever format your stack needs. Free, no account, nothing uploaded.

Open this sample in LogForge →