Logstash grok expressions

Revision as of 16:28, 5 February 2015 by WikiFreak (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Logstash grok expressions

Grok is a language based on regular expressions. This is the heart of Logstash.

Thanks to Grok each log event can be analyzed and split into fields.


Tooling

You can create your own grok patterns and test them with the following on-line processor:

http://grokdebug.herokuapp.com/


Grok setup

Grok is installed with Logstash. So you don't have to install anything. :)


Put all your configuration files in /etc/logstash/grok/*.grok


Grok usage

You can use any Grok expression in a Logstash configuration file.

In the Logstash match criteria do:

# Match a single expression
match => [ "message", "%{LOG4J}" ]


# Try to apply many pattern to an expression (until a success is found)
match => [ 
   "message", "%{LOG4J_COMMON_PATTERN_V1}", 
   "message", "%{LOG4J_COMMON_PATTERN_V2}", 
   "message", "%{LOG4J_COMMON_PATTERN_V3}", 
   "message", "%{LOG4J_COMMON_PATTERN_V4}", 
   "message", "%{LOG4J_COMMON_PATTERN_V5}", 
   "message", "%{LOG4J}" 
]

Just use %{Grok_rule}


Grok expressions

Here are some GROK expressions you can use right away!


Apache2 error log

Create configuration file:

vim /etc/logstash/grok/apache2ErrorLog.grok


Put the following content:

HTTPERRORDATE %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}
APACHEERRORLOG \[%{HTTPERRORDATE:timestamp}\] \[%{WORD:severity}\] \[client %{IPORHOST:clientip}\] %{GREEDYDATA:message_remainder}


IpTables

Create configuration file:

vim /etc/logstash/grok/iptables.grok


Put the following content:

NETFILTERMAC %{COMMONMAC:dst_mac}:%{COMMONMAC:src_mac}:%{ETHTYPE:ethtype}
ETHTYPE (?:(?:[A-Fa-f0-9]{2}):(?:[A-Fa-f0-9]{2}))

# Iptables generic values
IPTABLES_MAC_LAYER IN=(%{WORD:in_device})? OUT=(%{WORD:out_device})? *(MAC=(%{NETFILTERMAC})?)?
IPTABLES_SRC_DEST SRC=(%{IP:src_ip})? DST=(%{IP:dst_ip})?
IPTABLES_FLAGS LEN=%{INT:pkt_length} *(TOS=%{BASE16NUM:pkt_tos})? *(PREC=%{BASE16NUM:pkt_prec})? *(TTL=%{INT:pkt_ttl})? *(ID=%{INT:pkt_id})? (?:DF)*
IPTABLES_PROTOCOL PROTO=%{WORD:protocol}
IPTABLES_HEADER %{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME} .* %{IPTABLES_MAC_LAYER} %{IPTABLES_SRC_DEST} %{IPTABLES_FLAGS} %{IPTABLES_PROTOCOL}

# IPv6 + v4
IPTABLES_IP_SUFFIX SPT=%{INT:src_port} DPT=%{INT:dst_port} *(WINDOW=%{INT:pkt_window})? *(RES=%{BASE16NUM:pkt_res})? .* *(URGP=%{INT:pkt_urgp})?
IPTABLES_IP %{IPTABLES_HEADER} %{IPTABLES_IP_SUFFIX}

# ICMP
IPTABLES_ICMP %{IPTABLES_HEADER} *(TYPE=%{INT:icmp_type})? *(CODE=%{BASE16NUM:icmp_code})?

# Generic pattern
IPTABLES_GENERIC %{IPTABLES_HEADER} (?<content>(.|\r|\n)*)

# Error pattern
IPTABLES_ERROR %{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME} .* %{IPTABLES_MAC_LAYER} %{IPTABLES_SRC_DEST} (?<content>(.|\r|\n)*)


Fail2ban

Create configuration file:

vim /etc/logstash/grok/fail2ban.grok


Put the following content:

FAIL2BAN %{TIMESTAMP_ISO8601:timestamp} %{JAVACLASS:criteria}: %{LOGLEVEL:level} \[%{WORD:service}\] Ban %{IPV4:clientip}


Log4j

We use some common log4j patterns, it's easy to extract the overall log message:

###### %d %5p %t %c - %m%n 

LOG4J ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - *(%{GREEDYDATA:content})

# Some logs might start with spaces :'S ...
LOG4J_COMMON_PATTERN_V1 .* %{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - (?<content>(.|\r|\n)*)
LOG4J_COMMON_PATTERN_V2 .* %{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - (?<content>(.|\r|\n)*)

# Nominal cases
LOG4J_COMMON_PATTERN_V3 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - (?<content>(.|\r|\n)*)
LOG4J_COMMON_PATTERN_V4 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - (?<content>(.|\r|\n)*)

# When log is split on many lines right away
LOG4J_COMMON_PATTERN_V5 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} (?<content>(.|\r|\n)*)
LOG4J_COMMON_PATTERN_V6 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} (?<content>(.|\r|\n)*)


###### %d %5p %c{1} - %m%n 

# Some logs might start with spaces :'S ...
LOG4J_ALT_PATTERN_V1 .* %{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{WORD:logger} - (?<content>(.|\r|\n)*)
LOG4J_ALT_PATTERN_V2 .* %{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{WORD:logger} - (?<content>(.|\r|\n)*)

# Nominal cases
LOG4J_ALT_PATTERN_V3 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{WORD:logger} - (?<content>(.|\r|\n)*)
LOG4J_ALT_PATTERN_V4 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{WORD:logger} - (?<content>(.|\r|\n)*)

# When log is split on many lines right away
LOG4J_ALT_PATTERN_V5 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{WORD:logger} (?<content>(.|\r|\n)*)
LOG4J_ALT_PATTERN_V6 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{WORD:logger} (?<content>(.|\r|\n)*)


###### %d %5p %t %c{1} - %m%n 

# Some logs might start with spaces :'S ...
LOG4J_ALT_2_PATTERN_V1 .* %{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} - (?<content>(.|\r|\n)*)
LOG4J_ALT_2_PATTERN_V2 .* %{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} - (?<content>(.|\r|\n)*)

# Nominal cases
LOG4J_ALT_2_PATTERN_V3 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} - (?<content>(.|\r|\n)*)
LOG4J_ALT_2_PATTERN_V4 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} - (?<content>(.|\r|\n)*)

# When log is split on many lines right away
LOG4J_ALT_2_PATTERN_V5 ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} (?<content>(.|\r|\n)*)
LOG4J_ALT_2_PATTERN_V6 ^%{TIMESTAMP_ISO8601:timestamp} .* %{LOGLEVEL:level} %{USERNAME:thread} %{WORD:logger} (?<content>(.|\r|\n)*)


Super strong expression

To match multiple cases at once:

  •  %d %5p %t %c - %m%n
  •  %d %5p %t %c{1} - %m%n
  •  %d %5p %c - %m%n
  •  %d %5p %c{1} - %m%n
^\s*%{TIMESTAMP_ISO8601:timestamp}\s*%{LOGLEVEL:level} (?:(%{USERNAME:thread} %{JAVACLASS:logger}|%{USERNAME:thread} {WORD:logger}|%{JAVACLASS:logger}|%{WORD:logger})) (?<content>(.|\r|\n)*)


VEHCO specific patterns

My company, VEHCO, like all companies has some specific logs. The following example explains how to use Grok.


Logs

2014-11-21 12:00:47,922 TRACE rabbitmq-cxn-2-consumer com.vehco.rtd.smartcard.service.business.AuthClient \ 
   - Replying to OBC auth data DONE. Smart-card --> OBC   |   smartcardId 02951DA314000000
2014-11-21 12:38:26,981 TRACE rabbitmq-cxn-2-consumer com.vehco.rtd.smartcard.service.dao.ampq.JmsTopicListener \
   -  [x] Received message 'startAuthentication' for smart-card: 02667AA314000000, consumer smartcardId: 02667AA314000000
2014-11-21 12:38:27,033 TRACE rabbitmq-cxn-2-consumer com.vehco.rtd.smartcard.service.cardreaderlisthandler.cardreader.ReaderLocker \
   - Terminal: OMNIKEY AG CardMan 3121 02 00 | Smart-card ID: 02667AA314000000 # locked
2014-11-21 12:38:30,920 TRACE rabbitmq-cxn-2-consumer com.vehco.rtd.smartcard.service.cardreaderlisthandler.cardreader.ReaderLocker \
   - Terminal: OMNIKEY AG CardMan 3121 02 00 | Smart-card ID: 02667AA314000000 # unlocked


Grok patterns

LOG_SENTENCE (?:[A-Za-z0-9\s\-><\\/.+*\[\]&%'#]+)*
RTD_TERMINAL_SUFFIX Terminal: %{LOG_SENTENCE:rtd_terminal_id} .* *(Smart-card ID: %{WORD:rtd_smartcard_id}) # %{WORD:rtd_terminal_state}
RTD_AUTH_START_SUFFIX %{LOG_SENTENCE:rtd_action}: %{WORD:rtd_smartcard_id}
RTD_AUTH_DONE_SUFFIX %{LOG_SENTENCE:rtd_action}. *(smartcardId %{WORD:rtd_smartcard_id})?


RTD_TERMINAL ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - %{RTD_TERMINAL_SUFFIX}
RTD_AUTH_START ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - %{RTD_AUTH_START_SUFFIX}
RTD_AUTH_DONE ^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{USERNAME:thread} %{JAVACLASS:logger} - %{RTD_AUTH_DONE_SUFFIX}


Just put all these patterns inside a dedicated file: /etc/logstash/grok/vehco_rtd.grok