Stay alert – taking action upon log file events (part 1)

Almost every application writes log files. Where organizations differ (vastly) is in how and to which extent these logs are used. Aside from the ultimate no-go of not checking them at all, this takes discipline and effort, especially for timely reactions. This post shows how to automate the tedious task of checking log files (especially those from Java backends), how to consolidate them into existing infrastructures (like the Windows Event Log) and how to effortlessly generate alerts for serious incidents.

Ways of doing more with logs

Log files per se are… just files. Unless someone is taking action, they keep their knowledge (good or bad, trivial or serious) to themselves. A first step is achieved by establishing a routine of informally checking logs – but this is neither satisfying nor exhaustive. A lot more is possible – and one can draw up a hierarchy of options:

informal > discrete events > outliers > patterns / learning

This post focuses on handling discrete events, i.e. on means to check each event by itself and trigger some action based on it (or let it go). I’ll use logstash, an Open Source log shipper. It became popular as part of the ELK stack (as in Elasticsearch, Logstash and Kibana) – however, it can perfectly be used on its own. The basic idea is the same as tail -f though a lot more elaborate.

My intention is to show that it pays of getting started with rather simple steps and that automating basic actions is already a valuable step forward in day-to-day operations.

Basic setup

I’ll use a simple web application with the mere intent of generating logs. It runs on the widely used Tomcat webapp server – and produces log files using the java.util.logging package along with the OneLineFormatter from Apache Juli. I write to a deidcated log file and pick that one up again using Logstash – producing either Windows Event Log entries via shell invocation or e-mails (tested by greenmail, can be replaced with any actual e-mail system).

Here’s an overview of this setup:

Tomcat to logfile to logstaash to windows event log and mail server

A typical log line looks like this:

04-Mar-2018 12:14:33.356 SEVERE [http-nio-8080-exec-5]$2.buttonClick Some error 1

Exceptions do happen – here’s how they look like in a log (lines begin with spaces):

04-Mar-2018 12:14:33.884 SEVERE [http-nio-8080-exec-6]$3.buttonClick Some exception
04-Mar-2018 12:14:33.884 SEVERE [http-nio-8080-exec-6]$3.buttonClick Some exception 
  java.lang.RuntimeException: Some error 2 

After working through all the steps below, I’ll be at the point of getting a Windows Event Log entry and/or an e-mail each time there’s an exception in the log – without any manual involvement.

Basic logstash

Logstash is basically a pipeline of inputs, filters and outputs. What makes it so powerful is that they can be combined almost infinitely: any input can go through any filter and end up in any output. The most common output is Elasticsearch – but it doesn’t have to be that way: I exclusively use two other outputs to get the job done. In fact, there’s no need to ever bring elasticsearch in when there’s no use for it. And if the use arises later (when moving beyond discrete events), I could still do it.

Getting the config in place

Logstash is configured using configuration files that use a custom format close to (but not similar to) JSON. Essentially, it builds a tree of hashes, arrays and values. The base structure is this:

# this is a comment
input {
# one or more inputs
filter {
# zero or more filters
output {
# one or more outputs

So, I need to get started with a file:

input {
  file { 
    codec => multiline { 
      pattern => "^\s" 
      what => "previous" 
      auto_flush_interval => 10 
    path => "C:/apache-tomcat-8.0.30/logs/vlog-0.log"

Essentially, I could have stuck to just “path” (and leave out “codec”), but as exceptions are multi-line, I define each line that is indented (starts w/ some form of whitespace) as an exception and as line belonging to the event of the line before. New versions of logstash have another option for handling the last line in the log being an exception: when there’s not another line for the given number of seconds (here: 10), the event is considered complete (otherwise, the last event would be stuck until another line arrives). There’s more info on this in the reference manual.

Besides reading files, there’s a lot more inputs, including e.g. opening an HTTP endpoint listening for messages, listening to redis, etc. (see a full list).

What the above config gives me is one event sent down the pipe each time the log is written to. However, the event is merely a junk of text in the “message” variable of each event. There is little done in terms of analysis to gauge how important it is and what it relates to. So, I’ll go ahead and add a filter to do some parsing:

filter { 
  grok { 
    remove_field => ["MONTHDAY", "MONTH", "YEAR", "TIME"]

That leaves “message” in the event, but adds some more variables with data pulled from “message”: it takes the loglevel, thread name, trigger (i.e. method producing the event) and the message (up to the end of the line) while discarding date info (for now).

The pattern is specific for the application, logging framework used and logging pattern configured. However, customizing is quite straightforward – and there is help: The grok debugger takes an sample line from the log (can be anonymized) plus the “match” expression and prints which variables would be generated from the log. This allows for fine-tuning the expression until it really works.

So, all that comes next in the pipe can rely on an event object with some more – and more meaningful – data.

In fact, I’m good to go now to trigger some actions. Here’s a vast collection of possiblities of what that can be (from sending an e-mail or triggering a shell command to calling an HTTP endpoint or writing to elasticsearch or redis). I choose the two earlier (e-mail / shell) to send out an alert via e-mail and write a new Windows Event Log (and let the Windows infrastructure take it from there – see e.g. this post on reacting to the Event Log). I constrain the actions to only apply to “SEVERE” errors (called “ERROR” in other frameworks) while dropping “WARN”, “INFO” and below.

Here’s how to achieve this:

output {
  if [LOGLEVEL] == "SEVERE" { 
    # use elevated permissions or comment out 
    exec { 
      command => "eventcreate /ID 1 /L APPLICATION /SO Logstash /T ERROR /D \"Logstash: %{MSG}\"" 
    } # start greenmail or comment out 
    email { 
      options => { 
        address => "localhost" 
        port => 3025  
      from => "logstash-mailer@localhost" 
      to => "nightie@localhost" 
      subject => "Logstash Error" 
      body => "%{message}" 

I use eventcreate to generate Windows Event Log entries. And unless I want to configure my own log, I can use elevated permissions and write to the default application log. This one-liner is indeed all it takes.

For e-mails, it’s quite simiar: I have to give server and sender / recipient (and possibly an e-mail account, depending on the environment; there’s a complete reference.

Also, I could opt to use only one of exec or e-mail if I wanted to write only Windows Event Log entries or only send e-mails.

Running it

After saving the config to tomcat2evntAndMail.spec, I can run logstash via

C:\elk\logstash\bin\logstash.bat agent -f C:/elk/tomcat2evntAndMail.spec

This starts logstash (and never returns as long as logstash is running). To start the greenmail for testing, I use the standalone JAR version and call

java -Dgreenmail.setup.test.all -jar greenmail-standalone-1.5.7.jar

In order to retrieve e-mails, I went the easy way, i.e. telnet: this simple sequence is all it takes is

telnet localhost 3110 
USER nightie@localhost 
PASS nightie@localhost 

(where <X> is one of the LIST).

This is what the result looks like:

console with the POP3 commands from above

Summary, Wrap-up and Outlook

I now have a pipeline that proactively sends out alerts each time an error ends up in the log. This frees from checking log files manually – and allows for a more timely response (and for flagging errors like other tasks to follow up). I don’t yet check for systematic issues, i.e. the relation between several errors (or several info-level events) – which will be subject ahead in this series.

I hope you find this useful – and I’m looking forward to your feedback…

Posted in All