Sending logs form multiple instances to sematext or elastic search, using rsyslog, logstash and beantstalk

20 May    aws, linux, logstash, rsyslog

Hello all,

I have somewhat been lazy at writing tech articles recently, mostly because I am working on some very interesting private projects for now. Note that if you do not want to trouble yourself and you don’t care about implementing your own stuffs, you can just use cloudwatch. I wanted to write down a cheap alternative of how I used to do logging before, plus it had to be something that can be modified to be used inside our own infrastructure without any cloud approach as well. You may as well use filebeat and logstash.

The following is a bit how my final result will be like:

 

A picture is worth a thousand words as they say, hence, here’s one for you:

To proceed, we will call the multiple static instance and dynamic workers as client.

 

Since we do not want to do this manually each time, and that beanstalk is already used to generate our instance, we could take advantage of the ebextensions that aws provides for such scenarios. Ofcourse, you can use ansible as well for this purpose.

So you will need to create one s3 bucket, named yours3-conf-bucket and allow the security groups of the machines involved here to have read access to it.

Note that I have used one central key to encrypt the communication between these machines, you can obviously use multiple keys for each machines and generate them on run time using the rsyslogconfig.sh configuration script file. Use aws server variables wisely here to implement multiple keys. But for the sake of simplicity, let’s proceed with one single static key setup. To generate keys you can find some examples here.

 

You will also need to add a dns entry to point to your log server, even if all the ip addresses concerned are private ips inside a vpc, for the readability, i made a dns entry to point to my log server. In this case, I did an entry rsyslog-internal.gouvernma.mu in my route53, you can obviously use something else.

ebextensions

In the packages section, you can see that I have set the different packages that I will need for this kind of setup. This is a yaml file, so be careful with tabs and spaces.

packages:
yum:
rsyslog: []
gnutls: []
gnutls-utils.x86_64: []
cronie: []
libzip: []
rsyslog-gnutls: []

commands:
01keys_directory:
command: mkdir -p /etc/rsyslog/keys/
commands:
02yum_clean:
command: yum clean all -y
commands:
03yum_update:
command: yum update -y

files:
"/etc/rsyslog.d/client-rsyslog.conf":
mode: "000644"
owner: root
group: root
authentication: "S3Auth"
source: "https://s3.eu-central-2.amazonaws.com/yours3-conf-bucket/client-rsyslog.conf"

"/etc/rsyslog/keys/ca.pem":
mode: "000644"
owner: root
group: root
authentication: "S3Auth"
source: "https://s3.eu-central-2.amazonaws.com/yours3-conf-bucket/ca.pem"

"/etc/rsyslog/keys/key.pem":
mode: "000644"
owner: root
group: root
authentication: "S3Auth"
source: "https://s3.eu-central-2.amazonaws.com/yours3-conf-bucket/rslclient-key.pem"

"/etc/rsyslog/keys/cert.pem":
mode: "000644"
owner: root
group: root
authentication: "S3Auth"
source: "https://s3.eu-central-2.amazonaws.com/yours3-conf-bucket/rslclient-cert.pem"

"/etc/yum.repos.d/rsyslog.repo":
mode: "000644"
owner: root
group: root
authentication: "S3Auth"
source: "https://s3.eu-central-2.amazonaws.com/yours3-conf-bucket/rsyslog.repo"

"/home/ec2-user/rsyslogconfig.sh":
mode: "000755"
owner: root
group: root
content: |
#!/bin/bash
if [ "${IS_TEST}" = "true" ]; then
sed -i 's/apache-access/apache-access-test/g' /etc/rsyslog.d/client-rsyslog.conf
sed -i 's/apache-error/apache-error-test/g' /etc/rsyslog.d/client-rsyslog.conf
sed -i 's/sqs-log/sqs-log-test/g' /etc/rsyslog.d/client-rsyslog.conf
fi
/sbin/service rsyslog restart

container_commands:
08applicationlog:
command: /bin/touch /var/log/gouvernma.log && /bin/chown webapp:webapp /var/log/gouvernma.log
09RWapplicationlog:
command: /bin/chmod 700 /var/log/gouvernma.log
10rsyslog_config:
command: /home/ec2-user/rsyslogconfig.sh

services:
sysvinit:
rsyslog:
enabled: true

 

Repo File

Since aws linux does not provide the latest rsyslog in its repo, I used the centos6 repos which are closer to it at the time of writing. Using the adiscon repo. You can read more about it here.

 
[v8-stable]
name=Adiscon CentOS-6 - local packages for \$basearch
baseurl=http://rpms.adiscon.com/v8-stable/epel-6/\$basearch
enabled=1
priority=10
gpgcheck=0
gpgkey=http://rpms.adiscon.com/RPM-GPG-KEY-Adiscon
protect=1

 

Rsyslog Client side configuration

As you can see, I am fetching a file called client-rsyslog.conf this file will actually be the one responsible to tell rsyslog how to send logging data to the rsyslog server. The beanstalk configuration will drop that configuration file there. Obviously, you can use ansible to do ths.

/etc/rsyslog.d/client-rsyslog.conf
$ModLoad imfile
$InputFilePollInterval 10
$PrivDropToGroup adm
$WorkDirectory /tmp

$DefaultNetstreamDriverCAFile /etc/rsyslog/keys/ca.pem
$DefaultNetstreamDriverCertFile /etc/rsyslog/keys/cert.pem
$DefaultNetstreamDriverKeyFile /etc/rsyslog/keys/key.pem

$DefaultNetstreamDriver gtls
$ActionSendStreamDriverMode 1
$ActionSendStreamDriverAuthMode anon

# SQS Log
$InputFileName /var/log/aws-sqsd/default.log
$InputFileTag sqs-log:
$InputFileStateFile stat-sqs-log
$InputFileSeverity error
$InputFilePersistStateInterval 10000
$InputRunFileMonitor

#any custom logs you have
# gouvernma Application Log
$InputFileName /var/log/gouvernma.log
$InputFileTag pc-application-log:
$InputFileStateFile stat-pc-application-log
$InputFileSeverity info
$InputFilePersistStateInterval 10000
$InputRunFileMonitor

# Apache access file:
$InputFileName /var/log/httpd/access_log
$InputFileTag apache-access:
$InputFileStateFile stat-apache-access
$InputFileSeverity info
$InputFilePersistStateInterval 10000
$InputRunFileMonitor

#Apache Error file:
$InputFileName /var/log/httpd/error_log
$InputFileTag apache-error:
$InputFileStateFile stat-apache-error
$InputFileSeverity error
$InputFilePersistStateInterval 10000
$InputRunFileMonitor

if $programname == 'apache-access' then @@rsyslog-internal.gouvernma.mu:514
if $programname == 'apache-error' then @@rsyslog-internal.gouvernma.mu:514
if $programname == 'pc-application-log' then @@rsyslog-internal.gouvernma.mu:514
if $programname == 'sqs-log' then @@rsyslog-internal.gouvernma.mu:514
if $programname == 'mem-cpu-log' then @@rsyslog-internal.gouvernma.mu:514

 

 

Part of my rsyslog.conf

module(load="imuxsock") # provides support for local system logging (e.g. via logger command)
module(load="imklog") # provides kernel logging support (previously done by rklogd)
# Use default timestamp format
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat

# Include all config files in /etc/rsyslog.d/
$IncludeConfig /etc/rsyslog.d/*.conf

 

How about the Server configuration now

Now, we will need something on the server to not only allow communications form only those machines, but also accept only encrypted communication using those keys.

Not only that, but we also need to tell rsyslog to format our log into JSON so as we can proceed to the next step with logstash.

You will notice that to make the json template files, a lot of regular expressions were used, don’t be afraid, it is pretty fun once to get the hang of it, but remember  Rsyslog uses POSIX ERE (and optionally BRE) expressions, so you need to be quite creative writing them out.

so here we go with our server /etc/rsyslog.conf

 
# Provides UDP syslog reception
$DefaultNetstreamDriver gtls
$DefaultNetstreamDriverCAFile /etc/rsyslog/keys/ca.pem
$DefaultNetstreamDriverCertFile /etc/rsyslog/keys/cert.pem
$DefaultNetstreamDriverKeyFile /etc/rsyslog/keys/key.pem

$ModLoad imtcp
$InputTCPServerStreamDriverAuthMode x509/name
$InputTCPServerStreamDriverPermittedPeer *.xxxx.gouvernma.mu #host you want to be allowed in
$InputTCPServerStreamDriverMode 1 # run driver in TLS-only mode

#$ModLoad imudp
#$UDPServerRun 514 #chose your port
# Provides TCP syslog reception
$InputTCPServerRun 514
$IncludeConfig /etc/rsyslog.d/*.conf

 

Now, we also need to parse those log files and convert them to json, that’s what our config in /etc/rsyslog.d/* are for 🙂

01-access-template-json.conf

This will be used to convert our apache access logs to json format
template(name="access-json-template"
type="list") {
constant(value="{")
constant(value="\"@timestamp\":\"") property(format="json" name="timereported" dateFormat="rfc3339")
constant(value="\",\"message\":\"") property(format="json" name="msg")
constant(value="\",\"sysloghost\":\"") property(format="json" name="hostname")
constant(value="\",\"severity\":\"") property(format="json" name="syslogseverity-text")
constant(value="\",\"facility\":\"") property(format="json"name="syslogfacility-text")
constant(value="\",\"programname\":\"") property(format="json" name="programname")
constant(value="\",\"client_ip\":\"") property(format="json" name="msg" regex.type="ERE" regex.submatch="2" regex.nomatchmode="FIELD" regex.expression="(\\()([^\\)]+)")
constant(value="\",\"endpoint\":\"") property(format="json" name="msg" regex.type="ERE" regex.submatch="4" regex.nomatchmode="FIELD" regex.expression="(\\[(.*)\\])[^(GET|PUT|POST|DELETE|PATCH)]+(GET|PUT|POST|DELETE|PATCH)(.*)(HTTP)")
constant(value="\",\"app-date\":\"") property(format="json" name="msg" regex.type="ERE" regex.submatch="2" regex.nomatchmode="FIELD" regex.expression="(\\[([^ ]+))")
constant(value="\",\"method\":\"") property(format="json" name="msg" regex.type="ERE" regex.submatch="3" regex.nomatchmode="FIELD" regex.expression="(\\[(.*)\\])[^(GET|PUT|POST|DELETE|PATCH)]+(GET|PUT|POST|DELETE|PATCH)(.*)(HTTP)")
constant(value="\",\"log\":\"") property(format="json" name="msg" regex.type="ERE" regex.submatch="3" regex.nomatchmode="FIELD" regex.expression="(\\[(.*)\\])(.*)")
constant(value="\",\"response\":\"") property(format="json" name="msg" regex.type="ERE" regex.submatch="2" regex.nomatchmode="FIELD" regex.expression="(HTTP\\/1\\.1)\" ([0-9]+) (([0-9]+)|(-)) \"")
constant(value="\",\"procid\":\"") property(format="json" name="procid")
constant(value="\"}\n")
}

02-error-json-template

This will convert our apache error log file to json format.
template(name="error-json-template"
type="list") {
constant(value="{")
constant(value="\"@timestamp\":\"") property(format="json" name="timereported" dateFormat="rfc3339")
constant(value="\",\"@version\":\"1")
constant(value="\",\"message\":\"") property(name="msg" format="json")
constant(value="\",\"sysloghost\":\"") property(format="json" name="hostname")
constant(value="\",\"severity\":\"") property(format="json" name="syslogseverity-text")
constant(value="\",\"facility\":\"") property(format="json" name="syslogfacility-text")
constant(value="\",\"programname\":\"") property(format="json" name="programname")
constant(value="\",\"status\":\"") property(format="json" name="msg" regex.type="ERE" regex.submatch="2" regex.nomatchmode="FIELD" regex.expression="(\\[\\:)([^ ]+)(\\])")
constant(value="\",\"pid\":\"") property(format="json" name="msg" regex.type="ERE" regex.submatch="2" regex.nomatchmode="FIELD" regex.expression="(\\[pid) ([^ ]+)(\\])")
constant(value="\",\"client_ip\":\"") property(format="json" name="msg" regex.type="ERE" regex.submatch="2" regex.nomatchmode="FIELD" regex.expression="(\\[client) ([^ ]+)(\\])")
constant(value="\",\"app-date\":\"") property(format="json" name="msg" regex.type="ERE" regex.submatch="1" regex.nomatchmode="FIELD" regex.expression="([A-Z]{1}[a-z]{2} [A-Z]{1}[a-z]{2} [0-9]{2} [0-9]{2}\\:[0-9]{2}\\:[0-9]{2}\\.[0-9]{6} [0-9]{4})")
constant(value="\",\"error_class\":\"") property(format="json" name="msg" regex.type="ERE" regex.submatch="4" regex.nomatchmode="FIELD" regex.expression="(\\[client) ([^ ]+)(\\]) ([^:]+)")
constant(value="\",\"log\":\"") property(format="json" name="msg" regex.type="ERE" regex.submatch="4" regex.nomatchmode="FIELD" regex.expression="(\\[client) ([^ ]+)(\\]) (.*)")
constant(value="\",\"procid\":\"") property(format="json" name="procid")
constant(value="\"}\n")
}

03-gouvernma-json-template

This will be used to convert my custom application logs format into json format. Yes I could have generated it in json when i create it then, but this article is to show how how rsyslog can be used to achieve this.
template(name="gouvernma-json-template"
type="list") {
constant(value="{")
constant(value="\"@timestamp\":\"") property(format="json" name="timereported" dateFormat="rfc3339")
constant(value="\",\"@version\":\"1")
constant(value="\",\"message\":\"") property(name="msg" format="json")
constant(value="\",\"sysloghost\":\"") property(format="json" name="hostname")
constant(value="\",\"facility\":\"") property(format="json" name="syslogfacility-text")
constant(value="\",\"programname\":\"") property(format="json" name="programname")
constant(value="\",\"app-date\":\"") property(format="json" name="msg" regex.type="ERE" regex.submatch="1" regex.nomatchmode="FIELD" regex.expression="([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2})")
constant(value="\",\"severity\":\"") property(format="json" name="msg" regex.type="ERE" regex.submatch="1" regex.nomatchmode="FIELD" regex.expression="(INFO|ERR|EMERG|NOTICE|WARN|DEBUG|ERR)")
constant(value="\",\"log\":\"") property(format="json" name="msg" regex.type="ERE" regex.submatch="5" regex.nomatchmode="FIELD" regex.expression="(((INFO|ERR|EMERG|NOTICE|WARN|DEBUG|ERR) \\(([0-9]+)\\):))(.*)")
constant(value="\"}\n")
}

04-sqs-json-template.conf

This one will convert the aws sqs logs into json
template(name="sqs-json-template"
type="list") {
constant(value="{")
constant(value="\"@timestamp\":\"") property(name="timereported" dateFormat="rfc3339")
constant(value="\",\"@version\":\"1")
constant(value="\",\"message\":\"") property(name="msg" format="json")
constant(value="\",\"sysloghost\":\"") property(name="hostname")
constant(value="\",\"severity\":\"") property(name="msg" regex.type="ERE" regex.submatch="0" regex.nomatchmode="FIELD" regex.expression="(http-err|message)")
constant(value="\",\"facility\":\"") property(name="syslogfacility-text")
constant(value="\",\"programname\":\"") property(name="programname")
constant(value="\",\"log\":\"") property(name="msg" regex.type="ERE" regex.submatch="3" regex.nomatchmode="FIELD" regex.expression="(http-err|message)(: )(.*)")
constant(value="\"}\n")
}

Finally the main file,06-output.conf

This is my main file which directs which log entry to its correct template file  and it will also send the json formatted entries to logstash
# This line sends all lines to defined IP address at port 10514,
# using the "json-template" format template

if $programname == 'apache-access' then {
@@localhost:10514;access-json-template
}
else if $programname == 'apache-error' then {
@@localhost:10514;error-json-template
}
else if $programname == 'pc-application-log' then {
@@localhost:10514;gouvernma-json-template
}
else if $programname == 'sqs-log' then {
@@localhost:10514;sqs-json-template
}
$template TmplAuth, "/var/log/%HOSTNAME%/%PROGRAMNAME%.log"
*.* ?TmplAuth
#*.info,mail.none,authpriv.none,cron.none ?TmplMsg

 

Notice how I have referenced each by their template name, and how depending on which log is coming in, I am pushing it to its proper template generation file.

Logstash

From here on, you can redirect the logs to anywhere you want. I have Logstash running and is using it to send logs to sematext or a simple elasticsearch cluster. Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite “stash.” You can do a lot of further magic with logstash, but that will be beyond the scope of this article. I invite you to read logstash documentation.

Watch this video about logstash

 

/etc/logstash/conf.d//filter.conf

 
filter {

if ([message] =~ /\(Amazon\) \(internal dummy connection\)/) {
drop{}
}
if([app-date] =~ /[0-9]{2}\/[A-Z]{1}[a-z]{2}\/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2}/) {
date{
match => ["app-date", "dd/MMM/yyyy:HH:mm:ss"]
target =>"app-date"
}
}
if([app-date] =~ /([A-Z]{1}[a-z]{2} [A-Z]{1}[a-z]{2} [0-9]{2} [0-9]{2}\:[0-9]{2}\:[0-9]{2}\.[0-9]{6} [0-9]{4})/) {
date{
match => ["app-date", "EEE MMM dd HH:mm:ss.SSSSSS yyyy"]
target =>"app-date"
}
}
}

 

/etc/logstash/conf.d//input.conf

input {

tcp {
host => "localhost"
port => "10514"
type => "rsyslog"
codec => "json"
}
}

 

/etc/logstash/conf.d/output.conf

This will move our logs to sematext, you can choose elastic search here if you want.

 
output {
file{
path => "/var/log/logstash/myjsonoutput.log"
}
elasticsearch {
hosts => ["logsene-receiver.sematext.com:443"]
ssl => "true"
#use a valid token here
index => "xxxxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxxxxx"
manage_template => "false"
}

}
 

You can now set up logrotate to clear up old logs, and I added a small scripts to cron it all, backup and compress everything old and upload it to s3 as well using awscli. I won’t go into the details of it, but moving the files to s3 can easily be achieved using aws cli as follows:

 
aws --region eu-central-2 s3 cp ${main_log_dir}.tar.bz2 s3://gouvernma-logs/
 

 

I hoped you enjoyed this article and this could be of some help to you. There are several ways of achieving this. Rsyslog is a nice project, and I think it can has some nice feature to give.

The above can be done using in house setup without any cloud usage as well. You can improve a lot on this if you decide to be really creative with this.

 

Till then, happy hacking!

 

 

Pirabarlen/Selven/PC_THE_GREAT

Founder of hackers.mu

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.