Building an ETL Pipeline with Open Source Tools

What Is ETL

ETL stands for Extract, Transform, Load. Extraction is the process by which data from many sources and formats is collected. The data is then processed to allow for ease of storing and future processing. This can include data cleaning, or format normalization into file structures such as JSON. From here the data can then be persisted for storage and access by interested stakeholders.

Continue reading

Getting started with ELK and JBoss EAP6

In this post we will describe what is needed to get started with managing your EAP 6 logs with ElasticSearch, Logstash and Kibana. There are several reasons why you would want to collect your logging output in a central place.

  • Aggregate (output from multiple applications / hosts)
  • Correlate events in different systems
  • Analyze (more than grep)
  • Backup
  • Integrate into monitoring
  • Gather statistics

A common solution that supports all this use cases is provided by the ELK stack. It consists of ElasticSearch (ES), Logstash and Kibana. ElasticSearch provides persistence and analytics, Logstash provides the pipeline that brings your Logs into ES and Kibana provides a GUI for querying and dashboards.

Continue reading