By Amitai B.
Sep 27, 2016
Monitoring computer systems has always been important. It helps us know the system’s health, identify problems, and even to forecast them.
Today, monitoring has become more and more significant for several reasons:
- The cloud (or cheap memory) enables us to monitor and archive almost anything, and forever.
- The systems have become much more complicated. They consist of a lot of moving parts such as VM’s, dockers, databases, services, you name it.
- IoT had come into our life with large amount of devices that we would like to track.
- Data analysis had evolved, enabling us to get much more than just taking care of the system’s stability and health. It can help us understand our customers’ needs, identify trends, classify our customers, and more.
In this blog post, I will compare two modern solutions for monitoring. They have different approaches and implementation, and they have their advantages and disadvantages.
The first is InfluxDB, which is part of the TICK stack. The second is Elasticsearch, which is part of the ELK stack.
What is monitoring?
Monitoring consists of the following activities:
- Collecting data
- Storing data
- Visualizing data
- Raising Alerts
TICK stands for:
- Telegraf – collecting metrics and data on the system it’s running on or from other services.
- InfluxDB – time series database with high availability and performance.
- Chronograf – web application for data visualization.
- Kapacitor – alerting and data processing engine.
The heart of the stack is InfluxDB which is a time series database.
Time series database (TSDB) is a software system that is optimized for handling time series data, and arrays of numbers indexed by time.
For instance, the temperature of water in the boiler over time is CPU usage over time. TSDB is a NoSQL database that supports CRUD operation and queries. The main thing that distinguishes it from other types of databases is it’s optimization to maintain time indexes on a very large amount of records.
Other leading TSDBs are Graphite, Prometheus, and OpenTSDB. The reason I chose InfluxDB is the fact that this is a modern database, written in GO. It is very easy to setup and configure, and gives great performance.
InfluxDB data model
InfluxDB is schema less. You can add series, measurements and tags at any time. Here is anexample of a row:
app_degrees, country=Canada, city=Toronto degree=77.5 1422568543702900257
- Series – Collection of data that shares a measurement and tag set (app_degrees).
- Tag – Key/value pair (country=Canada, city=Toronto) – is optional, used to describe the measurement.
- Measurement – a numeric value ( degree=77.5)
- Timestamp – the exact moment of the measurement (1422568543702900257).
This model enables us to insert measurements more efficiently and conveniently.
Reading and writing into InfluxDB
InfluxDB has an HTTP API and client libraries in many languages such as Ruby, GO, JAVA, Node and more. InfluxDB enables us to write single records using the API or insert multiple rows from a file.
InfluxDB has a query language that is very similar to SQL. Here’s an example for a query:
select degrees from app_degrees where country = ‘Canada’
It has the following statement: SELECT, WHERE, GROUP BY, ORDER BY, LIMIT and more. A developer with SQL knowledge will feel it very convenient to use.
The results are in JSON format.
“TICK” stack comes with Chronograf which is a web application for visualizing the series. It allows one to build graphs, tables, dashboards and more. It has a very understandable UX and you can build a dashboard in minutes.
Other features of the stack
- Retention policy – you can configure it when – if ever – the data will be deleted.
- Continues queries, that automatically compute aggregation on the data.
- High availability.
ELK stands for:
- Elasticsearch – a search engine based on Lucene.
- Logstash – data collection, enrichment, and transportation pipeline, with connectors to common infrastructure.
- Kibana – data visualization platform.
“Elastic” provides complementary products that add more capabilities to the stack such as “shield” for security, “watcher” for alerts and more, but they are not open source, nor for free.
Elasticsearch is a search engine that is based on Apache Lucene. Basically it is a NoSQL database that is adjusted for full text searches (like Google search for instance). Elasticsearch is distributed, scalable, and supports high availability. It is very easy to setup and configure. It is very popular, and has a very good ecosystem.
Elasticsearch is also schema-less and stores the data in JSON documents (like MongoDB). It indexes all fields and has many capabilities such as performing complicated text queries, as well as highlighting text, suggestions, geolocation, and more.
It has a restful API and has many clients in many languages.
In addition to its full text search capabilities, Elasticsearch also supports time series data, and that makes it also a very good candidate for monitoring. Not only that we can perform monitoring, we can also perform full text search on logs. It adds another dimension without the need of another database.
Logstash is a data collector, basically what it does is:
– Take data from a source.
– Filter the data and enrich it
– Send it to the targets.
All those actions are configured in a file.
Logstash has many inputs plugins that can get the data from many different sources such as files, HTTP, log4j, and syslog. All you have to do is configure the source, and Logstash will take the data from there.
Logstash also has many plugins for filtering and manipulating the data, such as aggregation, parsing, conversion, and even a plugin that lets you program with Ruby.
The same goes for output. You can transfer the data to many outputs. The primary one is of course Elasticsearch, but also to a file, redis, Kafka and even to InfluxDB. If something is missing, there is a guide for writing your own plugins.
Kibana is more sophisticated than Chronograf. It has many more capabilities such as diversity of diagrams, geolocation on map, and more.
I would like to mention also Grafana which is an excellent visualization tool. It can be used with both Elasticsearch and InfluxDB.
Which one is better?
Both solutions are excellent, they are scalable, support high availability, easy to setup, configure and maintain. And both of them are open source and free.
In a performance test that InfluxData (the company that developed InfluxDB) performed, InfluxDB got much better results. But Elasticsearch can handle a huge load so for a regular system it can be enough.
Elasticsearch has an advantage over InfluxDB because you can use its full text-search capability. This would be very useful if you want to save your logs (messages) and work with them. In order to get that capability with time series databases like InfluxDB, we will have to add another database that will support this service. It will make the system much more complicated. We will have to maintain two databases for monitoring. We will have to synchronize them in case of failure, andprobably add a message queue such as RabbitMQ. All of this comes at a high cost of time and money.
I think that the selection of the tool depends on the requirements. If the monitoring system should only monitor numbers through time, I would have picked InfluxDB, because it is more suited to the job. If you need also to save the logs or textual data, then pick Elasticsearch to simplify the job.