Elastic Stack to define Swiss cantons using BBOX coordinates

Introduction

This blog talks about how we built an Elastic Stack log statistics solution for the geodienste.ch service of KKGEO, the conference of cantonal geoinformation offices. These were the following key requirements:

  • Realtime collection of logs generated by the dynamic webserver farm consisting of 2 to many instances.
  • Reliably reload historic data for disaster recovery or to adapt changes of the source data structure based on application updates.
  • Properly assign cantons to a given BBOX map area.
  • Build a graphical analysis tool to visualize the collected logs.

Ideas

First of all we collected ideas of different approaches to get the cantons out of BBOXes. This is the result:

ApproachProCon
Public REST API Search endpoint of map.geo.admin
– Easy to implement– Reliability (no SLA)
– Terms of use for M to M usage not clear
– Performance (latency) not guaranteed
Python geoip library with swiss-cantons geojson db/file– Standalone Library available
– Widely used
– Third party software and geojson maps needed
Elastic Map / Geo featureBuilt in Solution
– Easy to implement
– Performance
– Only WGS (GPS) coordinates supported

Based on the results we decided to realize the project with Elastic Map.

Architecture

In few words, the solution is built with the standard components of the Elastic Stack. Live logs are collected with a Filebeat on every webserver instance and shipped to a central Logstash. Together with archived logs read from a file share, Logstash processes these logs and sends them to Elasticsearch to be indexed. Kibana is used to visualize the logs stored in Elasticsearch, as well as for Stack management and monitoring. The diagram below shows the graphical overview of the solution:

The heart of the solution is a Logstash pipeline which does all the work from receiving the logs, parsing, enriching with canton information and sending them out to elasticsearch, ready for visualization.

The main components of the pipeline are shown below:

Since this blog is about Geo Information, we focus on the geo specific tasks from now on. Common parts like input, parsing and output are not mentioned anymore, since they are well-known and widely used.

Elastic Maps

maps.elastic.co hosts a huge number of maps built for Elasticsearch/Kibana. For us, the map of interest was the swiss cantons map. Installing the map is as easy as download it from the web, open the Kibana web app, open or create a new map, add a new layer, choose upload geojson and import the file.

Pipeline development

Get coordinate-format and boundaries

As described before, the Elasticsearch Map features only support wgs84/GPS coordinates. The geodienste.ch service accepts BBOXes of the following EPSG coordinate systems, specified in a field of the log.

The first geo processing block of the pipeline is done with a ruby script filter plugin. First of all, logs with invalid coordinate systems are dropped. BBOX coordinates of valid logs are then checked against geographic boundaries of Switzerland, where out of bound logs are dropped as well.

Convert coordinates to WGS84 (GPS)

Logs with either swiss or pseudo mercator coordinate system are now converted to wgs84.

Swiss coordinates can be converted with a GitHub project containing examples for many coding languages. Since ruby was missing we ported it from another language and added it to the project.

Pseudo mercator coordinates are converted with a simple ruby function.

Query cantons by BBOX coordinates

All logs have now proper wgs84 coordinates and can be used for the BBOX queries.

The queries are done with the built in Logstash elasticsearch filter plugin. The following code snippets show a configuration example.

elasticsearch filter plugin configuration

elasticsearch { 
  hosts => ["https://${LS_ES_HOST}:9200"] 
  index => "switzerland_cantons" 
  query_template => "${LS_SETTINGS_DIR}/pipeline/kkgeo/scripts/cantons_query_template.json" 
  fields => { 
    "iso_3166_2" => "[kkgeo.bbox][cantons]" 
  } 
  user => "logstash_user" 
  password => "${ES_LS_USER_PW}" 
  ca_file => "${LS_PATH_ES_CACERT}" 
}

${LS_SETTINGS_DIR}/pipeline/kkgeo/scripts/cantons_query_template.json

{ 
  "query": { 
    "geo_shape": { 
      "coordinates": { 
        "shape": { 
          "type": "envelope", 
          "coordinates" : %{bbox.elastic} 
        }, 
        "relation": "intersects" 
      } 
    } 
  }, 
  "_source" : "iso_3166_2" 
}

Let’s start with the query template first. As you can see, it contains the parameters of the elasticsearch query itself. Most parameters are quite self explaining. We are issuing a geo_shape query with envelope (bbox) coordinates. With the relation parameter we tell elasticsearch to do an intersect. This means that it should return all documents which are overlapping with the coordinates of the envelope. Finally, the _source parameter defines which content the returned documents should contain. In our example, we are only interested in the name of the canton, which can be found in the iso_3166_2 field of the document. Long story short, we feed an envelope geo-shape to elasticsearch and receive a list of documents, each of them holding a name of an overlapping canton.

Now, let’s have a look at the elasticsearch filter. Again, most of the configuration needs no explanation. index points to the elastic-map index holding the canton information and query_template does the same for the query_template.json file. With the fields parameter we tell Logstash to take the values of the iso_3166_2 field of the returned documents, convert them to an array and store them in the new field kkgeo.bbox.cantons of the current log. In short words again, we feed a BBOX and get cantons. Very nice!

Fine Tuning

As mentioned above, the map used for the cantons query was the swiss cantons map provided by Elastic itself. This is a good choice for normal use cases with average precision requirements. During development it turned out that a more precise map would be preferred. A quick analysis showed that Elastic maps are standard geojson files. Thus it was a straight forward process to replace the standard map with one delivered by the customer.

Visualization

Finally, with all the logs parsed, enriched and indexed in Elasticsearch, they can be visualized with custom Dashboards like the one below.

Summary / Lessons Learned

  • Elasticsearch maps is a powerful, flexible and performant solution for a large amount of use cases.
  • Custom geojson maps are supported. This can be very helpful for a large amount of use cases.
  • The Elastic Stack provides a very large feature set to solve almost any log analysis problems.
  • All customer requirements could be fulfilled with the solution.
  • Working with logs is always a challenge and the effort should never be underestimated.
  • Apart of Elastic Stack know how, custom solutions often need additional skills. This project for example involved scripting/programming in different languages and required basic knowledge about geo information and coordinate systems.
  • Building this solution was very interesting and made a lot of fun.