Enhancing Apiman Elasticsearch metrics to capture arbitrary headers and write to file

Client

British-Omani Cloud & Integration Specialists

Background

BPL client Integrator (pseudonym) is a cloud and integration specialist. Integrator’s customer Telco (pseudonym) has a significant Apiman deployment.‍

‍

This case study condenses together two Elasticsearch metrics related projects that were undertaken for our client at different times.‍

‍

‍Integrator has an innovative gitops approach to deploying Apiman: an all-in-one Docker image is built during their CI/CD pipeline, loading and publishing a pre-set list of APIs to a container-local database. Each container deployed is a clone with everything it needs locally for routing purposes. Any time a new API is released, the master configuration files are updated in git, and the pipeline is run again.

‍

Allow capture of user-defined headers and query params

Integrator has a range of headers they use for tracing, debugging, and other audit purposes in Telco’s system (on both request and response). Before this enhancement, it was not possible to configure Apiman to capture specific headers and query parameters into Elasticsearch.

‍

Integrator also wanted to remove HAProxy from their deployment, as it was only being used for auxiliary logging purposes.‍

‍

‍As a concrete example, a user might want to capture the: X-Correlation-Id header in the request and response, and X-App-Id header in only the response.‍

‍

‍After some rounds of feedback, this was extended to allow regular expressions to be used, so headers could be captured without needing to know all names in advance (e.g. service-.*).

‍

Allow Elasticsearch metrics to be written file/log as JSON

Rather than using typical push-based Elasticsearch metrics, Telco's architects required metrics be written to disk and scraped using Filebeats or Logstash. These applications are then responsible for sending the data to Elasticsearch. Telco’s rationale for this pull-based approach was to better cope with Elasticsearch cluster downtime and network outages.

‍

With standard Apiman Elasticsearch metrics implementation, if there is a short Elasticsearch outage, it is not a problem; Apiman buffers the metrics in-memory and will retry until the server becomes available again. However, lengthy downtimes will likely exceed the buffer's capacity and cause metrics to be discarded.

‍

Working closely with our client, BPL enhanced Apiman's Elasticsearch metrics implementation to allow Apiman's Elasticsearch metrics to be configured to write asynchronously to a logger/file as JSON. In fact, you can write to both server and logger/file at the same time.

‍

BPL provided the client example configurations and detailed instructions, as well as backporting this to their older version of Apiman.

‍

Project design & implementation

We worked closely with the client to analyse existing datasets and build some use-cases, with example headers, query parameters, and other samples to verify functionality against.‍

‍

‍The first stage of the project was to extend Apiman’s IMetrics interfaces to allow metrics implementations to (optionally) accept HTTP request and response metadata. This was a fully-backwards compatible change.

‍

Header and query parameter capture

We added a range of new options to allow capture of headers via apiman.properties, accepting regular expression lists for headers and query parameters.‍

‍

‍For example:

During initialisation of the metrics plugin we add two new tasks:‍

Apiman builds an Elasticsearch dynamic template, which allows Elasticsearch to apply type mappings via a naming convention (and thus, ensuring Elasticsearch does not guess the wrong types, correct indexing settings, etc).‍
‍We build and compile optimised regular expressions, allowing for rapid identification of which headers to capture.‍

‍

‍During the processing of metrics (on a separate thread):‍

‍‍The headers and query parameter key values are identified and extracted, and then inserted into the JSON payload sent to Elasticsearch.
The structure is carefully matched to the template index mentioned earlier.

‍

This is backwards compatible, even with existing indexes (although without optimised types).

‍

Write metrics to file

We added a new write-to option that allows any combination of:

Remote: directly to Elasticsearch, which is the default behaviour.
File: local file via logging framework.‍

‍‍

Rather than implementing our own asynchronous file-writing code with rollover and compression, we wired the file implementation through Apiman’s logging framework. This has the secondary benefit of providing support for all common logger-related functionality such as rollover and compression.

‍‍

‍When the metrics implementation is running using the write-to:file option, Apiman writes metric batches into a special metrics logger, serialised as JSON.‍

‍

‍We then provided the client with an optimised non-blocking and asynchronous logging configuration. Metrics are written into a separate file than normal application logs; by default this is metrics.log.

‍

It was critical to implement this using techniques to minimise the performance impact on Apiman's key functionality (i.e. avoid blocking of threads performing proxying).

‍

Outcome

After a few rounds of feedback and tweaks, both features were tested and accepted by Integrator and Telco. It has performed well in production, with no changes to other aspects of system behaviour noticed.

‍

BPL backported a significant number of additional features and improvements from newer versions of Apiman to their old version of Apiman (which they cannot currently move from).

‍

During the whole process BPL had regular dialogue with the the client via email and videocall to ensure we were meeting their needs. We met with the client's senior engineering, QE, devops, and operations teams to identify the problems, define a solution, and verify that the end product fully covered all the use-cases defined.

‍

This was another project that leveraged our deep experience of Apiman's internals, implementing changes with the minimal possible technical and business risk for our client.

‍

These enhancements will be included in Apiman 3.1.0.Final. Metrics logger configurations are provided for both WildFly and Tomcat distributions. You can see the pull request on GitHub.