December 22nd, 2015 by Jonas van Vliet

Microservices have become very popular in recent years. Crosscutting concerns still apply to microservices and in this blog post we will briefly look at the security aspect. We will create a very simple proof of concept that secures a webpage and a REST microservice behind it.

But first, lets look at the problem. In a Java-monolith running on an application server, REST endpoints/resources are protected by filters that require authentication and authorisation. These filters often use a user database representation that is part of the application server. The application server might performs lookups in AD or in its own store where it stores the users. Often these filters are bundled with the application server and work for that specific server, but not on application servers made by other vendors.

When splitting a part of a monolith into a microservice, the user database representation has to be split as well. If the application server hosts the users, this requires syncing users/logins between the monolith and the microservice. If the microservice runs on a different kind of server than the monolith, this becomes hard to manage.

When the number of microservices increases, storing/syncing the state of users and logins across multiple servers becomes harder, so a different approach is necessary. This is where Keycloak enters the stage.

Keycloak is an off-the-shelf solution for authentication, authorization and managing users. Keycloak is responsible for storing the users (or integrating with AD) and providing access tokens to clients on behalf of users that have successfully logged in.
A Keycloak token can contain some fields with user data (firstname/lastname/email etc) and the roles the user has. When a service gets such a token, it can verify that the token is issued by keycloak and that the token contains a certain role. It can do this without contacting the keycloak server since the token is encoded with a shared secret.
Keycloak is bundled with adapters that make it very easy to use Keycloak with a lot of different application servers, Spring boot and node.js.

In this POC we use the following components:

  1. Keycloak server
  2. Dropwizard microservice containing a REST resource
  3. Node.js server that serves static html/javascript and functions as gateway to the REST service

The goal is to make the REST resource inaccessible unless the user has logged in through Keycloak. The flow we would like to have is:

  1. The users goes to the webpage
  2. The user is redirected to the Keycloak server to log in
  3. Keycloak redirects the user back to the node app after logging in, providing the user with a token that contains a Keycloak login
  4. The node server appends the token to the original request to the REST resource and sends it. The REST service checks that the token is valid and grants access to the REST resource.

keycloak-overview

To achieve this, we need to configure the components as follows.

Keycloak

From the keycloak documentation:

The core concept in Keycloak is a Realm. A realm secures and manages security metadata for a set of users and registered clients. Users can be created within a specific realm within the Administration console. Roles (permission types) can be defined at the realm level and you can also set up user role mappings to assign these permissions to specific users.
A client is a service that is secured by a realm. You will often use Client for every Application secured by Keycloak. When a user browses an application’s web site, the application can redirect the user agent to the Keycloak Server and request a login. Once a user is logged in, they can visit any other client (application) managed by the realm and not have to re-enter credentials. This also hold true for logging out. Roles can also be defined at the client level and assigned to specific users. Depending on the client type, you may also be able to view and manage user sessions from the administration console.

 

In this POC we create a realm called “hellorealm” and a client called “helloclient” using the Keycloak admin console. We download the client’s configuration json file (keycloak.json) to use in our node app and dropwizard service. This file contains the secret that is used to verify that tokens received by node and Dropwizard are issued by Keycloak.
Keycloak has a lot of configuration options. We can add a user registration flow, set password policies, define valid URLs that Keycloak will provide access tokens to etc. For now, we create a user “john” (full name: John Doe) and keep the configuration as is.

Node

The node server uses express to map URLS to static files and resource endpoints. The following snippet shows how to require a keycloak login and fetch the result from a different host – in this case the dropwizard microservice. The keycloak variable refers to a nodeadapter made by keycloak that requires very little configuration. The node app requires the keycloak.json file.

 

When a user accesses a protected URL without an active login, the user is redirected to the keycloak login page. The body of this code is not run unless unless the user is logged in.
The apiProxy refers to a http-proxy that is used to proxy requests to the node app towards a microservice. The proxy must be configured to include the token from the browser’s session with node. Otherwise the dropwizard service will conclude that the user is not logged in and deny access.

 

 

The URL mapping is fixed to point toward the dropwizard service. This is acceptable in a POC. In a real life setup with multiple services, this should be replaced by a service discovery/lookup mechanism.

Dropwizard

Unfortunately, Keycloak is not bundled with a dropwizard adapter. The dropwizard service needs to block requests that do not include a bearer token. We could use the following plugin, but to keep things as simple as possible, we implement the bare minimum and only use dependencies to the official Keycloak libraries. The implementation is heavily inspired by this plugin.
To secure our REST endpoint, we add the following maven dependencies to our pom:

To block access to the REST resource we add the following filter to the dropwizard configuration:

The filter blocks all requests to rest services that do not contain a valid bearer token.

The token contains information about the user and this information should be available to the service in order to save investigations, log who requested a resource etc. To access this, we need to register Keycloak as the security constraint authenticator:

We can then access the data in the token in our REST endpoint:

As previously mentioned, the bearer token also contains the user’s roles. These roles can be used to only allow or deny access.

Working with roles and tokenproperties like this is not pretty – injecting a securityContext and casting objects is a bad smell. The plugin previously mentioned provides a nice interface for these operations.

Trying it out

When we spin everything up and access the REST service path on the node gateway, we are redirected to keycloak.

keycloak-login

According the to the documentation, the login page is configurable. You can style the page and include your company logo/ product name.
When logged in, the service responds with:

service-response

If we try to access the REST resource without going through the node gateway, we get a http 401. This confirms that our service cannot be accessed without valid user credentials.

 

Final remarks

Security is hard, so avoid reimplementing the wheel. Figuring out how to store user passwords in a database in a secure way takes longer than implementing a security setup using Keycloak.
Keycloak is an active open source project. As long as it is maintained, I feel confident that it provides better security than a homemade implementation. The documentation even contains a section on possible security risks and how to mitigate them.

If you want to secure a single server with Keycloak and an adapter is available, then Keycloak is very easy to setup. If you want to do something a little bit more complex, for instance sending user credentials from one server to another, you will need a basic understanding of how the tokens issued by Keycloak work. To this extent, I highly recommend reading the OAuth 2.0 spec.

Now hiring in Dubai! - Read more

Posted in develop software Tagged with: , , , , , , ,

September 1st, 2015 by Steen Gundersborg

I’ve been experimenting with how we can beef up our monitoring of everything java. For this post I will only discuss the collection, storing and visualisation of various metrics. Alerting, trending and whatever else is covered by the monitoring definition I will blatantly ignore.

Setting the stage

I have a list of criteria, that is going to drive my selection of components. The more off-the-shelf components I can configure and use, the better. Moreover, since we’re trying out Docker, I’d like to wire everything up using Docker Compose. So if there’s already an docker image with a specific component, terrific!

More or less everything we develop or use is running java. We develop a lot of micro-services using DropWizard, use lots of open source components (SolrHBase, WildFly etc.), so being able to pull metrics from JMX is a must. Support for pulling or accepting pushed OS metrics in some way is also a must. If the components we create our selves can push metrics directly themselves, it’s an added bonus but otherwise not a dealbreaker, since JMX will be the fallback.

An important criteria for me is that metrics should not need to be predefined, before the storage backend will accept and store them. Some of our existing solutions use Zabbix, where this is a requirement. That might be fine for a single, isolated system, but becomes a nuisance if the monitoring solution is shared across systems. More importantly, adding new metrics should be a breeze. Adding new metrics for all your new service rest endpoints alone is going to be annoying and error prone, if you need to define them up front.

Since we are on the subject of Zabbix, it would be nice if the frontend supports multiple data sources and Zabbix, so we can visualise metrics from existing systems in the same frontend.

Finally, it is imperative that we be able to import and export metric data.

With that the above mind, let’s sum up the criteria:

  1. Off-the-shelf components, where possible
  2. Everything running as Docker containers, orchestrated with Docker Compose
  3. Must support pulling and storing JMX metrics
    1. Supporting metrics pushed from components is added bonus
  4. Predefining metrics must not be a requirement for accepting/storing them
  5. Bonus if the the GUI supports multiple datasources, including Zabbix
  6. Support for import/export of data

Selecting the pieces

Conceptually, we need three components:

  1. Timeseries database
  2. JMX metrics collector
  3. Frontend

First of all, we need a timeseries database. Second, we need something, in general, to collect metrics from JMX and push them to the database. Third and finally, we need a frontend, whose only responsibility is visualising the collected data.

Now, let’s have a look at the chosen components.

Frontend: Grafana

Grafana is a beautiful, multi-datasource graphing frontend for visualising data. There is full support for Graphite, OpenTSDB and InfluxDB, as well as initial support for KairosDB as datasources. There’s even a query editor for each data source. With a plugin, Zabbix is also supported. It has rich graphing, templated dashboards and queries as well as annotation support. There’s also a docker image for it.

Timeseries database: KairosDB + Cassandra

I have previously tried InfluxDB in the 0.8.8 version (also using a docker image), but I ended up having some issues with it: In case of an unclean process exit, I couldn’t start it up again unless I wiped the data. The frequency of the unclean exit part was a side effect of the image I used and not really InfluxDB per se, but the issue would be the same in a process, OS or host crash. Moreover, as our data amounts grew the database would sometimes stop responding. The only fix seemed to be to restart it (and wiping data, as per the first issue).

I am aware that InfluxDB is out in a version 0.9.x, which is quite a change from 0.8.x. There is a new storage backend (BoltDB) and support for importing/exporting data. Clustering support has gotten a major rework, but is still in alpha. I don’t think we really need clustering performance wise, but it would be nice for high availability.

On the flip side, there are a lot of breaking changes from version 0.8.x to 0.9.x. So sticking to InfluxDB still means all 0.8.x dashes would need to be re-done. Based on that, I thought I’d look around to see what else is available.

There’s a fairly new kid on the block called KairosDB, which uses Cassandra as the backend storage. Some people are running quite large clusters with terabytes of data, as this example shows, so it’s more than sufficient for our needs. It supports the OpenTSDB telnet format, rest, Graphite and Carbon (using a plugin). So it has support for most of the currently used protocols for sending metrics. That, and it supports import/export.

JMX metrics collector

There is a tool called jmxtrans, which is multithreaded and can poll any number of java processes for JMX data. It has output writers for almost anything (OpenTSDB, Graphite, Ganglia, ElasticSearch, Kafka, RDDTool, StatsD etc.). In short, it’s very flexible and will fit well for this use case.

Gluing it all together

I found most of the above components as docker images on docker hub. Without further ado, let’s have a look at docker-compose.yml:

Just a comment: Since I fixed the ports, to get going as fast as possible, it’s quite obvious that the scale command won’t work. But for now, it doesn’t need to.

I chose to run a single-node cassandra, tuned for quick container startup by guys at Spotify. I mapped the JMX port and exposed the telnet protocol port 4242. That’s the one both Grafana and KairosDB is going to use. By the way, at least on a mac, using a volume doesn’t seem to work so I just commented that out for now. In a production setting, that would be a no-go.

KairosDB is not on docker hub, but there are several git repos. I chose the one from mesophere and built it locally. I did change the baseimage to my liking, since I had to build it anyway. I fixed another issue related to using the image with docker compose, but I’ll get to that later.

Jmxtrans is off-the-shelf and the same for grafana.

Gathering metrics

I created a json file for Jmxtrans, specifying how to pull metrics from ActiveMQ:

I won’t go into details about the config. You can read the docs about which options are available on the OpenTSDBWriter, that is responsible for writing to KairosDB. How to do queries is documented here. I will say that using wildcards in the object name combined with the typeNames parameter is going to reduce the amount of queries specified a lot, as well as make it more generic.

One thing I did do when experimenting is hardcode the ActiveMQ host and port. For production, I would need to change the image to run something like envsubst on the configuration files, to get environment variables replaces with whatever is specified in the docker-compose.yml file.

With regards to our DropWizard services, I used the Metrics KairosDB plugin and added this to our DropWizard service startup:

The host and port of the KairosDB server should be taken from configuration, but you get the idea.

Graphing

To be honest, I have been and still am playing around with how best to create graphs. Ideally, I’d like to use the otherwise awesome templating feature with KairosDB,  but can’t quite get it to work like I want it to – probably because support for KairosDB is initial. That, or I’m missing something. Regular graphs do work as intended though. I have been trying to recreate this homegrown HBase dashboard, based on InfluxDB 0.8.8:

Hbase dashboard

So far, the only thing I am not able to do in the same way is the GC graphs. The InfluxDB dash uses regular expressions to match the GC algorithm part in the name of the metric, which KairosDB doesn’t support. I can hardcode the GC algorithm names, but it would require us to change our dash, if we sometime later choose something other than CMS. One possible solution would be to change the metric to exclude the CG algorithm name from the metric name and add it as a tag instead. It’s just somewhat more work, compared to what comes out of the box with DropWizard.

Closing comments

For now, I think I’ve reached a point where creating a complete graphing solution by using existing components and supplying configuration seems doable. However, using and composing the docker images from docker hub and github turned out to be more of a nuisance than expected. So was trying to get templating to work in the new KairosDB query editor in Grafana, while the editor is still kind of beta. When something doesn’t work, it’s not always obvious if it is a bug or a usage error.

I have a few pointers on how to structure metrics, in order for you to get the most out of Grafana and KairosDB as well as some thoughts on the docker part. The latter is not specific to my use case, so feel free to skip it at your leasure.

Thoughts about how to structure metrics

To be honest, I still haven’t fully decided how I’d like to structure the naming of the metrics we have, in order to support the most likely graphing requirements, both while running multiple instances of the same components as well as different components. Experimenting with the current KairosDB support in Grafana has led me to a single mantra: If you want more than one series per graph, manually enter each one or use tags and group by them.

Templating only works if you select one value per variable. Although this might be good enough for some use cases, it really does cripple the feature, in my opinion. Otherwise, if you just want to add series with fixed metric names, you’re good.

So, basically I’m thinking about adding tags for things like hostname, component type (service, database etc.), component name and whatever else I can think of. There is some talk about KairosDB getting support for selecting multiple series by regular expression, but I’m not sure if and when it’ll be released. If it does, then you could achieve the same by having the tags as part of the metric name.

One thing to consider, if you are pondering to use tags like crazy, is that having a lot of tag-value combinations might hurt performance. Depending on how many metrics you plan on storing, you may want to think about it.

Annoyances with the docker images used

My gripes fall into two categories, which are not really related to docker itself as much as the images shared on docker hub:

  1. The container doesn’t handle dependant resources not being available at startup
  2. The container doesn’t shut down properly

Now, before I discuss my gripes, let me be clear in saying that I am not complaining about people sharing their hard work for free. The issues above might be completely irrelevant to the use cases the original creators. It’s just to say that before a lot of the images can be used in a production setting, they would need to be modified.

Now, the first issue was an issue with both the Jmxtrans and the KairosDB images I used. If Cassandra wasn’t alive and kicking when KairosDB needed it, KairosDB would fail end exit. Jmxterm would fail if KairosDB wasn’t available when jmxtrans was starting.

Normally, if containers are spun up one at a time, the problem doesn’t appear. But when using docker compose, all containers gets fired up at the same time. Really, this is about the individual containers not being sufficiently robust, since things like network errors should always be anticipated (also on startup!). It’s surprising that this seems to be such a common issue. Most docker developers I know, that uses docker compose, have seen this issue with one or another image.

The second issue is the one about stopping containers gracefully. Only PID 1 in the running container receives signals, so you need to make sure that either the main process is running as PID 1 or the actual PID 1 (typically sh or bash) forwards signals to the main process. This is not a new issue, but there are still lots of images on docker hub that don’t allow stopping the container gracefully (thereby killing everything in the container after a timeout).

Now hiring in Dubai! - Read more

Posted in develop software Tagged with: , , ,