Observability mapping

Bryan Kroger
4 min readDec 16, 2021

Fun with prometheus, k8s and nginx!

In this article I’ll be using the nginx ingress stuff. However, this also applies to istio VirtualServices ( iVS ).

First, let’s start with our environment setup. “Observability” in this case is going to mean:

  • Prometheus for TSDB
  • AlertManager for ( duh ) alerts from prom
  • Grafana for visuals and graphs

Down the road we might expand this to include other services like Kiali or possibly even an EKLs stack.

I have multiple environments setup in a cross-region, cross-cloud, self-service k8s cluster rig. Users can issue merge requests to the infrastructure project defining their environment. When the MR is merged a few things happen:

  • Namespace with RBAC attached to the corp auth system is attached such that kubectl commands happen with domain u/p
  • GitHub runners are created with appropriate access to the namespace
  • Observability stack is rolled out
  • Everything is tagged with billing codes for cost analysis later on

The Observability stack creates Prometheus and AlertManager objects as such:

Now we’ll need the ingress object to tell nginx on our frontend to route to this location:

This should give us 3 endpoints:

  • ops.fqdn/prom which goes to the prom backend service
  • ops.fqdn/alerts which goes to the alertmanager backend service
  • ops.fqdn/grafana which, obviously, goes to the grafana backend service

Pretty straight forward, right? Most people would expect all of this to work right off the bat.

And here is why this is the most perfect interview question. Based on this setup, none of this actually works. At least not the way I have it setup.

If you curl ops.fqdn/prom you’ll get a 302 back to “ops.fqdn/” That’s because the prometheus software is forcing you to what it thinks is the “main” or “root” page of the application which is “/”.

However, at no point did I define anything to live at “/” in this setup. In fact, nothing lives at “/”. Which is the problem.

Now, most people, most of the time will suggest that the solution here is to use a rewrite rule. Most people, most of the time don’t know what a rewrite rule actually does, but they think they do.

A rewrite rule is effectively doing the exact same thing that prometheus is doing here.

Given the input “/prom” rewrite the URI to “/”. Again, there’s nothing actually living at “/” and we want “/prom” to be where prometheus lives. Most people don’t know why this is a problem because they seem to assume that somehow, magically the reverse proxy will know what the user is trying to do, but how could it possibly know that “/” actually means “/prom”?

The only way to solve this it to teach the backend services to “know” about the reverse proxy environment.

Here’s how we do that, it’s actually super easy. For prometheus we do this:

externalUrl: https://ops.gfe/prom/

Same thing for alert manager, and for grafana we use this:

grafana.ini:
server:
domain: ops.fqdn
root_url: "%(protocol)s://%(domain)s/grafana"
serve_from_sub_path: true

Each application has its own language for solving this problem because it’s a known pattern that people use all the time.

The perfect interview question

Interview questions suck from both ends of the desk. Most of the time what I’m “looking” for is to gauge how a person can noodle through problems.

Like, if we hire this person and I have to explain how the internet works, okay, fine, that’s part of the job, but it’s more fun when we can just skip all that go right to building amazing things together.

To me, the perfect interview question is a combination of two critical things:

  1. It’s foundational, like it applies to a wide range of things and it’s along the lines of “how the internet works.” However, is also applicable to every day debugging or life in general as a DevOps/SRE role.
  2. It’s simple, like deceptively simple, but still complicated.

What drives me nuts is when people ask questions ( especially about k8s ) in obscure and sort of obfuscated ways, then act like they caught you in their devious web of trickery all to prove that you’re not worth of joining their elite ranks of dip shits ( looking at you BeachBody ).

Who wants to work with people like that? I want something that is straight forward, easy enough to understand, but allows us to have a good conversation about something that is complicated.

Also, I use this pattern on 10 namespace environments now and more to come, so I know this works!

--

--

Bryan Kroger

Exploring the space at the intersection of technology and spirituality.