Getting Envoy to pick up rotated certificates

Like many people, I use cert-manager to automatically renew my website’s TLS certificates with Let’s Encrypt. Unlike many people, I don’t use an Ingress controller to get traffic into my cluster, I just have a few instances of Envoy that terminate TLS and route traffic to the appropriate backend. Cert-manager handles the mechanics of certificate renewal very efficiently; it runs a controller loop that checks all my Certificate objects for expiration, and when a certificate is close to expiring, it goes out and renews it. It then updates a Kubernetes Secret with the new key material, and Kubernetes then makes that new data available to pods that have mounted the Secret as a volume. From there, it’s up to the application to notice that some symlinks have moved around and reload the certificate. Up until very recently, Envoy did not bother to check. So at some point, you had to do a rolling restart of the Envoy deployment to pick up the new certificate. Because there is 30 days between when cert-manager renews the certificate and when the old certificate actually expired, this was rarely a problem in practice. If any of your machines went down, or you edited the config file to add a new route, or you upgraded Envoy itself, the pod containing Envoy would restart, pick up the new certificates, and you’d never notice that it wasn’t automatically reloading the certificate.

As someone who doesn’t like to leave important production operations to chance, though, I knew I needed a better system. Fortunately, Envoy added a way to reload certificates with the 1.14 release. Let’s try that.

The mechanics⌗

Everyone’s Envoy configuration is different, so I’m just going to provide a very minimal envoy.yaml that we’ll modify to make certs automatically reload. You can then apply this to your own configuration. (You can experiment with this on your workstation by building Envoy, or extracting the binary from a Docker image with docker cp. That’s what I do for all my local Envoy work. Although Envoy does not distribute binaries, the binary from the Docker image works great on my Ubuntu 19.10 workstation.)

Here is a basic envoy.yaml that serves HTTPS on port 10000 with a static response:

static_resources:
    listeners:
        - name: test
          address:
              socket_address:
                  protocol: TCP
                  address: 127.0.0.1
                  port_value: 10000
          listener_filters:
              - name: "envoy.listener.tls_inspector"
                typed_config: {}
          filter_chains:
              - tls_context:
                    common_tls_context:
                        alpn_protocols: ["h2", "http/1.1"]
                        tls_certificates:
                            - certificate_chain:
                                  filename: "/certs/tls.crt"
                              private_key:
                                  filename: "/certs/tls.key"
                filters:
                    - name: envoy.http_connection_manager
                      typed_config:
                          "@type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
                          stat_prefix: test
                          route_config:
                              virtual_hosts:
                                  - name: test
                                    domains: ["*"]
                                    routes:
                                        - match: { prefix: "/" }
                                          direct_response:
                                              status: 200
                                              body:
                                                  inline_string: "Hello from Envoy"
                          http_filters:
                              - name: envoy.router

Put your TLS key and cert in /certs, and curl https://localhost:10000/ will return “Hello from Envoy”. It works. You can do whatever you want to /certs and Envoy will keep using the TLS configuration that it loaded at startup.

To fix that, we need to make the TLS context for our listener use SDS instead of a static configuration.

The first step is to create another config file that contains information about the secret discovery. I put my main envoy.yaml in a ConfigMap that gets mounted into /etc/envoy, and just added a sds.yaml to that ConfigMap to store the SDS configuration. All it is is a plaintext representation of what an SDS API server would serve to Envoy, if it was getting configuration from an xDS server and not the filesystem. It looks like:

resources:
    - "@type": "type.googleapis.com/envoy.api.v2.auth.Secret"
      tls_certificate:
          certificate_chain:
              filename: "/certs/tls.crt"
          private_key:
              filename: "/certs/tls.key"

While this looks almost exactly what we put in the main envoy.yaml before, this is what triggers the code to start watching various directories for changes with inotify and lead to the eventual refreshing of your certificate.

We also need to make some changes to envoy.yaml itself. Instead of statically configuring the listener with a certificate, we need the listener to load the certificate from SDS. In the listener’s filter_chains section, we’ll change the tls_context to a more general transport_socket, and then point it at our sds.yaml. (It is not necessary to convert tls_context to transport_socket, but tls_context will be gone by the end of the year, so you might as well change it now.)

So now instead of:

listeners:
    - name: test
      ...
      filter_chains:
        - tls_context: {...}
          filters: [...]

We’ll have:

listeners:
    - name: test
      ...
      filter_chains:
          - transport_socket:
                name: "envoy.transport_sockets.tls"
                typed_config:
                    "@type": "type.googleapis.com/envoy.api.v2.auth.DownstreamTlsContext"
                    common_tls_context:
                        alpn_protocols: ["h2", "http/1.1"]
                        tls_certificate_sds_secret_configs:
                            sds_config:
                                path: /etc/envoy/sds.yaml
            filters: [...]

Using SDS also activates other parts of Envoy’s code that wants Envoy to have some identifying information associated with the node. You can supply that on the command line, or in the bootstrap config with a node configuration at the top level:

node:
    id: test
    cluster: test

If you omit this, you’ll get an error like:

TlsCertificateSdsApi: node 'id' and 'cluster' are required. Set it either in 'node' config or via --service-node and --service-cluster options.

(In production, I use the pod’s hostname, like envoy-b958c94b7-2fbws, for the ID and ingress:public:https as the cluster name. That is what my cluster discovery service calls my cluster. It doesn’t matter for this, but it does matter for other things. You probably already have this set up.)

The result is a final envoy.yaml that looks like:

node:
    id: test
    cluster: test
static_resources:
    listeners:
        - name: test
          address:
              socket_address:
                  protocol: TCP
                  address: 127.0.0.1
                  port_value: 10000
          listener_filters:
              - name: envoy.listener.tls_inspector
                typed_config: {}
          filter_chains:
              - transport_socket:
                    name: envoy.transport_sockets.tls
                    typed_config:
                        "@type": type.googleapis.com/envoy.api.v2.auth.DownstreamTlsContext
                        common_tls_context:
                            alpn_protocols: ["h2", "http/1.1"]
                            tls_certificate_sds_secret_configs:
                                sds_config:
                                    path: /etc/envoy/sds.yaml
                filters:
                    - name: envoy.http_connection_manager
                      typed_config:
                          "@type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
                          stat_prefix: test
                          route_config:
                              virtual_hosts:
                                  - name: test
                                    domains: ["*"]
                                    routes:
                                        - match: { prefix: "/" }
                                          direct_response:
                                              status: 200
                                              body:
                                                  inline_string: "Hello from Envoy"
                          http_filters:
                              - name: envoy.router

With that running, your certificates should be used by Envoy as soon as they are rotated!

There is a delay between a secret being updated and the volume mount changing, controlled by your cluster administrator (it’s a parameter to the Kubelet) – if you are watching the Kubernetes event log or cert-manager’s log, you might not see the new certificate as soon as you think it’s ready, but it should be available on the order of 5 minutes later.

Envoy also prints some logs at the debug level:

[2020-04-26 18:41:04.243][23137][debug][file] [source/common/filesystem/inotify/watcher_impl.cc:72] notification: fd: 1 mask: 80 file: ..data
[2020-04-26 18:41:04.243][23137][debug][file] [source/common/filesystem/inotify/watcher_impl.cc:88] matched callback: directory: ..data
[2020-04-26 18:41:04.243][23137][debug][config] [source/extensions/transport_sockets/tls/ssl_socket.cc:678] Secret is updated.
[2020-04-26 18:41:04.245][23137][debug][file] [source/common/filesystem/inotify/watcher_impl.cc:88] matched callback: directory: ..data

Be aware that debug logging is not on by default; you’ll have to turn it on if you want to watch this happen the first time. In general, the way that I check that it worked is by looking at the /certs admin API endpoint, or by the server.days_until_first_cert_expiring stat (which you should be feeding into your monitoring).

The details⌗

When I read the changelog for Envoy 1.14, I knew I wanted to try this feature, but I also assumed that it wouldn’t be a simple cut-n-paste job to get it working. In retrospect, I was wrong; it was actually simple to get working since it was designed to exactly work with Kubernetes’s filesystem structure, and I happen to deploy on Kubernetes. I wrote a little test program, to try things out on my workstation before blindly forging ahead in production. This ended up taking quite a bit of time and did not work initially.

The first version of my code assumed that the atomic updating would work like the rest of Envoy (through its Runtime configuration) – i.e., put your certificates in some directory, and symlink another directory (call it data) to that. Your certs are then in /whatever/data/tls.key and /whatever/data/tls.crt, and /whatever/data is just a symlink to /somewhere/20190401-certs. When you want the certificates to rotate, you symlink the new directory to .tmp or something, and then atomically replace data with .tmp, mv -Tf .tmp data.

However, Envoy does not recognize that sequence for certificates. It requires you to do the exact dance that Kubernetes does, which involves two levels of symlinks. If you have a volume mount /certs, then the current version of your Kubernetes secret is actually stored in /certs/..timestamp (where timestamp is actually something like 2020_04_09_17_25_30.145602340). So you’ll have /certs/..timestamp/tls.key, etc., as a normal file. This current timestamp is then linked to a directory called ..data. Finally, /certs/tls.key (and friends), are linked to ..data/tls.key. When a data update arrives, the files are written to a new ..timestamp directory, and the ..data symlink is atomically replaced. This is close to what I did in the first version of my program, but not exactly the same. As a result, Envoy did not notice any changes my program made. I changed my program to do exactly what Kubernetes does, and then it started working. Now that program exists so you can test locally without having to understand the details ;)

Here is the ls output of that sort of directory structure:

# ls -laR jrock.us
jrock.us:
total 4
drwxrwxrwt 3 root root  140 Apr  9 17:25 .
drwxr-xr-x 1 root root 4096 Apr  9 17:25 ..
drwxr-xr-x 2 root root  100 Apr  9 17:25 ..2020_04_09_17_25_30.145602340
lrwxrwxrwx 1 root root   31 Apr  9 17:25 ..data -> ..2020_04_09_17_25_30.145602340
lrwxrwxrwx 1 root root   13 Apr  9 17:25 ca.crt -> ..data/ca.crt
lrwxrwxrwx 1 root root   14 Apr  9 17:25 tls.crt -> ..data/tls.crt
lrwxrwxrwx 1 root root   14 Apr  9 17:25 tls.key -> ..data/tls.key

jrock.us/..2020_04_09_17_25_30.145602340:
total 8
drwxr-xr-x 2 root root  100 Apr  9 17:25 .
drwxrwxrwt 3 root root  140 Apr  9 17:25 ..
-rw-r--r-- 1 root root    0 Apr  9 17:25 ca.crt
-rw-r--r-- 1 root root 3558 Apr  9 17:25 tls.crt
-rw-r--r-- 1 root root 1679 Apr  9 17:25 tls.key

Looking at the tests in the PR where this feature was introduced was helpful, as is the related ticket. (It didn’t make sense to me until I just kubectl exec’d into a container and ran ls -laR on a mounted secret, though. If you are implementing something similar, I recommend doing that. I would also greatly appreciate a link to the code in Kubernetes that manages this. I spent about 20 minutes looking, but couldn’t find it, which annoys me.)

Conclusion⌗

In the end, this was a very simple change. Here’s all I needed to do for my own personal site: jrock.us#3d986…

Anyway, I hope this is helpful to someone. I am glad I no longer have to care about certificates, and hopefully you don’t have to either!