Look, just so we're clear:
Also, please don't blame my employer for this, and also don't blame the authors of the beautiful TLS library I vandalized repurposed. This is a personal weekend project <looks at clock showing 4AM monday π¨ >.
If you're serving files from an HTTPS server, and you're really successful at doing that, you might at some point run into the issue that your HTTPS server is using too much bandwidth - either because your main server's bandwidth is kinda pricey or because you're just running out of bandwidth on your uplink. Or you might also have the issue that your users are on other continents and shipping stuff over individual TCP connections to each user is kinda inefficient? (I don't actually know how this works in practice, because I've never run a successful website. π)
My understanding is that there are some classic solutions to that:
fetch()
. It's not a good fit for things like images because browsers want to be able to load them progressively, but the hash can only be checked once the entire resource has been loaded; the MICE proposal is intended to help with that by making it possible to continuously verify the integrity of a resources as it is being received.The following are not goals I have in this scenario (because if I had those goals, my solution wouldn't work, otherwise I would totally have those goals):
Of course, even just pointing a domain at some CDN already lets the CDN serve whatever content it wants to over HTTPS, even if the CDN is not given a TLS certificate and key by the domain owner - because the CDN could obtain its own TLS certificate from a Certificate Authority (CA), since one of the ways to prove ownership of a domain to a CA is to serve content specified by the CA over plain HTTP at a CA-specified path, and if the domain points to a CDN server, the CDN server can just serve whatever it wants via plain HTTP. Oops!
Luckily, this can nowadays be mitigated via Certificate Authority Authorization (CAA) (originally specified in RFC 6844, superseded by RFC 8659). In its minimal form, CAA makes it possible to use DNS records to ensure that only specific CAs can issue certificates for some domain. That alone isn't ideal here. A fancier feature that the Let's Encrypt CA supports since December 2022 (woohoo! π₯³) is that when you restrict certificate issuance to the Let's Encrypt CA, you can additionally also require that ownership of the domain must be proven using a specific method (such as DNS) and/or using a specific account. That's a perfect fit - we can restrict the validation method to DNS and then not worry about this attack anymore.
The core idea is: We start by pointing the domain's DNS records to the CDN server, and let the CDN server proxy TLS connections from clients to the origin server. At this point the CDN server is just passing through a TCP connection; it is not doing anything TLS-specific. Nothing useful is happening yet: The origin server is still using as much network bandwidth as before, it's just talking to a different machine now.
But here comes the trick: The origin server can give the CDN server the downlink encryption key, tell the CDN which file the client requested, and omit the corresponding ciphertext. The CDN can then use the downlink encryption key to reinject the static content.
So, is this actually doable? Is it possible in TLS to just give the CDN the downlink encryption key, without letting the CDN modify the encrypted data? Well, sort of. It depends.
In TLS 1.3, five Cipher Suites are specified: Two based on AES-GCM (one of which is the only one implementations are required to support), one based on Chacha20-Poly1305, and two based on AES-CCM (and it looks like Firefox/Chrome don't support AES-CCM). AES-GCM and Chacha20-Poly1305 are AEAD constructions that only take a single key that is used for both encryption and authentication. Which means this trick won't work with TLS 1.3 β if the CDN is given the key necessary to encrypt data, it can also modify the data.
π This shows how the overzealous simplification in TLS 1.3 is stifling innovation and HEY who is throwing stuff aaAAAH stop throwing things I'm joking I'M JOKING I'M SORRY OKAY I'M SO VERY SORRY I TAKE EVERYTHING BACK
Okay, time to go legacy. Let's look at TLS 1.2, maybe they have better ciphersuites ciphersuites that suit our purposes more.
TLS 1.2 has a lot of ciphersuites with the same kinda design; but the one ciphersuite that is mandatory to implement in TLS 1.2 is TLS_RSA_WITH_AES_128_CBC_SHA; and a variant with the same encryption/authentication but a perfect-forward-secure handshake is TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA. Both of these are supported by browsers; and both of these use separate keys for encryption (with AES-128-CBC) and authentication (HMAC-SHA1). π π₯³
Okay maybe this is not ideal, CBC is kinda known for being an implementation pitfall piΓ±ata, and SHA-1 is kinda, uh, broken, but, uh, that's fine, right? ... right? Come on, surely by now we've figured out how to CBC safely. And the SHA-1 collision isn't that bad here, an attacker would probably have to pull off a second-preimage attack, kinda depending on, uh, how static the CDN content is? And isn't HMAC anyway known for making it way harder to pull off attacks than if you were attacking the underlying hash function directly?
π§ββοΈ (cue horror movie scene where the protagonists are suddenly surrounded by a swarm of zombies) π§ββοΈ
To demonstrate this idea, I hacked up the really nice s2n TLS library from AWS. Allegedly the golang TLS stack is a really nice choice if you want to hack up a TLS stack, but I have approximately zero clue about golang, and I very much am familiar with C, so I decided to go with a C implementation for this project. Also seems like the right choice of language for the general vibe of this project. (That's not meant to be a criticism of the s2n library! From the tiny parts of it I've seen, they're good at using abstractions for all their memory buffers and such to minimize the chances of typical C program bugs. My modifications to the library, on the other hand, uuuh... wheeee, memmove()!)
In fact, the very website you are looking at right now (https://true-keyless.thejh.net/) is the demo; this HTML page, and the image embedded in it, are served from custom CDN software hosted on fly.io's anycast service, and this CDN software forwards the TLS connection to my VPS elsewhere and reinjects the CDN content to reduce outgoing network traffic on my VPS! (Through a WireGuard tunnel, so that the downlink encryption key and file path don't go over the internet in cleartext.)
(If you want to have a laugh, you can check out what Qualys SSL Labs thinks about this website... Grade B! Not bad!)
If you want to try running a "CDN" instance yourself for testing (with the CDN-to-origin connection just going over the public internet), you can grab the CDN code and run a local instance on your machine:
git clone https://git.thejh.net/git/true-keyless.git
cd true-keyless
make integrity-cdn
./integrity-cdn ::ffff:37.221.195.125 4433 4433
And then you should be able to see this page at https://local.true-keyless.thejh.net:4433/, which points to localhost (except that your router/ISP might prevent that from working, and your browser might not let you use that link directly?).
If you want to run your own origin server, you'll also need my hacked up s2n, which you can get with:
git clone https://git.thejh.net/git/s2n-tls.git
cd s2n-tls
git checkout true-keyless
S2N_LIBCRYPTO=openssl-1.1.1 BUILD_S2N=true GCC_VERSION=10 codebuild/bin/s2n_codebuild.sh # or something like this, idk, might depend on your system
Then you can edit the Makefile
of the true-keyless project to point to the right path for s2n, and build all of true-keyless with make
.
To change which files are served, edit the precooked_responses
table in common.h
. Note that the listed files must contain complete HTTP/1.0 responses with headers!
You'll also need your own TLS certificate chain and key.
As described above, we can't use TLS 1.3 because it has no cipher suites with the right properties. But if we had some classic encryption+HMAC cipher suite in TLS/1.3, then we could do the following:
One amazing feature in TLS 1.3 that does not really exist in TLS 1.2 is that it's possible to just rekey in the sending direction without any round trips - you just send a KeyUpdate
message and that's it. This means the origin server could sandwich any CDN content between two KeyUpdate
messages, and then embed secrets in other parts of the response that the CDN server wouldn't be able to decrypt.
So if we had compatible cipher suites in TLS 1.3, you could use a templating language for HTML and let the CDN inject the static parts of the template while letting the origin server insert the dynamic parts such that the CDN can't decrypt them! That'd be neat, huh? The TLS spec folks should totally add an encryption+HMAC cipher suite extension to TLS 1.3. It'd be great. (See disclaimer at the top, not a cryptographer, yadda yadda.)
Of course the ultimately helpful cipher suite for this would be one that takes two encryption keys in addition to the HMAC key (like CTR+HMAC except you replace the block cipher invocation with two invocations with different keys and XOR their outputs), where one of the keys can be freely set by the sending peer. Then you could have encrypted user content stored in an untrusted CDN, without the CDN being able to read it. The user can authenticate using request headers or request parameters to the origin server as normal (all communication in the uplink direction is completely secret from the CDN), and the server could set the inner encryption key to the key used to encrypt the CDN-cached file and instruct the CDN to inject the pre-encrypted user content while applying the outer layer of encryption.