Varnish + AWS LoadBalancer

TL;DR

Classic AWS Load Balancers only offer DNS hostnames instead of static IPs. Varnish does not allow DNS hostnames in its config. If you can't move to a VPC, you can still hook up Varnish with a classic Load Balancer using a simple script that resolves the hostname into static IPs Varnish can use, and update Varnish config regularly.

What is Varnish?

Varnish is a layer of caching usually called "web accelerator". Its purpose is to receive requests from any client, relay it to some other servers to get a resposne, store it and serve the saved response. On following requests on the same resource, Varnish will serve its stored response instead of relaying the request, functioning as a cache between users and servers. This way, the latency of processing and producing each response is reduced, freeing up resources on your application servers, increasing the volume of requests you can handle as well.

Varnish stays on top of your infrastructure and is the first one to attend a request. You can define rules and logic on how to serve or cache the content or how to treat certain requests, making this tool highly adaptable to most standard web needs.

The configuration of Varnish can be simple or complex. Ideally, the cache control should be delegated to the servers themselves, since Varnish reads the headers of the responses and determines how to cache it. One piece of configuration I want to center this post on is setting up those servers behind. These are called backends and they simply point to the original application servers that produce the response. There's a lot of stuff you can do with Varnish though, but it's out of the scope for now.

The simplest posible configuration would be just 1 backend, meaning you have only 1 application serving content. However, it's rare to have just a single server dealing with all that load. Typically you scale horizontally the application, making more servers able to handle requests at the same time, increasing the request volume the application can handle. You can setup the backend servers in different ways, setup probes to check the health of them or tell Varnish how to handle multiples backends, rotating each time to a different backend or else. If the horizontal scale grows significantly, you end up with a long list of servers to maintain, and if on top of that you need to replace or switch servers for any reason, this becomes a tedious task.

AWS Features

On a different topic, AWS provides great tools to handle infrastructure. It is common nowdays to choose AWS as a place to launch instances and serve the applications. In this line of thought, one of the tools you have available are Load Balancers, which are merely managed EC2 instances with a pre-configured web server that can then point to other EC2 servers. At the same time AWS provides Auto Scaling options to scale up or down according to traffic you have at any given moment. Auto scaling groups can also be paired with load balancers, providing a simple way of handling backends transparently. And you have two different ways of using these resources: using a Virtual Private Cloud or the Classic configuration.

Now to glue all this toghether, the setup you would imagine is to have a scaling group with your application servers, a load balancer on top of them, and a Varnish instance (or more with autoscaling and another load balancer or not). So any request you receive would hit Varnish first, or a load balancer on top of several Varnish instances, then Varnish would relay to the application load balancer, and finally to one of the backends. In this setup, Varnish has just one backend which is the application load balancer.

The Problem

However, there's a catch on all this. When configuring Varnish, backend addresses can only be IP addresses. You cannot use hostnames. As far as I know, this was a decision from Varnish developers, since Varnish configurations are compiled and static, and resolving the name can result in a number of problems, not to mention that it could drop performance. On the other hand, AWS load balancers launched on the classic strategy only offer you a hostname, and this is because the instance used can change at any moment for any reason and change its IP. So on this setup you cannot use an AWS load balancer as part of Varnish configuration.

There is a way to prevent this. If you do all your infrastructure inside a VPC, load balancers you create inside it can be referenced with static IPs. This however means if you were on the classic AWS setup, you need to move everything over, and VPCs have some cost for network traffic, so it's something to consider.

If moving everything seems undoable, there's another route, but it's sort of a cheat. However it's simple enough and should allow you to use classic load balancers with host names.

The idea is to use the IPs that resolve to the load balancer beforehand and update configurations if necesary. A CRON job can easily do this regularly and automatically for you. For this, the frequency of the update needs to be small enough to avoid any downtime. The key here, is to assert the IPs currently resolved, and specifically do an update if they change.

Here comes the good part

Before digging into the script that addresses this, you need to consider isolating the backend settings of your Varnish. Configuration files use a special syntax Varnish understands, and it allows you to modularize your configurations. The backend list is the prime example of modularization here and will help address the config update if it's isolated. So In order to do this, open up the configuration in your favorite editor and locate the configurations for the backend. It may look something like this:

backend default { .host = "111.111.111.111"; .port = "http"; }
backend server1 { .host = "111.111.111.111"; .port = "http"; }
backend server2 { .host = "222.222.222.222"; .port = "http"; }
backend server3 { .host = "333.333.333.333"; .port = "http"; }
sub vcl_init {
        new server_cluster = directors.round_robin();
        server_cluster.add_backend(server1);
        server_cluster.add_backend(server2);
        server_cluster.add_backend(server3);
        return (ok);
}


Don't mind the ugly and invalid IPs, this is just an example. If you have something like this on a single configuration file for Varnish, you should take it out to a different file like backends.vcl. In your other configuration file you can reference this one like this:

include "backends.vcl";

You may have different configurations there, but the main idea is to extract and isolate whatever defines the backend. With this done, you can now place a script that only handles this logic and makes it easier to create backend list files on the fly. Otherwise you'll have to deal with replacing stuff on the monolithic configuration file, which is not easy.

The following is a script I made that does the work. I'm sure it can be improved, but this one works well enough. You will need to customize the file generation for your case, which is mostly iterating over the balancer IPs (usually just two) and generate the lines needed.

#!/bin/bash

# Balancer address
BALANCER_HOSTNAME="mybalancer.address.com"

# File to replace with new IPs
VARNISH_BACKEND_FILE="/etc/varnish/backends.vcl"

# A temp file to write down a new backends.vcl
TMP_CONFIG_FILE="/tmp/tmp-backends.vcl"

# Simple timestamp
TIMESTAMP=`date +%s`

# A name candidate to load a new configuration in varnish
CONFIG_NAME=default_$TIMESTAMP

# The main varnish configuration file
CONFIG_FILE=/etc/varnish/default.vcl

# Resolve command to fetch IPs
DIG_COMMAND="dig +short $BALANCER_HOSTNAME"

# Compare backend.vcl and tmp-backend.vcl and only return output if different
DIFF_COMMAND="diff -q $VARNISH_BACKEND_FILE $TMP_CONFIG_FILE"

# Execute the DNS lookup
readarray -t BACKEND_IPS <<< "$($DIG_COMMAND | sort)"

# Write down tmp-backends.vcl
printf 'backend default { .host = "%s"; .port = "http"; }\n' "${BACKEND_IPS[0]}" > $TMP_CONFIG_FILE
for i in "${!BACKEND_IPS[@]}"
do
    position=$(($i + 1))
    printf 'backend server%s { .host = "%s"; .port = "http"; }\n' "$position" "${BACKEND_IPS[i]}" >> $TMP_CONFIG_FILE
done
echo "sub vcl_init {
  new server_cluster = directors.round_robin();" >> $TMP_CONFIG_FILE
for i in "${!BACKEND_IPS[@]}"
do
    position=$(($i + 1))
    printf '  server_cluster.add_backend(server%s);\n' "$position" >> $TMP_CONFIG_FILE
done
echo "  return (ok);
}" >> $TMP_CONFIG_FILE

# Compare configurations
if [[ $($DIFF_COMMAND) ]]; then
    # Copy tmp-backends.vcl into backends.vcl
    cp $TMP_CONFIG_FILE $VARNISH_BACKEND_FILE

    # Load the new configuration with the new backends
    varnishadm vcl.load $CONFIG_NAME $CONFIG_FILE

    # Replace configurations
    varnishadm vcl.use $CONFIG_NAME
else
    # Do nothing, IPs didn't change
    echo "No changes found, keeping configs untouched"
fi

This script is quite simple, it doesn't do much, but I'll break it down.

At the beginning it just setup some common variables:

  • BALANCER_HOSTNAME is the address of the load balancer. Since this is a DNS record, it remains static. The DNS later resolves to IPs that can change over time.
  • VARNISH_BACKEND_FILE is the location of the backend file modularized.
  • DIG_COMMAND executes a lookup and returns the list of IPs that were resolved for the host name.
  • TMP_CONFIG_FILE is simply a temp location to build a new file.
  • DIFF_COMMAND will compare the current backend file with the new backend file built from whatever IPs were fetched from the dig command.
  • TIMESTAMP is self explanatory.
  • CONFIG_NAME is a name for a potentially new configuration file. I'll explain this below.
  • CONFIG_FILE is the main configuration location.
What the script does essentially is the following:
  1. Use dig and resolve the hostname into an IPs array.
  2. Build a new backend file with these IPs.
  3. Compare current backends and new backends.
  4. If there are differences, replace the current backend file and update Varnish, or do nothing if no differences.
The need of the current configuration file and a configuration name, is that the way Varnish works, to update its configurations you need to first load them and compile them, and then tell Varnish to use them. In this process you cannot have duplicate names, but you can load the same configuration file under different names and have them stored while Varnish is running. Inside Varnish you can then switch configurations easily. And because of that, the TIMESTAMP is used to generate a proper name that doesn't collide with any other.

Note that you are changing the backends.vcl file instead default.vcl. But you load default.vcl into Varnish. At the end of the script you can see the varnishadm commands to handle the configuration. Depending how you have Varnish configured you could do something like "service varnish reload", but if you later go inspect Varnish, you'll see the exact same thing with an ugly hash for configuration name. If you don't mind that you can use that istead.

Finally, with the script in place, you can add a CRON job to run every minute and resolve the hostname. This is the highest frequency you can get with standard CRON. You can do more often updates on a customized process. With 1 minute check, you can have up to 1 minute of downtime if IPs change, which is pretty low, and considering balancers rarely change IPs, I think it's a good trade-off. Add the CRON task by executing:

$ sudo crontab -e

And adding the following:

* * * * * /path/to/script/resolve-balancer >/dev/null 2>&1

The crontab you may be using could be different. Varnish configurations are owned by root, so the script will need to run with those permissions. If you have a different setup, in where varnish configs are owned by another user, you can omit sudo on the previous command and use the designated user. In any case, just be sure the script is run with privileges to handle Varnish configurations and to execute Varnish commands. The nonsense at the end of the CRON task is simply to prevent mailing on the result and ignore standard output from the script if any.

You can verify that the new task is being executed by running the following:

$ grep CRON /var/log/syslog

All CRON jobs are logged there.

Final thoughts

This can get fancier and more complex, with a template and everything. The script I provide is merely an example of how to solve the problem, and while it works, you should make sure if it's suitable for your environment.

Additional features can include a sendmail command in the script to notify a change, you probably won't have to do anything on this event since the Varnish will point to the prover IPs in the next minute it happens. You could also add a simple logger to keep track of it.

A problem this approach has is that in the event that the balancer changes IPs a lot, the script would flood Varnish with new compiled configurations. You may need to clean them up periodically to prevent that. The frequency of the check equals to the max downtime you may have when IPs change.

If tools don't feel like cooperating, you can always cheat them and force them to cooperate :)