I had to set up Varnish recently. I only knew the name, it allowed me to discover the tool in detail, to better know its settings and the modifications that had to be made on the site to make the most of this HTTP cache system. I am sharing this feedback with you, do not hesitate to share your experiences on the subject in the comments.
It’s been a long time since I posted on this blog. A feedback on the implementation of Varnish is a very good reason to do so.
Varnish: An HTTP cache
Varnish is an HTTP caching system. In an architecture, it comes to stand in front of the web front-ends, which are themselves in front of the database server.
When the user requests a resource via an url, Varnish will see if this resource concerns him. And if so if it has it cached.
If the page is in its cache, it returns the page directly. Without even going through the Apache server for example, or a MySQL query. Response times are therefore necessarily much faster. Especially since it is possible to configure Varnish to store the cache in RAM, which means that there is not even a need for a disk read to access the information.
If it doesn’t have the page cached, it passes control to the front-ends which will process the information as they usually do. And when the frontends will respond, Varnish will take the opportunity to cache that response.
Installation, operation and configuration of Varnish
Varnish can be found in the Linux repositories, for example under Debian you can install it with a simple apt-get install varnish .
Varnish runs via a daemon and can be configured via a .vcl file. This vlc file will allow you to define the backends, as well as the behavior of Varnish in each of the execution phases of a request.
For Varnish, the backends are the web frontends that are behind it. They are declared in the VCL file as follows:
backend web1 { .host = "192.168.1.10"; .port="80"; .connect_timeout = 1s; .first_byte_timeout = 30s; .probe = { .url = "/"; .timeout = 15s; .interval=15s; .window = 5; .threshold = 2; } }
In this way, we can list all the backends (web1, web2, web3… if you have several). We thus define their IP, the port to use to reach them, as well as the parameters that will allow to see the state of health of the front ends. If one of the front-ends, for example, no longer responds several times in a row during the timeout, it will be temporarily removed from the pool.
If you want to see what the lifecycle of a Varnish request looks like, you can go see the diagram on the official website .
You see that there are different steps (we will come back to this in more detail later):
- vlc_recv: a user request arrives
- vcl_hash: Varnish constitutes a hash key that will allow it to identify its cache
- vcl_hit: the resource was found in the cache
- vcl_miss: resource not found in cache
- vcl_pass: we have an expired entry, or the pass was forced by the configuration
- vcl_fetch: we previously handed over to the backend (the web frontends) and we get the response
- vcl_deliver: we send the response to the user
In the VCL configuration file, we will be able to define what Varnish should do for each of these steps, via a function as follows:
sub vcl_recv { ... }
Resource expiration management
To be able to manage the expiration of your various resources, you will have to indicate in your HTTP headers the max-age desired for each of your pages. This will tell Varnish how long you want it to cache them. This is done via the HTTP Cache-Control header. So you can specify the following:
cache-control: public, max-age=600
If you want Varnish to cache the resource for 600 seconds before it expires. Afterwards, you have several ways to do it depending on your project to define the expirations of your pages. For my part, I was under Symfony 1 and I used a class called by Symfony filters which allows me, depending on the urls, to define timeouts ranging from 0 to 86400 seconds.
Management of resources not to hide
It is possible to tell Varnish not to hide certain resources, by forcing a pass as follows:
if (req.url ~ "/myAccount*") { return(pass); }
So everything about my account will not be cached. Also put the expiry of the page to 0 to prevent it from being cached by the browser (possibility from the moment you put the cache control to public). Also note that Varnish does not hide everything that is HTTPS.
Varnish and cookies
From the moment a cookie is used in a page, Varnish will use it, in particular to generate the hash which will allow it to constitute its cache key in order to find this element later.
The problem is that if you have a cookie on every page of your site, Varnish becomes irrelevant. Because it will generate a different cache key for all users. This poses two problems: the cache will not be shared on the one hand, and on the other the same page may be cached tens or hundreds of times, and therefore use space for nothing. Fortunately, it is possible to work around this problem in the Varnish configuration.
For pages that do not require the cookie (pages which are therefore destined to be identical for all your visitors, which effectively excludes, for example, a my account space), you can tell Varnish not to take the cookie. This is done through the unset req.http.Cookie . This is also useful for images, or media resources. An example in vcl_recv:
if (req.url ~ ".(jpeg|jpg|png|gif|ico|swf|js|css|gz|rar|txt|bzip|pdf)(\?.*|)$" & amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; req.url !~ "^/index.php?") { unset req.http.Cookie; return (lookup); } if (req.url ~ "^/myStaticPage/") unset req.http.Cookie; return(lookup); }
This means that for all jpegs, gifs, pdfs, etc. as well as for the url including “myStaticPage”, the cookie should not be used, and these elements can be shared in the cache, even if a cookie is present. Be careful though, if you have processing then using cookies they will no longer be functional.
Integrating dynamic elements into a cached page: ESI or Ajax?
To take full advantage of Varnish, the ideal is to cache as many elements as possible. On the other hand, it is possible that you have dynamic elements on your pages. In this case, it will be necessary to call them in ESI or in Ajax, so that it is loaded in the page in real time. This may be the case for example if on all your pages you have a basket, or the name of the logged in user.
As Varnish works on the resource, it will already be necessary initially that these dynamic elements be accessible by a url.
You will also need to configure your VCL and your HTTP headers so that these urls are not cached (max-age at 0 and/or a return pass in the vcl). This being done, you will now have to incorporate these elements into your cached page.
Two solutions are available to you: ESI and Ajax.
For the first solution, you just need to integrate the following tag into your page:
& amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;esi:include src="/myFolder/myPageNotCache"/& amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;
At the time of loading the cached page, an ESI call will be made to load the dynamic content. This solution is elegant. On the other hand, the big problem with the ESI in Varnish is that it is (for the moment) not parallelizable (to be monitored anyway, because it is possible that in future versions of Varnish, the ESI calls may be parallelizable, which would immediately make them more attractive).
This poses performance concerns, and breaks in the loading of your page. Especially if you have several ESI calls. This is the solution I chose initially. But I changed to Ajax calls (afterwards, I didn’t test the opposite, namely a dynamic resource and the integration in it of dynamic elements by ESI).
In the latter case (Ajax), you proceed as usual. In your HTML code you will have a DOM element (for example a div or span tag) in which you will come by Ajax call to integrate the html result of an uncached url (in our previous example /myDossier/myPageNotCache).
Purge Varnish Cache
Once your cache is properly configured, you may need to flush certain elements from time to time to refresh a page before its cache expires.
There are several methods in Varnish to do this.
– You can request a cache flush by an HTTP request which will not be a GET request but a PURGE request. On the other hand, for security reasons, it is necessary to restrict the possibilities of performing these purges, by filtering by authorized IPs. Otherwise, anyone may be able to issue such a PURGE request. An example VCL to handle purging:
acl purge { "localhost"; "127.0.0.1"; } sub vcl_recv { if (req.request == "PURGE") { if (!client.ip ~ purge) { error 405 "Not allowed."; } return(lookup); } else { return(lookup); } } sub vcl_hit { if (req. request == "PURGE") { purge; error 200 "Purged."; } } sub vcl_miss { if (req. request == "PURGE") { purge; error 200 "Not in cache."; } } sub vcl_pass { if (req.request == "PURGE") { error 502 "PURGE on a passed object"; } } & amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;
– It is also possible to ask Varnish not to use the cache it has by banishing it.
This forces Varnish to recreate a new cache which it can then use. The downside is that the old cache stays in RAM, but until it expires. It can also be deleted by a thread dedicated to this cleaning called “ban lurker”. This also avoids piling up many ban rules which become long to parse.
The advantage of bans over purges is that you can use regular expressions.
And thus ban a set of urls and no longer just one url. These ban or purge requests are shell commands.
But we can very well call them in PHP via an exec to build a cache administration backoffice. Here are some examples of code to ban a url, a set of urls via a regex, or completely clear the cache:
// Ban a url exec('echo "ban req.http.host == \"www.example.com\" &amp ;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp ;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp ;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp .url == \"'.$url.'\"" | nc -q 1 '.IP_VARNISH); // Ban urls by regex. For example, ban all images from the cache in a folder // the url will be: /images/.*/DossierName/.*.jpg$ exec('echo "ban req.http.host == \"www.example .com\" & amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; req.url ~ \"'.$url.'\"" | nc -q 1 '.IP_VARNISH) ; // Clear cache completely exec('echo "ban.url /*" | nc -q 1 '.IP_VARNISH);