By Cal Henderson, May 22nd 2006.
Updated 2011-05-03: This article originally appeared on Think Vitamin, but was lost in a reorganization. This copy was grabbed from the Google cache. Many things still apply, 5 years later. It predate's Steve's book by more than 2 years, but is partly based on conversations I had with him at Yahoo.
With our so-called "Web 2.0" applications and their rich content and interaction, we expect our applications to increasingly make use of CSS and JavaScript. To make sure these applications are nice and snappy to use, we need to optimize the size and nature of content required to render the page, making sure we're delivering the optimum experience. In practice, this means a combination of making our content as small and fast to download as possible, while avoiding unnecessarily refetching unmodified resources.
This is complicated a little by the nature of CSS and JavaScript resources. In contrast to image assets, CSS and JavaScript source code is very likely to change many times as time goes by. When these resources change, we need our clients to download them all over again, invalidating the version in their local cache (and any versions stored in other caches along the way). In this article, we'll look at ways we can make the whole experience as fast as possible for our users - the initial page load, subsequent page loads and ongoing resource loading as the application evolves and content changes.
I believe strongly in making things as simple as possible for developers, so we'll also be looking at ways we can set up our systems to automatically take care of these optimization issues for us. With a little up front work, we can get the best of both worlds - an environment that makes development easy with great end-user performance - all without changing the way we work.
The old school of thought was that we could achieve optimal performance by combining multiple CSS and JavaScript files into fewer, larger blocks. Rather than having ten 5k JavaScript files, we combine them into a single 50k file. While the total size of the code is still the same, we avoid having the overhead associated with multiple HTTP requests. Each request has a setup and teardown phase on both the client and server, incurs request and response header size overhead, and resource overhead on the server side in the form of more processes or threads (and perhaps more CPU time for on-the-fly gzipped content).
The parellization aspect is also important. By default, both Internet Explorer and Mozilla/Firefox will only download two resources from a single domain at once when using persistent connections (as suggested in the HTTP 1.1 spec, section 8.1.4). This means that while we're waiting to download those JavaScript files, 2 at a time, we're not loading image assets - the page our users see during the loading phase will be missing its images.
However, there are a couple of downsides to this approach. By bundling all of our resources together, we force the user to download everything up front. By chunking content into multiple files we can spread out the cost of loading across several pages, amortizing the speed hit across a session (or avoiding some of the cost completely, depending on the path the user chooses). If we make the first page slow to speed up subsequent pages, we might find that we have more users who never wait around to request a second page.
The big downside to the single file approach has not often, historically, been considered. In an environment where we will have to often change our resources, any changes to a single-file system will require the client to re-download a copy of the entire CSS or JavaScript working set. If our application has a single monolithic 100k JavaScript source file, any tiny change to our code will force all clients to suck down the 100k all over again.
The alternative approach lies somewhere in the middle - we split our CSS and JavaScript resources into multiple sub-files, while at the same time keeping that number functionally low. This compromise comes at a cost - we need to be able to develop applications with our code split out into logical chunks to increase development efficiency, while delivering merged files for performance. With a few additions to our build system (the set of tools which turn your development code into production code, ready for deployment), this needn't be a compromise we have to make.
For an application environment with distinct development and production environments, you can use a few simple techniques to keep your code manageable. In your development environment, code can be split into many logical components to make separation clear. In Smarty (A PHP templating language) we can create a simple function to manage the loading of our JavaScript:
SMARTY:
{insert_js files="foo.js,bar.js,baz.js"}
PHP:
function smarty_insert_js($args){
foreach (explode(',', $args['files']) as $file){
echo "<script type=\"text/javascript\" src=\"/javascript/$file\"></script>\n";
}
}
OUTPUT:
<script type="text/javascript" src="/javascript/foo.js"></script>
<script type="text/javascript" src="/javascript/bar.js"></script>
<script type="text/javascript" src="/javascript/baz.js"></script>
So far, so easy. But then we instruct our build process to merge
certain files together into single resources. In our example, imagine
we merged foo.js
and bar.js
into foobar.js
,
since they are nearly always loaded together. We can then record this
fact in our application configuration and modify our template function
to use this information.
SMARTY:
{insert_js files="foo.js,bar.js,baz.js"}
PHP:
# map of where we can find .js source files after the build process
# has merged as necessary
$GLOBALS['config']['js_source_map'] = array(
'foo.js' => 'foobar.js',
'bar.js' => 'foobar.js',
'baz.js' => 'baz.js',
);
function smarty_insert_js($args){
if ($GLOBALS['config']['is_dev_site']){
$files = explode(',', $args['files']);
}else{
$files = array();
foreach (explode(',', $args['files']) as $file){
$files[$GLOBALS['config']['js_source_map'][$file]]++;
}
$files = array_keys($files);
}
foreach ($files as $file){
echo "<script type=\"text/javascript\" src=\"/javascript/$file\"></script>\n";
}
}
OUTPUT:
<script type="text/javascript" src="/javascript/foobar.js"></script>
<script type="text/javascript" src="/javascript/baz.js"></script>
The source code in our templates doesn't need to change between development and production, but allows us to keep files separated while developing and merged in production. For bonus points, we can write our merging process in PHP and use the same configuration block to perform the merge process, allowing us to keep a single configuration file and avoid having to keep anything in sync. For super-bonus points, we could analyze the occurrence of scripts and style sheets together on pages we serve, to determine which files would be best to merge (files that nearly always appear together are good candidates for merging).
For CSS, a useful model to start from is that of a master and subsection relationship. A single master style sheet controls style across your entire application, while multiple sub-sheets control various distinct feature areas. In this way, most pages will load only two sheets, one of which is cached the first time any page is requested (the master sheet).
For small CSS and JavaScript resource sets, this approach may be slower for the first request than a single large resource, but if you keep the number of components low then you'll probably find it's actually faster, since the data size per page is much lower. The painful loading costs are spread out around different application areas, so the number of parallel loads is kept to a minimum while also keeping the resources-per-page size low.
When talk about asset compression, most people think immediately of mod_gzip
. Beware, however - mod_gzip
is actually evil,
or at the least, a resource hogging nightmare. The idea behind it is
simple - browsers request resources and send along a header to show
what kind of content encodings they accept. It looks something like
this:
Accept-Encoding: gzip,deflate
When a server encounters this header, it can then gzip or deflate
(compress) the content it's sending to the client, where the client
will then decompress it. This burns CPU time on both the client and
server, while reducing the amount of data transferred. All well and
good. The way mod_gzip
works, however, is to create a
temporary file on disk in which to compress the source data, serve that
file out, then delete it. For high volume systems, you very quickly
become bound by disk IO. We can avoid this by using mod_deflate
instead (Apache 2 only), which does all the compression in memory -
sensible. For Apache 1 users, you can instead create a RAM disk and
have mod_gzip
writes its temporary files there - not quite as fast as pure in-memory compression, but not nearly as slow as writing to disk.
Even so, we can avoid the compression overhead completely by pre-compressing the relevant static resources and using mod_gzip
to serve people the compressed version where appropriate. If we add
this compression into our build process, it all happens transparently
to us. The number of files that need compressing is typically quite low
- we don't compress images since we don't gain much, if any, size
benefit (since they're already compressed) so we only need to compress
our JavaScript and CSS (and any other uncompressed static content).
Configuration options tell mod_gzip
where to look for pre-compressed files.
mod_gzip_can_negotiate Yes
mod_gzip_static_suffix .gz
AddEncoding gzip .gz
Newer versions of mod_gzip
(starting with version
1.3.26.1a) can pre-compress files for you automatically by adding a
single extra configuration option. You'll need to make sure that Apache
has the correct permissions to create and overwrite the gzipped files
for this to work.
mod_gzip_update_static Yes
However, it's not that simple. Certain versions of Netscape 4 (specifically 4.06 to 4.08) identify themselves as being able to interpret gzipped content (they send a header saying they do), but they cannot correctly decompress it. Most other versions of Netscape 4 have issues with loading compressed JavaScript and CSS in different and exciting ways. We need to detect these agents on the server side and make sure they get served an uncompressed version. This is fairly easy to work around, but Internet Explorer (versions 4 through 6) has some more interesting issues. When loading gzipped JavaScript, Internet Explorer will sometimes incorrectly decompress the resource, or halt compression halfway through, presenting half a file to the client. If you rely on your JavaScript working, you need to avoid sending gzipped content to Internet Explorer. In the cases where Internet Explorer does receive gzipped JavaScript correctly, some older 5.x versions won't cache the file, regardless of it's e-tag headers.
Since gzip compression of content is so problematic, we can instead turn our attention to compressing content without changing its format. There are many JavaScript compression scripts available, most of which use a regular expression driven rule set to reduce the size of JavaScript source. There are several things which can be done to make the source smaller - removing comments, collapsing whitespace, shortening privately scoped variable names and removing optional syntax.
Unfortunately, most of these scripts either obtain a fairly low compression rate, or are destructive under certain circumstances (or both). Without understanding the full parse tree, it's difficult for a compressor to distinguish between a comment and what looks like a comment inside a quoted string. Adding closures to the mix, it's not easy to find which variables have a private lexical scope using regular expressions, so some variable name shortening techniques will break certain kinds of closure code.
One compressor does avoid this fate - the Dojo Compressor (there's a ready-to-use version here) works by using Rhino (Mozilla's JavaScript engine implemented in Java) to build a parse tree, which it then reduces before serializing it to a file. The Dojo Compressor can give pretty good savings for a low cost - a single compression at build time. By building this compression into our build process, it all happens transparently for us. We can add as much whitespace and as many comments as we like to our JavaScript in our development environment, without worrying about bloating our production code.
Compared to JavaScript, CSS is relatively simple to compress. Because of a general lack of quoted strings (typically paths and font names) we can mangle the whitespace using regular expressions. In the cases where we do have quoted strings, we can nearly always collapse a whitespace sequence into a single space (since we don't tend to find multiple spaces or tabs in URL paths or font names). A simple Perl script should be all we need:
#!/usr/bin/perl
my $data = '';
open F, $ARGV[0] or die "Can't open source file: $!";
$data .= $_ while <F>;
close F;
$data =~ s!\/\*(.*?)\*\/!!g; # remove comments
$data =~ s!\s+! !g; # collapse space
$data =~ s!\} !}\n!g; # add line breaks
$data =~ s!\n$!!; # remove last break
$data =~ s! \{ ! {!g; # trim inside brackets
$data =~ s!; \}!}!g; # trim inside brackets
print $data;
We can then feed individual CSS files through the script to compress them like so:
perl compress.pl site.source.css > site.compress.css
With these simple plaintext optimizations we can reduce the amount of data sent over the wire by as much as 50% (depending upon your coding style - it might be much less), which can translate to a much faster experience for our users. But what we'd really like to do is avoid users having to even request files unless completely necessary - and that's where an intimate knowledge of HTTP caching comes in handy.
When a user agent requests a resource from a server for the first time, it caches the response to avoid making the same request in the future. How long it stores this response for is influenced by two factors - the agent configuration and any cache control response headers from the server. All browsers have subtly different configuration options and behaviors, but most will cache a given resource for at least the length of a session, unless explicitly told otherwise.
It's quite likely you already send out anti-caching headers for dynamic content pages to avoid the browser caching pages which constantly change. In PHP, you can achieve this with a pair of function calls:
<?php
header("Cache-Control: private");
header("Cache-Control: no-cache", false);
?>
Sounds too easy? It is - some agents will ignore this header under certain circumstances. To really convince a browser not to cache a document, you'll need to be a little more forceful:
<?php
# 'Expires' in the past
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
# Always modified
header("Last-Modified: ".gmdate("D, d M Y H:i:s")." GMT");
# HTTP/1.1
header("Cache-Control: no-store, no-cache, must-revalidate");
header("Cache-Control: post-check=0, pre-check=0", false);
# HTTP/1.0
header("Pragma: no-cache");
?>
This is fine for content we don't want to be cached, but for content that doesn't change with every request we want to encourage the browser to cache it aggressively. The “If-Modified-Since” request header allows us to get part of the way there. If a client sends an “If-Modified-Since” header with its request, Apache (or your web server of choice) can respond with status code 304 (”Not Modified”), telling the browser that its cached copy of the file is already up to date. With this mechanism, we can avoid sending the contents of a file to the browser, but we still incur the overhead of an HTTP request. Hmmm.
Similar to the if-modified-since mechanism are entity tags. Under Apache, each response for a static resource is given an “ETag” header containing a checksum generated from the file's modified-time, size and inode number. A browser can then perform a HEAD request to check the e-tag for a resource without downloading it. E-tags suffer from the same problem as the if-modified-since mechanism - the client still needs to perform an HTTP request to determine the validity of the locally cached copy.
In addition, you need to be careful with if-modified-since and e-tags if you serve content from multiple servers. With two load-balanced web servers, a single resource could be requested from either server by a single agent - and could be requested from each at different times. This is great - it's why we load balance. However, if the two servers generate different e-tags or modified dates for the same files, then browsers won't be able to properly cache content. By default, e-tags are generated using the inode number of the file, which will vary from server to server. You can turn this off using a single Apache configuration option:
FileETag MTime Size
With this option, Apache will use only the modification time and file size to determine the e-tag. This, unfortunately, leads us to the other problem with e-tags, which can affect if-modified-since too (though not nearly as badly). Since the e-tag relies on the modified time of the file, we need those times to be in sync. If we're pushing files to multiple web servers, there's always a chance that the time at which the files are pushed are subtly different by a second or two. In this case, the e-tags generated by two servers will still be different. We could change the configuration to generate e-tags only from the file size, but this means that we'll generate the same e-tag if we change a file's contents without changing its size. Not ideal.
The problem here is that we are approaching the issue from the wrong direction. These possible caching strategies all revolve around the client asking the server if its cached copy is fresh. If we could notify the client when we change a file, it would know that its own cached copy was fresh, until we told it otherwise. But the web doesn't work that way - the client makes requests to the server.
But that's not quite true - before fetching any JavaScript or CSS
files, the client makes a request to the server for the page which will
be loading those files via <script>
or <link>
tags. We can use the response from the server to notify the client of
any changes in those resources. This is all a little cryptic, so let's
spell it out - if we change the filenames of JavaScript and CSS files
when we change their contents, we can tell the client to cache every
URL forever, since the content of any given URL will never change.
If we are sure that a given resource will never change, then we can send out some seriously aggressive caching headers. In PHP, we just need a couple of lines:
<?php
header("Expires: ".gmdate("D, d M Y H:i:s", time()+315360000)." GMT");
header("Cache-Control: max-age=315360000");
?>
Here we tell the browser that the content will expire in 10 years (there are 315,360,000 seconds in 10 years, more or less) and that it can keep it around for 10 years. Of course, we're probably not serving our JavaScript and CSS via PHP - we'll address that in a few moments.
Manually changing the filenames of resources when the contents are modified is a dangerous task. What happens if you rename the file, but not the templates pointing to it? What happens if you change some templates but not others? What happens if you change the templates but don't rename the file? Most likely of all, what happens if you modify a resource but forget to rename it or change any references to it. In the best of these cases, users will not see the new content and be stuck with the old versions. In the worst case, no valid resource is found and your site stops working. This sounds like a dumb idea.
Luckily computers are really good at this sort of thing - dull repetitive tasks which need to be done exactly right, over and over again, when some kind of change occurs.
The first step in making this process as painless as possible is to
realize that we don't need to rename files at all. URLs we serve
content from and where the content is located on disk don't have to
have anything to do with each other. Using Apache's mod_rewrite
we can create a simple rule to redirect certain URLs to certain files.
RewriteEngine on
RewriteRule ^/(.*\.)v[0-9.]+\.(css|js|gif|png|jpg)$ /$1$2 [L]
This rule matches any URL with one of the specified extensions which also contains a 'version' nugget. The rule then rewrites these URLs to a path without the version nugget. Some examples:
URL Path
/images/foo.v2.gif -> /images/foo.gif
/css/main.v1.27.css -> /css/main.css
/javascript/md5.v6.js -> /javascript/md5.js
With this rule in-place, we can change the URL (by changing the
version number) without changing where the file lives on disk. Because
the URL has changed, the browser treats it as a different resource. For
bonus points, you can combine this with the script grouping function
from earlier to produce a list of versioned <script>
tags as needed.
At this point, you might ask why we don't just add a query string to the end of the resource - /css/main.css?v=4
. According the letter of the HTTP caching specification, user agents should never
cache URLs with query strings. While Internet Explorer and Firefox
ignore this, Opera and Safari don't - to make sure all user agents can
cache your resources, we need to keep query strings out of their URLs.
Now that we can change our URLs without moving the file, it would be nice to be able to have the URLs updated automatically. In a small production environment (or a development environment, for people with large production environments), we can do this really easily using a template function. This example is for Smarty, but applies equally well to other templating engines.
SMARTY:
<link href="{version src='/css/group.css'}" rel="stylesheet" type="text/css" />
PHP:
function smarty_version($args){
$stat = stat($GLOBALS['config']['site_root'].$args['src']);
$version = $stat['mtime'];
echo preg_replace('!\.([a-z]+?)$!', ".v$version.\$1", $args['src']);
}
OUTPUT:
<link href="/css/group.v1234567890.css" rel="stylesheet" type="text/css" />
For each linked resource, we determine the file's location on disk, check its mtime
(the date and time the file was last modified on disk) and insert that
into the URL as the version number. This works great for low traffic
sites (where stat
operations are cheap) and for development environments, but it doesn't scale well to high volume deployments - each call to stat
requires a disk read.
The solution is fairly simple. In a large system we already have a version number for each resource, in the form of the source control revision number (you're already using source control, right?). At the point when we go to build our site for deployment, we simply check the revision numbers of all of our resource files and write them to a static configuration file.
<?php
$GLOBALS['config']['resource_versions'] = array(
'/images/foo.gif' => '2.1',
'/css/main.css' => '1.27',
'/javascript/md5.js' => '6.1.4',
);
?>
We can then modify our templating function to use these version numbers when we're operating in production.
<?php
function smarty_version($args){
if ($GLOBALS['config']['is_dev_site']){
$stat = stat($GLOBALS['config']['site_root'].$args['src']);
$version = $stat['mtime'];
}else{
$version = $GLOBALS['config']['resource_versions'][$args['src']];
}
echo preg_replace('!\.([a-z]+?)$!', ".v$version.\$1", $args['src']);
}
?>
In this way, we don't need to rename any files, or even remember when we modify resources - the URL will be automatically changed everywhere whenever we push out a new revision - lovely. We're almost where we want to be.
When we talked about sending very-long-period cache headers with our static resources earlier, we noted that since this content isn't usually served through PHP, we can't easily add the cache headers. We have a couple of obvious choices for dealing with this; inserting PHP into the process or letting Apache do the work.
Getting PHP to do our work for us is fairly simple. All we need to do is change the rewrite rule for the static files to be routed through a PHP script, then have the PHP script output headers before outputting the content of the requested resource.
Apache:
RewriteRule ^/(.*\.)v[0-9.]+\.(css|js|gif|png|jpg)$ /redir.php?path=$1$2 [L]
PHP:
header("Expires: ".gmdate("D, d M Y H:i:s", time()+315360000)." GMT");
header("Cache-Control: max-age=315360000");
# ignore paths with a '..'
if (preg_match('!\.\.!', $_GET[path])){ go_404(); }
# make sure our path starts with a known directory
if (!preg_match('!^(javascript|css|images)!', $_GET[path])){ go_404(); }
# does the file exist?
if (!file_exists($_GET[path])){ go_404(); }
# output a mediatype header
$ext = array_pop(explode('.', $_GET[path]));
switch ($ext){
case 'css':
header("Content-type: text/css");
break;
case 'js' :
header("Content-type: text/javascript");
break;
case 'gif':
header("Content-type: image/gif");
break;
case 'jpg':
header("Content-type: image/jpeg");
break;
case 'png':
header("Content-type: image/png");
break;
default:
header("Content-type: text/plain");
}
# echo the file's contents
echo implode('', file($_GET[path]));
function go_404(){
header("HTTP/1.0 404 File not found");
exit;
}
While this works, it's not a great solution. PHP demands more memory
and execution time than if we did everything in Apache. In addition, we
have to be careful to protect against exploits made possible by sending
us doctored values for the path
query parameter. To avoid all this headache, we can have Apache add the headers directly. The RewriteRule
directive allows us to set environment variables when a rule is matched, while the Header
directive lets us add headers only when a given environment variable is
set. Combining these two directives, we can easily chain the rewrite
rule together with the header settings.
RewriteEngine on
RewriteRule ^/(.*\.)v[0-9.]+\.(css|js|gif|png|jpg)$ /$1$2 [L,E=VERSIONED_FILE:1]
Header add "Expires" "Mon, 28 Jul 2014 23:30:00 GMT" env=VERSIONED_FILE
Header add "Cache-Control" "max-age=315360000" env=VERSIONED_FILE
Because of Apache's order of execution, we need to add the RewriteRule
line to the main configuration file (httpd.conf
) and not a per-directory (.htaccess
) configuration file, otherwise the Header
lines get run first, before the environment variable gets set. The Header
lines can either go in the main configuration file or in an .htaccess
file - it makes no difference.
By combining the above techniques, we can build a flexible development environment and a fast and performant production environment. Of course, this is far from the last word on speed. There are further techniques we could look at (separate serving of static content, multiple domain names for increased concurrency) and different ways of approaching the ones we've talked about (building an Apache filter to modify outgoing URLs in HTML source to add versioning information on the fly). Tell us about techniques and approaches that have worked well for you by leaving a comment.
Cal's new book, Building Scalable Web Sites, contains more tips and tricks to help you develop and manage the next generation of web applications.
Copyright © 2006 Cal Henderson & Think Vitamin.
The text of this article is all rights reserved. No part of these publications shall be reproduced, stored in a retrieval system, or transmitted by any means - electronic, mechanical, photocopying, recording or otherwise - without written permission from the publisher, except for the inclusion of brief quotations in a review or academic work.
All source code in this article is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. That means you can copy it and use it (even commerically), but you can't sell it and you must use attribution.
Comments have been disabled