Revisioning Assets using <base>

Posted by Rodney Rehm on Friday, September 27. 2013

One of the first performance optimization tricks you learn is setting a far-future cache expiration date. This prevents browsers from repeatedly fetching, even revalidating, static resources. The downside of this approach to reducing network overhead is that you need to change the resources' URLs in order to have a browser re-download it. This post covers the problems encountered when using <base> to achieve exactly that.

TL;DR: Do not use <base href="…"> for cache busting, ever!

Cache Busting

Cache Busting refers to changing the URL a resource can be downloaded from in order to bypass a browser's cache. There are a variety of ways to go about this.

A popular approach simply adds a version number to the query string: /some-resource.js?version=1.2.3. The string 1.2.3 can be anything you like, really. Since a static file cannot change its content depending on the query string, foo.jpg and foo.jpg?version=1.2.3 return exactly the same thing. A web server would look up foo.jpg on its file system and realize that ?version=1.2.3 has no further impact, thus simply ignore it. In other words we don't need to rename anything on file system level. We "just" need to change the name of the resources in our HTML, CSS, JavaScript, SVG, et al.

We have to ask the question if ?version=1.2.3 refers to the individual resource, or simply the package of resources. In the latter case an update to one resource (say you added a ; to a JavaScript file) means that an unchanged CSS file would also need to be downloaded again. The former requires you to somehow keep track of (or reproducibly calculate) the version of each individual file. This is often done by hashing the file's content (for example using md5 or sha1).

Using this query string thing may be a problem in some cases, as Steve Souders explains. I've seen tools embed the hash in the file's name, so some-resource.js becomes some-resource.THEHASH.js. This adds the overhead of hashing and renaming every single file you've got. Depending on the number of files you're dealing with, this can take anything from a couple of seconds to several minutes.

I work in a CI/CD (Continuous Integration, Continuous Deployment) environment. I usually don't particularly care about how long the CI server has to crunch numbers in order to build my apps. What I do care about is how long the stuff takes to compile on my local development system. 5 minutes is a cigarette break, 15 minutes is unacceptable.

Hello `<base>`

There's this little-used HTML element <base>. It allows you to change the reference point (base URL) for all relative URLs. The idea is to move all assets into a versioned directory 1.2.3/ and simply add <base href="1.2.3/"> to the HTML files and be done with it. It is so damn simple (and fast) that it seemed obvious to try this out first.

Too Easy To Be Simple

Well, it works. Mostly. While the cache busting part of the exercise was indeed dead simple, adding <base> cost a couple of hours of head-scratching. Most of the things listed here are fairly obvious when you think about it. While I understand what happened, I didn't expect any of it.

When you change the document's base URL, every relative URL respects that. Even <a href="#some-anchor">. Having pointed the document elsewhere using <base href="kansas/"> all my anchor links, such as <a href="#table-of-contents"> ended up pointing to kansas/#table-of-contents. Logical, but unexpected. More on Internet Explorer's point of view later. To "fix" this, I added a bit of JavaScript that looks somewhat like this:

$(document.body).on('click', 'a[href^="#"]', function(event) {
event.preventDefault();
event.stopImmediatePropagation();
location.hash = this.hash;
});

Yes, accessing .hash on an HTMLAnchorElement is perfectly legal, see URLUtils to learn more. In my case it was only necessary to address <a href="#fragment">, but please be aware that that is the canonical equivalent of <a href="index.html#framgent"> and <a href="http://example.org/directory/index.html#framgent">. The selector shown above will not capture these!

Our apps add links like <a href="#">do something</a> that trigger some JavaScript behavior. Having added the <base>, our automated tests immediately tripped over event handlers that forgot to properly keep the browser from following the link using event.preventDefault();. Not sure if this can be counted as a win…

Thus far we only had to fix a bit of URL resolution - nothing fancy. Let's have a look at inline SVGs. Some areas were all black in Chrome and invisible in Firefox. It took a while to realize that inlining an SVG changes its document URL as well. Gradients and filters and everything else that we pulled in through referencing elements stopped working. The reason was <radialGradient id="gradient"> being linked to the document's url, but fill="url(#gradient)" being resolved against <base href="kansas/"> and thus ending up to look for the gradient in the wrong place: fill="url(kansas/#gradient)".

SVG element references can be fixed "easily" by prefixing the fragment URLs with location.pathname. Since pathname doesn't include the query string, you're headed for the next round of WTF when things suddenly go black or invisible again - just because someone added ?return= to the URL. Using the fully qualified URL without the fragment fixes this for good:

// document's URL without the fragment
var baseUrl = location.href.replace(/#.*$/, '');
// find the <rect>s to fix
var elements = document.querySelectorAll('rect');
for (var i=0, length = elements.length; i < length; i++) {
var node = elements[i];
// inject the document's URL into url(#something) values
var value = node.getAttribute('fill');
var _value = value.replace(/url\((['"]?)#/i, 'url($1' + baseUrl + '#');
if (value !== _value) {
node.setAttribute('fill', _value);
}
}

As with links, a bit of JavaScript was all we needed to "fix" this. Since I was only working with a hand full of SVGs that already had some JavaScript initialization going, I didn't mind adding another couple of lines. I was a bit disappointed that adding xml:base="…" to the SVG didn't do anything, though.

The above holds true for IE10, but not so much for IE9. Internet Explorer 9 will not erroneously resolve <base href="kansas/"> to http://kansas/. Also IE9 doesn't show the described SVG referencing issue. Fixing the invalid base resolution is quite simple:

While <base href="kansas/">…<img src="some.png" alt=""> has the image's source resolve to http://kansas/some.png, the base itself is properly resolves to http://example.org/kansas/. So if we use <base> to resolve itself to an absolute URL, we trick IE9 into proper base resolution. Since SVGs are resolving fill="url(#fragment)" just fine in Internet Explorer (9 and 10), we need to keep it from injecting the document's URL - that's why we use window.BASE_DOCUMENT_URL to store the URL to inject.

The SVG problem got me thinking about the CSS :target pseudo class. Tests confirmed that it is not resolved against <base>, so I didn't have to touch the CSS. If you, like me, didn't see Captain Obvious here, remind yourself that #some-id:target is a CSS selector, not a URL. I'm not sure if :local-link, a pseudo-class to select links »whose target's absolute URL matches the element's own document URL«, is going to resolve against <base> or not. Right now it isn't implemented anywhere, so let's leave it at that…

So, do we still think that using <base> for cache busting is a beautiful thing? While I managed to get the <base>-based cache busting working for us, I wouldn't recommend this approach to anyone. You can see how ugly things get with this hack. I'm already looking forward to throwing all of this away in a couple of weeks. It's an unmaintainable mess, prone to come back to bite us where the sun don't shine…

Maybe you're (much) better served with using grunt-rev and grunt-usemin instead. (At least that's what we're going to try next :)

Comments

Display comments as Linear | Threaded

No comments

The author does not allow comments to this entry

Cache Busting

Hello <base>

Too Easy To Be Simple

Comments

Hello `<base>`