Tuesday, November 03, 2015

A simple explanation of etags

I recently worked on an app where I was responsible for some of the performance improvements on the networking side. This applied to both the speed and size of the data being transferred. It was an app that retrieved a lot over content over HTTP.

After adding gzip support the next thing I wanted to add was support for etags (entity tags) as they were the caching method supported by the server.

Amongst the other developers on the team there was some scepticism about the value of this. They'd not used etags before and so some assumed it wouldn't be worth doing.
It turns out this scepticism was due to ignorance. As they'd not used them before they didn't know how they worked.

With so much of software development involving data retrieved over HTTP I'm still surprised when developers aren't familiar with some of the more common HTTP Headers.

While working on this, my son (age 8) asked what I was doing at work. Here's how I explained it to him and how I've since explained it to other developers who have found the simplicity helpful.

Imagine I wanted to know a list of your friends.

You tell me that they are Archie, Bob and Charlie. Additionally you tell me that these are represented by special code 1.

The next time I ask you for a list of friends I can say that if they can still be represented by special code 1 then you can just tell me that they haven't changed.

That time you respond that they haven't changed.

Some time passes and I ask again. Still stating that I know what they are if special code 1 applies.

This time you reply that they are now Archie, Bob and Dan and these are represented by special code 2.

Some more time passes again and I can ask the question again this time stating I know the answer if it can be represented by special code 2.

So, in the above example the "special code" is an etag.
In the HTTP world the server responds with the content requested and an etag value in the header.
On subsequent requests I include the previously returned etag in the If-None-Match header.
If the content doesn't match that tag then a new response is returned like the first one. If the content hasn't changed (and does still have the same tag) then the server responds with a 304 (Not Modified) status.

The value of all this is that you can be spared the network overhead of the server returning data that is the same as what you've already received. This saves the time and cost (if on a mobile network or other connection where you pay for data) of the data transfer. You also avoid having to process data that matches what you already have.

There are downsides though too. You have to track the different etags for different requests. You may have to store the previous responses. (Especially if you want to use etgs beyond the single use of an app.) You also have to write your code such that it can handle a 304 response. This may mean that you can't treat a response with no body as an error. (Something I see a lot of people do.)

In the above mentioned team there was concern about the impact on the total amount of data sent by adding the extra request header. In reality this was much smaller than the several kilobytes of data we were saving on each not modified response. By adding support for etags we were able to substantially reduce the amount of data the app consumed.

This isn't the only way to not get data that hasn't changed from the server on subsequent requests and there are situations where it isn't appropriate or even possible but I think it's probably the simplest and definitely one to be aware of.


Post a Comment

I get a lot of comment spam :( - moderation may take a while.