Posted in Better Programming, Software Engineering

Learn the basics of Web Caching

Caching is a mechanism by which responses from the web server such as pages, images etc are stored so that when a client requests the resource again, the response is served from the cache instead of sending a request to the web server.

Why is it important to use caching?

  • Reduce the number of requests sent to the server
  • Reduce the latency of responses for the client by serving content from nearby cache instead of a remote server
  • Reduce network bandwidth by minimizing the number of times a resource is sent over the network from the web server

There are two main types of web caches. Let’s take a look at them.

Browser Cache

Browser Cache

If you are a Mac and Chrome user, you can find the the contents of the browser cache in the following path.

/Library/Caches/Google/Chrome/

The browser cache stores parts of pages, files, images etc to help them open faster during a user’s next visit. When a user clicks the back or next button in the browser, the contents are served from the cache directly. The contents of the cache are refreshed regularly after a certain amount of time or during every browser session.

How is the browser cache controlled?

There are several caching headers to define cache policy. Let’s look at the Cache-Control header in the following example which is set to private. This means that a private browser cache can store the response.

Source: redbot.com

There are different caching directives that can be used to set this header.

Caching HeadersDescription
Cache-Control: no-storeNothing should be cached about the request or response
Cache-Control: no-cacheThe cache sends a validation request to the server before serving from cache
Cache-Control: privateThe response is only applicable to a single user and must not be stored by a shared cache
Cache-Control: publicThe response can be stored by any cache
ExpiresThis header contains the date/time after which the response is considered stale. Ex: Expires: Wed, 22 Sept 2021 12:00:00 GMT
EtagThe entity-tag given in an ETag header field is used for Cache validation. One or more entity-tags, indicating one
or more stored responses, can be used in an If-None-Match header by the client for response validation.
Last-ModifiedThe timestamp given in a Last-Modified header can be used by the client in an If-Modified-Since header field for response validation
Caching Headers

For further reading, please refer to this detailed article on HTTP Caching.

Proxy Cache

Proxy Cache

Most web services these days use a proxy server as a gateway to handle requests before hitting the web servers. When a server acts as a caching proxy, it stores content and shares those resources with more users. Therefore this type of cache is also known as a shared cache. When a user sends a request, the proxy sever checks for a recent copy of the resource. If it exists, it is then sent back to the user, otherwise the proxy sends a request to the source server and caches the resulting content.

CDNs (Content Delivery Networks) are one of the most popular proxy servers. CDNs are a large network of servers geographically distributed around the world to serve content from a server closest to the user sending a request. When CDNs are configured properly, these can also help a web service prevent DDOS (Distributed Denial of Service) attacks as well.

What is cached?

HTTP caches usually cache responses to a GET request. This can be HTMP documents, images, style sheets or files such as media, javascript files etc. Secure and authenticated requests such as HTTPs will not be cached by shared caches. It is also possible to cache permanent redirects and error responses such as 404 (Not Found).

  • If the cached content is fresh (not expired or is in accordance with the max-age caching header, then it is served directly from the cache. There are other ways to determine freshness and perform cache validation, but we won’t go into the details here. I encourage you to read up on them if you’re interested.
  • If the content is stale, the must-revalidate Cache-Control directive is used to tell the cache to verify the freshness of the content.

The primary key used to cache contains the request method (GET) and the target URI (Uniform Resource Identifier). HTTP Caches are limited mostly to GET, so caches mostly ignore other methods and use the URI as the primary caching key.

Caching Best Practices

  • Consistent URLs – Use the same URL for serving same content on different pages and sites to users.
  • Library of content – Use a single source of truth library to store images and other shared content such as style sheets etc and refer to the same library from any page or site.
  • Avoid bulk modifications – The Last-Modified date will be set to a very recent once when you update too many files at the same time, so be aware of changing only the necessary ones.
  • Cache control – Use the appropriate cache control policies. If the response is private to the user, allow private caching and for generic content, set caching policy to public.
  • Use caching validators – Use the validation headers we learnt about in the table above such as Etag and Last-Modified so that caches can validate their content without having to download the resources from the server unnecessarily.
  • Max-age cache control – Set cache control to max-age for pages and images that will be updated only rarely.

I hope you enjoyed learning about the basics of web caching! In the next article, we will learn how to implement a simple cache from scratch.