Making the most of Content Caching

ZXTM’s Content Caching capability allows the ZXTM Traffic Manager to identify web page responses that are the same for each request and to remember (‘cache’) the content. The content may be ‘static’, such as a file on disk on the web server, or it may have been generated by an application running on the web server.

Why use Content Caching?

When another client asks for content that ZXTM has cached in its internal web cache, ZXTM can return the content directly to the client without having to forward the request to a back-end web server.

This has the effect of reducing the load on the back-end web servers, particularly if ZXTM has detected that it can cache content generated by complex applications which consume resources on the web server machine.

What are the pitfalls?

A content cache may store a document that should not be cached

ZXTM conforms to the recommentations of RFC 2616, which describe how web browsers and server can specify cache behaviour. However, if a web server is misconfigured, and does not provide the correct cache control information, then a TrafficScript rule can be used to override ZXTM's default caching logic.

A content cache may need a very large amount of memory to be effective

ZXTM allows you to specify precisely how much memory you wish to use for your cache, and to impose fine limits on the sizes of files to be cached and the duration that they should be cached for. When running on a 64-bit platform, ZXTM can easily overcome the 2-4Gb memory limits of legacy 32 bit platforms.

How does it work?

Not all web content can be cached. Information in the HTTP request and the HTTP response drives ZXTM’s decisions as to whether or not a request should be served from the web cache, and whether or not a response should be cached.

Requests

  • Only HTTP GET and HEAD requests are cacheable. All other methods are not cachable.
  • The Cache-Control header in an HTTP request can force ZXTM to ignore the web cache and to contact a back-end node instead.
  • Requests that use HTTP basic-auth are uncacheable.

Responses

  • The Cache-Control header in an HTTP response can indicate that an HTTP response should never be placed in the web cache.

    The header can also use the max-age value to specify how long the cached object can be cached for. This may cause a response to be cached for less than the configured webcache!time parameter.

  • HTTP responses can use the Expires header to control how long to cache the response for. Note that using the Expires header is less efficient than using the max-age value in the Cache-Control response header.
  • The Vary HTTP response header controls how variants of a resource are cached, and which variant is served from the cache in response to a new request.

If a web application wishes to prevent ZXTM from caching a response, it should add a ‘Cache-Control: no-cache’ header to the response.

Debugging ZXTM's Cache Behaviour

You can use the global setting webcache!verbose if you wish to debug your cache behaviour. This setting is found in the Cache Settings section of the System, Global Settings page. If you enable this setting, ZXTM will add a header named ‘X-Cache-Info’ to the HTTP response to indicate how the cache policy has taken effect. You can inspect this header using ZXTM's access logging, or using the LiveHTTPHeaders extension in Mozilla-based browsers.

X-Cache-Info values

  • X-Cache-Info: cached
  • X-Cache-Info: caching
  • X-Cache-Info: not cacheable; request had a content length
  • X-Cache-Info: not cacheable; request wasn't a GET or HEAD
  • X-Cache-Info: not cacheable; request specified "Cache-Control: no-store"
  • X-Cache-Info: not cacheable; request contained Authorization header
  • X-Cache-Info: not cacheable; response had too large vary data
  • X-Cache-Info: not cacheable; response file size too large
  • X-Cache-Info: not cacheable; response code not cacheable
  • X-Cache-Info: not cacheable; response contains "Vary: *"
  • X-Cache-Info: not cacheable; response specified "Cache-Control: no-store"
  • X-Cache-Info: not cacheable; response specified "Cache-Control: private"
  • X-Cache-Info: not cacheable; response specified "Cache-Control: no-cache"
  • X-Cache-Info: not cacheable; response specified max-age <= 0
  • X-Cache-Info: not cacheable; response specified "Cache-Control: no-cache=..."
  • X-Cache-Info: not cacheable; response has already expired
  • X-Cache-Info: not cacheable; response is 302 without expiry time

Overriding ZXTM's default cache behaviour

Several TrafficScript cache control functions are available to facilitate the control of ZXTM’s caching behaviour. In most cases, these functions eliminate the need to manipulate headers in the HTTP requests and responses.

  • http.cache.disable()

    Invoking http.cache.disable() in a response rule prevents ZXTM from caching the response.

  • http.cache.enable()

    Invoking http.cache.enable() in a response rule reverts the effect of a previous call to http.cache.disable(). It causes ZXTM’s default caching logic to take effect.
    Note that it possible to force ZXTM to cache a response that would normally be uncachable by rewriting the headers of that response using TrafficScript (response rewriting occurs before cachability testing) - see below for an example.

  • http.cache.setkey()

    The http.cache.setkey() function is used to differentiate between different versions of the same request, in much the same way that the Vary response header functions. It is used in request rules, but may also be used in response rules.

    It is more flexible than the RFC2616 vary support, because it lets you partition requests on any calculated value – for example, different content based on whether the source address is internal or external, or whether the client’s User-Agent header indicates an IE or Gecko-based browser.

http.cache.disable and http.cache.disable allow you to easily implement either 'default on', or 'default off' policies, where you either wish to cache everything cacheable unless you explicity disallow it, or you wish to only let ZXTM cache things you explictly allow. For example, you have identified a particular set of transactions out of a large working set that each 90% of your web server usage, and you wish to just cache those requests, and not lets less painful transactions knock these out of the cache. Alternatively, you may be trying to cache a web-based application which is not HTTP compliant in that it does not properly mark up pages which are not cacheable and caching them would break the application. In this scenario, you wish to only enable caching for particular code paths which you have tested to not break the application. An example TrafficScript rule implementing a 'default off' policy might be:

# Only cache what we explicitly allow
http.cache.disable();
if( string.regexmatch( http.geturl(), "^/sales/(order|view).asp" )) {
  # these are our most painful pages for the DB, and are cacheable
  http.cache.enable();
}

For example, suppose that your web service returns different versions of your home page, depending on whether the client is coming from an internal network (10.0.0.0) or an external network. If you were to put a content cache in front of your web service, you would need to arrange that your web server sent a Cache-Control: no-cache header with each response so that the page were not cached.

With ZXTM 4.0, you can override that behaviour. Use the following request rule to manipulate the request and set a 'cache key' so that ZXTM caches the two different versions of your page:

# We're only concerned about the home page...
if( http.getPath() != "/" ) break;

# Set the cache key depending on where the client is located
$client = request.getRemoteIP();
if( string.ipmaskmatch( $ip, "10.0.0.0/8" ) ) {
   http.cache.setkey( "internal" );
} else {
   http.cache.setkey( "external" );
}

# Remove the Cache-Control response header - it's no longer needed!
http.removeResponseHeader( "Cache-Control" );

Forcing pages to be cached

You may have an application, say a JSP page, that says it is not cacheable, but actually you know under certain circumstances that it is and you want to force ZXTM to cache this page because it is a heavy use of resource on the webserver.

You can force ZXTM to cache such pages by rewriting its response headers; any TrafficScript rewrites happen before the content caching logic is invoked, so you can perform extremely fine-grained caching control by manipulating the HTTP response headers of pages you wish to cache.

In this example, as have a JSP page that sets a 'Cache-Control: no-cache' header, which prevents ZXTM by caching the page. We can make this response cacheable by removing the Cache-Control header (and potentially the Expires header as well), for example:

if( http.getpath() == "/testpage.jsp" ) {
  # We know this request is cacheable, even though this JSP sets a Cache-Control header
  http.removeResponseHeader( "Cache-Control" );
}

Granular cache timeouts

For extra control, you may wish instead to use the http.setResponseHeader() function to set a Cache-Control with a max-age= paramter to specify exactly how long this particular piece of content should be cached or; or add a Vary header to specify which parts of the input request this response depends on (e.g. user language, or cookie). You can use these methods to set cache parameters on entire sets of URLs (e.g. all *.jsp) or individual requests for maximum flexibility.

Forcing ZXTM to revalidate its cache

The easiest method is to use 'Control-F5' in MSIE, or 'Shift-reload' in Firefox. These both send a request with a Cache-control: no-cache header, which informs ZXTM not to send the answer back from the cache, but instead to send the request onto the origin server. The cache will then be updated with the response from this request.

Owen Garrett [Zeus Dev Team] 20 October 2005  Permalink 8 comments  

Comments:

This public messageboard is not a forum for technical support. To report technical support problems, please contact our dedicated Support team using the instructions at the bottom of this page.

Comment from: Aaron [Visitor]
Does ZXTM's cache reside in memory or on disk?
Permalink 03 March 2006 @ 22:59
Comment from: Owen Garrett [Zeus Dev Team]
It's entirely in memory (and you can tune how much memory you want to use) If you have a large, well-used disk-backed cache, 99% of the hot content will sit in memory buffers anyway. Memory isn't expensive, and with ZXTM's 64bit support, you can have pretty much as much memory as you want. Initial testing with a disk-based cache, and experience managing the various caches on ZWS led us to this design.
Permalink 07 March 2006 @ 11:07
Comment from: Jason Mckelliget [Visitor] · http://www.ioko.com
How would the caching mechanism handle dynamic content? We have a CMS that allows in context editing of text, images etc. Once approved the content is published to the delivery site. We have over 40 sites with frequently changing content. How would the ZXTM caching mechanism handle this? would we just have to tune the TTL to something we can live with or can the device actively detect content changes. Thanks
Permalink 24 May 2006 @ 12:24
Comment from: Jacques Talbot [Visitor]
We need to cache 80 millions (indeed) small file (average 20KB), very static (updated monthly). Obviously in-memory cachin is not an option, so ZXTM does not do the job. Any idea if some other Zeus stuff can handle this TB of files?
Permalink 01 June 2006 @ 12:59
Comment from: Owen Garrett [Zeus Dev Team]
Hi Jason,

You're correct - you tune the cache TTL globally for a virtual server (defaults to 600 seconds). You can then set a shorter TTL using a cache-control max-age response header. Either your web application can set this header in the response, or you can programmatically set it in a TrafficScript rule. You can also programmatically control which content is cached and which is not.

ZXTM will not attempt to refresh the content from the web server until the TTL runs out. The intention of content caching is to reduce the load on the web server. If you content changes frequently, you can set a low TTL - several seconds would be sensible.

If you want, you can force ZXTM to clear out its content cache; for example, if you've made a large content change. Use the 'clear content cache' button in the UI; the next ZXTM release will include a SOAP method to do the same.
Permalink 05 June 2006 @ 12:11
Comment from: Owen Garrett [Zeus Dev Team]
Hi Jacques,

Content caching is valuable for 'hot' content that is requested frequently (i.e, several times per minute). If you've got lots of warm content, there is little benefit in caching it as it is unlikely that a cached version will be reused before it expires.

If you find that a small proportion of your 80 million files is used very frequently, then this 'hot' portion can be usefully cached. However, if your 80 million files are evenly used, then caching will not help you.

You need to take a look at your request distribution and determine how spikey it is. For example:

* What is the total size of the different files requested per minute? Your cache should be at twice this amount - is this possible?
* What proportion of these files are requested in the next minute? This will give you an estimate of your cache hit rate.
* If your cache is too small, try looking at 30-second intervals or less.

If your hit rate is 85%, you'll reduce traffic to 15% of the original, which would be great.

It your hit rate is 20%, you'll only save 20% of your server-side traffic, which probably is not worth it.

I'd recommend taking a look at Zeus Web Server for your requirements, with the content stored as files on local disk if possible, to avoid the round trip get.

One of our customers has solved a similar problem. They were hosting very large numbers of image files, and the full image set was too big to store locally on one web server. They split the images into several smaller sets (for example, by filename or directoryname) so that they could easily determine which set an image was in, and set up pairs of webservers that hosted each image set on local disk. A trafficscript rule on ZXTM was used to load-balance each request across the correct pair of webservers.

Get in touch if you'd like any more information - our consultancy team would be happy to discuss possible solutions.
Permalink 05 June 2006 @ 13:20
Comment from: Simon [Visitor]
Is there a way to set Cache-control: no-cache after an item has been cached in ZXTM? We would like to cache pages in ZXTM but not allow other up stream proxies to cache our content.
Permalink 11 October 2006 @ 00:56
Comment from: michael [Zeus Dev Team]
Hi Simon,

this functionality has been added in ZXTM 4.2. There is a virtual server-specific configuration option, webcache!control_out, whose value will be added to every cached HTTP response for the given virtual server as the Cache-Control header.

This can be configured in Virtual Servers > NAME > Content Caching .

Note, however, that there is no guarantee that up-stream proxies respect this setting (well-behaved ones do, of course).
Permalink 22 March 2008 @ 18:52
Leave a comment ...
Your email address will not be displayed.
Your URL will be displayed.
This public messageboard is not a forum for technical support. To report technical support problems, please contact our dedicated Support team using the instructions at the bottom of this page.
Options:
 
(Line breaks become <br />)
(Set cookies for name, email & url)
Download Free ZXTM Desktop Edition

Recent Articles

Other Resources



www.zeus.com