Dave Johnson on open web technologies, social software and Java
Displaying a blog page can take dozens of database queries and database queries can be expensive. They take time, consume CPU cycles and typically use network bandwidth. Roller's built-in caching system addresses this problem by caching generated pages and feeds. By default, Roller caches pages and feeds in memory using a Least Recently Used (LRU) algorithm and by default caches are configured appropriately for a 100 blog system. If you are running a site with more blogs or a very high-traffic site, you should consider changing the caching configuration. First, let's discuss how the caches work.
Cache invalidation and expiration
When Roller generates a page, it puts a copy of that page in a cache. The next time that a request comes in for that page, Roller returns the page from the cache. When a blog changes, Roller invalidates the blog's cache enties, i.e. it throws that blog's pages out of the cache. And by default, when the cache is full and we need to add a new entry to the cache, we push out the least recently used entry in the cache to make room; that's the LRU algorithm I mentioned before.
Sometimes, a blog page includes things that change frequently like a list of referrers or a server-side hit counter or data from some other source. We don't want to invalidate a blog's cache entries every time a hit is counted. That would defeat the purpose of the cache. So, by default Roller uses an expiring cache that automatically invalidates cache entries after timeout period.
To configure the Roller caches, you add properties to your roller-custom.properties properties override file. You can learn more about this override in Section 6 of the Roller 4.0 Installation Guide and you can find a complete list of the properties you can override in Section 11.
First, let's cover the default caching mechanism. If you're running a large and high-traffic site, you might want to consider using the non-expiring cache or setting the cache timeout very high (4, 6 or 12 hours). Here's how you tell all caches to use the non-expiring cache:
cache.defaultFactory=org.apache.roller.util.cache.LRUCacheFactoryImplHowever, if you do that, then blogs that use Roller's built-in hit counter or that display referrers will not be updated as often as your users would like. So, you might want to consider removing the #showReferrersList() macro from any themes in use on your site.
Configuring Roller's four page and feed caches
You can configure caching differently for the different types of pages and feeds produced by Roller. There are four separately configurable caches. Here are their names and an explanation of each:
And for each one of these caches you can configure these properties:
Cache property names follow the pattern cache.
# Weblog page cache (all the weblog content) cache.weblogpage.enabled=true cache.weblogpage.size=400 cache.weblogpage.timeout=3600 # Feed cache (xml feeds like rss, atom, etc) cache.weblogfeed.enabled=true cache.weblogfeed.size=200 cache.weblogfeed.timeout=3600 # Site-wide cache (all content for site-wide frontpage weblog) cache.sitewide.enabled=true cache.sitewide.size=50 cache.sitewide.timeout=1800 # Planet cache (planet feeds) cache.planet.enabled=true cache.planet.size=10 cache.planet.timeout=1800
The default cache configurations above are setup for a 100 weblog system. To some extent, this is guess-work. For example, we've decided to cache 4 pages and 2 feeds for each blog. That's how we arrived a cache.weblogpage.size=400 and cache.weblogfeed.size=200. And we've decided to cache blog entries for 30 minutes and feeds for one hour. That's how we arrived at cache.weblogpage.size=400 and cache.weblogfeed.timeout=3600.
You might decide to do things a little differently on your Roller system. Copy the properties above to your roller-custom.properties file and set them to values you thing are appropriate for number of weblogs, average page size, traffic levels and JVM heap size of your Roller installation.
Roller default cache configuration will work well without modification for a small to medium size Roller installation, but for large high-traffic sites you should increase cache sizes and think carefully about timeouts. And if you're running Roller in a cluster you might want to consider using a distributed caching system like memcached. I'll discuss that in my next HOWTO.