« Latest Links: March... | Main | Welcome to Sun! »

HOWTO: Configure caching in Apache Roller

Since the early days, Roller has included a pluggable caching system for blog pages and feeds. In Roller 2.1 (early 2006), Sun's Allen Gilliland rewrote the whole cache system and made it much more flexible and much easier to configure. But, apart from comments in the configuration file, we never provided any documentation for the cache system. In this post, I'll start to correct that. I'll explain the basics of how the cache works and how to configure it.

Overview

Displaying a blog page can take dozens of database queries and database queries can be expensive. They take time, consume CPU cycles and typically use network bandwidth. Roller's built-in caching system addresses this problem by caching generated pages and feeds. By default, Roller caches pages and feeds in memory using a Least Recently Used (LRU) algorithm and by default caches are configured appropriately for a 100 blog system. If you are running a site with more blogs or a very high-traffic site, you should consider changing the caching configuration. First, let's discuss how the caches work.

Cache invalidation and expiration

When Roller generates a page, it puts a copy of that page in a cache. The next time that a request comes in for that page, Roller returns the page from the cache. When a blog changes, Roller invalidates the blog's cache enties, i.e. it throws that blog's pages out of the cache. And by default, when the cache is full and we need to add a new entry to the cache, we push out the least recently used entry in the cache to make room; that's the LRU algorithm I mentioned before.

Sometimes, a blog page includes things that change frequently like a list of referrers or a server-side hit counter or data from some other source. We don't want to invalidate a blog's cache entries every time a hit is counted. That would defeat the purpose of the cache. So, by default Roller uses an expiring cache that automatically invalidates cache entries after timeout period.

Cache configuration

To configure the Roller caches, you add properties to your roller-custom.properties properties override file. You can learn more about this override in Section 6 of the Roller 4.0 Installation Guide and you can find a complete list of the properties you can override in Section 11.

First, let's cover the default caching mechanism. If you're running a large and high-traffic site, you might want to consider using the non-expiring cache or setting the cache timeout very high (4, 6 or 12 hours). Here's how you tell all caches to use the non-expiring cache:


   cache.defaultFactory=org.apache.roller.util.cache.LRUCacheFactoryImpl

However, if you do that, then blogs that use Roller's built-in hit counter or that display referrers will not be updated as often as your users would like. So, you might want to consider removing the #showReferrersList() macro from any themes in use on your site.

Configuring Roller's four page and feed caches

You can configure caching differently for the different types of pages and feeds produced by Roller. There are four separately configurable caches. Here are their names and an explanation of each:

  1. weblogpages: this cache is used to cache weblog pages
  2. weblogfeeds: this one is for weblog RSS and Atom feeds
  3. sitewide: this is for the aggregated front-page blog and it's RSS/Atom feeds
  4. planet: this is for feeds produced by Roller's built-in aggregator

And for each one of these caches you can configure these properties:

  • enabled: for debugging purposes you can completely disable a cache by setting it's enabled property to 'false'
  • size: sets the total number of entries allowed in a cache, each entry holds one page or feed response.
  • timeout: the number of seconds that an entry is allowed to remain in the cache. After this time expires the entry will be removed from the cache.
  • factory: set the classname of the cache factory to be used for this cache, otherwise default cache will be used

Cache property names follow the pattern cache... The best way to understand how this works is to look at the default cache configuration used by Roller:


# Weblog page cache (all the weblog content)
cache.weblogpage.enabled=true
cache.weblogpage.size=400
cache.weblogpage.timeout=3600

# Feed cache (xml feeds like rss, atom, etc)
cache.weblogfeed.enabled=true
cache.weblogfeed.size=200
cache.weblogfeed.timeout=3600

# Site-wide cache (all content for site-wide frontpage weblog)
cache.sitewide.enabled=true
cache.sitewide.size=50
cache.sitewide.timeout=1800

# Planet cache (planet feeds)
cache.planet.enabled=true
cache.planet.size=10
cache.planet.timeout=1800

The default cache configurations above are setup for a 100 weblog system. To some extent, this is guess-work. For example, we've decided to cache 4 pages and 2 feeds for each blog. That's how we arrived a cache.weblogpage.size=400 and cache.weblogfeed.size=200. And we've decided to cache blog entries for 30 minutes and feeds for one hour. That's how we arrived at cache.weblogpage.size=400 and cache.weblogfeed.timeout=3600.

You might decide to do things a little differently on your Roller system. Copy the properties above to your roller-custom.properties file and set them to values you thing are appropriate for number of weblogs, average page size, traffic levels and JVM heap size of your Roller installation.

Conclusion

Roller default cache configuration will work well without modification for a small to medium size Roller installation, but for large high-traffic sites you should increase cache sizes and think carefully about timeouts. And if you're running Roller in a cluster you might want to consider using a distributed caching system like memcached. I'll discuss that in my next HOWTO.

Comments:

Can't wait for the memcached one -- Good post!!

Posted by German Eichberger on March 04, 2008 at 04:58 PM EST #

How can I add a hit counter (value from cacheInfo) to main page ?

Posted by Anil Samuel on March 25, 2008 at 12:00 AM EDT #

Also can the blog have view counts against each posting ?

Posted by Anil Samuel on March 25, 2008 at 03:46 AM EDT #

Anil, the cache info is not intended to be used as a hit counter, but if you wanted to display cache-info on a blog page and you can code in Java then you could write your own plugin model.

I'd like to see better blog statistics in Roller, but that has not been a priority for us because the big Roller sites are on the open internet where we can use services like Google Analytics.

- Dave

Posted by Dave Johnson on March 25, 2008 at 11:27 AM EDT #

How we can use roller with distributed cache like Tangasol? It support mem cache but does it provide any plugin for Oracle Tangosol?

Posted by varun krishan on November 20, 2008 at 01:51 PM EST #

As I explain above, Roller's caching mechanism is pluggable. By implementing two Java interfaces, you can add support for any caching system. Please direct further questions to the Roller dev mailing list.

Posted by Dave Johnson on November 23, 2008 at 12:22 PM EST #

Post a Comment:
Comments are closed for this entry.

« Latest Links: March... | Main | Welcome to Sun! »

Welcome

This is just one entry in the weblog Blogging Roller. You may want to visit the main page of the weblog

Related entries

Below are the most recent entries in the category General, some may be related to this entry.