Dave Johnson on open web technologies, social software and software development
« Latest Links: March... | Main | Welcome to Sun! »
Since the early days, Roller has included a pluggable caching system for blog pages and feeds. In Roller 2.1 (early 2006), Sun's Allen Gilliland rewrote the whole cache system and made it much more flexible and much easier to configure. But, apart from comments in the configuration file, we never provided any documentation for the cache system. In this post, I'll start to correct that. I'll explain the basics of how the cache works and how to configure it.
Overview
Displaying a blog page can take dozens of database queries and database queries can be expensive. They take time, consume CPU cycles and typically use network bandwidth. Roller's built-in caching system addresses this problem by caching generated pages and feeds. By default, Roller caches pages and feeds in memory using a Least Recently Used (LRU) algorithm and by default caches are configured appropriately for a 100 blog system. If you are running a site with more blogs or a very high-traffic site, you should consider changing the caching configuration. First, let's discuss how the caches work.
Cache invalidation and expiration
When Roller generates a page, it puts a copy of that page in a cache. The next time that a request comes in for that page, Roller returns the page from the cache. When a blog changes, Roller invalidates the blog's cache enties, i.e. it throws that blog's pages out of the cache. And by default, when the cache is full and we need to add a new entry to the cache, we push out the least recently used entry in the cache to make room; that's the LRU algorithm I mentioned before.
Sometimes, a blog page includes things that change frequently like a list of referrers or a server-side hit counter or data from some other source. We don't want to invalidate a blog's cache entries every time a hit is counted. That would defeat the purpose of the cache. So, by default Roller uses an expiring cache that automatically invalidates cache entries after timeout period.
Cache configuration
To configure the Roller caches, you add properties to your roller-custom.properties properties override file. You can learn more about this override in Section 6 of the Roller 4.0 Installation Guide and you can find a complete list of the properties you can override in Section 11.
First, let's cover the default caching mechanism. If you're running a large and high-traffic site, you might want to consider using the non-expiring cache or setting the cache timeout very high (4, 6 or 12 hours). Here's how you tell all caches to use the non-expiring cache:
cache.defaultFactory=org.apache.roller.util.cache.LRUCacheFactoryImplHowever, if you do that, then blogs that use Roller's built-in hit counter or that display referrers will not be updated as often as your users would like. So, you might want to consider removing the #showReferrersList() macro from any themes in use on your site.
Configuring Roller's four page and feed caches
You can configure caching differently for the different types of pages and feeds produced by Roller. There are four separately configurable caches. Here are their names and an explanation of each:
And for each one of these caches you can configure these properties:
Cache property names follow the pattern cache... The best way to understand how this works is to look at the default cache configuration used by Roller:
# Weblog page cache (all the weblog content) cache.weblogpage.enabled=true cache.weblogpage.size=400 cache.weblogpage.timeout=3600 # Feed cache (xml feeds like rss, atom, etc) cache.weblogfeed.enabled=true cache.weblogfeed.size=200 cache.weblogfeed.timeout=3600 # Site-wide cache (all content for site-wide frontpage weblog) cache.sitewide.enabled=true cache.sitewide.size=50 cache.sitewide.timeout=1800 # Planet cache (planet feeds) cache.planet.enabled=true cache.planet.size=10 cache.planet.timeout=1800
The default cache configurations above are setup for a 100 weblog system. To some extent, this is guess-work. For example, we've decided to cache 4 pages and 2 feeds for each blog. That's how we arrived a cache.weblogpage.size=400 and cache.weblogfeed.size=200. And we've decided to cache blog entries for 30 minutes and feeds for one hour. That's how we arrived at cache.weblogpage.size=400 and cache.weblogfeed.timeout=3600.
You might decide to do things a little differently on your Roller system. Copy the properties above to your roller-custom.properties file and set them to values you thing are appropriate for number of weblogs, average page size, traffic levels and JVM heap size of your Roller installation.
Conclusion
Roller default cache configuration will work well without modification for a small to medium size Roller installation, but for large high-traffic sites you should increase cache sizes and think carefully about timeouts. And if you're running Roller in a cluster you might want to consider using a distributed caching system like memcached. I'll discuss that in my next HOWTO.
Dave Johnson in General
07:15AM Mar 03, 2008
Comments [6]
Tags:
apacheroller
blogging
caching
java
memcached
This is just one entry in the weblog Blogging Roller. You may want to visit the main page of the weblog
Below are the most recent entries in the category General, some may be related to this entry.
Posted by German Eichberger on March 04, 2008 at 04:58 PM EST #
Posted by Anil Samuel on March 25, 2008 at 12:00 AM EDT #
Posted by Anil Samuel on March 25, 2008 at 03:46 AM EDT #
Anil, the cache info is not intended to be used as a hit counter, but if you wanted to display cache-info on a blog page and you can code in Java then you could write your own plugin model.
I'd like to see better blog statistics in Roller, but that has not been a priority for us because the big Roller sites are on the open internet where we can use services like Google Analytics.
- DavePosted by Dave Johnson on March 25, 2008 at 11:27 AM EDT #
Posted by varun krishan on November 20, 2008 at 01:51 PM EST #
Posted by Dave Johnson on November 23, 2008 at 12:22 PM EST #