« First blogs.sun.com... | Main | Couple of notes from... »

Rome + Texen = Planet Roller

After a couple days of hacking with the Rome Fetcher and Velocity Texen, Planet Roller is born.

Planet Roller is currently a command-line line tool that reads a configuration file of newsfeed subscription data, then generates an aggegated weblog with an RSS feed, and an OPML listing of all subscriptions. It's essentially a Java version of Planet Planet. I've got it set up to run every 30 mintues. Yes, I'm aware that the RSS gets a warning on validation. No, I haven't added newsfeed autodiscovery yet. Yes, I stole David Edmondson's Planet Sun theme.  No, I haven't done any testing on the OPML. Enough questions already! I need to get back to work.

I'll be adding a couple more details to this post as the night progresses.

OK, I'm back. Did I mention that Planet Roller is a community aggregator, a "A Community Aggregator is a portal-like web application that displays weblog posts from a group of closely related but separately hosted weblogs and provides synthetic newsfeeds so that readers may subscribe to the group as a whole."

Configuring Planet Roller

Currently, Planet Roller is just a simple command-line tool that is designed to run as a scheduled task. It reads a list of newsfeed subscriptions from an XML file, as shown below. Eventually, there will also be a UI for Planet Roller so that you don't have to shell into to a server and edit an XML file to add and delete subscriptions.

<planet-config>
   <main-page>control.vm</main-page>
   <admin-name>Dave Johnson</admin-name>
   <admin-email>dave.johnson@rollerweblogger.org</admin-email>
   <site-url>http://rollerweblogger.org/planet</site-url>
   <output-dir>/nfs/ank/home1/r/roller/public_html/planet</output-dir>
   <template-dir>/nfs/ank/home1/r/roller/planet-roller/templates</template-dir>
   <cache-dir>/nfs/ank/home1/r/roller/planet-roller/cache</cache-dir>
   <subscription id="dave">
      
      <feed-url>http://rollerweblogger.org/rss/roller</feed-url>
      <site-url>http://rollerweblogger.org/page/roller</site-url>
   </subscription>
   <subscription id="lance">
      
      <feed-url>http://www.brainopolis.com/roller/rss/lance</feed-url>
      <site-url>http://www.brainopolis.com/roller/page/lance</site-url>
   </subscription>
   <subscription id="matt">
      
      <feed-url>http://raibledesigns.com/rss/rd</feed-url>
      <site-url>http://raibledesigns.com/page/rd</site-url>
   </subscription>
   <subscription id="anil">
      
      <feed-url>http://www.busybuddha.org/blog/rss/anil</feed-url>
      <site-url>http://www.busybuddha.org/blog/page/anil</site-url>
   </subscription>
   <subscription id="henri">
      
      <feed-url>http://blog.generationjava.com/roller/rss/bayard</feed-url>
      <site-url>http://blog.generationjava.com/roller/page/bayard</site-url>
   </subscription>
   <subscription id="pat">
      
      <feed-url>http://blogs.sun.com/roller/rss/pat</feed-url>
      <site-url>http://blogs.sun.com/roller/page/pat</site-url>
   </subscription>
   <group handle="roller">
      
      <description>Other folks who are blogging Roller</description>
      <max-page-entries>30</max-page-entries>
      <max-feed-entries>30</max-feed-entries>
      <subscription-ref refid="dave">
      <subscription-ref refid="lance">
      <subscription-ref refid="pat">
      <subscription-ref refid="matt">
      <subscription-ref refid="anil">
      <subscription-ref refid="henri">
   </subscription-ref>
   <group handle="trijug">
      
      <description>Triangle Java User Group Bloggers</description>
      <max-page-entries>40</max-page-entries>
      <max-feed-entries>40</max-feed-entries>
      <subscription-ref refid="dave">
   </subscription-ref>
</group>

The configuration file contains three types of information: 1) configuration information for the planet site itself, 2) newsfeed subscriptions, and 3) groups. Groups allow a single Planet Roller site to host differernt aggregations. In the above configuration file, I've defined two groups "Planet Roller" and "Planet TriJUG". Note that one subscription can appear in more than one group.

Customizing Planet Roller File Generation

The command-line version of Planet Roller uses the Texen feature of Velocity to generate whatever files you want in your Planet Roller site. I included templates for HTML, RSS, and OPML, but you can tweak these and/or add whatever you want.

You tell Planet Roller which templates to use by specifying a Texen control template in the element of the config file. Specify the templates directory in the element. The control template does not generate anything itself. It controls the file generation process and it determines which files are generated and which template is used for each. Here is Planet Roller's current control template:

#set ($groupHandles = $planet.groupHandles)
#foreach ($groupHandle in $groupHandles)
    #set ($outputFile = $strings.concat([$groupHandle, &quot;.html&quot;]))
    $generator.parse(&quot;html.vm&quot;, $outputFile, &quot;groupHandle&quot;, $groupHandle)
    #set ($outputFile = $strings.concat([$groupHandle, &quot;.rss&quot;]))
    $generator.parse(&quot;rss.vm&quot;,  $outputFile, &quot;groupHandle&quot;, $groupHandle)
    #set ($outputFile = $strings.concat([$groupHandle, &quot;.opml&quot;]))
    $generator.parse(&quot;opml.vm&quot;, $outputFile, &quot;groupHandle&quot;, $groupHandle)
#end

The control template loops through the groups defined in the config file and for each, generates an HTML file using the html.vm template, an RSS file using the rss.vm template, and an OPML file using the opml.vm template. You can provide your own control template, or just hack the one that comes with Planet Roller.

Based on the above configuration data and control template, when Planet Roller runs, you'll end up with six files:

Let's look at the RSS template, so you can get a feel for how the templates work.


<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
<channel>
  #set($group = $planet.getGroup($groupHandle))
  
  $planet.configuration.url/${group.handle}.html
  <description>$utilities.textToHTML($group.description,true)</description>
  <lastbuilddate>$utilities.formatRfc822Date($date)</lastbuilddate>
  <generator>Roller Planet 1.1-dev</generator>
  #set($entries = $planet.getAggregation($group, 30))
  #foreach( $entry in $entries )
  <item>
    
    <description>$utilities.textToHTML($entry.content,true)</description>
    <category>$utilities.textToHTML($entry.category,true)</category>
    $entry.permalink
    <pubdate>$utilities.formatRfc822Date($entry.published)</pubdate>
    #if($entry.author)<dc:creator>$utilities.textToHTML($entry.author,true)</dc:creator>#end
  </item>
  #end
</channel>
</rss>

And here is the OPML template:


#set($group = $planet.getGroup($groupHandle))
<opml version="1.1">

   
   <datecreated>$utilities.formatRfc822Date($date)</datecreated>
   <datemodified>$utilities.formatRfc822Date($date)</datemodified>
   <ownername>$planet.config.adminName</ownername>
   <owneremail>$planet.config.adminEmail</owneremail>

#foreach($sub in $group.subscriptions)
   <outline htmlurl="$utilities.textToHTML($sub.siteUrl)" xmlurl="$utilities.textToHTML($sub.feedUrl)" text="$utilities.textToHTML($sub.title)">
#end
</outline>

Within a template, you have access to the configuration through the $planet object, plus there are a couple of other objects that you'll find helpful in generating files. Here are the objects that are available in a template:

  • groupHandle: a string that contains the "handle" of the current group, which you can use to get the group object from the planet.
  • planet: the planet object allows you to access groups via $planet.getGroup($groupHandle) and aggregations for groups via $planet.getAggregation($group, N) where N is the max number of entries to be returned.
  • planet.configuration: this object contains configuration information, such as
  • date: current date
  • utilities: text-to-HTML, data formatting, and other utilities

Running Planet Roller

You can run Planet Roller from a simple script, like the one below:

#!/bin/bash
_CP=.:./lib/planet-roller-1.1-dev.jar
_CP=${_CP}:./lib/rollerbeans.jar
_CP=${_CP}:./lib/commons-logging.jar
_CP=${_CP}:./lib/jaxen-full.jar
_CP=${_CP}:./lib/jdom.jar
_CP=${_CP}:./lib/dom4j-1.4.jar
_CP=${_CP}:./lib/rome-0.5.jar
_CP=${_CP}:./lib/rome-fetcher-0.5.jar
_CP=${_CP}:./lib/velocity-1.4.jar
_CP=${_CP}:./lib/velocity-dep-1.4.jar
java -classpath ${_CP} org.roller.tools.planet.PlanetTool $1 

If you want Planet Roller to run on a schedule, schedule it. For example, on UNIX you can use cron. I use the following cron task to run Planet Roller on the 6th and 36th minute of every hour:

   6,36 * * * * (cd ~roller/planet-roller; ./planet-roller.sh)

Planet Roller uses the Rome Fetcher library to retrieve, parse, and cache newsfeed data to disk. Fetcher uses HTTP Conditional Get and Etags to ensure that feeds are only downloaded when truly updated.

That's enough for now. Tomorrow, I'll tell you about Planet Roller internals. </template-dir></main-page></subscription-ref></subscription-ref></subscription-ref></subscription-ref></subscription-ref></planet-config>

Comments:

This is great, it's exactly what I've been looking for.

I'm sorry if I missed this info, but is the source for Planet Roller available? is it packaged with the main Roller distribution ? and can it be run without roller ?

Posted by John Sawers on March 05, 2005 at 04:31 AM EST #

That's a really good idea. Planet Roller "tool" runs without Roller and is now available for download. Enjoy!

Posted by Dave Johnson on March 05, 2005 at 01:47 PM EST #

Post a Comment:
  • HTML Syntax: NOT allowed

« First blogs.sun.com... | Main | Couple of notes from... »

Welcome

This is just one entry in the weblog Blogging Roller. You may want to visit the main page of the weblog

Related entries

Below are the most recent entries in the category Blogging, some may be related to this entry.