Blogging Roller

Dave Johnson on open web technologies, social software and software development


Couple of notes from the Triangle blogger con

I attended the Triangle Bloggers Conference 2005 on Saturday morning in Chapel Hill. The meeting was held in a classroom large enough to accommodate the approximately 150 people in attendance, power in every seat, and wireless internet. The agenda was divided into three portions, but the conference was really one long, seamless, and very interesting conversation between audience members and the speakers. The theme was using blogs to build community, how to build a larger readership for your blog, how to use blogs in grassroots journalism. Here are a couple of the things I wrote down (these are not 100% accurate quotes):

  • David Hoggard: You've heard of 'dancing like nobody is watching' -- you've got blog like nobody is reading if you want to get your authentic voice out there.
  • Dave Winer: Why do you care about being popular? Bloggers don't need readers. Bloggers are documenting the human knowledge base and making expertise available that was previously only available to the press and big institutions.
  • Ruby Sinreich: blogs will never replace the mainstream media, their role is to watchdog the media and that is a good place to be. Few people will get their news from blogs, but those that do are journalists, politicians, activists -- people who can make a difference.
  • Matt Gross: On the Blog for America site, approximately 5% of visitors would click on the comments link and 1% would leave a comment.

I also got a chance to talk to folks about corporate blogs at SAS and IBM (both have some internal Roller sites) and student blogs at UNC. I also spent some time talking to Roch Smith, the man behind the Greensboro 101 community aggregator. All and all it was a great experience. I learned a lot about blogging and I feel a little more connected to my hometown and the Triangle in general. Thanks to Anton Zuiker, Paul Jones, and everybody else who helped put it together. More information, check here and here.

Tags: Blogging

Rome + Texen = Planet Roller

After a couple days of hacking with the Rome Fetcher and Velocity Texen, Planet Roller is born.

Planet Roller is currently a command-line line tool that reads a configuration file of newsfeed subscription data, then generates an aggegated weblog with an RSS feed, and an OPML listing of all subscriptions. It's essentially a Java version of Planet Planet. I've got it set up to run every 30 mintues. Yes, I'm aware that the RSS gets a warning on validation. No, I haven't added newsfeed autodiscovery yet. Yes, I stole David Edmondson's Planet Sun theme.  No, I haven't done any testing on the OPML. Enough questions already! I need to get back to work.

I'll be adding a couple more details to this post as the night progresses.

OK, I'm back. Did I mention that Planet Roller is a community aggregator, a "A Community Aggregator is a portal-like web application that displays weblog posts from a group of closely related but separately hosted weblogs and provides synthetic newsfeeds so that readers may subscribe to the group as a whole."

Configuring Planet Roller

Currently, Planet Roller is just a simple command-line tool that is designed to run as a scheduled task. It reads a list of newsfeed subscriptions from an XML file, as shown below. Eventually, there will also be a UI for Planet Roller so that you don't have to shell into to a server and edit an XML file to add and delete subscriptions.

<planet-config>
   <main-page>control.vm</main-page>
   <admin-name>Dave Johnson</admin-name>
   <admin-email>dave.johnson@rollerweblogger.org</admin-email>
   <site-url>http://rollerweblogger.org/planet</site-url>
   <output-dir>/nfs/ank/home1/r/roller/public_html/planet</output-dir>
   <template-dir>/nfs/ank/home1/r/roller/planet-roller/templates</template-dir>
   <cache-dir>/nfs/ank/home1/r/roller/planet-roller/cache</cache-dir>
   <subscription id="dave">
      
      <feed-url>http://rollerweblogger.org/rss/roller</feed-url>
      <site-url>http://rollerweblogger.org/page/roller</site-url>
   </subscription>
   <subscription id="lance">
      
      <feed-url>http://www.brainopolis.com/roller/rss/lance</feed-url>
      <site-url>http://www.brainopolis.com/roller/page/lance</site-url>
   </subscription>
   <subscription id="matt">
      
      <feed-url>http://raibledesigns.com/rss/rd</feed-url>
      <site-url>http://raibledesigns.com/page/rd</site-url>
   </subscription>
   <subscription id="anil">
      
      <feed-url>http://www.busybuddha.org/blog/rss/anil</feed-url>
      <site-url>http://www.busybuddha.org/blog/page/anil</site-url>
   </subscription>
   <subscription id="henri">
      
      <feed-url>http://blog.generationjava.com/roller/rss/bayard</feed-url>
      <site-url>http://blog.generationjava.com/roller/page/bayard</site-url>
   </subscription>
   <subscription id="pat">
      
      <feed-url>http://blogs.sun.com/roller/rss/pat</feed-url>
      <site-url>http://blogs.sun.com/roller/page/pat</site-url>
   </subscription>
   <group handle="roller">
      
      <description>Other folks who are blogging Roller</description>
      <max-page-entries>30</max-page-entries>
      <max-feed-entries>30</max-feed-entries>
      <subscription-ref refid="dave">
      <subscription-ref refid="lance">
      <subscription-ref refid="pat">
      <subscription-ref refid="matt">
      <subscription-ref refid="anil">
      <subscription-ref refid="henri">
   </subscription-ref>
   <group handle="trijug">
      
      <description>Triangle Java User Group Bloggers</description>
      <max-page-entries>40</max-page-entries>
      <max-feed-entries>40</max-feed-entries>
      <subscription-ref refid="dave">
   </subscription-ref>
</group>

The configuration file contains three types of information: 1) configuration information for the planet site itself, 2) newsfeed subscriptions, and 3) groups. Groups allow a single Planet Roller site to host differernt aggregations. In the above configuration file, I've defined two groups "Planet Roller" and "Planet TriJUG". Note that one subscription can appear in more than one group.

Customizing Planet Roller File Generation

The command-line version of Planet Roller uses the Texen feature of Velocity to generate whatever files you want in your Planet Roller site. I included templates for HTML, RSS, and OPML, but you can tweak these and/or add whatever you want.

You tell Planet Roller which templates to use by specifying a Texen control template in the element of the config file. Specify the templates directory in the element. The control template does not generate anything itself. It controls the file generation process and it determines which files are generated and which template is used for each. Here is Planet Roller's current control template:

#set ($groupHandles = $planet.groupHandles)
#foreach ($groupHandle in $groupHandles)
    #set ($outputFile = $strings.concat([$groupHandle, &quot;.html&quot;]))
    $generator.parse(&quot;html.vm&quot;, $outputFile, &quot;groupHandle&quot;, $groupHandle)
    #set ($outputFile = $strings.concat([$groupHandle, &quot;.rss&quot;]))
    $generator.parse(&quot;rss.vm&quot;,  $outputFile, &quot;groupHandle&quot;, $groupHandle)
    #set ($outputFile = $strings.concat([$groupHandle, &quot;.opml&quot;]))
    $generator.parse(&quot;opml.vm&quot;, $outputFile, &quot;groupHandle&quot;, $groupHandle)
#end

The control template loops through the groups defined in the config file and for each, generates an HTML file using the html.vm template, an RSS file using the rss.vm template, and an OPML file using the opml.vm template. You can provide your own control template, or just hack the one that comes with Planet Roller.

Based on the above configuration data and control template, when Planet Roller runs, you'll end up with six files:

Let's look at the RSS template, so you can get a feel for how the templates work.


<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
<channel>
  #set($group = $planet.getGroup($groupHandle))
  
  $planet.configuration.url/${group.handle}.html
  <description>$utilities.textToHTML($group.description,true)</description>
  <lastbuilddate>$utilities.formatRfc822Date($date)</lastbuilddate>
  <generator>Roller Planet 1.1-dev</generator>
  #set($entries = $planet.getAggregation($group, 30))
  #foreach( $entry in $entries )
  <item>
    
    <description>$utilities.textToHTML($entry.content,true)</description>
    <category>$utilities.textToHTML($entry.category,true)</category>
    $entry.permalink
    <pubdate>$utilities.formatRfc822Date($entry.published)</pubdate>
    #if($entry.author)<dc:creator>$utilities.textToHTML($entry.author,true)</dc:creator>#end
  </item>
  #end
</channel>
</rss>

And here is the OPML template:


#set($group = $planet.getGroup($groupHandle))
<opml version="1.1">

   
   <datecreated>$utilities.formatRfc822Date($date)</datecreated>
   <datemodified>$utilities.formatRfc822Date($date)</datemodified>
   <ownername>$planet.config.adminName</ownername>
   <owneremail>$planet.config.adminEmail</owneremail>

#foreach($sub in $group.subscriptions)
   <outline htmlurl="$utilities.textToHTML($sub.siteUrl)" xmlurl="$utilities.textToHTML($sub.feedUrl)" text="$utilities.textToHTML($sub.title)">
#end
</outline>

Within a template, you have access to the configuration through the $planet object, plus there are a couple of other objects that you'll find helpful in generating files. Here are the objects that are available in a template:

  • groupHandle: a string that contains the "handle" of the current group, which you can use to get the group object from the planet.
  • planet: the planet object allows you to access groups via $planet.getGroup($groupHandle) and aggregations for groups via $planet.getAggregation($group, N) where N is the max number of entries to be returned.
  • planet.configuration: this object contains configuration information, such as
  • date: current date
  • utilities: text-to-HTML, data formatting, and other utilities

Running Planet Roller

You can run Planet Roller from a simple script, like the one below:

#!/bin/bash
_CP=.:./lib/planet-roller-1.1-dev.jar
_CP=${_CP}:./lib/rollerbeans.jar
_CP=${_CP}:./lib/commons-logging.jar
_CP=${_CP}:./lib/jaxen-full.jar
_CP=${_CP}:./lib/jdom.jar
_CP=${_CP}:./lib/dom4j-1.4.jar
_CP=${_CP}:./lib/rome-0.5.jar
_CP=${_CP}:./lib/rome-fetcher-0.5.jar
_CP=${_CP}:./lib/velocity-1.4.jar
_CP=${_CP}:./lib/velocity-dep-1.4.jar
java -classpath ${_CP} org.roller.tools.planet.PlanetTool $1 

If you want Planet Roller to run on a schedule, schedule it. For example, on UNIX you can use cron. I use the following cron task to run Planet Roller on the 6th and 36th minute of every hour:

   6,36 * * * * (cd ~roller/planet-roller; ./planet-roller.sh)

Planet Roller uses the Rome Fetcher library to retrieve, parse, and cache newsfeed data to disk. Fetcher uses HTTP Conditional Get and Etags to ensure that feeds are only downloaded when truly updated.

That's enough for now. Tomorrow, I'll tell you about Planet Roller internals. </template-dir></main-page></subscription-ref></subscription-ref></subscription-ref></subscription-ref></subscription-ref></planet-config>

Tags: blogging

Main | Next day (Feb 14, 2005) »