Going to California

I'll be based in North Carolina, as I am now, but I'm going to be in the SF bay area for my first week at Sun. I'll be arriving on Sunday around 2PM, so I'll start off with some free time and I might have a night free later in the week. Any Rollers or bloggers want to get together for a geek dinner one night? I will be in Palo Alto, but I'll have a car and I know my way around, so I could make it as far as the city. If interested, drop me an email at dave.johnson@rollerweblogger.org.


Bryan Bell themes

I've ported a couple of Bryan Bell's free themes to Roller. I'm still tweaking them, but take a look: I'm hosting them temporarily at Brushed Metal and Movable Manilla. On his site, Bell says "I pretty much feel that the 'free' themes I created are public domain" and "I'd like to encourage anyone to port my themes to whatever platform they use. I do however ask for clear attribution to remain on the theme, and that you in-turn provide your derivative work to other users of your favored blogging tool." I think that means that we can distribute them with Roller as long as we keep the attribution.


Friday photo

Our home improvement projects are finally complete including our new screened-in porch. Screens are a must here in Raleigh where swarming mosquitoes will literally eat you alive within minutes of your walking outside. We've been enjoying the bug free living for about a month now. Two big ceiling fans make it comfortable even in August.

our newly completed screened-in porch

del.icio.us

After reading Chris and Rafe's posts about coping with a day without del.icio.us, I decided it was time to give it a try. I've tried a number of different ways to keep track of articles that I want to blog about, read at a later time, or just remember for future reference. Over the years I've tried keeping them in a browser bookmark folder, tracking them on a Voodoo pad page, and even saving them as draft blog entries but del.icio.us seems like a much better approach. You can find my del.icio.us link blog here.


Securing Pebble, Roller, and Java web apps in general

Simon Brown has been posting a number of good recommendations for securiting Pebble, but which apply to just about any Java web app. If you are running a public Roller server you should at least implement recommendations #1 and #2.

Additionally, for Roller site admins: if you are running Roller 0.9.8, make sure you are running with the latest security patch, see the Roller project blog for details. If you are running Roller 0.9.9 from CVS make sure you have updated your site since August 2, 2004.


Parsing RSS with .Net

How do you do it? I need to provide some examples to show how to parse RSS with Java and C#. I have written simple parsers using the common XML parsing techniques such as DOM, SAX, and Pull. I have also written some examples that use parser libraries, but I have yet to find a good and free RSS parser library for .Net. Lazy-web, please help me out here.

When you assume...

If you assume that RSS is XML and you are just interested in getting titles, decriptions, links, and dates then it is pretty easy to write a simple parser that can handle most forms of RSS including RSS 1.0, RSS 2.0, and some forms of funky RSS. If you to handle more than those basic elements, then I recommend that you use a parser library.

Parser libraries

Python programmers are blessed with a great newsfeed parser library: Pilgrim's regex-based Universal Feed Parser which can parse any feed, even if it is not valid XML. I don't think Pilgrim's parser will port easily to the Java version of Python Jython, because Jython is missing some important Python libraries and Jython uses a Java regex which is different from Python's built-in regex. The same thing probably goes for the .Net version of Python IronPython. By the way, Lazy-web, would you please port Pilgrim's parser to Jython?

So, Java developers don't have the Universal Feed Parser, but we do have two active projects that are developing full featured RSS (and Atom) parsers: Informa (used by Javablogs.com) and Rome. .Net developers have RSS.Net, but it is incomplete and development seems to have comletely stagnated back in November of 2003.

So how do you parse RSS with .Net? I started looking around and digging into source code. I found that Dare built his C# based RSS parser for RssBandit on top of an SGML parser. Joe built his C# based RSS parser for Aggie using good old System.Xml. I guess you just have to do it by hand, so here goes...

My examples

Now it's time for the lazy web to point and laugh at my feeble efforts to build simple RSS parsers in C#. I have two examples for your ridicule. After you are done laughing, please, .Net heads, help me out and tell me what I am doing wrong and where I can make improvements.

First, here is a simple C# RSS parser method that uses a DOM based approach. It extracts the basic elements of title, description, link, and pubDate from the channel and item levels and it puts them into a dictionary (just like Pilgrim's parser does). It can handle RSS 1.0, RSS 2.0, and some forms of funky RSS. Have a look:

public IDictionary ParseFeed(String fileName) {
XmlDocument feedDoc = new XmlDocument();
feedDoc.Load(fileName);
XmlElement root = feedDoc.DocumentElement;
string defaultNS = null;
string contentNS = "http://purl.org/rss/1.0/modules/content/";
string dcNS = "http://purl.org/dc/elements/1.1/";
string xhtmlNS = "http://www.w3.org/1999/xhtml";
if (root.Name.Equals("rss")) {
defaultNS = null;
}
else {
defaultNS = "http://purl.org/rss/1.0/";
}
XmlElement channel = (XmlElement)root.GetElementsByTagName("channel").Item(0);
IDictionary feedMap = new Hashtable();
feedMap.Add("title", GetChildText(channel,"title",defaultNS));
feedMap.Add("pubDate", GetChildText(channel,"pubDate",defaultNS));
feedMap.Add("dc:date", GetChildText(channel,"date",dcNS));
feedMap.Add("description", GetChildText(channel,"description",defaultNS));
feedMap.Add("link", GetChildText(channel,"link",defaultNS));

XmlNodeList items = null;
if (root.Name.Equals("rss")) {
items = channel.GetElementsByTagName("item");
}
else {
items = root.GetElementsByTagName("item");
}
IList itemList = new ArrayList();
feedMap.Add("items", itemList);
for (int i=0; i<items.Count; i++) {
IDictionary itemMap = new Hashtable();
itemList.Add(itemMap);
XmlElement item = (XmlElement)items.Item(i);
itemMap.Add("title", GetChildText(item,"title",defaultNS));
itemMap.Add("link", GetChildText(item,"link",defaultNS));
itemMap.Add("guid", GetChildText(item,"guid",defaultNS));
itemMap.Add("pubDate", GetChildText(item,"pubDate",defaultNS));
itemMap.Add("dc:date", GetChildText(item,"date",dcNS));
itemMap.Add("description", GetChildText(item,"description",defaultNS));
itemMap.Add("content:encoded", GetChildText(item,"encoded",contentNS));
itemMap.Add("body", GetChildText(item,"body",xhtmlNS));
}
return feedMap;
}
private string GetChildText(XmlElement element, string childName, string namespaceURI) {
string text = null;
XmlNodeList nodeList = null;
if (namespaceURI != null) {
nodeList = element.GetElementsByTagName(childName, namespaceURI);
} else {
nodeList = element.GetElementsByTagName(childName);
}
if (nodeList!=null && nodeList.Item(0)!=null) {
if (nodeList.Item(0).FirstChild!=null) {
text = nodeList.Item(0).FirstChild.Value;
} else {
text = "";
}
}
return text;
}

And here is the same thing, but using a pull-parser based XmlTextReader approach:

public IDictionary ParseFeed(String fileName) {
XmlTextReader reader = new XmlTextReader(fileName);
reader.WhitespaceHandling = WhitespaceHandling.None;
IDictionary feedMap = new Hashtable();
IList items = new ArrayList();
IDictionary itemMap = null;
feedMap.Add("items", items);
while (reader.Read()) {
bool isStart = reader.NodeType.Equals(XmlNodeType.Element);
bool isEnd = reader.NodeType.Equals(XmlNodeType.EndElement);
if (isEnd && reader.Name.Equals("item")) {
itemMap = null;
}
else if (isStart && reader.Name.Equals("item")) {
itemMap = new Hashtable();
items.Add(itemMap);
}
else if (isStart && itemMap!=null
&& reader.Name.Equals("title")) {
reader.Read();
itemMap.Add("title", reader.Value);
}
else if (isStart && itemMap!=null
&& reader.Name.Equals("link")) {
reader.Read();
itemMap.Add("link", reader.Value);
}
else if (isStart && itemMap!=null
&& reader.Name.Equals("description")) {
reader.Read();
itemMap.Add("description", reader.Value);
}
else if (isStart && itemMap!=null
&& reader.Name.Equals("content:encoded")) {
reader.Read();
itemMap.Add("content:encoded", reader.Value);
}
else if (itemMap!=null && reader.Name.Equals("body")) {
reader.Read();
itemMap.Add("body", reader.Value);
}
else if (isStart && itemMap!=null
&& reader.Name.Equals("pubDate")) {
reader.Read();
itemMap.Add("pubDate", reader.Value);
}
else if (isStart && itemMap!=null
&& reader.Name.Equals("dc:date")) {
reader.Read();
itemMap.Add("dc:date", reader.Value);
}
else if (isStart && reader.Name.Equals("title")) {
reader.Read();
feedMap.Add("title", reader.Value);
}
else if (isStart && reader.Name.Equals("description")) {
reader.Read();
feedMap.Add("description", reader.Value);
}
else if (isStart && reader.Name.Equals("link")) {
reader.Read();
feedMap.Add("link", reader.Value);
}
else if (isStart && reader.Name.Equals("pubDate")) {
reader.Read();
feedMap.Add("pubDate", reader.Value);
}
else if (isStart && reader.Name.Equals("dc:date")) {
reader.Read();
feedMap.Add("dc:date", reader.Value);
}
else if (isStart && reader.Name.Equals("image")) {
// skip images
while (reader.Read()) {
if (reader.Name.Equals("image")
&& reader.NodeType.Equals(XmlNodeType.EndElement)) {
break;
}
}
}
}
return feedMap;
}

Have some better examples of parsing RSS with .Net? Please point me to them.


That was fun.

I haven't had as much fun watching the hits roll since when Weblogger.com threatened to sue me. Yesterday was much much more fun, of course. Thanks to all who commented, linked, welcomed and trackbacked me. One thing is for sure, you made my mom and dad feel a whole lot better about my leaving the seemingly safe sanctuary of SAS.

I'm venturing into new territory as a blogger. I have always kept my employer a secret. I never wanted anybody to google for HAHT or SAS and end up on my blog. I was a little worried about getting fired for blogging. It still happens even to those who try to be careful. Now, everybody knows who I work for and that changes things for me. On the positive side, blogging about my work with Roller, blogging technologies, Sun, and Java will give me lots of interesting material to work with - and then there's that evangelism thing. On the negative side, there are probably some topics that I had better avoid. Even with a company with a clueful policy on public discourse, you can still screw up and do damage to your career.

I'm confident that I'll do just fine in this new territory. I tend to be conservative in my output, perhaps too conservative. I'm also biased in favor of Sun and always have been. I'm a shareholder too. There's my full disclosure for you. I've been working with Sun hardware and software since the Sun3 timeframe. In fact, I proposed to the woman who became my wife as a direct result of a SPARCstation sale. I went down to Jamaica in '91 to install a SPARCstation-based system and to do a training workshop on the open source GRASS GIS software, got a great job offer at the Univ. of the West Indies, came home and asked Andi to marry me. We had a great honeymoon in Jamaica that lasted about a year and a half. I hope my honeymoon at Sun will last a lot longer than that.


Full time Roller!

It's official. Roller is now my full time job. I just accepted a job with Sun Microsystems to "design, develop, and deploy the primary blogging system for Sun in conjunction with other engineers" and to evangelize blogging both inside and outside of Sun. Needless to say, I'm thrilled. I'm honored to be working for Sun and with great folks like Will Snow, John Hoffman, Tim Bray, Patrick Chanezon, and Danese Cooper. I'm excited to be working for a company that feels the same was as I do about the value of blogs and wikis, open source software, and encouraging employees to speak with honest and authentic voice to customers, to partners, and to each other.

What does this mean to Roller? Only good things. Sun wants many of the same things for Roller that other Roller users want including high performance, high availability, great user interface, support for standards, and better support for large communities of bloggers. Thanks to Sun I'll be working full time to help make these things happen. Since Roller will continue on as an open source project, you can help too (and I hope you will).


Those pesky Autoruns

I've been using SysInternals freeware Windows Process Explorer and other Sysitnernals utilties for years now, but I never noticed this one. AutoRuns "shows you what programs are configured to run during system bootup or login" and allows you to delete or disable any of them. Via Jonathan Hardwick.


Friday photo

Over the past couple of years, I've been scanning my photo collection using a HP slide/negative scanner. My dad, who is an excellent photographer, has been scanning his collection as well. So, to add a little life to this tired old blog, I'm going to start taking advantage of my .Mac account (no longer active) and posting each week a photo from my collection or my dad's collection. Here is the first one:

Jamaican carwash - My old VM Golf in the carwash close to Ocho Rios, Jamaica

RSS link vs. guid vs. source elements

I've been researching newsfeed formats for various reasons. I've been using Rome to convert to and from various formats and that revealed a problem with Roller's RSS feed. After re-reading the loosey goosey RSS specs, I'm thinking that I it wrong in the Roller RSS feeds. What do you think? Currently, Roller uses the following elements for links:

  • <guid isPermaLink="true"> - the permalink
  • <link> - the (optional) source link, i.e. the one link that the blog entry is about
After looking at some Radio generated RSS 2.0 feeds, seems like Roller should do this:
  • <link> - the permalink
  • <guid isPermaLink="true"> - the permalink, same as <link>
  • <source url="[url of source]"> - the (optional) source link
Have the permalink in two places (link and guid) seems silly. Should we drop guid entirely? Putting the source link in the source element seems like right think to do, but the spec says the source url "links to the XMLization of the source" - that is, the source url should point to an RSS feed. Is that the common usage of the source element?

Can't tell you yet

I'm sitting on some very big news for Roller and for me, but I have to tell some other folks about it before I can tell you.


Socialtext Closes Series A Financing

Enterprise wiki-blogging meets venture capitalists

RSS in Thunderbird

"receiving and reading RSS feeds" in Thunderbird

atomflow

"atom storage/query core"

The Balkanization of the Internet

"how often do you actually visit sites in other countries?"

IBM Touts New Eclipse Package for Linux

Eclipse 3.0 and the IBM JRE

Red Hat launches employee blogs.

Jesus points out that Red Hat has launched employee blogs, or perhaps I should say employee blog. Unlike Sun's employee blog system, where each employee has a personal blog with it's own personal theme, Red Hat's Movable Type based blog system appears to be configured for two group blogs. One group blog, Red Hat People, is for regular employees and one, Red Hat Executives, is for executives. Interesting approach. Hey, aren't executives people too?


Knowledge management on your keychain.

Thank goodness for referrers. They bring in the porn spam, sure, but they also bring in wonderful news of the world. How could I have missed this incredible technological acheivement:

Le Danois: A wiki and weblog placed on a USB key, is that possible? The answer seems to be yes. I have put a bundle of Roller weblogger, JSPWiki, HSQLDB (file based database) and Tomcat on the USB key and I am currently testing it.

« Previous page | Main | Next page »