« Lucene in Action | Main | Roller Hacks »

Javablogs 1.2 using Informa.

Charles Miller posted an interesting write-up on the recent Javablogs 1.2 improvements. I found it especially interesting that Javablogs now uses the Informa RSS parser, which takes the strict XML approach to RSS feed parsing - if the XML is invalid it ignores the feed. They are considering using Mark Pilgrim's ultra-liberal feed Universal Feed Parser which is much more forgiving, but it is written in Python and may need to be run in a separate process. Wouldn't it would be nice if Informa provided ultra-liberal parsing capabilities? Hmmm... I wonder... how hard would it be to port Pilgrim's parser to Jython?

Comments:

Many of the arguments in favour of liberal parsing don't actually apply to Javablogs: If your RSS feed doesn't appear on Javablogs because it's broken, that should be incentive for you to fix the feed. (That said, a lot of Roller feeds seem to be spitting out invalid UTF-8) I've found a couple of bugs in Informa where perfectly valid feeds don't get read because they have a namespace declaration. Luckily they're easy to patch. Open Source is cool that way.

Posted by Charles Miller on April 14, 2004 at 10:13 PM EDT #

And Niko (Informa's project manager) is pretty good at getting patches integrated too. The problem with 'liberal' parsing, is that often invalid feeds are actually invalid XML - Informa saves a load of time by using standard XML parsers, which assume they are dealing with valid XML. I don't know of a liberal XML parser, so I expect integrating a liberal parser into Informa would be a lot of work. If Mark Pilgrims feed parser could be easily ported however, extending informa to support user defined parsers isn't much work at all - then you get all the benifits of Informa's nice feed API.

Posted by Sam Newman on April 15, 2004 at 07:33 AM EDT #

I did some research yesterday, but as a complete J/Python novice I couldn't figure out how to run a Python script in Jython. My one previous attempt to run Python (the wxAtom client) failed completely.

Posted by Lance on April 17, 2004 at 02:15 PM EDT #

Post a Comment:
  • HTML Syntax: NOT allowed

« Lucene in Action | Main | Roller Hacks »

Welcome

This is just one entry in the weblog Blogging Roller. You may want to visit the main page of the weblog

Related entries

Below are the most recent entries in the category Java, some may be related to this entry.