Sunday August 24, 2003
|
Vanity Foul Dedicated to the wanderings of an egotistical mind. |
|
Atom and Digester As I mentioned in my previous post, I'm lazy. So I'm using Digester to parse the AtomAPI XML - but its falling down in one regard: <content>can contain <html></content>. And Digester sees this as just more XML, so the contents on the content tag only get assigned if there is no html. Some general websearching, and searching on the Digester-user list, revealed one other person with a similar concern earlier this year. I've sent him an email, asking if he can supply more information on his solution. If not, or if that doesn't solve the issue, I'm afraid I'll have to resort to "hand-parsing" the AtomAPI. I'm lazy, I don't wanna do that. But again, it may become necessary as the AtomAPI allows for adding arbitrary vendor-specific tags - <mt:fooBar> for example - and my Digester rules don't cover that at the moment. I do plan on re-expressing the rules via Digester's rules.xml capability though; that may allow for the flexibility necessary....
Trackback URL: http://www.brainopolis.com/roller/trackback/lance/Weblog/atom_and_digester
|
|
||||
If you can't find a solution with Digester, I've got a delegating SAX handler that this would be pretty easy to do this with... it was originally designed to chop large xml files into chunks, so it's used to ignoring (and maintaining) nested tags.
Posted by Jason Carreira on August 24, 2003 at 08:14 PM CDT
Website: http://freeroller.net/page/jcarreira #
Posted by Mark Mascolino on August 24, 2003 at 08:18 PM CDT
Website: http://people.etango.com/~markm/ #
Mark, that's pretty much the solution I'm looking at. I found someone who's already done it, just waiting on a reply.
Posted by Lance on August 24, 2003 at 08:43 PM CDT #