<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.6944.0">
<TITLE>rexml/httpwrite, why must you mock me?</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/rtf format -->
<P><FONT SIZE=2 FACE="Arial">I got struck by a little inspiration last night, and thought "Hey, mousehole, how would you like to have some fun manipulating rss feeds?".</FONT></P>
<P><FONT SIZE=2 FACE="Arial">MouseHole didn't seem too impressed by the idea.</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">Firstly, after littering my code with debug messages, I found that MouseHole only wanted to deal with the text/html and application/xhtml+xml mime types. Fair enough - if want to parse everything as HTML, it should probably obey that restriction. So I fiddled with the checking:</FONT></P>
<P><FONT SIZE=2 FACE="Arial"> unless script.match req.request_uri</FONT>
<BR> <FONT SIZE=2 FACE="Arial"> logger.info "Skipping #{script.name} - no match"</FONT>
<BR> <FONT SIZE=2 FACE="Arial"> next</FONT>
<BR> <FONT SIZE=2 FACE="Arial"> end</FONT>
<BR><FONT SIZE=2 FACE="Arial"> unless script.document_converter.handles_type?(res.content_type)</FONT>
<BR> <FONT SIZE=2 FACE="Arial"> logger.info "Skipping #{script.name} - wrong content type"</FONT>
<BR> <FONT SIZE=2 FACE="Arial"> next</FONT>
<BR> <FONT SIZE=2 FACE="Arial"> end</FONT>
<BR><FONT SIZE=2 FACE="Arial"> logger.info "Executing #{script.name}"</FONT>
<BR><FONT SIZE=2 FACE="Arial"> script.execute( req, res )</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">And I added in this bit of code (which I think is a pretty messy way of going about this, but it works in a pinch):</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial"> class HtmlDocumentConverter</FONT>
<BR><FONT SIZE=2 FACE="Arial"> def parse_string(body)</FONT>
<BR> <FONT SIZE=2 FACE="Arial"> parse_xhtml(HTree.parse(body))</FONT>
<BR> <FONT SIZE=2 FACE="Arial">end</FONT>
<BR><FONT SIZE=2 FACE="Arial"> def output_string(document, stream)</FONT>
<BR><FONT SIZE=2 FACE="Arial"> document.write(stream)</FONT>
<BR><FONT SIZE=2 FACE="Arial"> end</FONT>
<BR><FONT SIZE=2 FACE="Arial"> def parse_xhtml( htree )</FONT>
<BR><FONT SIZE=2 FACE="Arial"> htree.each_child do |child|</FONT>
<BR><FONT SIZE=2 FACE="Arial"> if child.respond_to? :qualified_name</FONT>
<BR><FONT SIZE=2 FACE="Arial"> if child.qualified_name == 'html'</FONT>
<BR><FONT SIZE=2 FACE="Arial"> return HTree::Doc.new( child ).to_rexml</FONT>
<BR><FONT SIZE=2 FACE="Arial"> break</FONT>
<BR><FONT SIZE=2 FACE="Arial"> end</FONT>
<BR><FONT SIZE=2 FACE="Arial"> end</FONT>
<BR><FONT SIZE=2 FACE="Arial"> end</FONT>
<BR><FONT SIZE=2 FACE="Arial"> end </FONT>
<BR><FONT SIZE=2 FACE="Arial"> def handles_type?(type)</FONT>
<BR><FONT SIZE=2 FACE="Arial"> [</FONT>
<BR><FONT SIZE=2 FACE="Arial"> /^text\/html/, </FONT>
<BR><FONT SIZE=2 FACE="Arial"> /^application\/xhtml+xml/</FONT>
<BR><FONT SIZE=2 FACE="Arial"> ].any? {|x| x === type }</FONT>
<BR><FONT SIZE=2 FACE="Arial"> end </FONT>
<BR><FONT SIZE=2 FACE="Arial"> end</FONT>
<BR><FONT SIZE=2 FACE="Arial"> </FONT>
<BR><FONT SIZE=2 FACE="Arial"> class XmlDocumentConverter</FONT>
<BR><FONT SIZE=2 FACE="Arial"> def parse_string(body)</FONT>
<BR><FONT SIZE=2 FACE="Arial"> REXML::Document.new(body)</FONT>
<BR><FONT SIZE=2 FACE="Arial"> end</FONT>
<BR><FONT SIZE=2 FACE="Arial"> def output_string(document, stream)</FONT>
<BR><FONT SIZE=2 FACE="Arial"> document.write(stream)</FONT>
<BR><FONT SIZE=2 FACE="Arial"> end</FONT>
<BR><FONT SIZE=2 FACE="Arial"> def handles_type?(type)</FONT>
<BR> <FONT SIZE=2 FACE="Arial"> p type</FONT>
<BR><FONT SIZE=2 FACE="Arial"> /^text\/xml/ === type</FONT>
<BR><FONT SIZE=2 FACE="Arial"> end</FONT>
<BR><FONT SIZE=2 FACE="Arial"> end</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial"> class UserScript</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial"> attr_accessor :document, :matches, :db, :request, :response, :mtime, :active, :install_url, :document_converter</FONT>
<BR><FONT SIZE=2 FACE="Arial"> def document_converter s = nil; s ? @document_converter = s : (@document_converter || HtmlDocumentConverter.new) end</FONT></P>
<P><FONT SIZE=2 FACE="Arial"> def name s = nil; s ? @name = s : @name; end</FONT>
<BR><FONT SIZE=2 FACE="Arial"> ……</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">Which allowed me to write this rule for rewriting the slashdot RSS feed:</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">MouseHole.script do</FONT>
<BR> <FONT SIZE=2 FACE="Arial">name "Slashdot Fullfeed RSS"</FONT>
<BR> <FONT SIZE=2 FACE="Arial">namespace "</FONT><A HREF="http://members.iinet.net.au/~soxbox/"><U><FONT COLOR="#0000FF" SIZE=2 FACE="Arial">http://members.iinet.net.au/~soxbox/</FONT></U></A><FONT SIZE=2 FACE="Arial">"</FONT>
<BR> <FONT SIZE=2 FACE="Arial">description "Converts the slashdot RSS feed to a full-content feed"</FONT>
<BR> <FONT SIZE=2 FACE="Arial">include_match "</FONT><A HREF="http://rss.slashdot.org/Slashdot/slashdot"><U><FONT COLOR="#0000FF" SIZE=2 FACE="Arial">http://rssslashdot.org/Slashdot/slashdot</FONT></U></A><FONT SIZE=2 FACE="Arial">"</FONT>
<BR> <FONT SIZE=2 FACE="Arial">document_converter XmlDocumentConverter.new</FONT>
<BR> <FONT SIZE=2 FACE="Arial">version "0.1"</FONT>
</P>
<P> <FONT SIZE=2 FACE="Arial">rewrite do |req,res|</FONT>
<BR> <FONT SIZE=2 FACE="Arial">p "rewriting"</FONT>
<BR> <FONT SIZE=2 FACE="Arial">document.each_element('//item/') do |e|</FONT>
<BR> <FONT SIZE=2 FACE="Arial">e.each_element('description') {|x| x.remove}</FONT>
<BR> <FONT SIZE=2 FACE="Arial">doc = read_xhtml_from(e.attributes['rdf:about'] + "&mode=nocomment")</FONT>
<BR> <FONT SIZE=2 FACE="Arial">desc = REXML::Element.new('description')</FONT>
<BR> <FONT SIZE=2 FACE="Arial">doc.each_element('//div[@class="intro"]') do |x|</FONT>
<BR> <FONT SIZE=2 FACE="Arial">s = ""</FONT>
<BR> <FONT SIZE=2 FACE="Arial">x.write(s)</FONT>
<BR> <FONT SIZE=2 FACE="Arial">s.gsub!("&mdash;","&#8212;")</FONT>
<BR> <FONT SIZE=2 FACE="Arial">desc.text = s</FONT>
<BR> <FONT SIZE=2 FACE="Arial">end</FONT>
<BR> <FONT SIZE=2 FACE="Arial">e << desc</FONT>
<BR> <FONT SIZE=2 FACE="Arial">end</FONT>
<BR> <FONT SIZE=2 FACE="Arial">end</FONT>
<BR><FONT SIZE=2 FACE="Arial">end</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">Then I noticed something strange - my rewritten feed had no content in the <link/> elements. Apparently, the reason for this is the rexml/httpwrite.rb code - which seems to be designed to ensure that HTML elements that aren't meant to have content don't end up having any content… Why does this code exist? Shouldn't it by the task of the tree builder to put the right things in the right nodes? Otherwise, wouldn't it be better to have something walking the tree and trimming the bad nodes before it gets output?</FONT></P>
<DIV>#####################################################################################</DIV>
<DIV>This email has been scanned by MailMarshal, an email content filter. </DIV>
<DIV>#####################################################################################</DIV>
</BODY>
</HTML>