rexml/httpwrite, why must you mock me?

Daniel Sheppard daniels at pronto.com.au
Tue Sep 27 01:45:04 EDT 2005


I got struck by a little inspiration last night, and thought "Hey,
mousehole, how would you like to have some fun manipulating rss feeds?".

MouseHole didn't seem too impressed by the idea.

Firstly, after littering my code with debug messages, I found that
MouseHole only wanted to deal with the text/html and
application/xhtml+xml mime types. Fair enough - if want to parse
everything as HTML, it should probably obey that restriction. So I
fiddled with the checking:

                    unless script.match req.request_uri
			    logger.info "Skipping #{script.name} - no
match"
			    next
		    end
                    unless
script.document_converter.handles_type?(res.content_type)
			    logger.info "Skipping #{script.name} - wrong
content type"
			    next
		    end
                    logger.info "Executing #{script.name}"
                    script.execute( req, res )

And I added in this bit of code (which I think is a pretty messy way of
going about this, but it works in a pinch):

    class HtmlDocumentConverter
        def parse_string(body)
	    parse_xhtml(HTree.parse(body))
	end
        def output_string(document, stream)
            document.write(stream)
        end
        def parse_xhtml( htree )
            htree.each_child do |child|
            if child.respond_to? :qualified_name
                if child.qualified_name == 'html'
                return HTree::Doc.new( child ).to_rexml
                break
                end
            end
            end
        end	  
        def handles_type?(type)
            [
                /^text\/html/, 
                /^application\/xhtml+xml/
            ].any? {|x| x === type }
        end        
    end
    
    class XmlDocumentConverter
        def parse_string(body)
            REXML::Document.new(body)
        end
        def output_string(document, stream)
            document.write(stream)
        end
        def handles_type?(type)
	    p type
            /^text\/xml/ === type
        end
    end

    class UserScript

        attr_accessor :document, :matches, :db, :request, :response,
:mtime, :active, :install_url, :document_converter
        def document_converter s = nil; s ? @document_converter = s :
(@document_converter || HtmlDocumentConverter.new) end
        def name s = nil; s ? @name = s : @name; end
        ......

Which allowed me to write this rule for rewriting the slashdot RSS feed:

MouseHole.script do
	name "Slashdot Fullfeed RSS"
	namespace "http://members.iinet.net.au/~soxbox/"
	description "Converts the slashdot RSS feed to a full-content
feed"
	include_match "http://rss.slashdot.org/Slashdot/slashdot"
	document_converter XmlDocumentConverter.new
	version "0.1"

	rewrite do |req,res|
		p "rewriting"
		document.each_element('//item/') do |e|
			e.each_element('description') {|x| x.remove}
			doc = read_xhtml_from(e.attributes['rdf:about']
+ "&mode=nocomment")
			desc = REXML::Element.new('description')
			doc.each_element('//div[@class="intro"]') do |x|
				s = ""
				x.write(s)
				s.gsub!("—","—")
				desc.text = s
			end
			e << desc
		end
	end
end

Then I noticed something strange - my rewritten feed had no content in
the <link/> elements. Apparently, the reason for this is the
rexml/httpwrite.rb code - which seems to be designed to ensure that HTML
elements that aren't meant to have content don't end up having any
content... Why does this code exist? Shouldn't it by the task of the
tree builder to put the right things in the right nodes? Otherwise,
wouldn't it be better to have something walking the tree and trimming
the bad nodes before it gets output?

#####################################################################################
This email has been scanned by MailMarshal, an email content filter.
#####################################################################################
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://rubyforge.org/pipermail/mousehole-scripters/attachments/20050927/d309c7da/attachment-0001.htm


More information about the Mousehole-scripters mailing list