Speeding up MouseHole

why the lucky stiff why at hobix.com
Sun Sep 4 17:16:46 EDT 2005


Okay, I've had some good success speeding things up.  I've attached a 
benchmark script I'm playing with.  You can take a look at that later.

For now, here's my benchmarks.  50 iterations cleaning Boing Boing's 
home page, parsing it and building the HTML string again.  (I use Boing 
Boing since it's riddled with heinous Javascript and unclosed tags and 
stuff.)  This is on my Pentium-M 1.6GHz laptop on Gentoo Linux.

                        user     system      total        real
tidy+rexml:        19.010000   0.030000  19.040000 ( 20.467225) * 1.2 *
cached tidy+rexml: 19.580000   0.020000  19.600000 ( 20.823490)
htree+rexml:       48.630000   0.030000  48.660000 ( 51.742457) * 1.1 *
htree only:        14.490000   0.010000  14.500000 ( 15.717621)
tidy+smart:         3.150000   0.010000   3.160000 (  3.217260) * 2.0 *

The quickest combination is Tidy <http://tidy.rubyforge.org> for 
cleaning the HTML and XML-Smart 
<http://raa.ruby-lang.org/project/ruby-xml-smart/> for parsing the XML, 
altering it, displaying.  Tidy is a Ruby/DL wrapper for a C extension.  
XML-Smart is a C extension.

I'd like to move to Tidy and away from HTree.  Replacing HTree with Tidy 
will roughly improve speed by a factor of 3.  There are binaries for 
most operating systems: <http://tidy.sourceforge.net/>

I'd like to move to XML-Smart in time as well.  It's faster than REXML 
by a factor of 5 to 6 generally.  However, we'd have to fill in some 
gaps in the API, it's still beta software, and there's no binaries 
readily available.  If we can solve these problems, then I think we're 
looking at MouseHole 2.0.

So, for now, I'm checking support for Tidy, but holding off on other XML 
libs.  I've looked at XML-Simple, libxml bindings, etc.  Nothing else 
has both the speed and the XPath support, ease of scripting.

_why


More information about the Mousehole-scripters mailing list