Speeding up MouseHole
why the lucky stiff
why at hobix.com
Sun Sep 4 17:16:46 EDT 2005
Okay, I've had some good success speeding things up. I've attached a
benchmark script I'm playing with. You can take a look at that later.
For now, here's my benchmarks. 50 iterations cleaning Boing Boing's
home page, parsing it and building the HTML string again. (I use Boing
Boing since it's riddled with heinous Javascript and unclosed tags and
stuff.) This is on my Pentium-M 1.6GHz laptop on Gentoo Linux.
user system total real
tidy+rexml: 19.010000 0.030000 19.040000 ( 20.467225) * 1.2 *
cached tidy+rexml: 19.580000 0.020000 19.600000 ( 20.823490)
htree+rexml: 48.630000 0.030000 48.660000 ( 51.742457) * 1.1 *
htree only: 14.490000 0.010000 14.500000 ( 15.717621)
tidy+smart: 3.150000 0.010000 3.160000 ( 3.217260) * 2.0 *
The quickest combination is Tidy <http://tidy.rubyforge.org> for
cleaning the HTML and XML-Smart
<http://raa.ruby-lang.org/project/ruby-xml-smart/> for parsing the XML,
altering it, displaying. Tidy is a Ruby/DL wrapper for a C extension.
XML-Smart is a C extension.
I'd like to move to Tidy and away from HTree. Replacing HTree with Tidy
will roughly improve speed by a factor of 3. There are binaries for
most operating systems: <http://tidy.sourceforge.net/>
I'd like to move to XML-Smart in time as well. It's faster than REXML
by a factor of 5 to 6 generally. However, we'd have to fill in some
gaps in the API, it's still beta software, and there's no binaries
readily available. If we can solve these problems, then I think we're
looking at MouseHole 2.0.
So, for now, I'm checking support for Tidy, but holding off on other XML
libs. I've looked at XML-Simple, libxml bindings, etc. Nothing else
has both the speed and the XPath support, ease of scripting.
_why
More information about the Mousehole-scripters
mailing list