Files | Admin

Notes:

Release Name: 0.2.0

Notes:
Spidr is a versatile Ruby web spidering library that can spider a site,
multiple domains, certain links or infinitely. Spidr is designed to be fast
and easy to use.


Changes: === 0.2.0 / 2009-10-10 * Added URI.expand_path. * Added Spidr::Page#search. * Added Spidr::Page#at. * Added Spidr::Page#title. * Added Spidr::Agent#failures=. * Added a HTTP session cache to Spidr::Agent, per suggestion of falter. * Added Spidr::Agent#get_session. * Added Spidr::Agent#kill_session. * Added Spidr.proxy=. * Added Spidr.disable_proxy!. * Aliased Spidr::Page#txt? to Spidr::Page#plain_text?. * Aliased Spidr::Page#ok? to Spidr::Page#is_ok?. * Aliased Spidr::Page#redirect? to Spidr::Page#is_redirect?. * Aliased Spidr::Page#unauthorized? to Spidr::Page#is_unauthorized?. * Aliased Spidr::Page#forbidden? to Spidr::Page#is_forbidden?. * Aliased Spidr::Page#missing? to Spidr::Page#is_missing?. * Split URL filtering code out of Spidr::Agent and into Spidr::Filtering. * Split URL / Page event code out of Spidr::Agent and into Spidr::Events. * Split pause! / continue! / skip_link! / skip_page! methods out of Spidr::Agent and into Spidr::Actions. * Fixed a bug in Spidr::Page#code, where it was not returning an Integer. * Make sure Spidr::Page#doc returns Nokogiri::XML::Document objects for RSS/RDF/Atom pages as well. * Fixed the handling of the Location header in Spidr::Page#links (thanks falter). * Fixed a bug in Spidr::Page#to_absolute where trailing '/' characters on URI paths were not being preserved (thanks falter). * Fixed a bug where the URI query was not being sent with the request in Spidr::Agent#get_page (thanks Damian Steer). * Fixed a bug where SSL sessions were not being properly setup (thanks falter). * Switched Spidr::Agent#history to be a Set, to improve search-time of the history (thanks falter). * Switched Spidr::Agent#failures to a Set. * Allow a block to be passed to Spidr::Agent#run, which will receive all pages visited. * Allow Spidr::Agent#start_at and Spidr::Agent#continue! to pass blocks to Spidr::Agent#run. * Made Spidr::Agent#visit_page public. * Moved to YARD based documentation.