Files | Admin


Release Name: scrubyt-0.2.3

Thanks to the feedback from all of you, I managed to find a lot of bugs as well as write up a nice feature request list. The bugs are mostly fixed and also some shiny new features have been added. Stability was also improved by adding new tests and totally refacroring the whole code.
The new features make this release much more powerful than the previous one. Sites requiring login, submitting forms with button click, filling text areas, dealing with variable-size results, smart handling of attribute lookup, https, custom proxy setting and tons of bugfixes make this release capable of doing much-much more than it was possible in 0.2.0.
I have added also some shiny new examples - scraping reddit,, rubyforge login, wordpress automatic comment
ing for example.

Keep the great feedback coming! Thanks for everyone for their help.

Changes: * [FIX] Cookies (and other stuff) are now taken into consideration * [NEW] select_indices feature. Example: table do (row '1').select_indices(:last) end this will select only the last row; possibility to specify a Range, or an array of indices, or other constants like :first, :every_odd etc. More to come in the future! * [FIX] next page problem fixed * [FIX] Fetching of https sites * [FIX] Next page works incorrectly when given an absolute path * [FIX] Fixing exporting if the pattern parameters are parenthesized * [NEW] Possibility to submit forms by clicking a button * [NEW] Added new unit test suite: pattern_test * [NEW] Possibility to set a proxy for fetching the input document * [NEW] Added possibility to choose an option from a selection list (Credit: Zaheed Haque) * [FIX] Image pattern example lookup fix * [NEW] Possibility to prefilter the document before passing it to Hpricot (Credit: Demitrious Kelly) * [FIX] corrected gem dependencies (Credit: Tim Fletcher) * [FIX] remove duplicates only if there are more examples present * [NEW] new examples: wordpress comment (Credit: Zaheed Haque), rubyforge login,, reddit and more * [FIX] if there is no scraper defined, exit with a message rather than raise an exception * [NEW] smart handling of attribute lookup: try to look up the attribute in the parent, but if it is not there, traver se up until it is found (this is useful e.g. if an image is inside a span and the span is inside an <a> tag)