Forums | Admin

Discussion Forums: open-discussion

Start New Thread Start New Thread

 

By: vinay hk
selecting text while scraping [ reply ]  
2008-10-31 06:33
hi,

i am using scrubyt, and below is my code to scap

google.

Scrubyt.logger = Scrubyt::Logger.new
google_data = Scrubyt::Extractor.define do
#Perform the action(s)
fetch 'http://www.google.com/'
fill_textfield 'q', 'ruby'
submit
#Construct the wrapper
#
link "//div[3]/div/ol/li" do

head "/h3[@class='r']"
des "/div[@class='s']"

end
next_page "Next", :limit => 2
end


and this wil output some thing like this

# Ruby Programming Language
# A dynamic, interpreted, open source programming language with a focus on simplicity and productivity. Site includes news, downloads, documentation, ...www.ruby-lang.org/ - 12k - Cached - Similar pagesDownloadsDocumentationin Twenty MinutesWhat's RubyDownload RubyLibrariesAbout RubySecurityMore results from ruby-lang.org ยป
# Ruby (programming language) - Wikipedia, the free encyclopedia
# Ruby is a dynamic, reflective, general purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. ...en.wikipedia.org/wiki/Ruby_(programming_language) - 118k - Cached - Similar pages

since <div class ='s'> has text and some child nodes. I m getting all text of <div class ='s'> as well as its chlid nodes.

how to filter this( i dont want child node's text). Can any body help in this. What procedure i have to follow.