Forums | Admin

Discussion Forums: open-discussion

Start New Thread Start New Thread

 

By: Jimmy O Regan
RE: Sample scrapers [ reply ]  
2004-10-20 23:23
No problem. It's serendipitous that your reply came today -- I only thought to check back today too :)

By: Lau Taarnskov
RE: Sample scrapers [ reply ]  
2004-10-20 22:54
Great.

I have enabled a wiki and posted your scraper:
http://rssscraper.rubyforge.org/wiki/wiki.pl?ScraperList

Hopefully the list of custom scrapers will grow.

P.S. Sorry 'bout the late reply. I don't check this forum often, because it hasn't been very active(!).

By: Jimmy O Regan
Sample scrapers [ reply ]  
2004-07-17 12:40
Damn. I was expecting to find more samples here. I suppose I'd better get the ball rolling. This is scrapes Ben Okopnik's blog <http://okopnik.freeshell.org/blog/blog.cgi>

class BenScanner < RSSscraper::AbstractScanner
def initialize
@url_string = 'http://okopnik.freeshell.org/blog/blog.cgi'
@url_proper = 'http://okopnik.freeshell.org/blog/'
@postsRE = /div class="HeadText"> \n\n([^<]*)\n\n<\/div>\n<\/td>\n<\/tr>\n\n<tr>\n<td bgcolor="#fdfded" class="UsualText" valign="middle"> \n\n<br><b>([^<]*)<\/b><p>\n\n([^\]]*)\n\n<p>\[ <a href="([^\s\t\r\n\f]*)">([^<]*)<\/a>/m
end

def find_items
require 'cgi'
items = Array.new
request_feed.scan(@postsRE).each{ |date, title, content, comments_link, comments|
items << { :title => title,
:description => "#{CGI::escapeHTML(content)}",
:comments_link => @url_proper+comments_link,
:comments => @url_proper+comments_link,
}
}
items
end
end

class Ben < RSSscraper::AbstractScraper

def scanner
BenScanner.new
end

def description
{
:link => 'http://okopnik.freeshell.org/blog/blog.cgi',
:title => 'The Bay of Tranquility',
:description => 'Ben Okopnik\'s blog.',
:language => 'en-us',
:generator => generator_string
}
end
end