Forums | Admin

Discussion Forums: help

Start New Thread Start New Thread

 

By: Kaspar Fischer
Problem with caching and regular expression [ reply ]  
2007-04-27 22:26
Hi there,

I am trying to write my own screenscraper and have run into two problems. Here's what I have so far

<code>
class Transparency4Scanner < RSSscraper::AbstractScanner
def initialize
@url_string = 'http://transparency.org/news_room/latest_news/press_releases'
@mainRE = /<p>\s*<a href="([^"]*)">([^<]*)<\/a><br \/>\s*(([^<]*)<br \/>)?\s*([^<]*)<\/p>/m
end

def find_items()
items = Array.new
request_feed.scan(@mainRE).each{ |a,b,c,d|
items << { :title => b,
:description => c,
:date => d,
:link => a
}
}
items
end
end

class Transparency4 < RSSscraper::AbstractScraper
def scanner
Transparency4Scanner.new
end

def description
{
:link => 'http://transparency.org/news_room/latest_news/press_releases',
:title => 'Transparency International News',
:description => 'News articles from Transpareny.org.',
:language => 'en-us'
}
end
end
</code>

The problem is that PHP's preg_match() function understands the regular expression but Ruby does not. What is wrong? I have to admit, I am a Ruby-newby (which sounds nice but is sad ;-).

Another issue I am having is that either my webbrowser (Firefox 2.0.0.x or Safari) or the ruby server 'ruby scrape_server.rb 2000' is caching the feed so that I have to rename the file (and classes) after each change to get another output. How can I disable this sort of caching?

Many thanks for any help,
Kaspar