Forums | Admin

Discussion Forums: help

Start New Thread Start New Thread


By: Kaspar Fischer
Problem with caching and regular expression [ reply ]  
2007-04-27 22:26
Hi there,

I am trying to write my own screenscraper and have run into two problems. Here's what I have so far

class Transparency4Scanner < RSSscraper::AbstractScanner
def initialize
@url_string = ''
@mainRE = /<p>\s*<a href="([^"]*)">([^<]*)<\/a><br \/>\s*(([^<]*)<br \/>)?\s*([^<]*)<\/p>/m

def find_items()
items =
request_feed.scan(@mainRE).each{ |a,b,c,d|
items << { :title => b,
:description => c,
:date => d,
:link => a

class Transparency4 < RSSscraper::AbstractScraper
def scanner

def description
:link => '',
:title => 'Transparency International News',
:description => 'News articles from',
:language => 'en-us'

The problem is that PHP's preg_match() function understands the regular expression but Ruby does not. What is wrong? I have to admit, I am a Ruby-newby (which sounds nice but is sad ;-).

Another issue I am having is that either my webbrowser (Firefox 2.0.0.x or Safari) or the ruby server 'ruby scrape_server.rb 2000' is caching the feed so that I have to rename the file (and classes) after each change to get another output. How can I disable this sort of caching?

Many thanks for any help,