[Rubygems-developers] Adoption
Patrick May
patrick at hexane.org
Fri Oct 22 23:39:27 EDT 2004
Hello,
To help anyone interested in writing an raa => rubygems bridge, there
is a gzip'd yaml file of the raa online:
http://www.narf-lib.org/raa-yaml.yml.gz
created using the script attached below. I'll update the gzip on
narf-lib.org nightly from the latest html up on raa.ruby-lang.org. Let
me know if I can clean up the project attributes more, I'm sure there's
stuff I've missed.
Of course, if anyone working on RAA has a problem with the load this
script causes, then I'll stop running it.
~ patrick
------------------------------
raa2yaml.rb
------------------------------
ALL_PROJECTS = "http://raa.ruby-lang.org/all.html"
PROJECT_DETAIL = "http://raa.ruby-lang.org/project/{name}/"
RAA_YAML = "raa-yaml.yml"
GZIP_CMD = "gzip -9f " + RAA_YAML
#ALL_PROJECTS = "all_projects.html"
#PROJECT_DETAIL = "./{name}.html"
require 'open-uri'
require 'yaml'
require 'date'
require 'parsedate'
def file_get_contents( path, message=nil )
puts message if message
open( path ) { |f|
f.read
}
end
def project_list
all_html = file_get_contents( ALL_PROJECTS, "get all projects" )
# strip out everything but the names
all_html.sub!( /^.*?<a href="project\/([^>]*)\/">/m, "" )
all_html.sub!( /<\/table>.*<p class=\"count\">.*$/m, "" )
all_html.gsub!( /<\/th>\s*<th>/m, "\t" )
all_html.gsub!( /<\/a>.*?<a href=\"project\/([^>]*)\/">/m, "\n" )
all_html.sub!( /<\/a>.*$/m, "" )
all_html.collect{ |project|
{ "name" => project.strip }
}
end
def fill_in_details( project )
detail_page = PROJECT_DETAIL.gsub( /\{\s*name\s*\}/i, project["name"]
)
project_html = file_get_contents( detail_page, "get details for
#{project["name"]}..." )
project["version"] = $1 if
project_html =~ /#{project["name"]}\s+\/\s+([\w.]+)/m
["Short description",
"Category",
"Status",
"Created",
"Last update",
"Owner",
"Homepage",
"Download",
"License",
"Dependency",
"Description"
].each{ |attribute|
if project_html =~ /<th>#{attribute}:\s*<\/th>\s*<td>(.*?)<\/td>/m
value = $1
project[attribute.downcase.gsub(/\s+/, "_")] = value
end
}
# clean up attributes
["download",
"category",
"homepage"].each{ |attribute|
project[attribute] = project[attribute].gsub( /<[^>]*>/, "" )
}
project["owner"] =~ /<a href=\"mailto:([^"]*)\">(.*)<\/a>.*id=(\d+)/m
project["owner"] = {"email" => $1,
"name" => $2,
"raa-id" => $3}
project
end
open( RAA_YAML, "w" ) { |f|
f.puts project_list.collect{ |project|
fill_in_details( project )
}.to_yaml
}
`#{GZIP_CMD}`
More information about the Rubygems-developers
mailing list