[Rubygems-developers] Adoption

Patrick May patrick at hexane.org
Fri Oct 22 23:39:27 EDT 2004


Hello,

To help anyone interested in writing an raa => rubygems bridge, there 
is a gzip'd yaml file of the raa online:

   http://www.narf-lib.org/raa-yaml.yml.gz

created using the script attached below.  I'll update the gzip on 
narf-lib.org nightly from the latest html up on raa.ruby-lang.org.  Let 
me know if I can clean up the project attributes more, I'm sure there's 
stuff I've missed.

Of course, if anyone working on RAA has a problem with the load this 
script causes, then I'll stop running it.

~ patrick

------------------------------
   raa2yaml.rb
------------------------------
ALL_PROJECTS   = "http://raa.ruby-lang.org/all.html"
PROJECT_DETAIL = "http://raa.ruby-lang.org/project/{name}/"
RAA_YAML       = "raa-yaml.yml"
GZIP_CMD       = "gzip -9f " + RAA_YAML

#ALL_PROJECTS   = "all_projects.html"
#PROJECT_DETAIL = "./{name}.html"

require 'open-uri'
require 'yaml'
require 'date'
require 'parsedate'

def file_get_contents( path, message=nil )
   puts message if message
   open( path ) { |f|
     f.read
   }
end

def project_list
   all_html = file_get_contents( ALL_PROJECTS, "get all projects" )

   # strip out everything but the names
   all_html.sub!(  /^.*?<a href="project\/([^>]*)\/">/m, "" )
   all_html.sub!(  /<\/table>.*<p class=\"count\">.*$/m, "" )
   all_html.gsub!( /<\/th>\s*<th>/m, "\t" )
   all_html.gsub!( /<\/a>.*?<a href=\"project\/([^>]*)\/">/m, "\n" )
   all_html.sub!(  /<\/a>.*$/m, "" )

   all_html.collect{ |project|
     { "name" => project.strip }
   }
end

def fill_in_details( project )
   detail_page = PROJECT_DETAIL.gsub( /\{\s*name\s*\}/i, project["name"] 
)
   project_html = file_get_contents( detail_page, "get details for 
#{project["name"]}..." )

   project["version"] = $1 if
     project_html =~ /#{project["name"]}\s+\/\s+([\w.]+)/m


   ["Short description",
    "Category",
    "Status",
    "Created",
    "Last update",
    "Owner",
    "Homepage",
    "Download",
    "License",
    "Dependency",
    "Description"
   ].each{ |attribute|
     if project_html =~ /<th>#{attribute}:\s*<\/th>\s*<td>(.*?)<\/td>/m
       value = $1
       project[attribute.downcase.gsub(/\s+/, "_")] = value
     end
   }

   # clean up attributes
   ["download",
    "category",
    "homepage"].each{ |attribute|
     project[attribute] = project[attribute].gsub( /<[^>]*>/, "" )
   }

   project["owner"] =~ /<a href=\"mailto:([^"]*)\">(.*)<\/a>.*id=(\d+)/m
   project["owner"] = {"email"  => $1,
                       "name"   => $2,
                       "raa-id" => $3}
   project
end

open( RAA_YAML, "w" ) { |f|
   f.puts project_list.collect{ |project|
     fill_in_details( project )
   }.to_yaml
}

`#{GZIP_CMD}`



More information about the Rubygems-developers mailing list