[Rubygems-developers] Why does an install command an update of the Gem source index ?

Jim Weirich jim at weirichhouse.org
Fri Jun 3 07:44:51 EDT 2005

On Thursday 02 June 2005 07:35 pm, Hugh Sasse wrote:
> But I think the source_index method in
> lib/rubygems/remote_installer.rb should look for a SHA1 sum of
> yaml.Z before it decides whether to get it or not. Supporting
> If-Modified-Since and/or Etags would help also.  Or it could at
> least check that the size hasn't changed.  

RubyGems does check for the size.  I thought it also used if-modified-since, 
but don't see it in the code.  I know we tried it at one point.  IIRC, it may 
have had trouble behind certain proxies, and I don't recall the workaround 
off the top of my head.

> Rubygems-0.8.10 has a
> cache, but it seems to need to figure out whether it is the system
> or the user cache, and I've not understood what is happening there
> yet.  [...]

We do cache the specs locally, in a system-wide cache and a user-specific 
cache for those times the system-wide cache is non-writable (which is often 
the case for Unix systems and almost never the case in Windows systems).  
Unfortunately, there are several gem releases every day and the cache goes 
out of date pretty fast.  I consider that a good thing, but it does present 
some special challenges.

Making the spec listing download more efficient has been a topic of discussion 
in the past.  We are well aware of the current limitations.   Let me share 
some of my ideas in this area.

Being able to incrementally update the cache would be a big win, especially 
with the number of gems growing each day, just updating the ones that changed 
would be fairly zippy.  The key is to do it in a way where you don't suddenly 
require everyone in the world to update their copy of RubyGems at the same 
time (because the server protocol is changed).  

Another feature of the current server protocol that I really like is that it 
can be implemented with a standard static file server.  In other words, you 
can dump the gems in a directory served by an apache server, run an update 
script to update the metadata in that directory and you have a gem server.  
This is perfect for RubyForge and is also how I run my personal gem server 
(http://onestepback.org/betagems).  I think we can upgrade the protocol 
without requiring a dynamic server configuration.

Here's the plan.  In addition to the current yaml file, make every individual 
gem spec available on the server as well.  Also keep a small index that maps 
gem name to its latest version.  (While the total number of gem-version 
combinations grows rapidly, the total number of unique gems (ignoring 
versions) grows much more slowly ... currently its under 300).   When gems 
determines that it is time to update the cache, it first attempts to download 
the version map index.  It then does individual downloads of only the gem 
specs that are out of date.  At some point it gives up and decides that it 
would be more efficient to download the entire yaml file in a single download 
and then it falls back to the old method.  If it fails to get the version map 
index, it also fails back to the old method.  This allows it to work with old 
servers that don't yet support the new protocol.  Of course, compressed 
versions of each of the files can be made available, and gems will attempt to 
get the compressed versions first (as it does with the yaml file today).

Anyways, that's the plan.  All we need is for someone to implement it.

-- Jim Weirich    jim at weirichhouse.org     http://onestepback.org
"Beware of bugs in the above code; I have only proved it correct, 
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

More information about the Rubygems-developers mailing list