[Rubygems-developers] Why does an install command an update of the Gem source index ?

Hugh Sasse hgs at dmu.ac.uk
Fri Jun 3 08:45:54 EDT 2005


On Fri, 3 Jun 2005, Jim Weirich wrote:

> On Thursday 02 June 2005 07:35 pm, Hugh Sasse wrote:
>> But I think the source_index method in
>> lib/rubygems/remote_installer.rb should look for a SHA1 sum of
>> yaml.Z before it decides whether to get it or not. Supporting
>> If-Modified-Since and/or Etags would help also.  Or it could at
>> least check that the size hasn't changed.
>
> RubyGems does check for the size.  I thought it also used if-modified-since,
> but don't see it in the code.  I know we tried it at one point.  IIRC, it may
> have had trouble behind certain proxies, and I don't recall the workaround

It should be possible to fall back to the default position.  I might
have a play with that at the weekend.
> off the top of my head.
>
>> Rubygems-0.8.10 has a
>> cache, but it seems to need to figure out whether it is the system
>> or the user cache, and I've not understood what is happening there
>> yet.  [...]
>
> We do cache the specs locally, in a system-wide cache and a user-specific
> cache for those times the system-wide cache is non-writable (which is often

I couldn't see that without deeper exploration.  Thanks

> the case for Unix systems and almost never the case in Windows systems).
> Unfortunately, there are several gem releases every day and the cache goes
> out of date pretty fast.  I consider that a good thing, but it does present
> some special challenges.
>
> Making the spec listing download more efficient has been a topic of discussion
> in the past.  We are well aware of the current limitations.   Let me share

I certainly intended no negativity.  It's an interesting problem as
things get bigger, more data, more possible things to get updates,
etc.

> some of my ideas in this area.
>
> Being able to incrementally update the cache would be a big win, especially
> with the number of gems growing each day, just updating the ones that changed

Agreed.  I can't see how to do this without holdin N back issues of
the data, or making RCS/CVS/SVN a requirement of the server, which
is an unpleasant constraint.  Even the differential HTTP docs don't
have much advice about this.

> would be fairly zippy.  The key is to do it in a way where you don't suddenly
> require everyone in the world to update their copy of RubyGems at the same
> time (because the server protocol is changed).

Good point.
>
> Another feature of the current server protocol that I really like is that it
> can be implemented with a standard static file server.  In other words, you
> can dump the gems in a directory served by an apache server, run an update
> script to update the metadata in that directory and you have a gem server.
> This is perfect for RubyForge and is also how I run my personal gem server
> (http://onestepback.org/betagems).  I think we can upgrade the protocol
> without requiring a dynamic server configuration.

This is good as it would encourage mirroring.  [I wonder how easy
that is now %x{gemserver --mirror $URL} ?? ]  But I still think it
might be useful to explore not sending the indices at all, and
adding commands to the server (where there is one) for querying the
index.

>
> Here's the plan.  In addition to the current yaml file, make every individual
> gem spec available on the server as well.  Also keep a small index that maps
> gem name to its latest version.  (While the total number of gem-version
> combinations grows rapidly, the total number of unique gems (ignoring
> versions) grows much more slowly ... currently its under 300).   When gems

That duplication would be necessary for a files-only based server,
and a server you could query would have to support that API.

> determines that it is time to update the cache, it first attempts to download
> the version map index.  It then does individual downloads of only the gem
> specs that are out of date.  At some point it gives up and decides that it
> would be more efficient to download the entire yaml file in a single download

I suppose the cutoff point would depend on connection speed to some
extent.

> and then it falls back to the old method.  If it fails to get the version map
> index, it also fails back to the old method.  This allows it to work with old
> servers that don't yet support the new protocol.  Of course, compressed
> versions of each of the files can be made available, and gems will attempt to
> get the compressed versions first (as it does with the yaml file today).
>
> Anyways, that's the plan.  All we need is for someone to implement it.

that seems like a good plan.  I also wonder if it might be possible
to use ypath to record updates to the /yaml.Z, so one only needs to
download the update code newer than one's own yaml index.  The only
thing about that is that it would probably need to be executable
code and may thus be a potential security hole.

The only other thing I can think of is to split the yaml.Z file and
allow people to get the parts they need, and re-assemble it.

http://www.eng.cse.dmu.ac.uk/~hgs/ruby/splitter.rb

The splitting would need to be done on Gem boundaries, rather than
on size like that does.

40 kBits a second is -- assume about 8 kBits/sec overhead -- about
4kbytes/sec.  So if each chunk is about 40kbytes thats about 10 secs
per gem to update, as an update would mean the whole chunk must be
obtained.  Would that be sufficiently useful to Lothar?

>
         Hugh


More information about the Rubygems-developers mailing list