From gsinclair at soyabean.com.au Tue Nov 9 02:04:33 2004 From: gsinclair at soyabean.com.au (Gavin Sinclair) Date: Tue Nov 9 02:06:08 2004 Subject: [Rubygems-developers] Updating source index is slow Message-ID: <159322863853.20041109180433@soyabean.com.au> Updating the source index from the rubyforge server is slow. This could be sped up if there were a pure-Ruby rsync implementation available (for client and server). Discuss. Cheers, Gavin From patrick at hexane.org Tue Nov 9 21:47:39 2004 From: patrick at hexane.org (Patrick May) Date: Tue Nov 9 21:47:38 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: <159322863853.20041109180433@soyabean.com.au> Message-ID: On Tuesday, November 9, 2004, at 02:04 AM, Gavin Sinclair wrote: > Updating the source index from the rubyforge server is slow. This > could be sped up if there were a pure-Ruby rsync implementation > available (for client and server). Discuss. Does the rubygems client need the entire source index? It seems to me that one would be more likely to ask for a list of packages, then drill individually into the metadata for particular pages. I may be ignorant here -- I'm assuming the client is hitting http://gems.rubyforge.org/yaml ~ Patrick From chad at chadfowler.com Wed Nov 10 07:39:17 2004 From: chad at chadfowler.com (Chad Fowler) Date: Wed Nov 10 07:39:17 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: References: Message-ID: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> On 09-Nov-04, at 9:47 PM, Patrick May wrote: > > On Tuesday, November 9, 2004, at 02:04 AM, Gavin Sinclair wrote: > >> Updating the source index from the rubyforge server is slow. This >> could be sped up if there were a pure-Ruby rsync implementation >> available (for client and server). Discuss. > > Does the rubygems client need the entire source index? It seems to me > that one would be more likely to ask for a list of packages, then > drill individually into the metadata for particular pages. > > I may be ignorant here -- I'm assuming the client is hitting > http://gems.rubyforge.org/yaml > > You're right. It grabs (and caches) all of the gem metadata. Your idea for a solution might be the most pragmatic (as opposed to implementing rsync in ruby ;). How big an issue does it seem to be? For me, the gem repository doesn't update (and therefore I don't have to redownload the yaml) enough to be a serious drag, but I can see how as more gems are released it will get more annoying on the client. Chad From batsman.geo at yahoo.com Wed Nov 10 07:55:17 2004 From: batsman.geo at yahoo.com (Mauricio =?iso-8859-1?Q?Fern=E1ndez?=) Date: Wed Nov 10 07:55:12 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> Message-ID: <20041110125517.GA32125@student.ei.uni-stuttgart.de> On Wed, Nov 10, 2004 at 07:39:17AM -0500, Chad Fowler wrote: > > On 09-Nov-04, at 9:47 PM, Patrick May wrote: > > > > >On Tuesday, November 9, 2004, at 02:04 AM, Gavin Sinclair wrote: > > > >>Updating the source index from the rubyforge server is slow. This > >>could be sped up if there were a pure-Ruby rsync implementation > >>available (for client and server). Discuss. > > > >Does the rubygems client need the entire source index? It seems to me > >that one would be more likely to ask for a list of packages, then > >drill individually into the metadata for particular pages. > > > >I may be ignorant here -- I'm assuming the client is hitting > >http://gems.rubyforge.org/yaml > > > > > > You're right. It grabs (and caches) all of the gem metadata. Your > idea for a solution might be the most pragmatic (as opposed to > implementing rsync in ruby ;). > > How big an issue does it seem to be? For me, the gem repository > doesn't update (and therefore I don't have to redownload the yaml) > enough to be a serious drag, but I can see how as more gems are > released it will get more annoying on the client. Take a look at http://gems.rubyforge.org/usage/usage_200411.html yaml.z and yaml account for over 50% of the traffic... -- Hassle-free packages for Ruby? RPA is available from http://www.rubyarchive.org/ From hgs at dmu.ac.uk Wed Nov 10 08:11:16 2004 From: hgs at dmu.ac.uk (Hugh Sasse Staff Elec Eng) Date: Wed Nov 10 08:11:18 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> Message-ID: On Wed, 10 Nov 2004, Chad Fowler wrote: >> I may be ignorant here -- I'm assuming the client is hitting >> http://gems.rubyforge.org/yaml >> >> > > You're right. It grabs (and caches) all of the gem metadata. Your idea for [...] > How big an issue does it seem to be? For me, the gem repository doesn't > update (and therefore I don't have to redownload the yaml) enough to be a > serious drag, but I can see how as more gems are released it will get more > annoying on the client. A quick glance through the code suggests that it doesn't try to use compression by default. HTTP headers like: "accept-encoding" => "gzip;q=1.0, " + "identity; q=0.5, " + "*;q=0 " which we can handle with things in the standard library now, something like: if answer.key?("content-encoding") case answer["content-encoding"] when 'gzip' gz = Zlib::GzipReader.new(StringIO.new(answer.read_body)) body = gz.read [...] end [...] end [Cut, pasted and pruned from my hacked version of rubric] > > Chad Hugh From rich at infoether.com Wed Nov 10 08:17:59 2004 From: rich at infoether.com (Richard Kilmer) Date: Wed Nov 10 08:17:54 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: Message-ID: yaml.Z is a compressed file...that is the one downloaded by default (not the source 'yaml' file). I will look into downloading just the updated gemspecs and see if that can be used to speed things up. It would obviously be a cgi...wanted to stay away from that...but CPU is cheeper than bandwidth right now! -rich On 11/10/04 8:11 AM, "Hugh Sasse Staff Elec Eng" wrote: > On Wed, 10 Nov 2004, Chad Fowler wrote: > >>> I may be ignorant here -- I'm assuming the client is hitting >>> http://gems.rubyforge.org/yaml >>> >>> >> >> You're right. It grabs (and caches) all of the gem metadata. Your idea for > [...] >> How big an issue does it seem to be? For me, the gem repository doesn't >> update (and therefore I don't have to redownload the yaml) enough to be a >> serious drag, but I can see how as more gems are released it will get more >> annoying on the client. > > A quick glance through the code suggests that it doesn't try to use > compression by default. HTTP headers like: > > "accept-encoding" => "gzip;q=1.0, " + "identity; q=0.5, " + "*;q=0 " > > which we can handle with things in the standard library now, something like: > > if answer.key?("content-encoding") > case answer["content-encoding"] > when 'gzip' > gz = Zlib::GzipReader.new(StringIO.new(answer.read_body)) > body = gz.read > [...] > end > [...] > end > > [Cut, pasted and pruned from my hacked version of rubric] >> >> Chad > > Hugh > _______________________________________________ > Rubygems-developers mailing list > Rubygems-developers@rubyforge.org > http://rubyforge.org/mailman/listinfo/rubygems-developers > From batsman.geo at yahoo.com Wed Nov 10 08:21:53 2004 From: batsman.geo at yahoo.com (Mauricio =?iso-8859-1?Q?Fern=E1ndez?=) Date: Wed Nov 10 08:21:48 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> Message-ID: <20041110132153.GA1545@student.ei.uni-stuttgart.de> On Wed, Nov 10, 2004 at 01:11:16PM +0000, Hugh Sasse Staff Elec Eng wrote: > A quick glance through the code suggests that it doesn't try to use > compression by default. HTTP headers like: > > "accept-encoding" => "gzip;q=1.0, " + "identity; q=0.5, " + "*;q=0 " > > which we can handle with things in the standard library now, something like: The stats indicate the following: yaml.Z 7178 hits 287305K (42.87%) yaml 94 hits 54518K (8.13%) Even if all the traffic associated to the index was compressed, it would account for over 40% of the total traffic... -- Hassle-free packages for Ruby? RPA is available from http://www.rubyarchive.org/ From gsinclair at soyabean.com.au Wed Nov 10 08:24:08 2004 From: gsinclair at soyabean.com.au (Gavin Sinclair) Date: Wed Nov 10 08:24:14 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> Message-ID: <63431934699.20041111002408@soyabean.com.au> On Wednesday, November 10, 2004, 11:39:17 PM, Chad wrote: > On 09-Nov-04, at 9:47 PM, Patrick May wrote: >> >> On Tuesday, November 9, 2004, at 02:04 AM, Gavin Sinclair wrote: >> >>> Updating the source index from the rubyforge server is slow. This >>> could be sped up if there were a pure-Ruby rsync implementation >>> available (for client and server). Discuss. >> >> Does the rubygems client need the entire source index? It seems to me >> that one would be more likely to ask for a list of packages, then >> drill individually into the metadata for particular pages. >> >> I may be ignorant here -- I'm assuming the client is hitting >> http://gems.rubyforge.org/yaml >> >> > You're right. It grabs (and caches) all of the gem metadata. Your > idea for a solution might be the most pragmatic (as opposed to > implementing rsync in ruby ;). Nice and pragmatic, but I think the client should have all the metadata, all else being equal, to enable greater searches. Especially in GUI clients. Aside: I'm getting annoyed with being asked about dependencies one by one. The entire dependency tree for a gem is knowable upfront, so it would be nice to OK them all at once. That can only happen if the client has the metadata. I agree implementing rsync is not worth it. But were it to exist by some other means... :) > How big an issue does it seem to be? For me, the gem repository > doesn't update (and therefore I don't have to redownload the yaml) > enough to be a serious drag, but I can see how as more gems are > released it will get more annoying on the client. I notice it a lot. A substantial download every time a gem gets added... (about 30 of them since RubyConf). Gavin From jim at weirichhouse.org Wed Nov 10 08:42:10 2004 From: jim at weirichhouse.org (Jim Weirich) Date: Wed Nov 10 08:40:54 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> Message-ID: <200411100842.10784.jim@weirichhouse.org> On Wednesday 10 November 2004 07:39 am, Chad Fowler wrote: > You're right. ?It grabs (and caches) all of the gem metadata. ? Your > idea for a solution might be the most pragmatic (as opposed to > implementing rsync in ruby ;). Do we use HEAD to get the time stamp of the file? Then we would just need to download it whenever it changes. -- -- Jim Weirich jim@weirichhouse.org http://onestepback.org ----------------------------------------------------------------- "Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas) From hgs at dmu.ac.uk Wed Nov 10 09:15:37 2004 From: hgs at dmu.ac.uk (Hugh Sasse Staff Elec Eng) Date: Wed Nov 10 09:15:43 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: References: Message-ID: On Wed, 10 Nov 2004, Richard Kilmer wrote: > yaml.Z is a compressed file...that is the one downloaded by default (not the > source 'yaml' file). gzip generally "beats" compress though, and I'm fairly sure that would be true for YAML data... The HTTP 1.1 spec seems to have no support for bzip2, unfortunately. Hugh From eivind at FreeBSD.org Wed Nov 10 09:38:33 2004 From: eivind at FreeBSD.org (Eivind Eklund) Date: Wed Nov 10 09:38:41 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: <200411100842.10784.jim@weirichhouse.org> References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> <200411100842.10784.jim@weirichhouse.org> Message-ID: <20041110143833.GB8888@FreeBSD.org> On Wed, Nov 10, 2004 at 08:42:10AM -0500, Jim Weirich wrote: > On Wednesday 10 November 2004 07:39 am, Chad Fowler wrote: > > You're right. ?It grabs (and caches) all of the gem metadata. ? Your > > idea for a solution might be the most pragmatic (as opposed to > > implementing rsync in ruby ;). > > Do we use HEAD to get the time stamp of the file? Then we would just need to > download it whenever it changes. You already do something quite a bit like this; a GET is started, and from this GET you retrieve the Content-Length. If this match the cached data, you use that; if not, you start a new GET and retrieve the file. This is present from 0.8.0 and up; older clients will always download the complete file. I don't know your client distribution, but this may be the primary source of the load described elsewhere in the thread. Eivind. From rich at infoether.com Wed Nov 10 09:58:27 2004 From: rich at infoether.com (Richard Kilmer) Date: Wed Nov 10 09:58:21 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: Message-ID: On 11/10/04 9:15 AM, "Hugh Sasse Staff Elec Eng" wrote: > On Wed, 10 Nov 2004, Richard Kilmer wrote: > >> yaml.Z is a compressed file...that is the one downloaded by default (not the >> source 'yaml' file). > > gzip generally "beats" compress though, and I'm fairly sure that > would be true for YAML data... The HTTP 1.1 spec seems > to have no support for bzip2, unfortunately. It uses zlib. > > Hugh > _______________________________________________ > Rubygems-developers mailing list > Rubygems-developers@rubyforge.org > http://rubyforge.org/mailman/listinfo/rubygems-developers > From hgs at dmu.ac.uk Wed Nov 10 10:32:12 2004 From: hgs at dmu.ac.uk (Hugh Sasse Staff Elec Eng) Date: Wed Nov 10 10:32:10 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: References: Message-ID: On Wed, 10 Nov 2004, Richard Kilmer wrote: > On 11/10/04 9:15 AM, "Hugh Sasse Staff Elec Eng" wrote: > >> On Wed, 10 Nov 2004, Richard Kilmer wrote: >> >>> yaml.Z is a compressed file...that is the one downloaded by default (not the >>> source 'yaml' file). >> >> gzip generally "beats" compress though, and I'm fairly sure that >> would be true for YAML data... The HTTP 1.1 spec seems >> to have no support for bzip2, unfortunately. > > It uses zlib. and its inflate method, which refers to compressed files, without specifying gzip, and it doesn't use GZipReader, suggesting it isn't gzip. There is abiguity here, though: "compressed" in the zlib docs could mean any data compression scheme and it could mean Unix compress. Inflate/deflate is a compression type reported by PKZIP (I think) and InfoZIP, Extensions of .Z mean unix compress, ZIP and Gzip use .ZIP and .gz, respectively. Are you telling me that it is definitely using GZIP now? If so, why isn't the file yaml.gz rather than yaml.Z? Hugh From rich at infoether.com Wed Nov 10 10:44:47 2004 From: rich at infoether.com (Richard Kilmer) Date: Wed Nov 10 10:44:47 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: Message-ID: On 11/10/04 10:32 AM, "Hugh Sasse Staff Elec Eng" wrote: > >> It uses zlib. > > and its inflate method, which refers to compressed files, without > specifying gzip, and it doesn't use GZipReader, suggesting it isn't > gzip. There is abiguity here, though: "compressed" in the zlib docs > could mean any data compression scheme and it could mean Unix > compress. Inflate/deflate is a compression type reported by PKZIP > (I think) and InfoZIP, Extensions of .Z mean unix compress, ZIP and > Gzip use .ZIP and .gz, respectively. Right... > > Are you telling me that it is definitely using GZIP now? If so, why > isn't the file yaml.gz rather than yaml.Z? Because I am a goof. > > Hugh From hgs at dmu.ac.uk Wed Nov 10 11:39:34 2004 From: hgs at dmu.ac.uk (Hugh Sasse Staff Elec Eng) Date: Wed Nov 10 11:41:16 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: References: Message-ID: On Wed, 10 Nov 2004, Richard Kilmer wrote: > I will look into downloading just the updated gemspecs and see if that can > be used to speed things up. It would obviously be a cgi...wanted to stay > away from that...but CPU is cheeper than bandwidth right now! While looking around at the compression issues for HTTP I found RFC3229 "Delta encoding in HTTP" which might be of use, but looks less than obvious to do correctly. However, if we have good control over the server to support this, through our contacts at RubyForge, then that's more than half the battle, because RubyGems is the client in this case and we have full control over what that does. Hugh From hgs at dmu.ac.uk Wed Nov 10 12:11:42 2004 From: hgs at dmu.ac.uk (Hugh Sasse Staff Elec Eng) Date: Wed Nov 10 12:11:52 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: <20041110143833.GB8888@FreeBSD.org> References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> <200411100842.10784.jim@weirichhouse.org> <20041110143833.GB8888@FreeBSD.org> Message-ID: On Wed, 10 Nov 2004, Eivind Eklund wrote: > On Wed, Nov 10, 2004 at 08:42:10AM -0500, Jim Weirich wrote: >> Do we use HEAD to get the time stamp of the file? Then we would just need to >> download it whenever it changes. > > You already do something quite a bit like this; a GET is started, and > from this GET you retrieve the Content-Length. If this match the cached > data, you use that; if not, you start a new GET and retrieve the file. I really don't like this algorithm. The program uses open-uri for most of its work, which is great for getting things going, proof-of-concept and so on, but for a scalable application I think we should aim to move to Net::HTTP and follow the protocol more closely. [1] Why? A Get is started. Well, when the server responds with the content length it also responds with everything else as well, and in the case of a 200 response, that includes the whole of the resource. To the best of my knowledge (and because of my attempts to get this working with Rubric I've read it fairly recently) there is no way in the protocol to stop a get in the middle. Indeed (indent munged): open(uri_str, :proxy => @http_proxy, :content_length_proc => lambda {|t| size = t; raise "break"}) {|i| } doesn't tell the server to stop sending the contents. If the server detects something has stopped, then whether it does so "in time" is rather like a race condition. Running over a 56k modem this is rather likely to be too late. Suppose the contents change, but the length doesn't. At present we would be unable to detect this. "if not, you start a new GET and retrieve the file" Then you get it again. Ouch. I think we should be using the head method, and the Etag, Last-Modified and any other applicable headers, which really necessitates using Net::HTTP. Much more tedious to program, but much more courteous to the server('s owners). > > This is present from 0.8.0 and up; older clients will always download > the complete file. I don't know your client distribution, but this may > be the primary source of the load described elsewhere in the thread. Getting it twice really doesn't help this. > > Eivind. > Hugh [1] Please note: I am really in favour of this project, and think criticism that is intended be constructive is a valid part of "first make it work, then make it work right, then make it fast". http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast This approach (open-uri) is perfectly valid for early stages of the project, but I think we nee to move beyond it if downloads are becoming a problem. From chad at chadfowler.com Wed Nov 10 12:49:53 2004 From: chad at chadfowler.com (chad@chadfowler.com) Date: Wed Nov 10 12:46:17 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> <200411100842.10784.jim@weirichhouse.org> <20041110143833.GB8888@FreeBSD.org> Message-ID: <24112.207.40.151.167.1100108993.squirrel@207.40.151.167> > On Wed, 10 Nov 2004, Eivind Eklund wrote: > >> On Wed, Nov 10, 2004 at 08:42:10AM -0500, Jim Weirich wrote: >>> Do we use HEAD to get the time stamp of the file? Then we would just >>> need to >>> download it whenever it changes. >> >> You already do something quite a bit like this; a GET is started, and >> from this GET you retrieve the Content-Length. If this match the cached >> data, you use that; if not, you start a new GET and retrieve the file. > > I really don't like this algorithm. The program uses open-uri for > most of its work, which is great for getting things going, > proof-of-concept and so on, but for a scalable application I think > we should aim to move to Net::HTTP and follow the protocol more > closely. [1] Why? > > A Get is started. Well, when the server responds with the content > length it also responds with everything else as well, and in the > case of a 200 response, that includes the whole of the resource. To > the best of my knowledge (and because of my attempts to get this > working with Rubric I've read it fairly recently) there is no way in > the protocol to stop a get in the middle. Indeed (indent munged): > > open(uri_str, :proxy => @http_proxy, :content_length_proc => > lambda {|t| size = t; raise "break"}) {|i| } > > doesn't tell the server to stop sending the contents. If the server > detects something has stopped, then whether it does so "in time" is > rather like a race condition. Running over a 56k modem this is > rather likely to be too late. > > Suppose the contents change, but the length doesn't. At present we > would be unable to detect this. > > "if not, you start a new GET and retrieve the file" Then you get it > again. Ouch. > Yea, your logic makes sense. It could be that the current code is going to be slow whether we download the new index or not. It's definitely not "good". > I think we should be using the head method, and the Etag, > Last-Modified and any other applicable headers, which really > necessitates using Net::HTTP. Much more tedious to program, but > much more courteous to the server('s owners). > >> I actually tried to do if-modified-since originally, and I ran into problems with RubyForge not responding correctly (very weird stuff that Tom Copeland and I couldn't figure out). My ruby code was working on every other server I tried, but I was taking too long to get it to work, so Rich stepped in and whipped up the current incarnation. I think using If-Modified-Since is the right way to go. We wouldn't actually need to use the HEAD method in this case, since the "don't send data" behavior is built into the HTTP spec when using If-Modified-Since. >> This is present from 0.8.0 and up; older clients will always download >> the complete file. I don't know your client distribution, but this may >> be the primary source of the load described elsewhere in the thread. > > Getting it twice really doesn't help this. >> >> Eivind. >> > > Hugh > > [1] Please note: I am really in favour of this project, and > think criticism that is intended be constructive is a valid part of > "first make it work, then make it work right, then make it fast". > > http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast > I totally agree. We're in stages 2 and 3 right now. > This approach (open-uri) is perfectly valid for early stages of the > project, but I think we nee to move beyond it if downloads are > becoming a problem. > _______________________________________________ We can actually use open-uri with the If-Modified-Since approach. I think that would be ideal. Thanks for your comments and ideas, Hugh. Chad From eivind at FreeBSD.org Wed Nov 10 12:50:42 2004 From: eivind at FreeBSD.org (Eivind Eklund) Date: Wed Nov 10 12:50:52 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> <200411100842.10784.jim@weirichhouse.org> <20041110143833.GB8888@FreeBSD.org> Message-ID: <20041110175042.GE8888@FreeBSD.org> On Wed, Nov 10, 2004 at 05:11:42PM +0000, Hugh Sasse Staff Elec Eng wrote: > On Wed, 10 Nov 2004, Eivind Eklund wrote: > >On Wed, Nov 10, 2004 at 08:42:10AM -0500, Jim Weirich wrote: > >>Do we use HEAD to get the time stamp of the file? Then we would just > >>need to > >>download it whenever it changes. > > > >You already do something quite a bit like this; a GET is started, and > >from this GET you retrieve the Content-Length. If this match the cached > >data, you use that; if not, you start a new GET and retrieve the file. > > I really don't like this algorithm. The main problem is IMO the use of size as a discriminator. The rest isn't quite as bad as it sounds like. > The program uses open-uri for > most of its work, which is great for getting things going, > proof-of-concept and so on, but for a scalable application I think > we should aim to move to Net::HTTP and follow the protocol more > closely. [1] Why? > > A Get is started. Well, when the server responds with the content > length it also responds with everything else as well, and in the > case of a 200 response, that includes the whole of the resource. To > the best of my knowledge (and because of my attempts to get this > working with Rubric I've read it fairly recently) there is no way in > the protocol to stop a get in the middle. Well, the protocol is layered. There is no way to tell it at HTTP layer; however, there is at the TCP layer. > Indeed (indent munged): > > open(uri_str, :proxy => @http_proxy, :content_length_proc => > lambda {|t| size = t; raise "break"}) {|i| } > > doesn't tell the server to stop sending the contents. If the server > detects something has stopped, then whether it does so "in time" is > rather like a race condition. Running over a 56k modem this is > rather likely to be too late. No. This is going to be blocked by Nagle's algorithm in the TCP stack. The net result is that you get two three-way handshakes plus two or three extra 1500 byte packets (assuming Ether MTU) plus three extra roundtrip delays in Nagle acceleration. The problem would be with FAST networks, because Nagle would outrace the process time slicing in the system, so the reset above would come after there was a bunch of data in the pipeline. > Suppose the contents change, but the length doesn't. At present we > would be unable to detect this. Correct. However, I believe the the file presently monotonically grows (because old versions are not removed), so this may not be an issue. > "if not, you start a new GET and retrieve the file" Then you get it > again. Ouch. > > I think we should be using the head method, and the Etag, > Last-Modified and any other applicable headers, which really > necessitates using Net::HTTP. Much more tedious to program, but > much more courteous to the server('s owners). Either that, or run an rsync implementation. I think the latter would be best, but more work. > >This is present from 0.8.0 and up; older clients will always download > >the complete file. I don't know your client distribution, but this may > >be the primary source of the load described elsewhere in the thread. > > Getting it twice really doesn't help this. Sure. But it may be a minor issue; it's hard to tell. > [1] Please note: I am really in favour of this project, I'm in favour of RubyGems as long as RubyGems gets done right :-) I especially see the commitment to becoming repackager friendly as important, as repackaging is crucial for making software effectively usable for many users on many of the relevant platforms. > and > think criticism that is intended be constructive is a valid part of > "first make it work, then make it work right, then make it fast". > > http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast One more thing around this: Published interfacess and non-controlled data have important influence on how it's appropriate to think around things. For instance, the present RubyGem client versions will continue loading the server until everybody's upgraded. And the "File called .Z but is really .gz" API bug (and that file is an API) will need to be supported for a long while. I got my thinking changed quite a bit when I started thinking specifically about "published" vs "non-published" interfaces. It helped a lot of stuff get organized. See http://www.martinfowler.com/bliki/PublishedInterface.html for Martin Fowler's quick comments on the same. Eivind. From jim at weirichhouse.org Wed Nov 10 12:54:33 2004 From: jim at weirichhouse.org (Jim Weirich) Date: Wed Nov 10 12:54:27 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com><200411100842.10784.jim@weirichhouse.org><20041110143833.GB8888@FreeBSD.org> Message-ID: <40329.192.223.163.6.1100109273.squirrel@weirichhouse.org> Hugh Sasse Staff Elec Eng said: > On Wed, 10 Nov 2004, Eivind Eklund wrote: >> You already do something quite a bit like this; a GET is started, and >> from this GET you retrieve the Content-Length. If this match the cached >> data, you use that; if not, you start a new GET and retrieve the file. > > I really don't like this algorithm. The program uses open-uri for > most of its work, which is great for getting things going, > proof-of-concept and so on, but for a scalable application I think > we should aim to move to Net::HTTP and follow the protocol more > closely. [...] I concur. -- -- Jim Weirich jim@weirichhouse.org http://onestepback.org ----------------------------------------------------------------- "Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas) From hgs at dmu.ac.uk Wed Nov 10 13:37:23 2004 From: hgs at dmu.ac.uk (Hugh Sasse Staff Elec Eng) Date: Wed Nov 10 13:38:21 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: <24112.207.40.151.167.1100108993.squirrel@207.40.151.167> References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> <200411100842.10784.jim@weirichhouse.org> <20041110143833.GB8888@FreeBSD.org> <24112.207.40.151.167.1100108993.squirrel@207.40.151.167> Message-ID: On Wed, 10 Nov 2004 chad@chadfowler.com wrote: > > Yea, your logic makes sense. It could be that the current code is going Thank you. > to be slow whether we download the new index or not. It's definitely not > "good". > >> I think we should be using the head method, and the Etag, >> Last-Modified and any other applicable headers, which really [...] > > I actually tried to do if-modified-since originally, and I ran into > problems with RubyForge not responding correctly (very weird stuff that > Tom Copeland and I couldn't figure out). My ruby code was working on Have you got this recorded anywhere? I've not seen any traffic on this, and while it may well baffle me as well, there may be some who recognise what is going on. > every other server I tried, but I was taking too long to get it to work, > so Rich stepped in and whipped up the current incarnation. I think using > If-Modified-Since is the right way to go. We wouldn't actually need to > use the HEAD method in this case, since the "don't send data" behavior is > built into the HTTP spec when using If-Modified-Since. OK. If the resource gets mirrored to a server that only supports Etag it would be good to have, but YAGNI for now is fine with me. > [...] >> >> [1] Please note: I am really in favour of this project, and >> think criticism that is intended be constructive is a valid part of >> "first make it work, then make it work right, then make it fast". >> >> http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast >> > > I totally agree. We're in stages 2 and 3 right now. > [...] > > We can actually use open-uri with the If-Modified-Since approach. I think > that would be ideal. But it won't really support the ETag & size comparison methods, or will it? Maybe we can just drop those and use If-Modified-Since, then? Can it support GZIP also? Maybe there's a case for sending patches for open-uri to the appropriate person, but it's outside the scope of this project.... > > Thanks for your comments and ideas, Hugh. I'm glad they've been received in the spirit they were intended. :-) > > Chad > Thank you, Hugh From hgs at dmu.ac.uk Wed Nov 10 14:06:13 2004 From: hgs at dmu.ac.uk (Hugh Sasse Staff Elec Eng) Date: Wed Nov 10 14:06:23 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: <20041110175042.GE8888@FreeBSD.org> References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> <200411100842.10784.jim@weirichhouse.org> <20041110143833.GB8888@FreeBSD.org> <20041110175042.GE8888@FreeBSD.org> Message-ID: On Wed, 10 Nov 2004, Eivind Eklund wrote: > On Wed, Nov 10, 2004 at 05:11:42PM +0000, Hugh Sasse Staff Elec Eng wrote: >> A Get is started. Well, when the server responds with the content >> length it also responds with everything else as well, and in the >> case of a 200 response, that includes the whole of the resource. To >> the best of my knowledge (and because of my attempts to get this >> working with Rubric I've read it fairly recently) there is no way in >> the protocol to stop a get in the middle. > > Well, the protocol is layered. There is no way to tell it at HTTP > layer; however, there is at the TCP layer. When we stop reading the server will stop sending? Classic producer consumer problem: I suppose it must... > >> Indeed (indent munged): >> >> open(uri_str, :proxy => @http_proxy, :content_length_proc => >> lambda {|t| size = t; raise "break"}) {|i| } >> >> doesn't tell the server to stop sending the contents. If the server >> detects something has stopped, then whether it does so "in time" is >> rather like a race condition. Running over a 56k modem this is >> rather likely to be too late. > > No. This is going to be blocked by Nagle's algorithm in the TCP stack. It's about time I looked that up... Oh, RFC 896. It seems to abolish sliding windows which made Kermit really fast, but I see the point I think... > The net result is that you get two three-way handshakes plus two or > three extra 1500 byte packets (assuming Ether MTU) plus three extra > roundtrip delays in Nagle acceleration. > > The problem would be with FAST networks, because Nagle would outrace the > process time slicing in the system, so the reset above would come after > there was a bunch of data in the pipeline. Yes, good point. I was forgetting about the handshaking during the transfer and was only thinking about waiting for the data to end. > >> Suppose the contents change, but the length doesn't. At present we >> would be unable to detect this. > > Correct. However, I believe the the file presently monotonically grows > (because old versions are not removed), so this may not be an issue. That structure may ba a problem later... [...] >> necessitates using Net::HTTP. Much more tedious to program, but >> much more courteous to the server('s owners). > > Either that, or run an rsync implementation. I think the latter would > be best, but more work. I don't know enough about that protocol to comment. [...] >> http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast > > One more thing around this: [...] > I got my thinking changed quite a bit when I started thinking > specifically about "published" vs "non-published" interfaces. It helped > a lot of stuff get organized. See > http://www.martinfowler.com/bliki/PublishedInterface.html > for Martin Fowler's quick comments on the same. Interesting stuff. I wonder if its worth raising an RCR for a "published" keyword in Ruby? Paul Graham seems to suggest that the more a language allows a program to express ideas about itself, the more powerful it is, in this article: http://www.paulgraham.com/avg.html He's arguing for lisp macros, but I think it applies to this and to design by contract. > > Eivind. > Thank you, Hugh From hgs at dmu.ac.uk Wed Nov 10 14:28:23 2004 From: hgs at dmu.ac.uk (Hugh Sasse Staff Elec Eng) Date: Wed Nov 10 14:31:13 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> Message-ID: On Wed, 10 Nov 2004, Chad Fowler wrote: > > On 09-Nov-04, at 9:47 PM, Patrick May wrote: > >> I may be ignorant here -- I'm assuming the client is hitting >> http://gems.rubyforge.org/yaml >> >> > > You're right. It grabs (and caches) all of the gem metadata. Your idea for While my brain is still fizzing with this stuff: Is there a reason why the Yaml itself can't be a gem? That would allow us to use the versioning information gems already have for this, which might help with all the Last-Modified weirdness you were getting. If it were a gem, then could we make gem update 'patch' what we have, rather than do a full install? Maybe not immediately, but in the future? Hugh From tom.clarke at gmail.com Wed Nov 10 14:39:46 2004 From: tom.clarke at gmail.com (Tom Clarke) Date: Wed Nov 10 14:39:40 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> Message-ID: Hi, I've been lurking here for a bit, and just let me say that ruby gems is looking great and everything I wanted for the now defunct RAAInstall (which I gave life to and subsequently killed). The issue I encounted with raa install was that for an RAA sized database (which gems isn't yet) it was very difficult to get the information down to a reasonable size. I ended up producing a drastically compressed format (in terms of what data I put in the download file). The gems I've seen are comparatively verbose. I would suggest that the end solution probably needs to be either of: * A gentoo like system where each package has it's own file, and be synced with rsync or a clone (this has the benefit of being easy to mirror in an efficient manner) * A client server system where package details are queried on demand. Probably not necessary yet, but perhaps something to think about. One question I had. Is anyone was working on making DBI into gems? -Tom On Wed, 10 Nov 2004 19:28:23 +0000 (WET), Hugh Sasse Staff Elec Eng wrote: > On Wed, 10 Nov 2004, Chad Fowler wrote: > > > > > On 09-Nov-04, at 9:47 PM, Patrick May wrote: > > > >> I may be ignorant here -- I'm assuming the client is hitting > >> http://gems.rubyforge.org/yaml > >> > >> > > > > You're right. It grabs (and caches) all of the gem metadata. Your idea for > > While my brain is still fizzing with this stuff: > > Is there a reason why the Yaml itself can't be a gem? That would allow us > to use the versioning information gems already have for this, which > might help with all the Last-Modified weirdness you were getting. > > If it were a gem, then could we make gem update 'patch' what we have, > rather than do a full install? Maybe not immediately, but in the > future? > > Hugh > > > > > _______________________________________________ > Rubygems-developers mailing list > Rubygems-developers@rubyforge.org > http://rubyforge.org/mailman/listinfo/rubygems-developers > From bitserf at gmail.com Wed Nov 10 15:37:33 2004 From: bitserf at gmail.com (leon breedt) Date: Wed Nov 10 15:37:27 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> <200411100842.10784.jim@weirichhouse.org> <20041110143833.GB8888@FreeBSD.org> Message-ID: <270bd0c4041110123767ce6aa6@mail.gmail.com> On Wed, 10 Nov 2004 17:11:42 +0000 (WET), Hugh Sasse Staff Elec Eng wrote: > I really don't like this algorithm. The program uses open-uri for > most of its work, which is great for getting things going, > proof-of-concept and so on, but for a scalable application I think > we should aim to move to Net::HTTP and follow the protocol more > closely. [1] Why? Another reason for moving away from open-uri would be that the HTTP implementation in open-uri uses an internal Buffer class, which uses Tempfile or a StringIO. This makes partial downloads using Range: for servers that support it problematic without modifying open-uri.rb. Leon From gsinclair at soyabean.com.au Wed Nov 10 19:55:34 2004 From: gsinclair at soyabean.com.au (Gavin Sinclair) Date: Wed Nov 10 19:55:44 2004 Subject: [Rubygems-developers] Optimisation suggestion (was: Updating source index is slow) In-Reply-To: References: Message-ID: <162473420072.20041111115534@soyabean.com.au> On Thursday, November 11, 2004, 12:17:59 AM, Richard wrote: > I will look into downloading just the updated gemspecs and see if that can > be used to speed things up. It would obviously be a cgi...wanted to stay > away from that...but CPU is cheeper than bandwidth right now! Here's a way you could do it. Have an index file that looks like this: # Gem Name MD5 1 rake-0.4.4 fadflhdflkashflaewjhalewkjhf 2 rake-0.4.8 fkasjhflkjehflkaht4luhdlkfjl .... (Or use YAML.) That can obviously be downloaded quickly. Then each gemspec is in its own file (rake-0.4.4.gemspec, etc.) and is downloaded separately. The client sees what's in its current index, and assumes that it has all the corresponding gemspec files. It examines the fresh index and sees which numbers it's missing. It sends a request to the server, containing a list of the numbers. The server responds with a datastream of the corresponding gemspecs. The client then writes all the gemspec files. As a further optimisation, the index file could include the project name and summary, making the list/search command much faster. The 'gem check' command can verify the soundness of the client-side information, and if necessary throw it all out and get fresh information from the server. Does all this sound plausible? Gavin From patrick at hexane.org Thu Nov 11 00:19:09 2004 From: patrick at hexane.org (Patrick May) Date: Thu Nov 11 00:19:10 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: <63431934699.20041111002408@soyabean.com.au> Message-ID: <372CB4F8-33A1-11D9-8966-000A95848050@hexane.org> Hello, On Wednesday, November 10, 2004, at 08:24 AM, Gavin Sinclair wrote: >>> Does the rubygems client need the entire source index? It seems to >>> me >>> that one would be more likely to ask for a list of packages, then >>> drill individually into the metadata for particular pages. > >> You're right. It grabs (and caches) all of the gem metadata. Your >> idea for a solution might be the most pragmatic (as opposed to >> implementing rsync in ruby ;). > > Nice and pragmatic, but I think the client should have all the > metadata, all else being equal, to enable greater searches. > Especially in GUI clients. well, god forbid we have *2* cron jobs running on the server :-) Kidding aside, the client and server are dependent on each other. Rubygems has a variety of (excellant!) client tools. The server should support a variety of different ways of accessing the data. Some folks drill down, some folks grab it all. Everyone uses rubygems. I think your idea about the index + md5's is good, it might be worth it to squeeze dependency data in there. Ah, the sweaty smell of compromise. ~ patrick From eivind at FreeBSD.ORG Thu Nov 11 08:36:39 2004 From: eivind at FreeBSD.ORG (Eivind Eklund) Date: Thu Nov 11 08:36:31 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> Message-ID: <20041111133639.GA34804@FreeBSD.org> On Wed, Nov 10, 2004 at 02:39:46PM -0500, Tom Clarke wrote: > Hi, > > I've been lurking here for a bit, and just let me say that ruby gems > is looking great and everything I wanted for the now defunct > RAAInstall (which I gave life to and subsequently killed). > > The issue I encounted with raa install was that for an RAA sized > database (which gems isn't yet) it was very difficult to get the > information down to a reasonable size. I ended up producing a > drastically compressed format (in terms of what data I put in the > download file). The gems I've seen are comparatively verbose. > > I would suggest that the end solution probably needs to be either of: > * A gentoo like system where each package has it's own file, and be > synced with rsync or a clone (this has the benefit of being easy to > mirror in an efficient manner) > * A client server system where package details are queried on demand. It also works to have large metadata and have the users sync it occasionally using the rsync algorithm in some fashion. We do this for FreeBSD using cvsup, and it scales to the 12000 ports we maintain there. WRT packaging DBI: This will require binary packages if it is to work well, I think. David Ross is coordinating a project to try to make this work for Windows; he's probably primarily targetting RPA (as he is a fan of ours), but it should probably be possible to use a lot of the results for RubyGems, too. The problem turned out to be much more icky than at least I thought; I've been used to doing binary packages and having that be only a minor part of the problem in packaging. However, the issue of platforms without a standardized build environment (Windows, especially) turns this nasty. Eivind. From tom.clarke at gmail.com Thu Nov 11 09:10:25 2004 From: tom.clarke at gmail.com (Tom Clarke) Date: Thu Nov 11 09:10:26 2004 Subject: [Rubygems-developers] Updating source index is slow In-Reply-To: <20041111133639.GA34804@FreeBSD.org> References: <893EBF75-3315-11D9-B946-000D9337C01C@chadfowler.com> <20041111133639.GA34804@FreeBSD.org> Message-ID: When thinking about this in the past, I thought that having windows binaries and source for everything else might just work. Since when people talk about platforms without a standardized build environment they usually mean windows. Putting binary packages aside, does gems support the necessary features to do source packages for dbi - that is being able to pass compilation options (perhaps like gentoo USE="mysql")? -Tom > WRT packaging DBI: This will require binary packages if it is to work well, > I think. David Ross is coordinating a project to try to make this work for > Windows; he's probably primarily targetting RPA (as he is a fan of ours), but > it should probably be possible to use a lot of the results for RubyGems, too. > > The problem turned out to be much more icky than at least I thought; I've > been used to doing binary packages and having that be only a minor part > of the problem in packaging. However, the issue of platforms without a > standardized build environment (Windows, especially) turns this nasty. > > Eivind. > _______________________________________________ > > > Rubygems-developers mailing list > Rubygems-developers@rubyforge.org > http://rubyforge.org/mailman/listinfo/rubygems-developers > From gsinclair at soyabean.com.au Tue Nov 23 22:09:20 2004 From: gsinclair at soyabean.com.au (Gavin Sinclair) Date: Tue Nov 23 22:10:04 2004 Subject: [Rubygems-developers] Found a quirk with 'update' Message-ID: <1851118735116.20041124140920@soyabean.com.au> While updating my gems, so I'd get the latest rails, I found this anomoly: Attempting remote installation of 'actionmailer' Install required dependency actionpack? [Yn] y ...actionpack and actionmailer get installed... Attempting remote installation of 'actionpack' Successfully installed actionpack, version 0.9.5 That is, actionpack gets installed twice. First as a dependency, then in its own right. Nothing harmful, just duplication. Gavin From bitserf at gmail.com Wed Nov 24 15:59:25 2004 From: bitserf at gmail.com (leon breedt) Date: Wed Nov 24 16:01:56 2004 Subject: [Rubygems-developers] wiki spam? Message-ID: <270bd0c404112412594875bec9@mail.gmail.com> the sample gemspec is not intended to contain URLs for things like "Generic Allegra"? :) http://rubygems.rubyforge.org/wiki/wiki.pl?CreateAGemInTenMinutes leon From chad at chadfowler.com Wed Nov 24 16:13:21 2004 From: chad at chadfowler.com (Chad Fowler) Date: Wed Nov 24 16:12:50 2004 Subject: [Rubygems-developers] wiki spam? In-Reply-To: <270bd0c404112412594875bec9@mail.gmail.com> References: <270bd0c404112412594875bec9@mail.gmail.com> Message-ID: Actually I put that there to advertise my new miracle drug ;) Thanks for catching it. Chad On 24-Nov-04, at 3:59 PM, leon breedt wrote: > the sample gemspec is not intended to contain URLs for things like > "Generic Allegra"? :) > > http://rubygems.rubyforge.org/wiki/wiki.pl?CreateAGemInTenMinutes > > leon > _______________________________________________ > Rubygems-developers mailing list > Rubygems-developers@rubyforge.org > http://rubyforge.org/mailman/listinfo/rubygems-developers From jim at weirichhouse.org Fri Nov 26 21:34:08 2004 From: jim at weirichhouse.org (Jim Weirich) Date: Fri Nov 26 21:31:56 2004 Subject: [Rubygems-developers] FYI: I'm looking at the remote installer Message-ID: <200411262134.08357.jim@weirichhouse.org> Just so folks are aware ... I'm working in the remote installer area, adding tests, cleaning code, swabbing decks, etc. -- -- Jim Weirich jim@weirichhouse.org http://onestepback.org ----------------------------------------------------------------- "Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas) From chad at chadfowler.com Fri Nov 26 21:42:29 2004 From: chad at chadfowler.com (Chad Fowler) Date: Fri Nov 26 21:41:53 2004 Subject: [Rubygems-developers] FYI: I'm looking at the remote installer In-Reply-To: <200411262134.08357.jim@weirichhouse.org> References: <200411262134.08357.jim@weirichhouse.org> Message-ID: On 26-Nov-04, at 9:34 PM, Jim Weirich wrote: > Just so folks are aware ... I'm working in the remote installer area, > adding > tests, cleaning code, swabbing decks, etc. > > Cool. I've been on the verge of the same. Most importantly, I hoped to make the local/remote thing more seamless. Specifically, I want to install dependencies remotely when doing an install of a local gem. I hope that, or something that makes that easier, works its way in :) Chad From jim at weirichhouse.org Fri Nov 26 22:12:56 2004 From: jim at weirichhouse.org (Jim Weirich) Date: Fri Nov 26 22:10:47 2004 Subject: [Rubygems-developers] FYI: I'm looking at the remote installer In-Reply-To: References: <200411262134.08357.jim@weirichhouse.org> Message-ID: <200411262212.56817.jim@weirichhouse.org> On Friday 26 November 2004 09:42 pm, Chad Fowler wrote: > On 26-Nov-04, at 9:34 PM, Jim Weirich wrote: > > Just so folks are aware ... I'm working in the remote installer area, > > adding > > tests, cleaning code, swabbing decks, etc. > > Cool. I've been on the verge of the same. Most importantly, I hoped > to make the local/remote thing more seamless. Specifically, I want to > install dependencies remotely when doing an install of a local gem. I > hope that, or something that makes that easier, works its way in :) I'm not working specifically for that, but one thing that I have done is pull the remote access code /out/ of the remote installer into a "fetcher" object (primarily to make improvements to fetching easier). This may make the remote installer much closer to the local installer. Perhaps close enough so that they may be merged. -- -- Jim Weirich jim@weirichhouse.org http://onestepback.org ----------------------------------------------------------------- "Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas) From jim at weirichhouse.org Fri Nov 26 23:51:34 2004 From: jim at weirichhouse.org (Jim Weirich) Date: Fri Nov 26 23:49:23 2004 Subject: [Rubygems-developers] Detecting server changes Message-ID: <200411262351.34913.jim@weirichhouse.org> There was some discussion a while back on the efficiency of detecting changes on the server. I was doing a little experimentation and got the following results ... user system total real HEAD 0.020000 0.000000 0.020000 ( 0.115111) IMS(no) 0.000000 0.000000 0.000000 ( 0.109897) IMS(yes) 0.180000 0.020000 0.200000 ( 1.952789) NMod 0.160000 0.000000 0.160000 ( 2.156582) open-uri 0.290000 0.030000 0.320000 ( 2.112810) open-uri/sz 0.000000 0.000000 0.000000 ( 0.146552) HEAD -- Just request the headers. IMS(no) -- Use If-Modified-Since (where the answer is no). IMS(yes) -- Use If-Modified-Since (where the answer is yes). NMod -- No IMS header (normal HTTP get) open-uri -- Using open-uri straight up open-uri/sz -- Using open-uri will the break on content length. All of the shortcuts (HEAD, IMS(no) and open-uri/sz) all take approximately the same amount of time, and all are much faster than downloading the entire compressed source. Based on that, I don't think there is a strong compeling reason based on these benchmarks to switch from the open-uri/sz technique that is in use currently. (although the size technique can fail with non-monotonically increasing file sizes, in which case the IMS technique looks attractive). But, there is a problem with the current code. On many systems, the cache is in an area that can only be written while running under sudo. That means the cache is updated on installs, but not on general queries. This forces the gem command to download the yaml source on every command unless a recent install has occurred. I'm suggesting that we store the cached results in the users's home directory (or appropriate directory on windows). This allows a gem list command to update the cache without running under sudo. We could act intelligently and only store in the home directory if the normal cache directory is write protected. We could also check both the system-wide cache and the user-specific cache when comparing against the server. Any thoughts before I dive into this? -- -- Jim Weirich jim@weirichhouse.org http://onestepback.org ----------------------------------------------------------------- "Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas) From halostatue at gmail.com Sat Nov 27 00:41:55 2004 From: halostatue at gmail.com (Austin Ziegler) Date: Sat Nov 27 00:41:15 2004 Subject: [Rubygems-developers] Detecting server changes In-Reply-To: <200411262351.34913.jim@weirichhouse.org> References: <200411262351.34913.jim@weirichhouse.org> Message-ID: <9e7db9110411262141278a3db5@mail.gmail.com> On Fri, 26 Nov 2004 23:51:34 -0500, Jim Weirich wrote: > All of the shortcuts (HEAD, IMS(no) and open-uri/sz) all take approximately > the same amount of time, and all are much faster than downloading the entire > compressed source. > Based on that, I don't think there is a strong compeling reason based on these > benchmarks to switch from the open-uri/sz technique that is in use currently. > (although the size technique can fail with non-monotonically increasing file > sizes, in which case the IMS technique looks attractive). The reason for changing this is not the performance on the gem list command, really, but the performance and load on the server. Be kind to the GemHost :) Either HEAD or IMS is preferable because of this. -austin -- Austin Ziegler * halostatue@gmail.com * Alternate: austin@halostatue.ca From curt at hibbs.com Sat Nov 27 01:21:59 2004 From: curt at hibbs.com (Curt Hibbs) Date: Sat Nov 27 01:21:21 2004 Subject: [Rubygems-developers] Detecting server changes In-Reply-To: <200411262351.34913.jim@weirichhouse.org> Message-ID: Jim Weirich wrote: > > I'm suggesting that we store the cached results in the users's > home directory > (or appropriate directory on windows). The user's home directory on windows is contained in the environment variable USERPROFILE. Curt From chad at chadfowler.com Sat Nov 27 07:12:25 2004 From: chad at chadfowler.com (Chad Fowler) Date: Sat Nov 27 07:11:48 2004 Subject: [Rubygems-developers] Detecting server changes In-Reply-To: <9e7db9110411262141278a3db5@mail.gmail.com> References: <200411262351.34913.jim@weirichhouse.org> <9e7db9110411262141278a3db5@mail.gmail.com> Message-ID: <99A212A0-406D-11D9-9BED-000D9337C01C@chadfowler.com> On 27-Nov-04, at 12:41 AM, Austin Ziegler wrote: > On Fri, 26 Nov 2004 23:51:34 -0500, Jim Weirich > wrote: >> All of the shortcuts (HEAD, IMS(no) and open-uri/sz) all take >> approximately >> the same amount of time, and all are much faster than downloading the >> entire >> compressed source. > >> Based on that, I don't think there is a strong compeling reason based >> on these >> benchmarks to switch from the open-uri/sz technique that is in use >> currently. >> (although the size technique can fail with non-monotonically >> increasing file >> sizes, in which case the IMS technique looks attractive). > > The reason for changing this is not the performance on the gem list > command, really, but the performance and load on the server. Be kind > to the GemHost :) > > Either HEAD or IMS is preferable because of this. > > Right. We're using HEAD now. Jim, I think you're right about the cache going into users' home directories (unless it's writable in the global spot???). This has been bugging me too. Chad From jim at weirichhouse.org Sun Nov 28 17:32:25 2004 From: jim at weirichhouse.org (Jim Weirich) Date: Sun Nov 28 17:31:45 2004 Subject: [Rubygems-developers] RubyGems Build Monitor Message-ID: <200411281732.25715.jim@weirichhouse.org> I've setup a build monitor for Ruby Gems. Everytime a rubygems check-in is made, the build monitor will run all the unit tests and functional tests against the new build. This should happen automatically, without any special effort from the rubygems developers. If you wish to see the result of the build, visit: http://onestepback.org/cgi-bin/rubygems_build.cgi You can also request and/or cancel a build there. If this proves fairly stable over the next week or so, I plan to add in auto-emailing to developers when the build is broken. -- -- Jim Weirich jim@weirichhouse.org http://onestepback.org ----------------------------------------------------------------- "Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas) From hgs at dmu.ac.uk Mon Nov 29 07:13:47 2004 From: hgs at dmu.ac.uk (Hugh Sasse Staff Elec Eng) Date: Mon Nov 29 07:13:34 2004 Subject: [Rubygems-developers] Detecting server changes In-Reply-To: <200411262351.34913.jim@weirichhouse.org> References: <200411262351.34913.jim@weirichhouse.org> Message-ID: On Fri, 26 Nov 2004, Jim Weirich wrote: > There was some discussion a while back on the efficiency of detecting changes > on the server. I was doing a little experimentation and got the following > results ... > [interesting study trimmed] Thanks for doing that study. > > All of the shortcuts (HEAD, IMS(no) and open-uri/sz) all take approximately > the same amount of time, and all are much faster than downloading the entire > compressed source. > > Based on that, I don't think there is a strong compeling reason based on these > benchmarks to switch from the open-uri/sz technique that is in use currently. > (although the size technique can fail with non-monotonically increasing file > sizes, in which case the IMS technique looks attractive). And the IMS one agrees more with the specs for how we should be doing this, and if the web server changes then our assumptions may break. But if we flag that up somewhere, we should be able to find it if/when that happens months down the road. > > But, there is a problem with the current code. On many systems, the cache is > in an area that can only be written while running under sudo. That means the > cache is updated on installs, but not on general queries. This forces the > gem command to download the yaml source on every command unless a recent > install has occurred. > > I'm suggesting that we store the cached results in the users's home directory > (or appropriate directory on windows). This allows a gem list command to > update the cache without running under sudo. We could act intelligently and > only store in the home directory if the normal cache directory is write > protected. We could also check both the system-wide cache and the > user-specific cache when comparing against the server. More efficient, but less secure. If gems is installed centrally then any users can use it, even if they are not trusted to touch system files. If the data from home directories becomes trusted then some of our students could DOS attack the gem database. Unless you are proposing that the user's own database may be updated from the system one, but not the other way round. Is it sensible to have a special process to do the database mods that can be requested by normal users, so that their processes never writes (possibly suspect) data to the database? Maybe not root, but user gem or rubygems (8 chars in a username is pretty std for unix) so that this user owns the files rubygems depends on. > Any thoughts before I dive into this? > > -- -- Jim Weirich jim@weirichhouse.org > http://onestepback.org Hugh From halostatue at gmail.com Mon Nov 29 08:43:29 2004 From: halostatue at gmail.com (Austin Ziegler) Date: Mon Nov 29 08:42:44 2004 Subject: [Rubygems-developers] Fwd: ruwiki gem install problems In-Reply-To: <9e7db9110411290539316bb300@mail.gmail.com> References: <41AB0027.2010706@gmx.net> <9e7db9110411290539316bb300@mail.gmail.com> Message-ID: <9e7db9110411290543103e0358@mail.gmail.com> What do you guys think? ---------- Forwarded message ---------- From: Austin Ziegler Date: Mon, 29 Nov 2004 08:39:47 -0500 Subject: Re: ruwiki gem install problems To: ruby-talk@ruby-lang.org On Mon, 29 Nov 2004 19:53:37 +0900, Henrik Horneber wrote: > Hi! > > I was trying to install ruwiki via gem. Bizarre. I get this as well, and I experience the same as you with respect to being able to access the package via alternative tools, including Archive::Tar::Minitar directly, so this appears to be a bug with RubyGems, but I don't know exactly what the problem is. I will be taking it up with the RubyGems developers directly. Alternatively, it could be something to do with differing versions of Ruby on Windows. I don't recall which version of the Ruby Windows installer I have on my main Ruwiki development machine and the machine I just tested on (the one at work). Can anyone else verify this? However, in researching this issue, I discovered an issue that I have to fix (something in the Google redirect doesn't work as it should), so I will be releasing 0.9.1 in the next two days (it may not be today because I have not had nearly enough sleep, and I'm not sure where the gem problem originates from). In the meantime, change line 83 of lib/ruwiki/wiki/tokens/00default.rb to read: "http://www.google.com/url?sa=D&q=#{CGI.escape(uri)}" Apparently, Google's redirector doesn't recognise the ; form of CGI parameter escaping, so you must use the older & form. Mauricio, if you could update this on the RPA distribution, that would be fine. > C:\temp\archive-tar-minitar-0.5.1>install > Checking for test cases: > > Loaded suite Unnamed TestSuite > Started > .F................... > Finished in 0.406 seconds. > > 1) Failure: > test_each_works(TC_Tar__Input) > [./tests/tc_tar.rb:509:in `test_each_works' > ./tests/tc_tar.rb:504:in `each_with_index' > ./tests/tc_tar.rb:504:in `each' > ./lib/archive/tar/minitar.rb:679:in `each_entry' > ./lib/archive/tar/minitar.rb:605:in `loop' > ./lib/archive/tar/minitar.rb:605:in `each_entry' > ./lib/archive/tar/minitar.rb:587:in `each' > ./lib/archive/tar/minitar.rb:679:in `each' > ./tests/tc_tar.rb:504:in `each_with_index' > ./tests/tc_tar.rb:504:in `test_each_works' > ./tests/tc_tar.rb:502:in `open' > ./tests/tc_tar.rb:502:in `test_each_works']: > <1072911600> expected but was > <1072933200>. I could not reproduce this. Ruwiki doesn't directly need Archive::Tar::Minitar; it is mostly needed for support for RubyGem and RPA installations (e.g., the ruwiki command). -austin -- Austin Ziegler * halostatue@gmail.com * Alternate: austin@halostatue.ca -- Austin Ziegler * halostatue@gmail.com * Alternate: austin@halostatue.ca From rich at infoether.com Mon Nov 29 09:19:18 2004 From: rich at infoether.com (Richard Kilmer) Date: Mon Nov 29 09:18:31 2004 Subject: [Rubygems-developers] Detecting server changes In-Reply-To: <200411262351.34913.jim@weirichhouse.org> Message-ID: I think this strategy of writing to the center if possible, then falling back to the user's home dir is the right one. I suppose you could then use mtime one each database to determine which is the newest and just compare that to the rubyforge db. -rich On 11/26/04 11:51 PM, "Jim Weirich" wrote: > > I'm suggesting that we store the cached results in the users's home directory > (or appropriate directory on windows). This allows a gem list command to > update the cache without running under sudo. We could act intelligently and > only store in the home directory if the normal cache directory is write > protected. We could also check both the system-wide cache and the > user-specific cache when comparing against the server. From chad at chadfowler.com Mon Nov 29 11:36:18 2004 From: chad at chadfowler.com (chad@chadfowler.com) Date: Mon Nov 29 10:31:16 2004 Subject: [Rubygems-developers] Fwd: ruwiki gem install problems In-Reply-To: <9e7db9110411290543103e0358@mail.gmail.com> References: <41AB0027.2010706@gmx.net> <9e7db9110411290539316bb300@mail.gmail.com> <9e7db9110411290543103e0358@mail.gmail.com> Message-ID: <24099.68.208.204.7.1101746178.squirrel@68.208.204.7> This appears to be a Windows-only problem. I've been able to try it on Windows here at work, but I'm not going to be able to fully dig in while I'm here. I also confirmed that I can extract the gem with minitar. For some reason this error isn't responding to Jim's --backtrace as I had hoped either. Chad > What do you guys think? > > > ---------- Forwarded message ---------- > From: Austin Ziegler > Date: Mon, 29 Nov 2004 08:39:47 -0500 > Subject: Re: ruwiki gem install problems > To: ruby-talk@ruby-lang.org > > > On Mon, 29 Nov 2004 19:53:37 +0900, Henrik Horneber wrote: >> Hi! >> >> I was trying to install ruwiki via gem. > > Bizarre. I get this as well, and I experience the same as you with > respect to being able to access the package via alternative tools, > including Archive::Tar::Minitar directly, so this appears to be a bug > with RubyGems, but I don't know exactly what the problem is. I will be > taking it up with the RubyGems developers directly. Alternatively, it > could be something to do with differing versions of Ruby on Windows. I > don't recall which version of the Ruby Windows installer I have on my > main Ruwiki development machine and the machine I just tested on (the > one at work). > > Can anyone else verify this? > > However, in researching this issue, I discovered an issue that I have > to fix (something in the Google redirect doesn't work as it should), > so I will be releasing 0.9.1 in the next two days (it may not be today > because I have not had nearly enough sleep, and I'm not sure where the > gem problem originates from). > > In the meantime, change line 83 of lib/ruwiki/wiki/tokens/00default.rb to > read: > "http://www.google.com/url?sa=D&q=#{CGI.escape(uri)}" > > Apparently, Google's redirector doesn't recognise the ; form of CGI > parameter escaping, so you must use the older & form. Mauricio, if > you could update this on the RPA distribution, that would be fine. > > > >> C:\temp\archive-tar-minitar-0.5.1>install >> Checking for test cases: >> >> Loaded suite Unnamed TestSuite >> Started >> .F................... >> Finished in 0.406 seconds. >> >> 1) Failure: >> test_each_works(TC_Tar__Input) >> [./tests/tc_tar.rb:509:in `test_each_works' >> ./tests/tc_tar.rb:504:in `each_with_index' >> ./tests/tc_tar.rb:504:in `each' >> ./lib/archive/tar/minitar.rb:679:in `each_entry' >> ./lib/archive/tar/minitar.rb:605:in `loop' >> ./lib/archive/tar/minitar.rb:605:in `each_entry' >> ./lib/archive/tar/minitar.rb:587:in `each' >> ./lib/archive/tar/minitar.rb:679:in `each' >> ./tests/tc_tar.rb:504:in `each_with_index' >> ./tests/tc_tar.rb:504:in `test_each_works' >> ./tests/tc_tar.rb:502:in `open' >> ./tests/tc_tar.rb:502:in `test_each_works']: >> <1072911600> expected but was >> <1072933200>. > > I could not reproduce this. > > Ruwiki doesn't directly need Archive::Tar::Minitar; it is mostly > needed for support for RubyGem and RPA installations (e.g., the ruwiki > command). > > -austin > -- > Austin Ziegler * halostatue@gmail.com > * Alternate: austin@halostatue.ca > > > -- > Austin Ziegler * halostatue@gmail.com > * Alternate: austin@halostatue.ca > _______________________________________________ > Rubygems-developers mailing list > Rubygems-developers@rubyforge.org > http://rubyforge.org/mailman/listinfo/rubygems-developers >