Gem file size limits
drbrain at segment7.net
Tue Jan 17 20:41:53 EST 2012
On Jan 17, 2012, at 3:10 PM, Evan Phoenix wrote:
> I believe that rubygems.org needs to limit the max size of a .gem file which will be allowed.
> This serves two purposes:
> 1) It protects users from themselves. The top 19 of 20 gems sorted by size are all huge because they accidentally packaged all previous versions within themselves. This issue needs to be fixed on the gem build side also, but there is no reason to allow these gems.
> 2) Cost. Rubygems.org is becoming increasingly expensive to run and thus we need to begin thinking of ways to keep it mean and lean.
> I think we can all agree that some kind of limit makes sense. At the moment, there is nothing from preventing a user from using rubygems.org as their personal backup and pushing terabytes in a .gem file. Clearly we can't operate if people do that.
> So the natural question I have for all of you is: what makes sense as the size limit? To help you with this decision, here is some data for you to chew on:
> 1) The top 1000 gems, sorted by size: https://gist.github.com/1629309
> 2) A histogram of gem sizes by megabyte: https://gist.github.com/1629435
For additional data, here's the sum of space consumed by each gem for all its releases:
> You can see from the histogram that 96% of gems are less than one megabyte, and 98% are 3 megs or less. It seems like that fact should inform our decision.
> To start the decision, let me throw out a starting point: 10 megs.
Most of the gems listed in the top 10 contain embedded third-party code, SDKs, etc.
At position 726 in the list the total consumption for a gem reaches 10MB, so 98% of authors use less than 10MB total.
> Looking at the biggest non-accidental gems, they're almost all jruby related and contain huge .jar files. We've pinged others about removing the impediment to pushing gems with maven deps and thusly devs would use that functionality rather than packaging the jars within the gems themselves.
For some perspective on possible limits, here's the list of gems that used more than 5MB for any release:
For 34264 releases:
A 10MB limit would block 0.7% of gems
A 5MB limit would block 1.3% of gems
On IRC Gregory Brown suggested we cross-reference this list with the popularity of the gem, but I don't have download counts handy.
From the most-downloaded-today list on rubygems.org:
mime-types uses 238KB total
multi_json uses 134KB total
treetop uses 2MB total
json uses 32MB total
thor uses 20MB total
neither thor nor json use > 5MB for any release
Some other known-popular gems:
rake uses 29MB total
rails uses 20MB total
activerecord uses 48MB total
actionpack uses 68MB total
actionmailer uses 9MB total
activeresource uses 5MB total
activesupport uses 28MB total
bundler uses 10MB total
There was one anomolous release of action pack, 2.3.6, which was 17MB due to garbage in tmp/test. 2.3.10 and 2.3.11 were 1MB, the rest below. The largest release in the 3 series is 600KB and the 3.2 RC is 374272 bytes.
More information about the RubyGems-Developers