[Rubygems-developers] Suggestions: categories and querying

Eivind Eklund eivind at FreeBSD.org
Fri Sep 17 17:40:40 EDT 2004

On Fri, Sep 17, 2004 at 09:30:30AM -0400, Chad Fowler wrote:
> On Sep 17, 2004, at 6:01 AM, Eivind Eklund wrote:
> >There are two places that could do this well at the moment: RAA (if
> >somebody adopted doing the librarian work for it), and RPA (which has
> >Mauricio as it's librarian already).  I think RubyGems' best bet is to
> >NOT add categorization at all at this time, but instead cooperate
> >closely with one of the above, and help them generate really good
> >categorization, and when good categories are available, start helping
> >authors find categories for their software.
> >
> >Anything else is doomed to chaos and a false sense of being helpful.
> Thanks for the long and obviously well thought out response, Eivind.  I 
> can't say I completely agree with you, but I _do_ agree that RubyGems 
> should not add any kind of categorization right now (or possibly ever). 
>  I also believe that rpa-base should not add categorization.  I think 
> it's in the scope of something at the RPA level, but should be 
> completely left out of the _packages_ themselves.

I agree with keeping them out of the packages.  They're at a level
higher up.

> I would be open to adding keywords to gems, but I would want to think 
> it through a lot more.  Keywords may be single-level hierarchies, but 
> being single-level (and therefore not _really_ hierarchies), they don't 
> carry with them the same commitment to a structure that may or may not 
> be right.  They can be used to help someone find a library or 
> application without forcing a rigid classification system.

I'm not sure they can be used to help people find something.  I'm afraid
that people will THINK that they can be used to help find something, and
therefore will add them and avoid thinking about the hard problems
associated with getting a good solution.

> Finally, I'm not convinced that a hierarchy is the way to go at all.  I 
> would even go so far as to say that hierarchical classification for 
> this kind of computer-based purpose is obsolete.

This does not match my experience.  I find the organization of a
physical library much better than computer based searches based on
keywords.  It is just expensive to maintain.

> And, as you've pointed out, they are almost unusable for
> self-organizing system/communitiies.

Again, I respectfully disagree.  In my opinion, they're expensive to
maintain and give a high payoff.  As I said: I think the lack of them
for software is possible THE primary flaw of software development today.

I hope you'll allow me another mini-essay - you're striking a lot of
issues close to my heart with the areas you tackle, so I've got a lot to
say :-)

All of human activity - really, all of life - is a self-organizing
system.  The activity of the human part of this system is based on
perceived expense and what benefits the individual get from it.  In
larger contexts, the organization comes from the activity of a number of
individuals.  This activity is directed by the interaction between the
individuals and the world, including each other.  Suffering from
abstraction asyphication yet?  Thought so - I'll try to get a little
more down to the nitty-gritty.  Then we'll go up again and look at the
forces included, how these real-world examples make things work, and try
to construct an example of how this could work for a RubyGems library
(or RPA).

Remember: It's always an interaction between a culture and a technology
- because the culture shape the technology, and the technology shape the

Two examples of fairly self-organizing hierarchical taxonomies, made
using network technology and a self-replicating culture: Wikipedia and
the Open Directory Project.  The latter has constructed over 460,000
categories and categorized many millions of sites by volunteer feedback;
the former has, in just a few years, built the largest encyclopeida in
the history of mankind, where hierarchical *and* crosscutting
organization is visible all over the place.

One thing that is clearly visible in both these projects is that they
have a strong interaction between technology and culture, and that the
technology has been designed with the explict goal of shaping the
culture - of making some behaviour rewarding, and other behaviour
non-rewarding.  And they've both made tools that make *collaboration*
work nicely - not having every user "sit on his own hilltop, use the
tools, and spread his data to the world", but letting users that want to
help fix up where things can be improved do so easily.

They also foster a sense of "doing something for the world" by doing
such fixups, and the ability to do a group of such fixups at the same
time, getting into a state of fixing, fixing, fixing - wow - the world
is noticably better than it was just ten minutes ago!

This is also something that has been there since the inception of both
projects.  They've tried to keep things good all the way, and have built
their infrastructure for it.  The clearly most successful of them
(Wikipedia) has also built the infrastructure to foster a sense of
community, and to make it possible for the members to communicate among
themselves about the work.

The infrastructure (at least for Wikipedia) is also made so that while
it is extremely easy to do damage, it is also very easy to fix up, and
the community can keep track of that and fix it as necessary.

I think it is possible to make the same happen with RubyGems and RPA.
We just need to make the infrastructure that makes it EASY for people to
help, and make it non-rewarding to damage the dataset.  Wikipedia does
this by making it easy for people to see what changes happen, and
keeping history so it is easy to revert vandalism.  So: Vandalism really
make little difference, and disappear quickly.

You also need to motivate people to contribute.  There are a few
different aspects to this. 

First of all, it is making the right things easy and the wrong things
harder.  This is done in Wikipedia etc by the ease of entering things
and the number of ways people can help fix, but I think this property will
miss from any system where every free software author assign the
categories (or keywords, or whatever you call them) to his software
locally.  (I'll describe a system that I think would actually work for
RubyGems below.)

Second, it is increasing the reward for doing the right thing.  It
shouldn't just be easy to do the right thing - it should feel good.  One
of the ways to do this is to do a so-called "step up" to a larger goal
that the person feels more about.  For Wikipedia, the step up goal is to
"Spread education to everyone".  Another way to give people positive
feedback on good behaviour.  An example of this is Ward's signature and
his "Thanks for your careful attention to detail!" on the c2.com Wiki.
(This also slot the submitters into a role when they submit stuff - a
very effective technique for manipulation, as people don't want to let
that positive role down.)

Third, make doing stuff into a habit - because people then do it a lot
and get good at it.  Wiki, Wikipedia, and the Open Directory all do this
- because people can work on more than just their own stuff.

Now, putting all of this together into a working design for how to get
RubyGems properly categorized:

* Set up a collection site for RubyGems.  You want people to upload
  their gems there, so all gems are available in a central location for
  categorization.  (They don't have to be available for general
  download, but they must be available for inspection for labellers).

* Make an interface where the authors are profusely thanked, and told
  about how this helps the entire Ruby community, and this hopefully
  will make all of the world a better place.  Also indicate where the
  author can help categorize his own and other packages.

* Make each category include a description of what should go into the
  category, in addition to the category name, and extra keywords that
  the category should also show up for.

* Make the category assignment system so that you FIRST search for
  categories by keywords (+ to enforce a keyword, normally OR the
  keywords to make sure that people get ALL the possibly relevant
  categories).  AFTER you have searched, you can choose "Add Category" at
  the BOTTOM of the form.  And there is a new search form, with your
  entered keywords, just above the place where you press to add a

* Only allow adding categories from the next higher level in the
  hierarchy, where you'll see all the already existing subcategories.

* Make the "Add Category" go through a separate confirm page before
  getting to the information entering page for categories.  On this
  page, explain how important using the existing categories is, and that
  adding a new category is a fairly big deal - but also the right thing
  to do if it is the right thing to do.  Add a search box with the
  keywords here, too.  And say "Thank you for your attention to detail.
  This categorization system is made to help Ruby users find the
  software they need, and by maintaining its quality, you make the world
  better for everyone (and hopefully help make Ruby a viable language
  for all your own use, too, by getting more people to help.)"  Or
  something like that.

* Make the category addition page require a list of search keywords that
  can match the category (any that are not in the name of the category
  already), and a large box with "Description".  Disallow adding
  categories with too short a description.

* When the user is through adding a category, allow him to search for
  other software that should ALSO be added to the category, to make sure
  that the category is good.

* Have a separate page with Recent Changes, which include lists of new
  categories and what software packages have been added to what
  categories.  This allows separate review.

* Make any look at a package see various levels of detail of the
  package, including inspecting source code and change frequency, in
  order to determine how to categorize the package.

* Make it easy to merge categories, and to remove (and restore, with
  contents) categories.

I think the above (along with a manifesto describing how important
categorizing is) should make distributed volunteers create good

And I hope that if you implement this, you'll let RPA (and any other
packagers) hang on the same framework ;-)


More information about the Rubygems-developers mailing list