[Rubygems-developers] Suggestions: categories and querying

Chad Fowler chad at chadfowler.com
Fri Sep 17 19:40:46 EDT 2004


Eivind (sorry for top posting)...

Thank you for this excellent set of ideas.  At a high (and potentially 
irrelevant) level, I think we disagree on some key points.  But in 
terms of what needs to be done for Rubygems, I think we agree.  I am 
working on a site (http://rubygems.org) for which one of the aims is to 
provide this kind of keyword, categorization capability.  I will use 
your ideas below as a starting point for that part of a feature list.  
Very well thought out.

Thanks,
Chad

On Sep 17, 2004, at 5:40 PM, Eivind Eklund wrote:

> On Fri, Sep 17, 2004 at 09:30:30AM -0400, Chad Fowler wrote:
>>
>> On Sep 17, 2004, at 6:01 AM, Eivind Eklund wrote:
>>
>>> There are two places that could do this well at the moment: RAA (if
>>> somebody adopted doing the librarian work for it), and RPA (which has
>>> Mauricio as it's librarian already).  I think RubyGems' best bet is 
>>> to
>>> NOT add categorization at all at this time, but instead cooperate
>>> closely with one of the above, and help them generate really good
>>> categorization, and when good categories are available, start helping
>>> authors find categories for their software.
>>>
>>> Anything else is doomed to chaos and a false sense of being helpful.
>>
>> Thanks for the long and obviously well thought out response, Eivind.  
>> I
>> can't say I completely agree with you, but I _do_ agree that RubyGems
>> should not add any kind of categorization right now (or possibly 
>> ever).
>>  I also believe that rpa-base should not add categorization.  I think
>> it's in the scope of something at the RPA level, but should be
>> completely left out of the _packages_ themselves.
>
> I agree with keeping them out of the packages.  They're at a level
> higher up.
>
>> I would be open to adding keywords to gems, but I would want to think
>> it through a lot more.  Keywords may be single-level hierarchies, but
>> being single-level (and therefore not _really_ hierarchies), they 
>> don't
>> carry with them the same commitment to a structure that may or may not
>> be right.  They can be used to help someone find a library or
>> application without forcing a rigid classification system.
>
> I'm not sure they can be used to help people find something.  I'm 
> afraid
> that people will THINK that they can be used to help find something, 
> and
> therefore will add them and avoid thinking about the hard problems
> associated with getting a good solution.
>
>> Finally, I'm not convinced that a hierarchy is the way to go at all.  
>> I
>> would even go so far as to say that hierarchical classification for
>> this kind of computer-based purpose is obsolete.
>
> This does not match my experience.  I find the organization of a
> physical library much better than computer based searches based on
> keywords.  It is just expensive to maintain.
>
>> And, as you've pointed out, they are almost unusable for
>> self-organizing system/communitiies.
>
> Again, I respectfully disagree.  In my opinion, they're expensive to
> maintain and give a high payoff.  As I said: I think the lack of them
> for software is possible THE primary flaw of software development 
> today.
>
> I hope you'll allow me another mini-essay - you're striking a lot of
> issues close to my heart with the areas you tackle, so I've got a lot 
> to
> say :-)
>
>
> All of human activity - really, all of life - is a self-organizing
> system.  The activity of the human part of this system is based on
> perceived expense and what benefits the individual get from it.  In
> larger contexts, the organization comes from the activity of a number 
> of
> individuals.  This activity is directed by the interaction between the
> individuals and the world, including each other.  Suffering from
> abstraction asyphication yet?  Thought so - I'll try to get a little
> more down to the nitty-gritty.  Then we'll go up again and look at the
> forces included, how these real-world examples make things work, and 
> try
> to construct an example of how this could work for a RubyGems library
> (or RPA).
>
> Remember: It's always an interaction between a culture and a technology
> - because the culture shape the technology, and the technology shape 
> the
> culture.
>
> Two examples of fairly self-organizing hierarchical taxonomies, made
> using network technology and a self-replicating culture: Wikipedia and
> the Open Directory Project.  The latter has constructed over 460,000
> categories and categorized many millions of sites by volunteer 
> feedback;
> the former has, in just a few years, built the largest encyclopeida in
> the history of mankind, where hierarchical *and* crosscutting
> organization is visible all over the place.
>
> One thing that is clearly visible in both these projects is that they
> have a strong interaction between technology and culture, and that the
> technology has been designed with the explict goal of shaping the
> culture - of making some behaviour rewarding, and other behaviour
> non-rewarding.  And they've both made tools that make *collaboration*
> work nicely - not having every user "sit on his own hilltop, use the
> tools, and spread his data to the world", but letting users that want 
> to
> help fix up where things can be improved do so easily.
>
> They also foster a sense of "doing something for the world" by doing
> such fixups, and the ability to do a group of such fixups at the same
> time, getting into a state of fixing, fixing, fixing - wow - the world
> is noticably better than it was just ten minutes ago!
>
> This is also something that has been there since the inception of both
> projects.  They've tried to keep things good all the way, and have 
> built
> their infrastructure for it.  The clearly most successful of them
> (Wikipedia) has also built the infrastructure to foster a sense of
> community, and to make it possible for the members to communicate among
> themselves about the work.
>
> The infrastructure (at least for Wikipedia) is also made so that while
> it is extremely easy to do damage, it is also very easy to fix up, and
> the community can keep track of that and fix it as necessary.
>
> I think it is possible to make the same happen with RubyGems and RPA.
> We just need to make the infrastructure that makes it EASY for people 
> to
> help, and make it non-rewarding to damage the dataset.  Wikipedia does
> this by making it easy for people to see what changes happen, and
> keeping history so it is easy to revert vandalism.  So: Vandalism 
> really
> make little difference, and disappear quickly.
>
> You also need to motivate people to contribute.  There are a few
> different aspects to this.
>
> First of all, it is making the right things easy and the wrong things
> harder.  This is done in Wikipedia etc by the ease of entering things
> and the number of ways people can help fix, but I think this property 
> will
> miss from any system where every free software author assign the
> categories (or keywords, or whatever you call them) to his software
> locally.  (I'll describe a system that I think would actually work for
> RubyGems below.)
>
> Second, it is increasing the reward for doing the right thing.  It
> shouldn't just be easy to do the right thing - it should feel good.  
> One
> of the ways to do this is to do a so-called "step up" to a larger goal
> that the person feels more about.  For Wikipedia, the step up goal is 
> to
> "Spread education to everyone".  Another way to give people positive
> feedback on good behaviour.  An example of this is Ward's signature and
> his "Thanks for your careful attention to detail!" on the c2.com Wiki.
> (This also slot the submitters into a role when they submit stuff - a
> very effective technique for manipulation, as people don't want to let
> that positive role down.)
>
> Third, make doing stuff into a habit - because people then do it a lot
> and get good at it.  Wiki, Wikipedia, and the Open Directory all do 
> this
> - because people can work on more than just their own stuff.
>
>
> Now, putting all of this together into a working design for how to get
> RubyGems properly categorized:
>
> * Set up a collection site for RubyGems.  You want people to upload
>   their gems there, so all gems are available in a central location for
>   categorization.  (They don't have to be available for general
>   download, but they must be available for inspection for labellers).
>
> * Make an interface where the authors are profusely thanked, and told
>   about how this helps the entire Ruby community, and this hopefully
>   will make all of the world a better place.  Also indicate where the
>   author can help categorize his own and other packages.
>
> * Make each category include a description of what should go into the
>   category, in addition to the category name, and extra keywords that
>   the category should also show up for.
>
> * Make the category assignment system so that you FIRST search for
>   categories by keywords (+ to enforce a keyword, normally OR the
>   keywords to make sure that people get ALL the possibly relevant
>   categories).  AFTER you have searched, you can choose "Add Category" 
> at
>   the BOTTOM of the form.  And there is a new search form, with your
>   entered keywords, just above the place where you press to add a
>   category.
>
> * Only allow adding categories from the next higher level in the
>   hierarchy, where you'll see all the already existing subcategories.
>
> * Make the "Add Category" go through a separate confirm page before
>   getting to the information entering page for categories.  On this
>   page, explain how important using the existing categories is, and 
> that
>   adding a new category is a fairly big deal - but also the right thing
>   to do if it is the right thing to do.  Add a search box with the
>   keywords here, too.  And say "Thank you for your attention to detail.
>   This categorization system is made to help Ruby users find the
>   software they need, and by maintaining its quality, you make the 
> world
>   better for everyone (and hopefully help make Ruby a viable language
>   for all your own use, too, by getting more people to help.)"  Or
>   something like that.
>
> * Make the category addition page require a list of search keywords 
> that
>   can match the category (any that are not in the name of the category
>   already), and a large box with "Description".  Disallow adding
>   categories with too short a description.
>
> * When the user is through adding a category, allow him to search for
>   other software that should ALSO be added to the category, to make 
> sure
>   that the category is good.
>
> * Have a separate page with Recent Changes, which include lists of new
>   categories and what software packages have been added to what
>   categories.  This allows separate review.
>
> * Make any look at a package see various levels of detail of the
>   package, including inspecting source code and change frequency, in
>   order to determine how to categorize the package.
>
> * Make it easy to merge categories, and to remove (and restore, with
>   contents) categories.
>
> I think the above (along with a manifesto describing how important
> categorizing is) should make distributed volunteers create good
> categorization.
>
> And I hope that if you implement this, you'll let RPA (and any other
> packagers) hang on the same framework ;-)
>
> Eivind.
> _______________________________________________
> Rubygems-developers mailing list
> Rubygems-developers at rubyforge.org
> http://rubyforge.org/mailman/listinfo/rubygems-developers



More information about the Rubygems-developers mailing list