[Nitro] Complex selections

James Britt james.britt at gmail.com
Wed Mar 23 10:32:36 EST 2005


George Moschovitis wrote:
...
> Can you suggest an API for this? I dont have something available, but
> I would like to add this functionality. Can you post the SQL query you
> would actually use and then we can extract some kind of API.

The pseudo-code I posted turned out to be essentially the code I'm using 
right now.  It works, but it's very slow, and really only does part of 
what I want.

   def find_by_tags
     tag_set = context.params[ 'tags' ].to_s.split( ' ' )
     tag_set.map!{ |t| t.downcase }
     @resources = []
     Resource.all.each { |r|
         r.tags.each { |t|
            @resources.push( r) if tag_set.include? t.name.downcase
         }
     }
     @resources.uniq!
   end

Given a tag set of ['foo', 'bar', 'baz'], this finds all resources that 
have any or all of these as tags.

There is no way this is even remotely as efficient as actual SQL.  My 
production DB has several hundred resources, each with an average of 
three tags.  I can't have code walk every item.

The SQL I have used in the first version is pretty gnarly, and part of 
the issue is that MySQL does not quite support sub-selects, and the 
support differs from 3.x to 4.x.  I ended up (due to a requirement to 
use a 3.x MySQL instance) using 2 calls.    Getting this right across 
multiple databases so as to hide the SQL details from the user can be 
tricky.

To make things more complex, what I'm really looking for is a 
combination of AND and OR in the selection.

For example, given a URL such as
http://foo.com/find_by_tags/foo,bar+baz

I want to locate resources that have  foo OR bar AND baz

My partial-Rails version (I ditched a lot of the built-in stuff) 
constructs chunks of SQL based on assorted parameters, joins them up, 
and does a find_by_sql to locate things.

In the end, given the nature of the query, that may be the only way to 
get reasonable performance over large data sets.


> 
> Btw this is similar to another feature I want to add: join queries.
> 
> Let me demonstrate with an example.
> 
> class User
>   property :name, String
>   has_many :comments, Comment
> end
> 
> class Comment
> 
>   belongs_to :user, User
> end
> 
> u = User['george']
> 
> u.comments(:join => User)

If there are common usage patterns then it may be reasonable to 
construct SQL templates for each database adapter and wrap the calls in 
high-level Ruby constructs.

For example, the ~rails version has this helper method:

def make_tag_union_post_id_set( tags )
  raise( "make_tag_union_post_id_set has nil tag array" )  unless tags
  raise( "make_tag_union_post_id_set given something not an array:
       #{tags.class}" )  unless tags.class == Array
  raise( "make_tag_union_post_id_set given empty tag
          array" )  unless tags.size > 0
  tag_set_str = tags.map { |t| "'#{t}'" }.join( ', ')
  sql = "select t.* from tags t  where t.tag in ( #{tag_set_str} ) "
  tags = Tag.find_by_sql( sql )
  tags.map{ |t| t.post_id }
end

There's a similar method for intersection that boils down to this:

tag_set_str = tags.map { |t| "'#{t}'" }.join( ', ')
sql = "select t.*, count(tag) as tag_count
        from tags t  where t.tag in ( #{tag_set_str} )
        group by t.post_id  "
tags = Tag.find_by_sql( sql )
id_set = [ -1 ]
tags.each{ |t|
   id_set << t.post_id  if t.tag_count.to_i == tcnt.to_i
}

(This is based on a SQL hack I found on a MySQL chat board, as MySQL
has poor INTERSECTION support.  Code for PostgreSQL would likely be more 
straightforward.)


These are used to find a set of post/resource IDs, which is used for 
another SQL select to get the final results.

Perhaps then there is a way to parameterize such calls to dynamically 
build SQL appropriate to the objects|properties of interest, given that

parent_obj.child_obj_set[]

is a common structure, as is selection of parent_obj based on some 
criteria applied to child_obj_set properties.

James



More information about the Nitro-general mailing list