[Nitro] OG vs Active Record

Robert Mela rob at robmela.com
Mon Aug 20 13:15:56 EDT 2007

I'm finding an inherent trap in ORMs.

Consider the case where a Country has Provice/Region/State, and each of 
those has Cities

Processing all cities in a country in the old days was a three-way join 
with control-break processing.  One query, one result set.

ORMs have some inherent traps which are only partially solved by preload.

In the simplest case a preload two-levels deep can have the same 
efficiency as a single query with control-break processing.  For example

countries.each do |c|
   c.provinces.each do |p|
       p.cities.each do |c|

Without using preload,  100 countries of 100 provincies of 100 cities 
would result in 1 fetch of 100 countries.  For each country, one fetch 
of 100 provinces.   That's 100 fetches.    For each province, one fetch 
of 100 cities.   We've now got 10,000 fetches.    That's 10,000 
round-trips to the database.

With preloads, of course, it's a single round trip.

Unfortunately, preloads don't work when there's a lot of application 
logic.   For example:

countries.each do |c|
    c.roundish_provinces_with_lots_of_lakes.each do |p|
          p.cities_with_more_than_ten_macdonalds.eac do |c|

There is no way to tie a preload to this.    If 
"roundish_provinces_with_lots_of_lakes" involves addtional joines, and 
so does "cities_with_more_than_ten_macdonalds" requires a lookup into a 
MacDonald's count table for each entry, you could be looking at 
thousands of round-trips to the database with no opportunity for 
precaching or preload.

If you want speed you need to be very careful about how you use ORMs 
with large datasets.

I recently found an ActiveRecord retrieval that made 8 thousand 
individual database calls totally 4 megabytes of generated SQL.   All 
calls were to the same, self-referential table.

The table only has 6,000 rows...

I wound up yanking acts_as_tree and a homegrown acts_as_hierarchy, 
fetching the entire table at once -- all 6,000 rows -- and building my 
own non-Object Oriented tree inside the Model.   Fetching 6,000 rows 
across a network *and* building the internal tree structure takes 3 
milliseconds.   Navigating the structure is microseconds.    Before that 
the operation took anywhere from 10 seconds to two minutes....

I've been thinking about that experience and realized that, as systems 
get large, it makes a lot of sense to have the models be as simple as 
possible -- doing little more than enforcing simple constraints -- and 
have most business and application logic at the controller level. 

If that's not possible, then there needs to be a ModelLogic layer... MVC 
becomes MMVC... which despite initial appearances does not parse as a 
Roman Numeral.

Jonathan Buch wrote:
> Hi,
>> I'm relatively new to Ruby and am trying to decide between diving into
>> ActiveRecord and OG.... I'd appreciate any advice.
>> The useage case/scenario is:  Single 'amateur' developer, large
>> numbers of databases, large databases (10'sGB/table, 10's tables/DB),
>> multiple ruby scripts accessing the dbserver directly (no web
>> interface - at this stage.)
> Multi-database is not as thoroughly tested as single, but the number
> (and also the sizes of the tables) do not make Og slow.  It's just how
> much you need to transfer between Og/DB.
> On startup time, if you stay below a 100 tables, it'll still be quite
> ok on a halfway capable computer.  There are a few tweaks you can do,
> but doing them means your app and db design is 'finished'.
>> I understand that this is a little like asking a parent if their child
>> is beautiful, but was hoping for some frank feedback on the following.
>> 1) Am I right that OG largely overcomes the issues discussed in this
>> (long) AR thread (I summarize a key issue below):
>> http://rubyurl.com/xWb
> Yes, by using Og basically _all_ your information is in the model file.
> Even the DB layout is there (which is what differenciates Og from AR).
>> and here:
>> http://aralbalkan.com/764
> Yep, this is exactly what Og does.  :)
>> A summary of a key AR issue (for me) is:
>> <quote>
>> Whether its in migrations or in the DB schema, I have to specify all
>> the columns as well as their lengths and unique constraints to have
>> the database created.  Then I have to specify the validates_* rules
>> in the model to match.  Finally I have to keep both of them in sync
>> over time.
>> Why not do it all inside the model and have that set up all the
>> validates_* rules in a single step?
>> </quote>
> This certainly could be done and it has been asked for already on irc
> some year ago.  I wasn't too sure about that back then, and I still
> hesitate on that.  We could make it an option, `Og.auto_validations`
> which then does the trick....  G?
> Personally, I wouldn't use it largly because I tend to see some of the
> validations as useless and time consuming (`uniq` for example, which is
> again done on DB level, where it actually belongs to.
> Some of the validations like 'belongs_to' (meaning it shouldn't exist
> when the 'main' object disappears) can only be done via certain DBs
> (like psql) with foreign key constraints.....  There's just so much
> stuff the DB does better, that's what I think, anyway.  :P).
> Anyway, like I said, the idea isn't all too bad, I'm just not sure
> if it's worth it, as the models/validations don't get changed so
> often (I draw stuff on real paper before doing DB work :P).
>> 2) Has anyone compared the performance of AR vs OG. The following
>> thread about AR's cpu useage, surprised me:
>> http://rubyurl.com/vFz
>> I see the same performance on my system (script attached).  Has anyone
>> run this test under OG?
> I'd love if you could adapt (I haven't had a look at it yet) that script
> and run it on Og?
> I would think that the performance of AR and Og could be quite comparable.
> (Though, I could be horribly wrong, when I started with Nitro I thought
> we were slower than rails, which turned out quite a wrong assumption ;))
> But, I think you'll see the same 'behaviour' with Og and MySQL.  MySQL
> is just too old, too optimized and there's too much movement behind it
> for Og to really stress it.  At least that's my guess.  ;)
>> 3) How well is MySQL supported (bearing in mind that OG is at 0.5), or
>> is most development done with Postgresql?  On MySQL, how about the
>> MyISAM and Memory Engine/tables?
> Og is at 0.50, so about 50 revisions (actually I think more like 35
> public ones..), not the version number '0.5'.
> MySQL is (and always has been) actually the 'premier' DB used by the head
> maintainer.   It's just always those weird other people like me who just
> use PostgreSQL.  ;)  (Which means, that I optimize Og for my psql usage,
> only run the specs with it etc, which means that sometimes the psql part
> is better supported than the mysql/sqlite one).
> In-memory SQLite is actually quite nicely supported, and I know one from
> IRC who used it to speed up Og quite a bit (synchronizing the mem with
> a real backend now and then).
> I know nothing about MySQL memory engines, so can't comment.
> Hope that helps,
> Jo

-------------- next part --------------
A non-text attachment was scrubbed...
Name: rob.vcf
Type: text/x-vcard
Size: 116 bytes
Desc: not available
Url : http://rubyforge.org/pipermail/nitro-general/attachments/20070820/2d608371/attachment.vcf 

More information about the Nitro-general mailing list