From thibaut.barrere at gmail.com Sat Jul 14 18:03:16 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Sun, 15 Jul 2007 00:03:16 +0200 Subject: [Activewarehouse-discuss] Tiny patch for the delimited parser (error message) Message-ID: <4a68b8cf0707141503r1d249dfeud39f79ee10703654@mail.gmail.com> Hi! I've started using AW-ETL a couple of weeks ago, and I'm very happy with it so far. I've been using SSIS a lot just before, and I must say that AW is spot on, a lot easier to manipulate and maintain. So well, thanks! I've attached a really tiny patch for something which confused me when I first started using it: I got error messages talking about 'The number of rows' whereas it was more likely the number of columns or fields, when working with the delimited parser. FWIW and if it can be useful to newcomers, here are a couple of caveats I've met (not caused by AW): - FasterCSV, which is used by the underlying delimited parser, cannot handle quotes escaped with a backslash \" . - if you reference the Rails environment in your .ctl files, $KCODE will be set to 'U' by Rails - this will impact the way FasterCSV understands the data as well (I'm manually setting $KCODE back to 'NONE' to process latin-1 data in some cases) - getting the bulkload to work utf-8 and mysql wasn't trivial (see http://bugs.mysql.com/bug.php?id=10195 for more details) cheers Thibaut -- http://www.dotnetguru2.org/tbarrere/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070715/d7dccfe3/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: delimited_parser_error_message.diff Type: application/octet-stream Size: 757 bytes Desc: not available Url : http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070715/d7dccfe3/attachment.obj From anthonyeden at gmail.com Sun Jul 15 13:10:14 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Sun, 15 Jul 2007 13:10:14 -0400 Subject: [Activewarehouse-discuss] Tiny patch for the delimited parser (error message) In-Reply-To: <4a68b8cf0707141503r1d249dfeud39f79ee10703654@mail.gmail.com> References: <4a68b8cf0707141503r1d249dfeud39f79ee10703654@mail.gmail.com> Message-ID: I'm glad that it is serving you well. You absolutely correct on the patch, it should be fields, not rows, thanks. I'll apply the patch for the 0.9 release. Thanks as well for this tips you've provided...I'll add them to the ETL documentation. V/r Anthony On 7/14/07, Thibaut Barr?re wrote: > > Hi! > > I've started using AW-ETL a couple of weeks ago, and I'm very happy with > it so far. I've been using SSIS a lot just before, and I must say that AW is > spot on, a lot easier to manipulate and maintain. So well, thanks! > > I've attached a really tiny patch for something which confused me when I > first started using it: I got error messages talking about 'The number of > rows' whereas it was more likely the number of columns or fields, when > working with the delimited parser. > > FWIW and if it can be useful to newcomers, here are a couple of caveats > I've met (not caused by AW): > - FasterCSV, which is used by the underlying delimited parser, cannot > handle quotes escaped with a backslash \" . > - if you reference the Rails environment in your .ctl files, $KCODE will > be set to 'U' by Rails - this will impact the way FasterCSV understands the > data as well (I'm manually setting $KCODE back to 'NONE' to process latin-1 > data in some cases) > - getting the bulkload to work utf-8 and mysql wasn't trivial (see http://bugs.mysql.com/bug.php?id=10195 > for more details) > > cheers > > Thibaut > -- > http://www.dotnetguru2.org/tbarrere/ > > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > > > -- Cell: 808 782-5046 Current Location: Melbourne, FL -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070715/9197b755/attachment-0001.html From thibaut.barrere at gmail.com Mon Jul 16 15:20:52 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Mon, 16 Jul 2007 21:20:52 +0200 Subject: [Activewarehouse-discuss] Errors handling, data and process reliability, raw thoughts! Message-ID: <4a68b8cf0707161220l43f6ff55ld7aa2fe346626c0d@mail.gmail.com> Hi, I'd like to share my thoughts about the way I handle errors (and clean-ups) with AW-ETL, and I'd be interested in any type of feedback from other users on this topic (or things available in AW-ETL that will help me make things better). What I'm doing today: - set_error_threshold 1 in control files - custom transform to ensure that the required fields are available: ensure_fields_presence(:required_fields => required_fields,:available_fields => available_fields). Stop the process if any error. Help detecting CSV format changes, for instance. - block tranforms with begin/rescue to put a default value (eg: date conversions...) - custom clean-up transform which returns decode_table[value] || value - use after_read(:print_row) and before_write(:print_row) for debugging purposes - look at etl.log What I'm thinking about: - at the end of each .ctl file, run a quality screen like described by Ralph Kimball (most likely a set of RSpec specifications, calling the database to assert various things, using ActiveRecord for instance) - adding a special "No matching dimension record found" record in each dimension table (like "Unknown date", "Somewhere in the future", "Unknown customer") - fill a log file or log table with all the errors (in my case, the source system can be fixed, provided I give the source system maintainer accurate data to fix it) - flag (special column) or remove offending records in the table itself - unit-test my .ctl files (mocking input and output, using RSpec) while developing them - generate a mail which gives statistics about what went wrong and what went fine Things I think could be worth adding to AW-ETL: - ability to define a call-back to be called on each error (for a given transform, or for the whole control file). This call-back could return nil if it chooses to eliminate the row, throw an exception on blocking errors, or just return the row. The call-back could also be used to log things (in file or in database). - or anything else to allow error manipulations, providing the same level of processing - what's currently in there ? Well, all this is a bit raw. I'd be interested in any feedback from the users and/or from Anthony. FWIW, I've enclosed a couple of transforms and tools I've written so far. regards, Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070716/d63816f8/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: dimension_migration_helper.rb Type: application/octet-stream Size: 352 bytes Desc: not available Url : http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070716/d63816f8/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: ensure_fields_presence.rb Type: application/octet-stream Size: 358 bytes Desc: not available Url : http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070716/d63816f8/attachment-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: clean_up_transform.rb Type: application/octet-stream Size: 287 bytes Desc: not available Url : http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070716/d63816f8/attachment-0002.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: trim_row_processor.rb Type: application/octet-stream Size: 298 bytes Desc: not available Url : http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070716/d63816f8/attachment-0003.obj From anthonyeden at gmail.com Mon Jul 16 15:42:04 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Mon, 16 Jul 2007 15:42:04 -0400 Subject: [Activewarehouse-discuss] Errors handling, data and process reliability, raw thoughts! In-Reply-To: <4a68b8cf0707161220l43f6ff55ld7aa2fe346626c0d@mail.gmail.com> References: <4a68b8cf0707161220l43f6ff55ld7aa2fe346626c0d@mail.gmail.com> Message-ID: On 7/16/07, Thibaut Barr?re wrote: > > Hi, > > I'd like to share my thoughts about the way I handle errors (and > clean-ups) with AW-ETL, and I'd be interested in any type of feedback from > other users on this topic (or things available in AW-ETL that will help me > make things better). > > What I'm doing today: > - set_error_threshold 1 in control files That's pretty aggressive, although given the lack of error hooks (as you describe below) I can understand. Another thing you can do is use command line switches when developing or testing ETL scripts. The command line options are documented in http://activewarehouse.rubyforge.org/docs/activewarehouse-etl.html#command-line_options - custom transform to ensure that the required fields are available: > ensure_fields_presence(:required_fields => required_fields,:available_fields > => available_fields). Stop the process if any error. Help detecting CSV > format changes, for instance. Definitely going to add this. - block tranforms with begin/rescue to put a default value (eg: date > conversions...) > - custom clean-up transform which returns decode_table[value] || value > - use after_read(:print_row) and before_write(:print_row) for debugging > purposes > - look at etl.log > > What I'm thinking about: > - at the end of each .ctl file, run a quality screen like described by > Ralph Kimball (most likely a set of RSpec specifications, calling the > database to assert various things, using ActiveRecord for instance) A first stab at screens is in the Subversion repository. Currently it launches right before the post processes, however I think I'm going to make it work *after* the post processes. This, in addition with temp table support (also added in 0.9) should make for a much more stable and safe environment (since screens can be fatal and die leaving the temp tables intact and not affecting the production tables). > - adding a special "No matching dimension record found" record in each > dimension table (like "Unknown date", "Somewhere in the future", "Unknown > customer") I do this right now in pretty much every dimension, using an Enumerable source as the very first record going into the database. - fill a log file or log table with all the errors (in my case, the source > system can be fixed, provided I give the source system maintainer accurate > data to fix it) We probably need to support an error log per control file model a-la Oracle SQL Loader. - flag (special column) or remove offending records in the table itself This is in the source or the destination? This is where being able to set the threshold to a higher number in conjunction with an error callback mechanism would probably work well. - unit-test my .ctl files (mocking input and output, using RSpec) while > developing them This would be really cool...I have no idea how to get there. - generate a mail which gives statistics about what went wrong and what went > fine Probably a good idea, although maybe the callbacks below could just be hooked into some sort of notification system. Things I think could be worth adding to AW-ETL: > - ability to define a call-back to be called on each error (for a given > transform, or for the whole control file). This call-back could return nil > if it chooses to eliminate the row, throw an exception on blocking errors, > or just return the row. The call-back could also be used to log things (in > file or in database). Agreed. - or anything else to allow error manipulations, providing the same level of > processing - what's currently in there ? Other than the screens coming in 0.9, not much. I like the callback methods. Well, all this is a bit raw. I'd be interested in any feedback from the > users and/or from Anthony. FWIW, I've enclosed a couple of transforms and > tools I've written so far. Thanks for enclosing those scripts. V/r Anthony -- Cell: 808 782-5046 Current Location: Melbourne, FL -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070716/e520594d/attachment.html From thibaut.barrere at gmail.com Mon Jul 16 16:10:26 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Mon, 16 Jul 2007 22:10:26 +0200 Subject: [Activewarehouse-discuss] Errors handling, data and process reliability, raw thoughts! In-Reply-To: <4a68b8cf0707161309wa2b671bg87ab92b52463b938@mail.gmail.com> References: <4a68b8cf0707161220l43f6ff55ld7aa2fe346626c0d@mail.gmail.com> <4a68b8cf0707161309wa2b671bg87ab92b52463b938@mail.gmail.com> Message-ID: <4a68b8cf0707161310m6bace159u6209c6b88d6e4e3@mail.gmail.com> > - set_error_threshold 1 in control files > > > That's pretty aggressive, although given the lack of error hooks (as you > describe below) I can understand. Another thing you can do is use command > line switches when developing or testing ETL scripts. The command line > options are documented in http://activewarehouse.rubyforge.org/docs/activewarehouse-etl.html#command-line_options > > Thanks for the link. A first stab at screens is in the Subversion repository. Currently it > launches right before the post processes, however I think I'm going to make > it work *after* the post processes. This, in addition with temp table > support (also added in 0.9) should make for a much more stable and safe > environment (since screens can be fatal and die leaving the temp tables > intact and not affecting the production tables). > I'll have a look, thanks! Those are very valuable features when going to production, and I'm about to get my first gig signed, using AW. - adding a special "No matching dimension record found" record in each > > dimension table (like "Unknown date", "Somewhere in the future", "Unknown > > customer") > > > I do this right now in pretty much every dimension, using an Enumerable > source as the very first record going into the database. > Hadn't thought about using it this way - thanks for the tip. - unit-test my .ctl files (mocking input and output, using RSpec) while > > developing them > > > This would be really cool...I have no idea how to get there. > I'll try to prototype something and report back - I expect it will require at least a bit of refactoring in AW-ETL. Clearly this would add a lot of stability to the whole process. best Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070716/f2556eaf/attachment.html From thibaut.barrere at gmail.com Tue Jul 17 02:30:32 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Tue, 17 Jul 2007 08:30:32 +0200 Subject: [Activewarehouse-discuss] role-playing dimensions Message-ID: <4a68b8cf0707162330y44253700qd1f41e468d9a8226@mail.gmail.com> Hi, I'm starting to rely on two or more dates for one fact record (ie: creation date, modification date). I'm curious here: do you define two views relying on the same date_dimension behind to query properly with role-playing dimensions like this ? Or do you use some kind of subquery mechanism, alias, something else ? (I'm using mysql) -- Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070717/be84f368/attachment-0001.html From thibaut.barrere at gmail.com Tue Jul 17 18:55:58 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Wed, 18 Jul 2007 00:55:58 +0200 Subject: [Activewarehouse-discuss] unit testing a control file Message-ID: <4a68b8cf0707171555o4e61ab7ewa76c62c64aebaf98@mail.gmail.com> Hi, here's my first and very raw (but quite successful) attempt at writing an automated test for a real .ctl file, without requiring any modifications to the .ctl file itself nor to AW-ETL. The idea here is to mock the sources and destinations of a real .ctl file: I replace all the sources by one mock source (eg: array of hashes), and replace all the destinations by one, on which I assert to see if the requirements are met. The connection to :etl_execution and the job creation are also mocked, to avoid a database dependency and focus on the transforms. With the plumbing removed, the test can be written: describe "test.ctl" do it "should store :first_name + :last_name under :name" do load 'test.ctl', [ {:first_name => 'john',:last_name => 'barry'} ] @destination.should_receive(:write).with( { :first_name => 'john', :last_name => 'barry', :name => 'john barry' } ) end end This specification is written with RSpec and takes benefits of the mocking abilities it provides (will be very practical to mock dimensions foreign-key look-ups as well). The plumbing was surprisingly easy to write, although it definitely requires polishing (for instance, running two tests in a row won't work today!). I think it would be reasonably easy to make AW-ETL more test-friendly by refactoring some areas in order to remove part or all of the plumbing I wrote to get this to work. What do you think ? cheers Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070718/54355704/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: test.ctl Type: application/octet-stream Size: 221 bytes Desc: not available Url : http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070718/54355704/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: etl_spec.rb Type: application/octet-stream Size: 1435 bytes Desc: not available Url : http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070718/54355704/attachment-0001.obj From public at fireraven.com Tue Jul 17 19:20:24 2007 From: public at fireraven.com (Public@FireRaven.com) Date: Tue, 17 Jul 2007 19:20:24 -0400 Subject: [Activewarehouse-discuss] =?utf-8?q?Using_activewarehouse-etl_wit?= =?utf-8?q?h_JRuby=3F?= Message-ID: <7135dc48e3188379d92998d45eba6e67@fireraven.com> Has anyone used activewarehouse-etl with JRuby? If so, can you tell me how to configure the database for use with the JRuby JDBC adapter? From anthonyeden at gmail.com Tue Jul 17 23:35:19 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Tue, 17 Jul 2007 23:35:19 -0400 Subject: [Activewarehouse-discuss] role-playing dimensions In-Reply-To: <4a68b8cf0707162330y44253700qd1f41e468d9a8226@mail.gmail.com> References: <4a68b8cf0707162330y44253700qd1f41e468d9a8226@mail.gmail.com> Message-ID: I use role plaing dimensions for that. V/r Anthony On 7/17/07, Thibaut Barr?re wrote: > > Hi, > > I'm starting to rely on two or more dates for one fact record (ie: > creation date, modification date). > I'm curious here: do you define two views relying on the same > date_dimension behind to query properly with role-playing dimensions like > this ? > Or do you use some kind of subquery mechanism, alias, something else ? > (I'm using mysql) > > -- Thibaut > > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > > -- Cell: 808 782-5046 Current Location: Melbourne, FL -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070717/7cb3c494/attachment.html From rick at rickbradley.com Wed Jul 18 16:08:28 2007 From: rick at rickbradley.com (Rick Bradley) Date: Wed, 18 Jul 2007 15:08:28 -0500 Subject: [Activewarehouse-discuss] foreign keys with AW-ETL Message-ID: <4acc49400707181308i39b7f36as583103746fd69d43@mail.gmail.com> Hi all, I'm trying to puzzle through some of the AW-ETL capabilities and usages and am stumped (even having looked at the ETL source code) over something. Let me try to describe what I'm trying to accomplish and how I'm trying to go about it and hopefully someone can set me straight on what I'm doing wrong. I've got a source database and a destination database. They both have the same simple schema with two tables. The schema definition is like so: ActiveRecord::Schema.define(:version => 1) do create_table :tab01, :force => true do |t| t.column :tab02_id, :integer t.column :name, :string, :limit => 20 t.column :data, :string, :limit => 20 end create_table :tab02, :force => true do |t| t.column :value, :integer t.column :some_text, :string, :limit => 20 end end The column "tab02" in table tab01 contains the value of an id in table tab02. I set up my source database with the following initial data (specified in "YAML fixture" format): tab01.yml: record_001: id: 1 tab02_id: 2 name: adam data: x record_002: id: 2 tab02_id: 1 name: cain data: y tab02.yml: record_001: id: 1 value: 12 some_text: bacon record_002: id: 2 value: 13 some_text: eggs And I set up my destination databse with the following initial data: tab02.yml: record_001: id: 1 value: 14 some_text: ham That is, I already have a record in tab02, which, on conversion, should force the 2 new incoming records from source to come in afterwards (i.e., get "id" values 2 and 3). My goal is to maintain the foreign key relationship and therefore want to see the tab02_id values for the converted records having values 3 and 2 the "adam" and "cain" records (respectively). I've specified this more formally with a test: require File.dirname(__FILE__) + '/../test_helper.rb' class ConversionTest < Test::Unit::TestCase def test_conversion_should_copy_data_from_source_to_destination with_db 'data_destination_development' do assert_equal 2, ActiveRecord::Base.connection.select_value("select count(*) from tab01").to_i assert_equal 3, ActiveRecord::Base.connection.select_value("select count(*) from tab02").to_i end end def test_conversion_should_leave_source_data_intact with_db 'data_source_development' do assert_equal 2, ActiveRecord::Base.connection.select_value("select count(*) from tab01").to_i assert_equal 2, ActiveRecord::Base.connection.select_value("select count(*) from tab02").to_i end end def test_primary_keys_should_be_distinct with_db 'data_destination_development' do assert_equal [1, 2], ActiveRecord::Base.connection.select_values("select id from tab01").sort.map(&:to_i) assert_equal [1, 2, 3], ActiveRecord::Base.connection.select_values("select id from tab02").sort.map(&:to_i) end end def test_both_tables_should_be_converted with_db 'data_destination_development' do assert_equal ['adam', 'cain'], ActiveRecord:: Base.connection.select_values("select name from tab01 order by id") assert_equal ['14', '12', '13'], ActiveRecord:: Base.connection.select_values("select value from tab02 order by id") assert_equal ['ham', 'bacon', 'eggs'], ActiveRecord:: Base.connection.select_values("select some_text from tab02 order by id") end end def test_foreign_should_be_preserved with_db 'data_destination_development' do assert_equal ['3', '2'], ActiveRecord::Base.connection.select_values("select tab02_id from tab01 order by id") end end end (yes, there's a little scaffolding to allow this to be run after an ETL run -- we're basically learning the AW-ETL tool step by step by using tests to verify that it works (or that we understand how it works, more like)) Anyway, the tests all pass except for the last one, which fails like so: /usr/local/bin/ruby -Ilib "/usr/local/lib/ruby/gems/1.8/gems/rake-0.7.3/lib/rake/rake_test_loader.rb" "/Users/rick/svn/centerstone/conversion/data_management/scenarios/009_multiple_tables_with_foreign_key/conversion_test.rb" Loaded suite /usr/local/lib/ruby/gems/1.8/gems/rake-0.7.3 /lib/rake/rake_test_loader Started ...F. Finished in 0.088458 seconds. 1) Failure: test_foreign_should_be_preserved(ConversionTest) [/Users/rick/svn/centerstone/conversion/data_management/scenarios/009_multiple_tables_with_foreign_key/conversion_test.rb:35:in `test_foreign_should_be_preserved' /Users/rick/svn/centerstone/conversion/data_management/scenarios/009_multiple_tables_with_foreign_key/../test_helper.rb:12:in `with_db' /Users/rick/svn/centerstone/conversion/data_management/scenarios/009_multiple_tables_with_foreign_key/conversion_test.rb:34:in `test_foreign_should_be_preserved']: <["3", "2"]> expected but was <["2", "1"]>. 5 tests, 10 assertions, 1 failures, 0 errors "What's the ETL conversion?" is probably the most important question. Here's the .ctl files I'm using (they are processed in numeric order, if that makes a difference -- I tried them with tab01 before tab02, and here they are now with tab02 before tab01 -- in case the tab02 data had to exist before the tab01 foreign key could be updated): 001_tab02.ctl: source :in2, { :adapter => 'postgresql', :database => 'data_source_development', :table => 'tab02', :username => 'rick' }, [ :id, :value, :some_text ] destination :out2, { # writing directly to database.. :adapter => 'postgresql', # this is default, anyway :database => 'data_destination_development', :table => 'tab02', :username => 'rick', :natural_key => [:value] }, { :order => [:value, :some_text], } before_write :check_exist, :table => "tab02", :columns => [:value] 002_tab01.ctl: source :in, { :adapter => 'postgresql', :database => 'data_source_development', :table => 'tab01', :username => 'rick' }, [ :id, :tab02_id, :name, :data ] destination :out, { # writing directly to database.. :adapter => 'postgresql', # this is default, anyway :database => 'data_destination_development', :table => 'tab01', :username => 'rick', :natural_key => [:name] }, { :order => [:tab02_id, :name, :data], } transform :tab02_id, :foreign_key_lookup, { :resolver => SQLResolver.new('tab02', 'value') } before_write :check_exist, :table => "tab01", :columns => [:name] So, the $64 question is, what am I missing to allow the foreign key references to stay intact? We don't have an ActiveRecord model in this instance so I didn't feel right using ActiveRecordResolver. Thanks for any help! Rick -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070718/47d32c0a/attachment-0001.html From thibaut.barrere at gmail.com Wed Jul 18 16:26:39 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Wed, 18 Jul 2007 22:26:39 +0200 Subject: [Activewarehouse-discuss] Using activewarehouse-etl with JRuby? In-Reply-To: <7135dc48e3188379d92998d45eba6e67@fireraven.com> References: <7135dc48e3188379d92998d45eba6e67@fireraven.com> Message-ID: <4a68b8cf0707181326r752e32a5kd9f4fd762fb6baaa@mail.gmail.com> Hi > Has anyone used activewarehouse-etl with JRuby? If so, can you tell me how to configure the database for use with the JRuby JDBC adapter? Not running JRuby + AW-ETL myself, but did you have a look at http://www.headius.com/jrubywiki/index.php/Running_Rails_with_ActiveRecord-JDBC? -- Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070718/fbff7a99/attachment.html From thibaut.barrere at gmail.com Thu Jul 19 01:20:44 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Thu, 19 Jul 2007 07:20:44 +0200 Subject: [Activewarehouse-discuss] Using activewarehouse-etl with JRuby? In-Reply-To: References: <4a68b8cf0707181326r752e32a5kd9f4fd762fb6baaa@mail.gmail.com> Message-ID: <4a68b8cf0707182220y344047a8nd4afe4a6b175d907@mail.gmail.com> Hi Martin, > I did, but I was not sure if the YML file which activewarehouse-etl uses could be configured the same way as the Rails database.yml file? For most of it I think it's interpreted the same way - some parameters may not be passed (for instance, I'm not sure that 'encoding' is actually used in aw-etl). -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070719/64b70558/attachment.html From anthonyeden at gmail.com Thu Jul 19 07:56:23 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Thu, 19 Jul 2007 07:56:23 -0400 Subject: [Activewarehouse-discuss] Using activewarehouse-etl with JRuby? In-Reply-To: <4a68b8cf0707182220y344047a8nd4afe4a6b175d907@mail.gmail.com> References: <4a68b8cf0707181326r752e32a5kd9f4fd762fb6baaa@mail.gmail.com> <4a68b8cf0707182220y344047a8nd4afe4a6b175d907@mail.gmail.com> Message-ID: Version 0.9 is going to do away with passing the database parameters in the control file and replace that with a reference to the named connection in database.yml in the directory where your ETL scripts reside. At that point it will respect all of the AR config parameters from the YAML file. V/r Anthony On 7/19/07, Thibaut Barr?re wrote: > > Hi Martin, > > > I did, but I was not sure if the YML file which activewarehouse-etl uses > could be configured the same way as the Rails database.yml file? > > For most of it I think it's interpreted the same way - some parameters may > not be passed (for instance, I'm not sure that 'encoding' is actually used > in aw-etl). > > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > > -- Cell: 808 782-5046 Current Location: Melbourne, FL -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070719/d1eed6d4/attachment.html From rick at rickbradley.com Thu Jul 19 09:17:10 2007 From: rick at rickbradley.com (Rick Bradley) Date: Thu, 19 Jul 2007 08:17:10 -0500 Subject: [Activewarehouse-discuss] foreign keys with AW-ETL In-Reply-To: <4acc49400707181308i39b7f36as583103746fd69d43@mail.gmail.com> References: <4acc49400707181308i39b7f36as583103746fd69d43@mail.gmail.com> Message-ID: <4acc49400707190617h3d47c07at64dad1384cea3a3d@mail.gmail.com> Let me try this a different way (as I realize yesterday's mail was probably too much info)... How might one write a control file that will bring over a foreign key reference between two tables intact? Rick From rick at rickbradley.com Mon Jul 23 14:57:28 2007 From: rick at rickbradley.com (Rick Bradley) Date: Mon, 23 Jul 2007 13:57:28 -0500 Subject: [Activewarehouse-discuss] ETL evaluation article Message-ID: <4acc49400707231157k527b6602j1f9abc2cdcd544b1@mail.gmail.com> Hi all, We've been undertaking an evaluation of ETL tools, including ActiveWarehouse-ETL (as you may have inferred from my previous posts to the list), and I wrote up our thoughts on the applications we've looked at. Since we were doing this somewhat "reproducibly" in AW-ETL's case I thought this might be of interest to the discussion on unit testing ETLs, etc. Any comments are appreciated. http://www.rickbradley.com/articles/2007/07/23/evaluating-activewarehouse-etl Best, Rick From thibaut.barrere at gmail.com Mon Jul 23 16:48:06 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Mon, 23 Jul 2007 22:48:06 +0200 Subject: [Activewarehouse-discuss] ETL evaluation article In-Reply-To: <4acc49400707231157k527b6602j1f9abc2cdcd544b1@mail.gmail.com> References: <4acc49400707231157k527b6602j1f9abc2cdcd544b1@mail.gmail.com> Message-ID: <4a68b8cf0707231348n75bf2bffoc4b1c425b0f114d2@mail.gmail.com> Hi Rick, After reading your article, I think I understand your foreign keys question a bit better. I'll try to rephrase your need with my words. Tell me if I did not understand. You have a parent -> child structure in a source database. You want to add the content of the source database into a target database with the same schema. The target database will most likely already contains records. Some of the source database records are already in the target database. You want to preserve the parent -> child relationship. Here's a few steps I've used to handle such cases (not with AW-ETL yet, but for http://www.villanao.com for instance with SSIS): - define a reliable natural key for the parent table (put a unicity constraint if it's compatible with your environment) - start the copy with the parent table - load each record from the source database - check for its existence within the destination table using the natural key (using the CheckExistProcessor in AW-ETL for instance) - insert it in the destination only if it's not already there (you can generate a valid surrogate key with the SurrogateKeyProcessor in AW-ETL) - copy the children table - load each record - depending on your case, check if the record already exist (you'll need a natural key if you need to do this) - achieve a look-up to find the parent surrogate key in the target database (using the ForeignKeyLookupTransform in AW-ETL on the destination database - not sure if it's easy to do) - insert in the destination table This is the raw idea - obviously the CheckExistProcessor could eat up your time and you may want to improve performance by other techniques. Does it suit your need ? I think pretty much everything needed for this scenario is already available in AW-ETL today. hope this helps Thibaut Barr?re -- [blog] http://www.dotnetguru2.org/tbarrere On 7/23/07, Rick Bradley wrote: > > Hi all, > We've been undertaking an evaluation of ETL tools, including > ActiveWarehouse-ETL (as you may have inferred from my previous posts > to the list), and I wrote up our thoughts on the applications we've > looked at. Since we were doing this somewhat "reproducibly" in > AW-ETL's case I thought this might be of interest to the discussion on > unit testing ETLs, etc. > > Any comments are appreciated. > > > http://www.rickbradley.com/articles/2007/07/23/evaluating-activewarehouse-etl > > Best, > Rick > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070723/f12157c2/attachment.html From rick at rickbradley.com Mon Jul 23 17:52:00 2007 From: rick at rickbradley.com (Rick Bradley) Date: Mon, 23 Jul 2007 16:52:00 -0500 Subject: [Activewarehouse-discuss] ETL evaluation article In-Reply-To: <4a68b8cf0707231348n75bf2bffoc4b1c425b0f114d2@mail.gmail.com> References: <4acc49400707231157k527b6602j1f9abc2cdcd544b1@mail.gmail.com> <4a68b8cf0707231348n75bf2bffoc4b1c425b0f114d2@mail.gmail.com> Message-ID: <4acc49400707231452s374dfd20o6e1de9afb54d21ee@mail.gmail.com> Thibaut, Thanks for the reply. I think we're on basically the same page about the problem I'm working on, I just can't seem to get the foreign key transform to have an impact when it runs. I'm using the following transform line in the child control file (posted in the original overly long message): transform :tab02_id, :foreign_key_lookup, { :resolver => SQLResolver.new('tab02', 'value') } ... where tab02 is the name of the "parent" table, and the tab01.tab02_id column is the child's reference to the parent row. It just never seems to do the lookup, or doesn't assign the new foreign key value. I went ahead and added a surrogate key declaration to the parent table's control file: before_write :surrogate_key, :table => "tab02", :column => 'id' but ultimately trying a number of variations on placement and ways of specifying the connection, etc., I haven't had any success in getting the relationship to be maintained across the conversion. Thanks, Rick From anthonyeden at gmail.com Mon Jul 23 21:37:38 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Mon, 23 Jul 2007 21:37:38 -0400 Subject: [Activewarehouse-discuss] ETL evaluation article In-Reply-To: <4acc49400707231452s374dfd20o6e1de9afb54d21ee@mail.gmail.com> References: <4acc49400707231157k527b6602j1f9abc2cdcd544b1@mail.gmail.com> <4a68b8cf0707231348n75bf2bffoc4b1c425b0f114d2@mail.gmail.com> <4acc49400707231452s374dfd20o6e1de9afb54d21ee@mail.gmail.com> Message-ID: First let me say that you have a good article there, and I will be going through it with a fine-toothed comb to get a better idea of how I can improve AW-ETL. Now, onto the airing of dirty laundry: I'll just go out on a limb and say that FK lookup is broken in 0.8. I should be spanked for the lack of testing in AW-ETL, but I promise I'm working on making that better. :-) Anyhow, I believe I've fixed the FK lookups for 0.9, however I'm still not happy with the way they work. In the ETL execution tables I store information about the natural key and CRC for each row inserted into the destination. The goal of this is to eventually go to that database (or a cached version of it) to determine whether or not the record exists. I believe this can also be augmented to provide the foreign key lookup support by adding the primary key of the row in the execution tables. Anyhow, I'll be the first to admit that this needs to be fixed, and I'm working on it, however being an open source project all I can say is if I'm not fast enough then feel free to patch to your heart's delight and I will gladly apply patches that make this function correctly. :-) V/r Anthony On 7/23/07, Rick Bradley wrote: > > Thibaut, > Thanks for the reply. I think we're on basically the same page about > the problem I'm working on, I just can't seem to get the foreign key > transform to have an impact when it runs. I'm using the following > transform line in the child control file (posted in the original > overly long message): > > transform :tab02_id, :foreign_key_lookup, { :resolver => > SQLResolver.new('tab02', 'value') } > > ... where tab02 is the name of the "parent" table, and the > tab01.tab02_id column is the child's reference to the parent row. It > just never seems to do the lookup, or doesn't assign the new foreign > key value. > > I went ahead and added a surrogate key declaration to the parent > table's control file: > > before_write :surrogate_key, :table => "tab02", :column => 'id' > > but ultimately trying a number of variations on placement and ways of > specifying the connection, etc., I haven't had any success in getting > the relationship to be maintained across the conversion. > > Thanks, > Rick > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > -- Cell: 808 782-5046 Current Location: Melbourne, FL -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070723/df3d1650/attachment-0001.html From rick at rickbradley.com Tue Jul 24 10:19:12 2007 From: rick at rickbradley.com (Rick Bradley) Date: Tue, 24 Jul 2007 09:19:12 -0500 Subject: [Activewarehouse-discuss] ETL evaluation article In-Reply-To: References: <4acc49400707231157k527b6602j1f9abc2cdcd544b1@mail.gmail.com> <4a68b8cf0707231348n75bf2bffoc4b1c425b0f114d2@mail.gmail.com> <4acc49400707231452s374dfd20o6e1de9afb54d21ee@mail.gmail.com> Message-ID: <4acc49400707240719y5d1f6c9bq909387d6243d035d@mail.gmail.com> Anthony, Thanks for the quick response! I'll pull a more recent svn and see if our suite runs to completion there wrt foreign keys, etc. We're looking seriously at using AW-ETL to handle our data management -- if nothing else using it for integrating legacy database data and file data into production databases and deferring some of the decisions about building reporting databases to some later date. If we can make it work for our local production schema then it looks like we'll adopt it. That means that in the immediate short term I'll have a little time to devote to getting AW-ETL to work on our critical path (which includes this foreign key business). If we can get over that hump then we'll have more time in general to contribute to AW-ETL. I think, if I can get a tighter feedback loop (either from the list, or in an IRC channel, or private email, or whatever) then I could work through getting some improvements sent in, even if they are just documentation patches to clear up some things that were causing us headaches. Any thoughts? Best, Rick From rick at rickbradley.com Tue Jul 24 12:13:05 2007 From: rick at rickbradley.com (Rick Bradley) Date: Tue, 24 Jul 2007 11:13:05 -0500 Subject: [Activewarehouse-discuss] ETL evaluation article In-Reply-To: <4acc49400707240719y5d1f6c9bq909387d6243d035d@mail.gmail.com> References: <4acc49400707231157k527b6602j1f9abc2cdcd544b1@mail.gmail.com> <4a68b8cf0707231348n75bf2bffoc4b1c425b0f114d2@mail.gmail.com> <4acc49400707231452s374dfd20o6e1de9afb54d21ee@mail.gmail.com> <4acc49400707240719y5d1f6c9bq909387d6243d035d@mail.gmail.com> Message-ID: <4acc49400707240913p28aa2a56n7b5f9543d1f3779e@mail.gmail.com> Or, even more succinctly, to start with, if I wish to send test or docs or code patches against SVN trunk somewhere, where should I send them? Thanks, Rick From rick at rickbradley.com Wed Jul 25 16:39:31 2007 From: rick at rickbradley.com (Rick Bradley) Date: Wed, 25 Jul 2007 15:39:31 -0500 Subject: [Activewarehouse-discuss] bugs in foreign key transform, + questions (was: "ETL evaluation article") Message-ID: <4acc49400707251339q42141e69gd4cff31b17270731@mail.gmail.com> I've updated my tests to get in sync with the 0.9.* changes and have my non-foreign key tests all running properly again and am now focusing on getting foreign key references to working. I've found two show-stopper bugs, one which is trivial to patch, the other one I'm not sure where to patch yet. I've attached a patch against trunk for the first bug. The second bug is that if I specify the foreign key transform without an ActiveRecord::Base connection object then I get an error when running the transform since there is no active connection. My current workaround is to manually connect in the .ctl file right before the transform and then pass in that connection to the transform. E.g.: $config = YAML::load(IO.read(File.dirname(__FILE__) + '/database.yml')) def conn ActiveRecord::Base.establish_connection($config['data_destination_development']) ActiveRecord::Base.connection end transform :tab02_id, :foreign_key_lookup, { :resolver => SQLResolver.new('tab02', 'some_text', conn) } (otherwise I get the following error in the logs: Error transforming from localhost/data_source_development/tab01 on line 1: ActiveRecord::ConnectionNotEstablished) All this said, after looking at the foreign key transform code a number of times I still can't see that it's designed to do what I'm expecting it to do (i.e., help me maintain a foreign key relationship when converting data, as I diagrammed in the article I posted). Can anyone educate me on what I should be doing to take advantage of AW-ETL in this regard? Best, Rick -------------- next part -------------- A non-text attachment was scrubbed... Name: foreign-key-resolver.patch Type: application/octet-stream Size: 597 bytes Desc: not available Url : http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070725/add5bbbc/attachment.obj From anthonyeden at gmail.com Thu Jul 26 09:34:22 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Thu, 26 Jul 2007 06:34:22 -0700 Subject: [Activewarehouse-discuss] bugs in foreign key transform, + questions (was: "ETL evaluation article") In-Reply-To: <4acc49400707251339q42141e69gd4cff31b17270731@mail.gmail.com> References: <4acc49400707251339q42141e69gd4cff31b17270731@mail.gmail.com> Message-ID: On 7/25/07, Rick Bradley wrote: > I've updated my tests to get in sync with the 0.9.* changes and have > my non-foreign key tests all running properly again and am now > focusing on getting foreign key references to working. I've found two > show-stopper bugs, one which is trivial to patch, the other one I'm > not sure where to patch yet. > > I've attached a patch against trunk for the first bug. Thanks, I've applied your patch. > The second bug is that if I specify the foreign key transform without > an ActiveRecord::Base connection object then I get an error when > running the transform since there is no active connection. My current > workaround is to manually connect in the .ctl file right before the > transform and then pass in that connection to the transform. E.g.: > > $config = YAML::load(IO.read(File.dirname(__FILE__) + '/database.yml')) > > def conn > ActiveRecord::Base.establish_connection($config['data_destination_development']) > ActiveRecord::Base.connection > end > > transform :tab02_id, :foreign_key_lookup, { :resolver => > SQLResolver.new('tab02', 'some_text', conn) } > > > > (otherwise I get the following error in the logs: > > Error transforming from localhost/data_source_development/tab01 on > line 1: ActiveRecord::ConnectionNotEstablished) > I've updated it so it will accept a symbol referencing a connection defined in the database.yml file. > All this said, after looking at the foreign key transform code a > number of times I still can't see that it's designed to do what I'm > expecting it to do (i.e., help me maintain a foreign key relationship > when converting data, as I diagrammed in the article I posted). Can > anyone educate me on what I should be doing to take advantage of > AW-ETL in this regard? Is there any chance you can send me a test case (test code and associated test data) that I can use for this? I could just build one up given what you've provided in a previous email, but if you already have one ready to go that would simplify things for me. Here is a use case for resolving foreign keys for a user table, assuming that the natural key is the username. 1.) The user table is loaded. Each record has both a surrogate key (sequenced primary key) and a natural key. 2.) A table is loaded that needs to have a foreign key reference to the user table. The natural key would loaded into a field called :user_id. In the fact ETL control file I would include the following: transform :user_id, :fk_lookup, :resolver => SQLResolver.new(:users, :username, :operational_database) This will use the SQLResolver to lookup the foreign key in the users table using the username as the natural key in the operational database defined in database.yml. After the lookup the user_id would contain the surrogate key rather than the natural key. HTH. V/r Anthony Eden -- Cell: 808 782-5046 Current Location: Seattle, WA From rick at rickbradley.com Thu Jul 26 17:05:30 2007 From: rick at rickbradley.com (Rick Bradley) Date: Thu, 26 Jul 2007 16:05:30 -0500 Subject: [Activewarehouse-discuss] bugs in foreign key transform, + questions (was: "ETL evaluation article") In-Reply-To: References: <4acc49400707251339q42141e69gd4cff31b17270731@mail.gmail.com> Message-ID: <4acc49400707261405n7fe1fc1av3105c7bd019e5745@mail.gmail.com> > I've updated it so it will accept a symbol referencing a connection > defined in the database.yml file. Anthony, This doesn't appear to work. I've attached a patch which makes the connection specification bit work. Note that there's another problem (one that is not biting me, so no big concern of mine at the moment), which is that ActiveRecord::Base.connection (the fall-through option here) seems to always be nil. I see that there's a point in engine.rb where all the connections are closed at some point, but not sure if this is related. Thanks, Rick -------------- next part -------------- A non-text attachment was scrubbed... Name: etl_connection_problem.patch Type: application/octet-stream Size: 929 bytes Desc: not available Url : http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070726/999f695d/attachment.obj From anthonyeden at gmail.com Fri Jul 27 11:20:35 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Fri, 27 Jul 2007 08:20:35 -0700 Subject: [Activewarehouse-discuss] bugs in foreign key transform, + questions (was: "ETL evaluation article") In-Reply-To: <4acc49400707261405n7fe1fc1av3105c7bd019e5745@mail.gmail.com> References: <4acc49400707251339q42141e69gd4cff31b17270731@mail.gmail.com> <4acc49400707261405n7fe1fc1av3105c7bd019e5745@mail.gmail.com> Message-ID: Rick, Thanks again for the patch, I'll apply it when I'm done vacationing. For 0.9 I've tried to remove the use of AR::Base.connection, favoring instead a hash of connections defined in ETL::Engine, that's why AR::Base may not be created. I do see though that there I need to make sure that AR::Base.connections is still functioning as expected, so I'll work on it a bit when I get back. V/r Anthony On 7/26/07, Rick Bradley wrote: > > I've updated it so it will accept a symbol referencing a connection > > defined in the database.yml file. > > Anthony, > This doesn't appear to work. I've attached a patch which makes the > connection specification bit work. Note that there's another problem > (one that is not biting me, so no big concern of mine at the moment), > which is that ActiveRecord::Base.connection (the fall-through option > here) seems to always be nil. I see that there's a point in engine.rb > where all the connections are closed at some point, but not sure if > this is related. > > Thanks, > Rick > > -- Cell: 808 782-5046 Current Location: Seattle, WA From jeroen at kanji.nl Sun Jul 29 06:43:32 2007 From: jeroen at kanji.nl (Jeroen Roodnat) Date: Sun, 29 Jul 2007 12:43:32 +0200 Subject: [Activewarehouse-discuss] Using Rails Model as source for AW-ETL Message-ID: <440E5EB9-EC33-45EF-9D6B-9751CA71C8EF@kanji.nl> I wanted to use a Rails Model as the input source for AW-ETL because I have some Model methods defined which I need in my datawarehouse. I hacked together a small extension that will allow you to do this. -------------- next part -------------- A non-text attachment was scrubbed... Name: model_source.rb Type: text/x-ruby-script Size: 1018 bytes Desc: not available Url : http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070729/aa5933bf/attachment.bin -------------- next part -------------- How to user: 1. Place model_source.rb in the lib/etl/control/source directory of the AW-etl gem installation 2. edit lib/etl/control/source.rb and replace lines: def source_types [:file, :database] end with: def source_types [:file, :database, :model] end 3. In your .ctl script you can now config the model as source, here is mine as an example: source :in, {:model => 'Mutation',:order=>"mutations.id"}, [:date,:purchase_value,:sales_value,:sell,:product_id,:location_id] You can specifiy both db fields as model methods as the source input. The extension assumes your ctl files are in RAILS_ROOT/etl directory 4. Run etl as normal. Put RAILS_ENV=production before etl if you want to run in Rails production mode. I am in now way a ruby guru, so I am sure it can be improved :-) regards Jeroen Roodnat From jeroen at kanji.nl Sun Jul 29 06:56:28 2007 From: jeroen at kanji.nl (Jeroen Roodnat) Date: Sun, 29 Jul 2007 12:56:28 +0200 Subject: [Activewarehouse-discuss] Using Rails Model as source for AW-ETL In-Reply-To: <440E5EB9-EC33-45EF-9D6B-9751CA71C8EF@kanji.nl> References: <440E5EB9-EC33-45EF-9D6B-9751CA71C8EF@kanji.nl> Message-ID: <4E4C6747-4438-450F-8A6A-4C6B2C45001A@kanji.nl> Small error line 2 of instructions should read : 2. edit lib/etl/control/control.rb and replace lines: def source_types [:file, :database] end with: def source_types [:file, :database, :model] end On 29-jul-2007, at 12:43, Jeroen Roodnat wrote: > I wanted to use a Rails Model as the input source for AW-ETL > because I have some Model methods defined which I need in my > datawarehouse. > > I hacked together a small extension that will allow you to do > this. > How to user: > > 1. Place model_source.rb in the lib/etl/control/source directory of > the AW-etl gem installation > 2. edit lib/etl/control/source.rb and replace lines: > def source_types > [:file, :database] > end > > with: > def source_types > [:file, :database, :model] > end > > 3. In your .ctl script you can now config the model as source, here > is mine as an example: > source :in, {:model => 'Mutation',:order=>"mutations.id"}, > [:date,:purchase_value,:sales_value,:sell,:product_id,:location_id] > > You can specifiy both db fields as model methods as the source input. > > The extension assumes your ctl files are in RAILS_ROOT/etl directory > > 4. Run etl as normal. Put RAILS_ENV=production before etl if you > want to run in Rails production mode. > > I am in now way a ruby guru, so I am sure it can be improved :-) > > regards > > Jeroen Roodnat_______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070729/6bc1ee75/attachment-0001.html