From endersonmaia at gmail.com Mon Sep 3 14:02:15 2007 From: endersonmaia at gmail.com (Enderson Maia) Date: Mon, 3 Sep 2007 15:02:15 -0300 Subject: [Activewarehouse-discuss] Filter report result Message-ID: <228840e70709031102k6fd5aa38u3094e226ac08df9e@mail.gmail.com> I tryed to use query in the ActiveWarehouse::Cube instance in the report, but I coldn't find a way to show just facts where I have the aggregate value <> 0 So products that has no sale in the Cube view, don't appear in the list. Is there a way to do this ? -- Enderson Maia -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070903/2e7c353d/attachment.html From chris.d.williams at gmail.com Mon Sep 3 18:05:20 2007 From: chris.d.williams at gmail.com (Chris Williams) Date: Mon, 3 Sep 2007 18:05:20 -0400 Subject: [Activewarehouse-discuss] How do you create a SQLResolver with a multi-field lookup Message-ID: I am trying to create my fact table and need to do foreign_key_lookup. The table I need to do the lookup in has multiple fields to match to get the id. What is the correct syntax for creating the SQLResolver? Thanks! CW From thibaut.barrere at gmail.com Tue Sep 4 01:27:09 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Tue, 4 Sep 2007 07:27:09 +0200 Subject: [Activewarehouse-discuss] How do you create a SQLResolver with a multi-field lookup In-Reply-To: References: Message-ID: <4a68b8cf0709032227s66570964ie237f124f8824bfc@mail.gmail.com> Hi Chris I had a look at the source (etl/lib/foreign_key_lookup_transform.rb) and I don't think you can achieve out of the box today (anyone else ?). It could be achieve by doing something like: 1/ use a first step to build the 'composite' key as an array transform(:my_composite_key) { |n,v,r| [ r[:first_key], r[:second_key] ] } 2/ patch SQLResolver (see foreign_key_lookup_transform.rb) to handle multiple fields and values, and make it support an array as a value @connection.select_value("SELECT id FROM #{table_name} WHERE ....") Other alternative include writing your own transform (either as a real transform or as a block transform): transform(:my_composite_key) do |n,v,r| connection.select_value("SELECT id FROM MY_TABLE WHERE first_key = #{@connection.quote(r[:first_key])} and second_key = #{@connection.quote(r[: second_key])") end In all case, check out the source code which is rather easy to understand hope this helps! Thibaut From ottercat at gmail.com Tue Sep 4 09:51:45 2007 From: ottercat at gmail.com (Matt Williams) Date: Tue, 4 Sep 2007 09:51:45 -0400 Subject: [Activewarehouse-discuss] Question about correllation Message-ID: <5e79bbab0709040651u243c80ecw8f090762b263a6e1@mail.gmail.com> Greetings! I had the pleasure of attending Anthony's session at Erubycon and was introduced to the package (and data warehousing) there. So, I'm very much a novice. I've purchased the Kimball books (both the Data Warehouse Toolkit and the ETL book) and am starting to work my way through them. I'm participating in the Rails Rumble this weekend and am planning to enter a data warehousing app which gathers information from sources such as vmstat, ps, syslog, apache logs, and rails logs and then does correlation and the like from them. Which brings me to my question... Is it better to place data for which I want to do correlation (and using activewarehouse) within the same data warehouse, or is it better to silo the data? By correlation, I mean in terms of being able to say "plot for me the CPU utilization from time X until time Y and show me the processes running at that time" or show me the memory characteristics and any syslog messages from the mail daemon. Obviously there is overlap in terms of the host/server from which the different pieces of data are being gathered, and the time dimension and, depending on the correlation, on other dimensions, but beyond that I'm looking for wisdom as to whether it is a better practice to keep discreet facts and silos separate or because I'm wanting to work closely with them to keep them closely coupled. I expect it's probably going to be in the next bit I read in the Kimball & Ross book, but I'm curious what others think. Thanks, Matt -- I can say to myself and the world, "Look at all I am doing, am I not being busy? Am I not contributing? Am I not having an impact on all those around me and with whom I come into contact? See, my life has meaning." To which the Tao responds, "You are doing, yes, but you are not being. Slow down, go with the flow, work with life, not against it. By being, you do. By doing, you cease to be." -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070904/70c4a12c/attachment-0001.html From anthonyeden at gmail.com Tue Sep 4 10:22:57 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Tue, 4 Sep 2007 10:22:57 -0400 Subject: [Activewarehouse-discuss] Question about correllation In-Reply-To: <5e79bbab0709040651u243c80ecw8f090762b263a6e1@mail.gmail.com> References: <5e79bbab0709040651u243c80ecw8f090762b263a6e1@mail.gmail.com> Message-ID: On 9/4/07, Matt Williams wrote: > Is it better to place data for which I want to do correlation (and using > activewarehouse) within the same data warehouse, or is it better to silo the > data? > > By correlation, I mean in terms of being able to say "plot for me the CPU > utilization from time X until time Y and show me the processes running at > that time" or show me the memory characteristics and any syslog messages > from the mail daemon. > > Obviously there is overlap in terms of the host/server from which the > different pieces of data are being gathered, and the time dimension and, > depending on the correlation, on other dimensions, but beyond that I'm > looking for wisdom as to whether it is a better practice to keep discreet > facts and silos separate or because I'm wanting to work closely with them to > keep them closely coupled. I expect it's probably going to be in the next > bit I read in the Kimball & Ross book, but I'm curious what others think. I think it's better to have them in the same warehouse, but perhaps I'm unaware of specific requirements for you. Is there is a reason you think silos might be better for you? If for some reason you do decide to go with silos please make sure to read up on conforming your models. This is covered in Chapter 3 of TDWTK in the section called Data Warehouse Bus Architecture. By having conformed models you can drill across facts when they share the same conformed dimensions, which is valuable IMO. V/r Anthony -- Cell: 808 782-5046 Current Location: Norfolk, VA From chris.d.williams at gmail.com Tue Sep 4 17:30:24 2007 From: chris.d.williams at gmail.com (Chris Williams) Date: Tue, 4 Sep 2007 17:30:24 -0400 Subject: [Activewarehouse-discuss] How do you create a SQLResolver with a multi-field lookup In-Reply-To: <4a68b8cf0709032227s66570964ie237f124f8824bfc@mail.gmail.com> References: <4a68b8cf0709032227s66570964ie237f124f8824bfc@mail.gmail.com> Message-ID: After I sent the email, I found the source for the SQLResolver. I think the easiest way to do it right now is to write a custom transform. I will post it to the group when I am done for feedback and maybe inclusion to AW-ETL. I already wrote a custom row processor so writing the transform shouldn't be to bad. Thanks for the suggestion! CW On 9/4/07, Thibaut Barr?re wrote: > Hi Chris > > I had a look at the source (etl/lib/foreign_key_lookup_transform.rb) > and I don't think you can achieve out of the box today (anyone else > ?). > > It could be achieve by doing something like: > 1/ use a first step to build the 'composite' key as an array > transform(:my_composite_key) { |n,v,r| [ r[:first_key], r[:second_key] ] } > 2/ patch SQLResolver (see foreign_key_lookup_transform.rb) to handle > multiple fields and values, and make it support an array as a value > @connection.select_value("SELECT id FROM #{table_name} WHERE ....") > > Other alternative include writing your own transform (either as a real > transform or as a block transform): > > transform(:my_composite_key) do |n,v,r| > connection.select_value("SELECT id FROM MY_TABLE WHERE first_key = > #{@connection.quote(r[:first_key])} and second_key = > #{@connection.quote(r[: second_key])") > end > > In all case, check out the source code which is rather easy to understand > > hope this helps! > > Thibaut > From thibaut.barrere at gmail.com Wed Sep 5 02:44:45 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Wed, 5 Sep 2007 08:44:45 +0200 Subject: [Activewarehouse-discuss] Reporting tools on Windows (outside a web application ?) Message-ID: <4a68b8cf0709042344v57c2ec51u79b89cf7a7b16eb1@mail.gmail.com> Hi, Today I'm using Excel with Microsoft Query to establish joins between tables and then go back to Excel to create pivot tables etc. It works, but I'd like to find something a bit more polished for end-users. do you have suggestions about (both open-source and commercial) tools to let non-technical end-users query a dimension model with grace ? I'm using MySQL as the back-end. any hint welcome! cheers Thibaut From activewarehouse at munkyboy.com Wed Sep 5 12:19:53 2007 From: activewarehouse at munkyboy.com (Michael Luu) Date: Wed, 5 Sep 2007 09:19:53 -0700 Subject: [Activewarehouse-discuss] Reporting tools on Windows (outside a web application ?) In-Reply-To: <4a68b8cf0709042344v57c2ec51u79b89cf7a7b16eb1@mail.gmail.com> References: <4a68b8cf0709042344v57c2ec51u79b89cf7a7b16eb1@mail.gmail.com> Message-ID: A nice looking commercial tool is Tableau http://www.tableausoftware.com/ I've only played around with the demo but it looks nice A big open source project is Pentaho http://www.pentaho.com/ I'm not sure how easily you can detach their Visualization stuff from the rest of their package... Personally, I'm building all the visualization for my project using Flex and Flex charting. It's probably more work than you were looking for but it is another option. Cheers, Mike On Sep 4, 2007, at 11:44 PM, Thibaut Barr?re wrote: > Hi, > > Today I'm using Excel with Microsoft Query to establish joins between > tables and then go back to Excel to create pivot tables etc. It works, > but I'd like to find something a bit more polished for end-users. > > do you have suggestions about (both open-source and commercial) tools > to let non-technical end-users query a dimension model with grace ? > I'm using MySQL as the back-end. > > any hint welcome! > > cheers > > Thibaut > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss From thibaut.barrere at gmail.com Wed Sep 5 15:03:50 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Wed, 5 Sep 2007 21:03:50 +0200 Subject: [Activewarehouse-discuss] Rows with errors are sent to the destination without all the transforms applied Message-ID: <4a68b8cf0709051203g35ae5c91n12035174e4b4b5fa@mail.gmail.com> Hi when a foreign key lookup fails (and possibly on other cases ?), an error is logged to etl.log, and the error count is incremented. But I also noticed that the row seems to be sent to the destination without the rest of the transforms applied, which can be very confusing at times. Did anyone else meet this behaviour ? cheers Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070905/ad304337/attachment.html From thibaut.barrere at gmail.com Wed Sep 5 15:13:04 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Wed, 5 Sep 2007 21:13:04 +0200 Subject: [Activewarehouse-discuss] Reporting tools on Windows (outside a web application ?) In-Reply-To: References: <4a68b8cf0709042344v57c2ec51u79b89cf7a7b16eb1@mail.gmail.com> Message-ID: <4a68b8cf0709051213t78cf2d7epe958952037b4f9b9@mail.gmail.com> > A nice looking commercial tool is Tableau > http://www.tableausoftware.com/ Thanks - I've requested a trial. It's around 1300$ per user but seems rather convenient and efficient. A big open source project is Pentaho > http://www.pentaho.com/ > I'm not sure how easily you can detach their Visualization stuff from > the rest of their package... I've seen this one - did anyone manage to use easily on the list ? Personally, I'm building all the visualization for my project using > Flex and Flex charting. It's probably more work than you were looking > for but it is another option. Thats a good suggestion as well. For those interested in flash rendering, I've seen http://www.fusioncharts.com/ which is very nice (but only works in flash). For easy bitmap rendering (with ajax support), there is also http://big.faceless.org/products/download.jsp Both takes a simple XML format as the input, which can be generated with xml builder in ruby. if anyone has more suggestions (for both graphic and non graphic stuff), I'm all hears. Maybe worth gathering on a wiki ? -- Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070905/8be86733/attachment.html From anthonyeden at gmail.com Wed Sep 5 15:24:18 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Wed, 5 Sep 2007 15:24:18 -0400 Subject: [Activewarehouse-discuss] Reporting tools on Windows (outside a web application ?) In-Reply-To: <4a68b8cf0709051213t78cf2d7epe958952037b4f9b9@mail.gmail.com> References: <4a68b8cf0709042344v57c2ec51u79b89cf7a7b16eb1@mail.gmail.com> <4a68b8cf0709051213t78cf2d7epe958952037b4f9b9@mail.gmail.com> Message-ID: We're using XML/SWF Charts (http://www.maani.us/xml_charts/) for our web charting needs, through the Ziya plugin (http://ziya.liquidrail.com/). V/r Anthony On 9/5/07, Thibaut Barr?re wrote: > > > > A nice looking commercial tool is Tableau > > http://www.tableausoftware.com/ > > Thanks - I've requested a trial. It's around 1300$ per user but seems rather > convenient and efficient. > > > A big open source project is Pentaho > > http://www.pentaho.com/ > > I'm not sure how easily you can detach their Visualization stuff from > > the rest of their package... > > > I've seen this one - did anyone manage to use easily on the list ? > > > Personally, I'm building all the visualization for my project using > > Flex and Flex charting. It's probably more work than you were looking > > for but it is another option. > > Thats a good suggestion as well. > > For those interested in flash rendering, I've seen > http://www.fusioncharts.com/ which is very nice (but only works in flash). > For easy bitmap rendering (with ajax support), there is also > http://big.faceless.org/products/download.jsp > > Both takes a simple XML format as the input, which can be generated with xml > builder in ruby. > > if anyone has more suggestions (for both graphic and non graphic stuff), I'm > all hears. Maybe worth gathering on a wiki ? > > -- Thibaut > > > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > > -- Cell: 808 782-5046 Cell: 321 505-0025 Current Location: Melbourne, FL From ottercat at gmail.com Wed Sep 5 15:30:53 2007 From: ottercat at gmail.com (Matt Williams) Date: Wed, 5 Sep 2007 15:30:53 -0400 Subject: [Activewarehouse-discuss] Fwd: Reporting tools on Windows (outside a web application ?) In-Reply-To: <5e79bbab0709051230r7198e185rcd179b7be9aa19fe@mail.gmail.com> References: <4a68b8cf0709042344v57c2ec51u79b89cf7a7b16eb1@mail.gmail.com> <4a68b8cf0709051213t78cf2d7epe958952037b4f9b9@mail.gmail.com> <5e79bbab0709051230r7198e185rcd179b7be9aa19fe@mail.gmail.com> Message-ID: <5e79bbab0709051230q55ef0077tad0144773a04338a@mail.gmail.com> (I didn't forward this to the list) ---------- Forwarded message ---------- From: Matt Williams Date: Sep 5, 2007 3:30 PM Subject: Re: [Activewarehouse-discuss] Reporting tools on Windows (outside a web application ?) To: Thibaut Barr?re I'm planning to use tools from the Simile project (http://simile.mit.edu) this weekend during the rails rumble with activewarehouse -- it's not going to provide all the sorts of graphs I'll ultimately desire, but for a 48hr sprint, it's got enough to get started..... On 9/5/07, Thibaut Barr?re wrote: > > > A nice looking commercial tool is Tableau > > http://www.tableausoftware.com/ > > > Thanks - I've requested a trial. It's around 1300$ per user but seems > rather convenient and efficient. > > A big open source project is Pentaho > > http://www.pentaho.com/ > > I'm not sure how easily you can detach their Visualization stuff from > > the rest of their package... > > > I've seen this one - did anyone manage to use easily on the list ? > > Personally, I'm building all the visualization for my project using > > Flex and Flex charting. It's probably more work than you were looking > > for but it is another option. > > > Thats a good suggestion as well. > > For those interested in flash rendering, I've seen > http://www.fusioncharts.com/ which is very nice (but only works in flash). > > For easy bitmap rendering (with ajax support), there is also > http://big.faceless.org/products/download.jsp > > Both takes a simple XML format as the input, which can be generated with > xml builder in ruby. > > if anyone has more suggestions (for both graphic and non graphic stuff), > I'm all hears. Maybe worth gathering on a wiki ? > > -- Thibaut > > > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > > -- I can say to myself and the world, "Look at all I am doing, am I not being busy? Am I not contributing? Am I not having an impact on all those around me and with whom I come into contact? See, my life has meaning." To which the Tao responds, "You are doing, yes, but you are not being. Slow down, go with the flow, work with life, not against it. By being, you do. By doing, you cease to be." -- I can say to myself and the world, "Look at all I am doing, am I not being busy? Am I not contributing? Am I not having an impact on all those around me and with whom I come into contact? See, my life has meaning." To which the Tao responds, "You are doing, yes, but you are not being. Slow down, go with the flow, work with life, not against it. By being, you do. By doing, you cease to be." -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070905/10fffa9e/attachment-0001.html From thibaut.barrere at gmail.com Thu Sep 6 05:03:05 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Thu, 6 Sep 2007 11:03:05 +0200 Subject: [Activewarehouse-discuss] Reporting tools on Windows (outside a web application ?) In-Reply-To: References: <4a68b8cf0709042344v57c2ec51u79b89cf7a7b16eb1@mail.gmail.com> <4a68b8cf0709051213t78cf2d7epe958952037b4f9b9@mail.gmail.com> Message-ID: <4a68b8cf0709060203l73f7d3e2r72d142cabc830793@mail.gmail.com> Thanks everyone. Very good suggestions. I knew Maani but didn't know about Ziya. Smile has interesting stuff (XML analysis as well!) best Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070906/eeb76b51/attachment.html From endersonmaia at gmail.com Thu Sep 6 16:49:03 2007 From: endersonmaia at gmail.com (Enderson Maia) Date: Thu, 6 Sep 2007 17:49:03 -0300 Subject: [Activewarehouse-discuss] etl source error Message-ID: <228840e70709061349k505a9c7byb990aacf5d2b8635@mail.gmail.com> I'm having this problem, but I think everything is fine. $ etl produto_dimension.ctl Using AdapterExtensions Starting ETL process initializing ETL engine Processing produto_dimension.ctl /usr/lib/ruby/gems/1.8/gems/activewarehouse-etl-0.9.0/bin/../lib/etl/control/control.rb:69:in `source': A source was specified but no matching type was found (ETL::ControlError) from /usr/lib/ruby/gems/1.8/gems/activewarehouse-etl-0.9.0/bin/../lib/etl/control/control.rb:63:in `each' from /usr/lib/ruby/gems/1.8/gems/activewarehouse-etl-0.9.0/bin/../lib/etl/control/control.rb:63:in `source' from produto_dimension.ctl:5:in `get_binding' from /usr/lib/ruby/gems/1.8/gems/activewarehouse-etl-0.9.0/bin/../lib/etl/control/control.rb:232:in `get_binding' from /usr/lib/ruby/gems/1.8/gems/activewarehouse-etl-0.9.0/bin/../lib/etl/control/control.rb:12:in `create' from /usr/lib/ruby/gems/1.8/gems/activewarehouse-etl-0.9.0/bin/../lib/etl/control/control.rb:262:in `parse' from /usr/lib/ruby/gems/1.8/gems/activewarehouse-etl-0.9.0/bin/../lib/etl/control/control.rb:286:in `resolve' from /usr/lib/ruby/gems/1.8/gems/activewarehouse-etl-0.9.0/bin/../lib/etl/engine.rb:292:in `process_control' ... 11 levels... from /usr/lib/ruby/gems/1.8/gems/activesupport-1.4.2/lib/active_support/dependencies.rb:495:in `require' from /usr/lib/ruby/gems/1.8/gems/activewarehouse-etl-0.9.0 /bin/etl:28 from /usr/bin/etl:16:in `load' from /usr/bin/etl:16 I'm trying to connect to a AR source. Here is my ctl file. vmd_db = ActiveRecord::Base.configurations['vmd'].symbolize_keys source :in, vmd_db.merge({ :database => "VMD", :table => "PRODU", :join => "GRPRC ON (PRODU.Cod_GrpPrc = GRPRC.Cod_GrpPrc)", :select => "PRODU.Cod_Produt, PRODU.Des_Produt, GRPRC.Des_GrpPrc", :order => "PRODU.Des_Produt"}), [ :Cod_Produt, :Des_Produt, :Des_GrpPrc ] columns = [:id, :nome, :grupo_nome] destination :out, { :file => 'output/produto_dimension.txt' }, { :order => :columns } post_process :bulk_import,{ :file => outfile, :truncate => true, :columns => columns, :target => :warehouse, :table => "produto_dimension"} Don't know what to do! -- Enderson Maia -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070906/4b019edc/attachment.html From thibaut.barrere at gmail.com Fri Sep 7 10:01:58 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Fri, 7 Sep 2007 16:01:58 +0200 Subject: [Activewarehouse-discuss] quick tip - how to track the source of the data in the destination database Message-ID: <4a68b8cf0709070701v686eff1nbda8a4c4a257b128@mail.gmail.com> Hi, a quick tip I've used today which seems to work to keep a trace of the source of the data after aggregation of multiple sources: source :some_file source :some_other_file .. transform(:data_origin) { |n,v,r| Engine.current_source.to_s } Is there a more recommended way of achieving this ? cheers Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070907/f7c14779/attachment.html From anthonyeden at gmail.com Fri Sep 7 10:40:23 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Fri, 7 Sep 2007 10:40:23 -0400 Subject: [Activewarehouse-discuss] quick tip - how to track the source of the data in the destination database In-Reply-To: <4a68b8cf0709070701v686eff1nbda8a4c4a257b128@mail.gmail.com> References: <4a68b8cf0709070701v686eff1nbda8a4c4a257b128@mail.gmail.com> Message-ID: Nope, this is a good strategy if you want to keep the data origin in each table. Another option is to create an audit dimension, that way you can add additional attributes without cluttering up your other dimensions and facts. V/r Anthony On 9/7/07, Thibaut Barr?re wrote: > Hi, > > a quick tip I've used today which seems to work to keep a trace of the > source of the data after aggregation of multiple sources: > > source :some_file > source :some_other_file > .. > transform(:data_origin) { |n,v,r| Engine.current_source.to_s } > > Is there a more recommended way of achieving this ? > > cheers > > Thibaut > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > > -- Cell: 808 782-5046 Cell: 321 505-0025 Current Location: Melbourne, FL From chris.d.williams at gmail.com Fri Sep 7 16:04:40 2007 From: chris.d.williams at gmail.com (Chris Williams) Date: Fri, 7 Sep 2007 16:04:40 -0400 Subject: [Activewarehouse-discuss] How to use trunk versions of ETL and Adapter Extensions (AE) Message-ID: I have changes I want to make (or have made in the past) with ETL and AE. Since these are gems, how does one get the current trunk out of SVN, make changes, and use it locally? I am new to ruby/rails and I'm not quite sure how to go about this. To make things even more complex is that I am running on Windows which I know most Ruby/Rails folks don't use. In the past I have made the changes in a seperate directory and just copied over the new files to the gem directory. That just seems right but maybe it is. Any suggestions would be great Thanks CW From thibaut.barrere at gmail.com Sat Sep 8 04:27:50 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Sat, 8 Sep 2007 10:27:50 +0200 Subject: [Activewarehouse-discuss] quick tip - how to track the source of the data in the destination database In-Reply-To: References: <4a68b8cf0709070701v686eff1nbda8a4c4a257b128@mail.gmail.com> Message-ID: <4a68b8cf0709080127n3fefa084n3d68b1466331066b@mail.gmail.com> > Nope, this is a good strategy if you want to keep the data origin in > each table. I've been looking for something like: source :my_source, ... Engine.current_source.name # => :my_source (instead of ./output/file- 0001.csv) Having an abstract name instead of the full file path would make it easier to achieve transforms (conforming, or whatever): transform(:customer_code) { |n,v,r| source == :first_source ? "LEGACY-PRODUCT-#{v}" : v } I had a first look at the code but it seems that the notion of the source name is lost after instanciation, so it may require more work than it seems. Another option is to create an audit dimension, that way > you can add additional attributes without cluttering up your other > dimensions and facts. Do you achieve this in a single pass (adding audit records before the matching table is loaded ?) or do you take two passes (one for data load, then another for audit) ? -- Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070908/38771816/attachment.html From thibaut.barrere at gmail.com Sat Sep 8 05:59:14 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Sat, 8 Sep 2007 11:59:14 +0200 Subject: [Activewarehouse-discuss] How to use trunk versions of ETL and Adapter Extensions (AE) In-Reply-To: References: Message-ID: <4a68b8cf0709080259m75c530a5v7cee7038aa0ea871@mail.gmail.com> Hi Chris, I have changes I want to make (or have made in the past) with ETL and > AE. Since these are gems, how does one get the current trunk out of > SVN, make changes, and use it locally? I am new to ruby/rails and I'm > not quite sure how to go about this. You have several ways, I'll outline the one I prefer. I'm using Piston [1], which will let you physically copy the content of AW-ETL trunk into /vendor/activewarehouse-etl. You can then patch the content (add new transforms, tweak some behaviours - AW-ETL code is very hackable without requiring monkey patching) which will be commited to your own repository, but at the same time Piston will let you retrieve the updates from the AW-ETL repository when you want it. This also makes it a lot easier to submit patches to AW. Here is my current layout under /vendor so you get a rough idea: activewarehouse-etl adapter_extensions fastercsv-1.2.0 ruport-1.0.2 plugins plugins/activewarehouse plugins/rspec plugins/rspec_on_rails rails => freeze rails 1.2.3 I only use Piston for libraries I wish to hack (aw-etl,aw,rspec...) while following their evolution closely (and at my pace). For the other ones (like fastercsv,ruport here), I use "gem unpack gemname" [2] which will unpack the content of the gem. This way all my dependencies are under SVN, which makes it very easy to deploy on a new machine (and ensure I really know what I depend upon - using gem uninstall xxx after gathering everything under SVN). Once it's done, you'll have to run /activewarehouse-etl/bin/etl instead of the regular etl (it's worth changing your PATH variable to let a etl.cmd be picked first). I've attached a patch you'll need to apply to get things working (LOAD_PATH / require tweaks). You'll also want to tell Rails to load what you need: config.load_paths += %W(adapter_extensions Text-1.1.2 ruport-1.0.2).map { |gem| "#{RAILS_ROOT}/vendor/#{gem}/lib" } > To make things even more complex is that I am running on Windows which I > know most Ruby/Rails folks don't use. I'm developing on Mac OS X, but deploying to Windows (for AW-ETL processing). I'd advise to avoid the Time class, because ruby Time is not cross platform (ie: dates before 1970 won't be supported by ruby under w32, whereas it works like a charms on Mac OSX). cheers! Thibaut -- http://www.dotnetguru2.org/tbarrere [1] - http://piston.rubyforge.org [2] - http://blog.nanorails.com/articles/2006/03/28/freeze-all-your-ruby-gems-on-a-shared-host -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070908/59d4f3b2/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: bin_etl.patch Type: application/octet-stream Size: 615 bytes Desc: not available Url : http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070908/59d4f3b2/attachment-0001.obj From chris.d.williams at gmail.com Sat Sep 8 12:08:15 2007 From: chris.d.williams at gmail.com (Chris Williams) Date: Sat, 8 Sep 2007 12:08:15 -0400 Subject: [Activewarehouse-discuss] How to use trunk versions of ETL and Adapter Extensions (AE) In-Reply-To: <4a68b8cf0709080259m75c530a5v7cee7038aa0ea871@mail.gmail.com> References: <4a68b8cf0709080259m75c530a5v7cee7038aa0ea871@mail.gmail.com> Message-ID: What database are you using on Windows? Have you run in to the end-of-line issue yet? I have a patch, which I posted to the mailing list, to correct the EOL characters with MySQL. The last field in my csv files before the bulk load had an extra control character on them since MySQL was just look for \n, not \r\n. If you not doing bulk loads from a csv I guess you wouldn't see it. I will try piston and see what if I can get it up and running. CW On 9/8/07, Thibaut Barr?re wrote: > Hi Chris, > > > > I have changes I want to make (or have made in the past) with ETL and > > AE. Since these are gems, how does one get the current trunk out of > > SVN, make changes, and use it locally? I am new to ruby/rails and I'm > > not quite sure how to go about this. > > You have several ways, I'll outline the one I prefer. I'm using Piston [1], > which will let you physically copy the content of AW-ETL trunk into > /vendor/activewarehouse-etl. You can then patch the content (add new > transforms, tweak some behaviours - AW-ETL code is very hackable without > requiring monkey patching) which will be commited to your own repository, > but at the same time Piston will let you retrieve the updates from the > AW-ETL repository when you want it. This also makes it a lot easier to > submit patches to AW. > > Here is my current layout under /vendor so you get a rough idea: > > activewarehouse-etl > adapter_extensions > fastercsv-1.2.0 > ruport-1.0.2 > plugins > plugins/activewarehouse > plugins/rspec > plugins/rspec_on_rails > rails => freeze rails 1.2.3 > > I only use Piston for libraries I wish to hack (aw-etl,aw,rspec...) while > following their evolution closely (and at my pace). For the other ones (like > fastercsv,ruport here), I use "gem unpack gemname" [2] which will unpack the > content of the gem. This way all my dependencies are under SVN, which makes > it very easy to deploy on a new machine (and ensure I really know what I > depend upon - using gem uninstall xxx after gathering everything under SVN). > > Once it's done, you'll have to run /activewarehouse-etl/bin/etl instead of > the regular etl (it's worth changing your PATH variable to let a etl.cmd be > picked first). I've attached a patch you'll need to apply to get things > working (LOAD_PATH / require tweaks). > > You'll also want to tell Rails to load what you need: > > config.load_paths += %W(adapter_extensions Text-1.1.2 ruport-1.0.2).map { > |gem| "#{RAILS_ROOT}/vendor/#{gem}/lib" } > > > > To make things even more complex is that I am running on Windows which I > know most Ruby/Rails folks don't use. > > I'm developing on Mac OS X, but deploying to Windows (for AW-ETL > processing). I'd advise to avoid the Time class, because ruby Time is not > cross platform (ie: dates before 1970 won't be supported by ruby under w32, > whereas it works like a charms on Mac OSX). > > cheers! > > Thibaut > -- > http://www.dotnetguru2.org/tbarrere > > [1] - http://piston.rubyforge.org > [2] - > http://blog.nanorails.com/articles/2006/03/28/freeze-all-your-ruby-gems-on-a-shared-host > > > From ottercat at gmail.com Sat Sep 8 12:12:48 2007 From: ottercat at gmail.com (Matt Williams) Date: Sat, 8 Sep 2007 12:12:48 -0400 Subject: [Activewarehouse-discuss] how to append to a table with ETL Message-ID: <5e79bbab0709080912r4cae9d70l117879b92514b2d6@mail.gmail.com> I must be missing something -- I'm wanting to be able to add updates to dimension tables, as well as fact tables, without having to wipe the table. I see that there's a switch in the ETL doc to allow for truncating the table as well as to conditionally set whether or not to load a record, however, the "reference" app -- the load of the svn data does a truncate each time and I didn't see anything pointing me in a direction otherwise. It's probably lack of sleep on my part. Matt -- I can say to myself and the world, "Look at all I am doing, am I not being busy? Am I not contributing? Am I not having an impact on all those around me and with whom I come into contact? See, my life has meaning." To which the Tao responds, "You are doing, yes, but you are not being. Slow down, go with the flow, work with life, not against it. By being, you do. By doing, you cease to be." -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070908/d24841ba/attachment.html From thibaut.barrere at gmail.com Sat Sep 8 16:08:01 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Sat, 8 Sep 2007 22:08:01 +0200 Subject: [Activewarehouse-discuss] How to use trunk versions of ETL and Adapter Extensions (AE) In-Reply-To: References: <4a68b8cf0709080259m75c530a5v7cee7038aa0ea871@mail.gmail.com> Message-ID: <4a68b8cf0709081308u560adf22v6862e547dcb00642@mail.gmail.com> > What database are you using on Windows? Have you run in to the > end-of-line issue yet? I have a patch, which I posted to the mailing > list, to correct the EOL characters with MySQL. The last field in my > csv files before the bulk load had an extra control character on them > since MySQL was just look for \n, not \r\n. If you not doing bulk > loads from a csv I guess you wouldn't see it. I'm using MySQL and using the file destination, then the bulk processor, but I didn't meet any issue with line terminators. I've checked the generated files and their lines end by CR+LF. Neither of my fields has extra characters at the end. I think the issue may lie outside AW-ETL - we should compare our MySQL settings. I have (from MySQL Administrator GUI tool) - server 5.0.45-community-nt - client 5.1.11 - InnoDB tables, encoded in latin1 Do you have something different ? -- Thibaut ps: if you do bulk upload with MySQL, I had to increase Startup Variables / Advanced Networking / Max packet size in order not to get a lost connection message. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070908/058c194e/attachment.html From thibaut.barrere at gmail.com Sat Sep 8 16:19:04 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Sat, 8 Sep 2007 22:19:04 +0200 Subject: [Activewarehouse-discuss] how to append to a table with ETL In-Reply-To: <5e79bbab0709080912r4cae9d70l117879b92514b2d6@mail.gmail.com> References: <5e79bbab0709080912r4cae9d70l117879b92514b2d6@mail.gmail.com> Message-ID: <4a68b8cf0709081319t20d41b2aw2c9f8d0c73ff5635@mail.gmail.com> Hi Matt, I didn't test that myself because I do wipe everything each night, but here's one thing you can do if you wish to add (not update) new records and only new records: - remove the post_process :truncate - add a check exist processor which will check if the row already exists in the database and remove it from the output if it is present (you can specify multiple columns to compose the key identifying a record) cheers Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070908/160f0427/attachment.html From chris.d.williams at gmail.com Sat Sep 8 19:32:21 2007 From: chris.d.williams at gmail.com (Chris Williams) Date: Sat, 8 Sep 2007 19:32:21 -0400 Subject: [Activewarehouse-discuss] How to use trunk versions of ETL and Adapter Extensions (AE) In-Reply-To: <4a68b8cf0709081308u560adf22v6862e547dcb00642@mail.gmail.com> References: <4a68b8cf0709080259m75c530a5v7cee7038aa0ea871@mail.gmail.com> <4a68b8cf0709081308u560adf22v6862e547dcb00642@mail.gmail.com> Message-ID: I don't have the MySQL Admin GUI or at least I couldn't find it. I have MySQL-Front as a frontend to look at my database. Here are my MySQL settings that I could find... - server 5.0.41-community-nt - client - ?? Not sure..how do I check - Not sure what I selected for tables when I installed..how can I tell? I was using utf8 but switched to latin1 and that didn't help. If you look back a few messages in the mailing list, you will see my patch. The MySQL command can take in a argument to define the end of line chars. Thanks CW On 9/8/07, Thibaut Barr?re wrote: > > > > What database are you using on Windows? Have you run in to the > > end-of-line issue yet? I have a patch, which I posted to the mailing > > list, to correct the EOL characters with MySQL. The last field in my > > csv files before the bulk load had an extra control character on them > > since MySQL was just look for \n, not \r\n. If you not doing bulk > > loads from a csv I guess you wouldn't see it. > > I'm using MySQL and using the file destination, then the bulk processor, but > I didn't meet any issue with line terminators. I've checked the generated > files and their lines end by CR+LF. Neither of my fields has extra > characters at the end. > > I think the issue may lie outside AW-ETL - we should compare our MySQL > settings. > > I have (from MySQL Administrator GUI tool) > - server 5.0.45-community-nt > - client 5.1.11 > - InnoDB tables, encoded in latin1 > > Do you have something different ? > > -- Thibaut > > ps: if you do bulk upload with MySQL, I had to increase Startup Variables / > Advanced Networking / Max packet size in order not to get a lost connection > message. > From thibaut.barrere at gmail.com Sun Sep 9 06:13:03 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Sun, 9 Sep 2007 12:13:03 +0200 Subject: [Activewarehouse-discuss] How to use trunk versions of ETL and Adapter Extensions (AE) In-Reply-To: References: <4a68b8cf0709080259m75c530a5v7cee7038aa0ea871@mail.gmail.com> <4a68b8cf0709081308u560adf22v6862e547dcb00642@mail.gmail.com> Message-ID: <4a68b8cf0709090313g2f564a6dq6ecbab17ed54303e@mail.gmail.com> > I don't have the MySQL Admin GUI or at least I couldn't find it => http://dev.mysql.com/downloads/gui-tools/5.0.html (google mysql gui) > If you look back a few messages in the mailing list, you will see my > patch. The MySQL command can take in a argument to define the end of > line chars. Yep I saw it (before my previous reply) - but everything seems to work without the patch on my side. We must have missed something somewhere. I'll report back if I discover anything more. -- Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070909/d634f9ed/attachment.html From thibaut.barrere at gmail.com Sun Sep 9 12:20:00 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Sun, 9 Sep 2007 18:20:00 +0200 Subject: [Activewarehouse-discuss] Here comes a block processor Message-ID: <4a68b8cf0709090920w424ba4c6ud6995adc298c74d3@mail.gmail.com> Hi, I came up with this implementation, which seems to work well for me and the tests, so I submit it here for review. The block processor is lucky enough to be usable as both a pre/post processor and an after_read / before_write row processor, but you may want to change that behaviour. I've decided to make :block the default processor when no processor name is given, it seems like a nice way of doing convention over configuration. Typical use (with or without the explicit :block): pre_process { puts "Well, I'm here. And I'm being called. I don't need to return anything because I'm a global processor" after_read do |row| row[:timestamp] = Time.now row[:should_be_removed] ? : nil : row # return nil to remove the row end before_write(:block) do |row| [row,row,row] # want more rows here for the same price end post_process { puts "Yep - the demo is complete" } For added yummy and to make it easier to write tests, you'll find a MockSource and a MockDestination, available during the tests: # in the control file source :in, { :type => :mock, :name => :block_processed_input } # in the test MockSource[:block_processed_input] = [{:key => 'test'},{:key => 'another-test'}] cheers! Thibaut -- LoGeek http://www.dotnetguru2.org/tbarrere -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070909/42a5095d/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: block_processor.patch Type: application/octet-stream Size: 10691 bytes Desc: not available Url : http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070909/42a5095d/attachment-0001.obj From ottercat at gmail.com Mon Sep 10 10:20:25 2007 From: ottercat at gmail.com (Matt Williams) Date: Mon, 10 Sep 2007 10:20:25 -0400 Subject: [Activewarehouse-discuss] Found a couple of glitches w/ etl doc... Message-ID: <5e79bbab0709100720h1c3d37e6nb4b6c1f90d170f0b@mail.gmail.com> Discovered, I think, sunday morning at 0-dark-thirty. Section 5.2.1: The example uses :check_exist_processor --> there should be no _processor Also, as an aside, has anyone started an archive of ctl scripts? I'm not saying the ones I did this weekend are super-faboo, but I'd be willing to donate them. vmstat parsing is working (with cron scripts to collect), and ps data is almost there. Matt -- I can say to myself and the world, "Look at all I am doing, am I not being busy? Am I not contributing? Am I not having an impact on all those around me and with whom I come into contact? See, my life has meaning." To which the Tao responds, "You are doing, yes, but you are not being. Slow down, go with the flow, work with life, not against it. By being, you do. By doing, you cease to be." -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070910/fec062c6/attachment.html From chris.d.williams at gmail.com Mon Sep 10 13:14:48 2007 From: chris.d.williams at gmail.com (Chris Williams) Date: Mon, 10 Sep 2007 13:14:48 -0400 Subject: [Activewarehouse-discuss] Found a couple of glitches w/ etl doc... In-Reply-To: <5e79bbab0709100720h1c3d37e6nb4b6c1f90d170f0b@mail.gmail.com> References: <5e79bbab0709100720h1c3d37e6nb4b6c1f90d170f0b@mail.gmail.com> Message-ID: Anthony, is there any way to add/create a wiki somewhere for AW? That would probably be a good place for users to start placing contributions like that. CW On 9/10/07, Matt Williams wrote: > Discovered, I think, sunday morning at 0-dark-thirty. > > Section 5.2.1: > > The example uses :check_exist_processor --> there should be no _processor > > > Also, as an aside, has anyone started an archive of ctl scripts? I'm not > saying the ones I did this weekend are super-faboo, but I'd be willing to > donate them. vmstat parsing is working (with cron scripts to collect), and > ps data is almost there. > > Matt > > -- > I can say to myself and the world, "Look at all I am doing, am I not being > busy? Am I not contributing? Am I not having an impact on all those around > me and with whom I come into contact? See, my life has meaning." > To which the Tao responds, "You are doing, yes, but you are not being. Slow > down, go with the flow, work with life, not against it. By being, you do. By > doing, you cease to be." > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > > From thibaut.barrere at gmail.com Mon Sep 10 13:35:14 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Mon, 10 Sep 2007 19:35:14 +0200 Subject: [Activewarehouse-discuss] Found a couple of glitches w/ etl doc... In-Reply-To: References: <5e79bbab0709100720h1c3d37e6nb4b6c1f90d170f0b@mail.gmail.com> Message-ID: <4a68b8cf0709101035t76e21e09g3ac6d8ed39c8fec2@mail.gmail.com> Mephisto uses http://www.stikipad.com/ as a wiki, and it's quite nice (at least from a reader point of view). http://mephisto.stikipad.com/ -- Thibaut From anthonyeden at gmail.com Mon Sep 10 13:39:53 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Mon, 10 Sep 2007 13:39:53 -0400 Subject: [Activewarehouse-discuss] Found a couple of glitches w/ etl doc... In-Reply-To: <4a68b8cf0709101035t76e21e09g3ac6d8ed39c8fec2@mail.gmail.com> References: <5e79bbab0709100720h1c3d37e6nb4b6c1f90d170f0b@mail.gmail.com> <4a68b8cf0709101035t76e21e09g3ac6d8ed39c8fec2@mail.gmail.com> Message-ID: Speaking of which, I just upgraded by Stikipad account, so you should see http://wiki.activewarehouse.org/ pointing to a Stikipad site shortly. V/r Anthony On 9/10/07, Thibaut Barr?re wrote: > Mephisto uses http://www.stikipad.com/ as a wiki, and it's quite nice > (at least from a reader point of view). > > http://mephisto.stikipad.com/ > > -- Thibaut > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > -- Cell: 808 782-5046 Cell: 321 505-0025 Current Location: Melbourne, FL From thibaut.barrere at gmail.com Mon Sep 10 14:10:16 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Mon, 10 Sep 2007 20:10:16 +0200 Subject: [Activewarehouse-discuss] Found a couple of glitches w/ etl doc... In-Reply-To: References: <5e79bbab0709100720h1c3d37e6nb4b6c1f90d170f0b@mail.gmail.com> <4a68b8cf0709101035t76e21e09g3ac6d8ed39c8fec2@mail.gmail.com> Message-ID: <4a68b8cf0709101110j22d93f5eve5469ce9d1276662@mail.gmail.com> great! From thibaut.barrere at gmail.com Mon Sep 10 18:03:43 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Tue, 11 Sep 2007 00:03:43 +0200 Subject: [Activewarehouse-discuss] Modelisation - advice wanted Message-ID: <4a68b8cf0709101503xc3cd61coa03a7d60034e7c3f@mail.gmail.com> Hi, I'm a bit hesitant on a specific point, I'd be happy to get some outside look from the list. I have a Customer dimension, with a date of registration attribute (the date the customer entered in the source system). My users need to be able to query the warehouse to know how many customers were registered, sliced by date dimension. Today it works fine with everything in the dimension record (although it feels strange because the registration is actually a fact - but it works). I've used AW-ETL all the way long, and I'm now playing around with AW, which will only let me use a cube to report on a fact and pivot on dimensions (for good reasons - the same reasons my model feels strange on this point). Should I create a new fact table which would solely contain two fields (the customer id and the date of creation id) for that purpose ? (It could also be a view ?) what do you think ? thanks! Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070911/e3c0aff8/attachment.html From chris.d.williams at gmail.com Mon Sep 10 20:09:57 2007 From: chris.d.williams at gmail.com (Chris Williams) Date: Mon, 10 Sep 2007 20:09:57 -0400 Subject: [Activewarehouse-discuss] How do you create a SQLResolver with a multi-field lookup In-Reply-To: <4a68b8cf0709032227s66570964ie237f124f8824bfc@mail.gmail.com> References: <4a68b8cf0709032227s66570964ie237f124f8824bfc@mail.gmail.com> Message-ID: How does one create a connection object in the ETL code. Here is what I tired but get an error. I decided to just inline on the in the ETL script vs. trying to update the current foreign key lookup. Thanks!! transform(:contract_id) do |n,v,r| connection = ETL::Engine.connection(:warehouse) connection.select_value("SELECT id FROM contract_dimension WHERE charge_number = #{@connection.quote(r[:charge_number])} and secondary_number = #{@connection.quote(r[:secondary_number])}") end Here is the error... Error transforming from input/actuals_unix.csv on line 1: undefined local variable or method `connection' for # weekly_charge_facts.ctl:71:in `get_binding' ../../vendor/activewarehouse-etl/bin/../lib/etl/transform/block_transform.rb:9:in `call' ../../vendor/activewarehouse-etl/bin/../lib/etl/transform/block_transform.rb:9:in `transform' ../../vendor/activewarehouse-etl/bin/../lib/etl/engine.rb:357:in `process_control' ../../vendor/activewarehouse-etl/bin/../lib/etl/engine.rb:355:in `each' ../../vendor/activewarehouse-etl/bin/../lib/etl/engine.rb:355:in `process_control' ../../vendor/activewarehouse-etl/bin/../lib/etl/engine.rb:354:in `each' ../../vendor/activewarehouse-etl/bin/../lib/etl/engine.rb:354:in `process_control' c:/ruby/lib/ruby/1.8/benchmark.rb:293:in `measure' c:/ruby/lib/ruby/1.8/benchmark.rb:307:in `realtime' On 9/4/07, Thibaut Barr?re wrote: > Hi Chris > > I had a look at the source (etl/lib/foreign_key_lookup_transform.rb) > and I don't think you can achieve out of the box today (anyone else > ?). > > It could be achieve by doing something like: > 1/ use a first step to build the 'composite' key as an array > transform(:my_composite_key) { |n,v,r| [ r[:first_key], r[:second_key] ] } > 2/ patch SQLResolver (see foreign_key_lookup_transform.rb) to handle > multiple fields and values, and make it support an array as a value > @connection.select_value("SELECT id FROM #{table_name} WHERE ....") > > Other alternative include writing your own transform (either as a real > transform or as a block transform): > > transform(:my_composite_key) do |n,v,r| > connection.select_value("SELECT id FROM MY_TABLE WHERE first_key = > #{@connection.quote(r[:first_key])} and second_key = > #{@connection.quote(r[: second_key])") > end > > In all case, check out the source code which is rather easy to understand > > hope this helps! > > Thibaut > From anthonyeden at gmail.com Mon Sep 10 22:53:30 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Mon, 10 Sep 2007 22:53:30 -0400 Subject: [Activewarehouse-discuss] Modelisation - advice wanted In-Reply-To: <4a68b8cf0709101503xc3cd61coa03a7d60034e7c3f@mail.gmail.com> References: <4a68b8cf0709101503xc3cd61coa03a7d60034e7c3f@mail.gmail.com> Message-ID: I would consider registrations facts. You even said yourself that is feels strange because the registration is actually a fact. I can easily imagine additional dimensions for registration facts - who referred them for example. Additionally, if your customer dimension begins exhibiting the properties of a slowly changing dimension then you will have to work even harder to calculate those registrations without double counting. And just because you create a fact for querying doesn't mean you have to remove that attribute from the dimension. :-) V/r Anthony On 9/10/07, Thibaut Barr?re wrote: > Hi, > > I'm a bit hesitant on a specific point, I'd be happy to get some outside > look from the list. > > I have a Customer dimension, with a date of registration attribute (the date > the customer entered in the source system). > My users need to be able to query the warehouse to know how many customers > were registered, sliced by date dimension. > Today it works fine with everything in the dimension record (although it > feels strange because the registration is actually a fact - but it works). > > I've used AW-ETL all the way long, and I'm now playing around with AW, which > will only let me use a cube to report on a fact and pivot on dimensions (for > good reasons - the same reasons my model feels strange on this point). > > Should I create a new fact table which would solely contain two fields (the > customer id and the date of creation id) for that purpose ? (It could also > be a view ?) > > what do you think ? > > thanks! > > Thibaut > > > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > > -- Cell: 808 782-5046 Cell: 321 505-0025 Current Location: Melbourne, FL From chris.d.williams at gmail.com Tue Sep 11 20:08:05 2007 From: chris.d.williams at gmail.com (Chris Williams) Date: Tue, 11 Sep 2007 20:08:05 -0400 Subject: [Activewarehouse-discuss] Need some Fact ETL assistance Message-ID: I am trying to create my fact table and having some naming issues with what I have in my input, foreign_key_lookup, and output columns. I am in-lining the ETL so I can try and talk through what I am trying to do. I am trying to do a foreign key lookup. The value in the row is defined as :employee_perm_number. The foreign key lookup is working fine. Where I am having problems is that the column in the fact table is called employee_id. I tried doing a rename after the foreign key lookup but that doesn't work. Seems like rename occurs before the transform. I did change the name in my :in source from :employee_perm_number to employee_id and that worked, just doesn't seem clean to me. Seems like I have the problem alot since none of my column names in my fact table for the fk's match my :in source. Any suggestions? Thanks CW # Control file for creating the file revision facts from a Subversion log (in XML format) log_file = 'input/actuals_unix.csv' source :in, { :file => log_file, :parser => :delimited, :skip_lines => 1 }, [ :week_ending_date, :employee_perm_number, :employee_name, :project_number, :primary_number, :secondary_number, :employee_home_group_number, :charge_group_number, :pay_type_cd, :labor_hours, :obs, :obs_name ] #transform :date_id, :string_to_date #transform :date_id, :foreign_key_lookup, { # :resolver => SQLResolver.new('date_dimension', 'sql_date_stamp', :warehouse) #} transform :employee_perm_number, :foreign_key_lookup, { :resolver => SQLResolver.new('employee_dimension', 'employee_perm_number', :warehouse) } #rename :employee_perm_number, :employee_id #transform(:charge_number) {|n,v,r| r[:charge_number] = "D" + r[:project_number].rjust(4, '0') + r[:primary_number].rjust(4, '0') } #transform(:charge_number_composite_key) {|n,v,r| { "charge_number" => r[:charge_number], "secondary_number" => r[:secondary_number] }} #transform :charge_number_composite_key, :foreign_key_lookup, { # :resolver => SQLResolver.new('employee_dimension', :charge_number_composite_key, :warehouse) #} #rename :employee_id, :charge_number_composite_key transform :labor_hours, :type, :type => :float outfile = 'output/weekly_charge_facts.txt' columns = [:contract_id,:employee_id,:labor_hours] destination :out, { :file => outfile }, { :order => columns, } post_process :bulk_import, { :file => outfile, :truncate => true, :columns => columns, :target => :warehouse, :table => 'weekly_charge_facts', :line_separator => '\r\n' } From anthonyeden at gmail.com Tue Sep 11 20:15:59 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Tue, 11 Sep 2007 20:15:59 -0400 Subject: [Activewarehouse-discuss] Need some Fact ETL assistance In-Reply-To: References: Message-ID: There a couple possible solutions: 1.) copy :employee_perm_number, :employee_id and then execute the transform on :employee_id 2.) Use before_write :rename, :source => :employee_perm_number, :dest => :employee_id (rename is just a shortcut to the RenameProcessor. Personally I prefer the first option. HTH. V/r Anthony On 9/11/07, Chris Williams wrote: > I am trying to create my fact table and having some naming issues with > what I have in my input, foreign_key_lookup, and output columns. I am > in-lining the ETL so I can try and talk through what I am trying to > do. > > I am trying to do a foreign key lookup. The value in the row is > defined as :employee_perm_number. The foreign key lookup is working > fine. Where I am having problems is that the column in the fact table > is called employee_id. I tried doing a rename after the foreign key > lookup but that doesn't work. Seems like rename occurs before the > transform. I did change the name in my :in source from > :employee_perm_number to employee_id and that worked, just doesn't > seem clean to me. Seems like I have the problem alot since none of my > column names in my fact table for the fk's match my :in source. > > Any suggestions? > > Thanks > CW > > # Control file for creating the file revision facts from a Subversion > log (in XML format) > > log_file = 'input/actuals_unix.csv' > source :in, { > :file => log_file, > :parser => :delimited, > :skip_lines => 1 > }, > [ > :week_ending_date, > :employee_perm_number, > :employee_name, > :project_number, > :primary_number, > :secondary_number, > :employee_home_group_number, > :charge_group_number, > :pay_type_cd, > :labor_hours, > :obs, > :obs_name > ] > > #transform :date_id, :string_to_date > #transform :date_id, :foreign_key_lookup, { > # :resolver => SQLResolver.new('date_dimension', 'sql_date_stamp', :warehouse) > #} > > transform :employee_perm_number, :foreign_key_lookup, { > :resolver => SQLResolver.new('employee_dimension', > 'employee_perm_number', :warehouse) > } > > #rename :employee_perm_number, :employee_id > > #transform(:charge_number) {|n,v,r| r[:charge_number] = "D" + > r[:project_number].rjust(4, '0') + r[:primary_number].rjust(4, '0') } > > #transform(:charge_number_composite_key) {|n,v,r| { "charge_number" => > r[:charge_number], "secondary_number" => r[:secondary_number] }} > > #transform :charge_number_composite_key, :foreign_key_lookup, { > # :resolver => SQLResolver.new('employee_dimension', > :charge_number_composite_key, :warehouse) > #} > > #rename :employee_id, :charge_number_composite_key > > transform :labor_hours, :type, :type => :float > > outfile = 'output/weekly_charge_facts.txt' > columns = [:contract_id,:employee_id,:labor_hours] > > destination :out, { > :file => outfile > }, > { > :order => columns, > } > > post_process :bulk_import, { > :file => outfile, > :truncate => true, > :columns => columns, > :target => :warehouse, > :table => 'weekly_charge_facts', > :line_separator => '\r\n' > } > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > -- Cell: 808 782-5046 Cell: 321 505-0025 Current Location: Melbourne, FL From chris.d.williams at gmail.com Tue Sep 11 20:17:40 2007 From: chris.d.williams at gmail.com (Chris Williams) Date: Tue, 11 Sep 2007 20:17:40 -0400 Subject: [Activewarehouse-discuss] Need some Fact ETL assistance In-Reply-To: References: Message-ID: I need to draft emails and let them sit before I send them. I thought I tried the copy but then moved it before the transform and it worked fine. Thanks for the quick response! CW On 9/11/07, Anthony Eden wrote: > There a couple possible solutions: > > 1.) copy :employee_perm_number, :employee_id and then execute the > transform on :employee_id > > 2.) Use before_write :rename, :source => :employee_perm_number, :dest > => :employee_id (rename is just a shortcut to the RenameProcessor. > > Personally I prefer the first option. > > HTH. > > V/r > Anthony > > On 9/11/07, Chris Williams wrote: > > I am trying to create my fact table and having some naming issues with > > what I have in my input, foreign_key_lookup, and output columns. I am > > in-lining the ETL so I can try and talk through what I am trying to > > do. > > > > I am trying to do a foreign key lookup. The value in the row is > > defined as :employee_perm_number. The foreign key lookup is working > > fine. Where I am having problems is that the column in the fact table > > is called employee_id. I tried doing a rename after the foreign key > > lookup but that doesn't work. Seems like rename occurs before the > > transform. I did change the name in my :in source from > > :employee_perm_number to employee_id and that worked, just doesn't > > seem clean to me. Seems like I have the problem alot since none of my > > column names in my fact table for the fk's match my :in source. > > > > Any suggestions? > > > > Thanks > > CW > > > > # Control file for creating the file revision facts from a Subversion > > log (in XML format) > > > > log_file = 'input/actuals_unix.csv' > > source :in, { > > :file => log_file, > > :parser => :delimited, > > :skip_lines => 1 > > }, > > [ > > :week_ending_date, > > :employee_perm_number, > > :employee_name, > > :project_number, > > :primary_number, > > :secondary_number, > > :employee_home_group_number, > > :charge_group_number, > > :pay_type_cd, > > :labor_hours, > > :obs, > > :obs_name > > ] > > > > #transform :date_id, :string_to_date > > #transform :date_id, :foreign_key_lookup, { > > # :resolver => SQLResolver.new('date_dimension', 'sql_date_stamp', :warehouse) > > #} > > > > transform :employee_perm_number, :foreign_key_lookup, { > > :resolver => SQLResolver.new('employee_dimension', > > 'employee_perm_number', :warehouse) > > } > > > > #rename :employee_perm_number, :employee_id > > > > #transform(:charge_number) {|n,v,r| r[:charge_number] = "D" + > > r[:project_number].rjust(4, '0') + r[:primary_number].rjust(4, '0') } > > > > #transform(:charge_number_composite_key) {|n,v,r| { "charge_number" => > > r[:charge_number], "secondary_number" => r[:secondary_number] }} > > > > #transform :charge_number_composite_key, :foreign_key_lookup, { > > # :resolver => SQLResolver.new('employee_dimension', > > :charge_number_composite_key, :warehouse) > > #} > > > > #rename :employee_id, :charge_number_composite_key > > > > transform :labor_hours, :type, :type => :float > > > > outfile = 'output/weekly_charge_facts.txt' > > columns = [:contract_id,:employee_id,:labor_hours] > > > > destination :out, { > > :file => outfile > > }, > > { > > :order => columns, > > } > > > > post_process :bulk_import, { > > :file => outfile, > > :truncate => true, > > :columns => columns, > > :target => :warehouse, > > :table => 'weekly_charge_facts', > > :line_separator => '\r\n' > > } > > _______________________________________________ > > Activewarehouse-discuss mailing list > > Activewarehouse-discuss at rubyforge.org > > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > > > > > -- > Cell: 808 782-5046 > Cell: 321 505-0025 > Current Location: Melbourne, FL > From chris.d.williams at gmail.com Tue Sep 11 20:52:57 2007 From: chris.d.williams at gmail.com (Chris Williams) Date: Tue, 11 Sep 2007 20:52:57 -0400 Subject: [Activewarehouse-discuss] Multi-column foreign key lookup update... Message-ID: I have been tweaking the SQLResolver to get it to work for multi-column foreign key lookups. I am looking for some input on what I have done. I guess my main question is should it be a new Resolver instead of adding it to the current SQLResolver. ETL Comments The object beginning transformed for the foreign key needs to be a hash in the format of column/value. Below is an example. transform(:charge_number_composite_key) {|n,v,r| { "charge_number" => r[:charge_number], "secondary_number" => r[:secondary_number] }} Next you need to call the foreign key transform. # fetch the fk using the hash transform :charge_number_composite_key, :foreign_key_lookup, { :resolver => SQLResolver.new('contract_dimension', nil, :warehouse) } The reason I am leaning towards this being a separate Resolver is due to passing in nil for the column name. With the multi-column lookup, it isn't needed since it is part of the hash. I though on making this an array of columns and make the object to be transformed an array values. I didn't do that since it would require the two arrays to match. I figured the hash keeps everything tied together better without the risk of error in the arrays getting out of sync. SQLResolver Update All I had to do is update the resolve method to build the where clause. This probably looks familiar to Anthony from the ORUG list... def resolve(value) where_clause = "WHERE " if value.kind_of? Hash where_clause << value.collect { |k,v| "#{k} = '#{v}'" }.join(" AND ") else where_clause << "#{@field} = #{@connection.quote(value)}" end @connection.select_value("SELECT id FROM #{table_name} #{where_clause}") end Thanks to Anthony and Thibaut for the help on my posts on this subject! Thanks CW From chris.d.williams at gmail.com Tue Sep 11 22:35:22 2007 From: chris.d.williams at gmail.com (Chris Williams) Date: Tue, 11 Sep 2007 22:35:22 -0400 Subject: [Activewarehouse-discuss] AW Wiki Access Message-ID: Anthony, how does one get write access to the AW Wiki? I know you just stood it up but was wondering what the process was. I got my simple AW app up and running. The one item I would like to assist with on the Wiki is a updated walk through of your original tutorial. While SVN is up to date, the nice step by step is now out of sync. It seems like the Wiki would be a could place to maintain that tutorial so the AW community can help keep it up to date. Thanks! CW From anthonyeden at gmail.com Wed Sep 12 07:34:38 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Wed, 12 Sep 2007 07:34:38 -0400 Subject: [Activewarehouse-discuss] AW Wiki Access In-Reply-To: References: Message-ID: I originally set it up with invite only for new authors, however I just opened it up to everyone. If spam becomes an issue then I'll lock it down again. Feel free to put up an updated tutorial, that would be very helpful. V/r Anthony On 9/11/07, Chris Williams wrote: > Anthony, how does one get write access to the AW Wiki? I know you > just stood it up but was wondering what the process was. I got my > simple AW app up and running. The one item I would like to assist > with on the Wiki is a updated walk through of your original tutorial. > While SVN is up to date, the nice step by step is now out of sync. It > seems like the Wiki would be a could place to maintain that tutorial > so the AW community can help keep it up to date. > > Thanks! > CW > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > -- Cell: 808 782-5046 Cell: 321 505-0025 Current Location: Melbourne, FL From thibaut.barrere at gmail.com Thu Sep 13 19:21:25 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Fri, 14 Sep 2007 01:21:25 +0200 Subject: [Activewarehouse-discuss] Here comes a block processor In-Reply-To: <4a68b8cf0709090920w424ba4c6ud6995adc298c74d3@mail.gmail.com> References: <4a68b8cf0709090920w424ba4c6ud6995adc298c74d3@mail.gmail.com> Message-ID: <4a68b8cf0709131621k414d5bdey9b1b09c97a8763ad@mail.gmail.com> For those interested in this feature, I've just committed it into the trunk. cheers Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070914/2b3d578c/attachment.html From thibaut.barrere at gmail.com Fri Sep 14 03:19:51 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Fri, 14 Sep 2007 09:19:51 +0200 Subject: [Activewarehouse-discuss] Wiki / updated content Message-ID: <4a68b8cf0709140019t4b9e8f3bm4ad8e561c8b326c1@mail.gmail.com> Hi, I've thrown a couple of troubleshooting info at the wiki. If anyone found glitches which can easily be overcome, add yours! 1. Troubleshooting 1.1. When using the delimited parser, I get an unquoted field error with FasterCSV 1.2. When using SQLServer, can't manage to connect to the database 1.3. When using MySQL on Windows, the bulk processor times out 1.4. When specifying a file in the file source, I get no error but the output contains no rows at all* -- Thibaut From thibaut.barrere at gmail.com Sun Sep 16 05:00:37 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Sun, 16 Sep 2007 11:00:37 +0200 Subject: [Activewarehouse-discuss] Error handling for row transforms - your opinion is wanted! Message-ID: <4a68b8cf0709160200jf2953bbqe3b03980813525a0@mail.gmail.com> Hi, I've noticed that when an error is raised in a row processor, the row still makes it to the next transforms, then to the destination. >From my perspective this can lead to very strange things, especially when one transform builds its result based on previous computation, which has failed. I'd like to know if this behaviour is intented, or if you would be interested in having the row removed from the pipeline when an error occurs (or even, to allow both in a configurable fashion). What do you guys think ? Here's a test-case (requires the trunk) to illustrate the issue: Control file source :in, { :type => :mock, :name => :input } # trigger an error when required after_read { |row| throw "This row is corrupt" if row[:trigger_an_error]; row } transform(:field) { |n,v,r| "I've been added" } destination :out, { :type => :mock, :name => :output } Testcase def test_error_in_row_processing_should_remove_the_row MockSource[:input] = [{ :first_name => 'John', :trigger_an_error => true},{:first_name => 'Gary'}] process 'control_with_error.ctl' # only rows not raising errors should make their way to the destination assert_equal [{:first_name => 'Gary',:field => "I've been added"}], MockDestination[:output] end 1) Failure:test_error_in_row_processing_should_remove_the_row(ControlTest) [./test/control_test.rb:42]: <[{:field=>"I've been added", :first_name=>"Gary"}]> expected but was <[{:field=>"I've been added", :trigger_an_error=>true, :first_name=>"John"}, {:field=>"I've been added", :first_name=>"Gary"}]>. Thibaut Barr?re -- LoGeek [blog] http://www.dotnetguru2.org/tbarrere -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070916/f88e4bf5/attachment-0001.html From anthonyeden at gmail.com Sun Sep 16 11:02:45 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Sun, 16 Sep 2007 17:02:45 +0200 Subject: [Activewarehouse-discuss] Error handling for row transforms - your opinion is wanted! In-Reply-To: <4a68b8cf0709160200jf2953bbqe3b03980813525a0@mail.gmail.com> References: <4a68b8cf0709160200jf2953bbqe3b03980813525a0@mail.gmail.com> Message-ID: I agree, an error should cause the row to not be included in the output. A fatal should kill the ETL process and a warning should be just that, a warning, so it makes sense that the error level allows the process to continue and skip the row. V/r Anthony On 9/16/07, Thibaut Barr?re wrote: > Hi, > > I've noticed that when an error is raised in a row processor, the row still > makes it to the next transforms, then to the destination. > From my perspective this can lead to very strange things, especially when > one transform builds its result based on previous computation, which has > failed. > > I'd like to know if this behaviour is intented, or if you would be > interested in having the row removed from the pipeline when an error occurs > (or even, to allow both in a configurable fashion). > > What do you guys think ? > > Here's a test-case (requires the trunk) to illustrate the issue: > > Control file > > source :in, { :type => :mock, :name => :input } > # trigger an error when required > after_read { |row| throw "This row is corrupt" if row[:trigger_an_error]; > row } > transform(:field) { |n,v,r| "I've been added" } > destination :out, { :type => :mock, :name => :output } > > Testcase > > def test_error_in_row_processing_should_remove_the_row > MockSource[:input] = [{ :first_name => 'John', :trigger_an_error => > true},{:first_name => 'Gary'}] > process 'control_with_error.ctl' > # only rows not raising errors should make their way to the destination > assert_equal [{:first_name => 'Gary',:field => "I've been added"}], > MockDestination[:output] > end > > 1) Failure: > test_error_in_row_processing_should_remove_the_row(ControlTest) > [./test/control_test.rb:42]: > <[{:field=>"I've been added", :first_name=>"Gary"}]> expected but was > <[{:field=>"I've been added", :trigger_an_error=>true, :first_name=>"John"}, > {:field=>"I've been added", :first_name=>"Gary"}]>. > > > Thibaut Barr?re > -- > LoGeek > [blog] http://www.dotnetguru2.org/tbarrere > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > > -- Cell: 321 473-4966 Current Location: Melbourne, FL From thibaut.barrere at gmail.com Sun Sep 16 15:17:48 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Sun, 16 Sep 2007 21:17:48 +0200 Subject: [Activewarehouse-discuss] Error handling for row transforms - your opinion is wanted! In-Reply-To: References: <4a68b8cf0709160200jf2953bbqe3b03980813525a0@mail.gmail.com> Message-ID: <4a68b8cf0709161217m71f96491qa296e81e9dfde1a8@mail.gmail.com> I've created http://rubyforge.org/tracker/index.php?func=detail&aid=13990&group_id=2435&atid=9387for this - I'll have a look later this week. cheers Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070916/b8df056e/attachment.html From thibaut.barrere at gmail.com Sun Sep 16 16:11:22 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Sun, 16 Sep 2007 22:11:22 +0200 Subject: [Activewarehouse-discuss] use_temp_tables limitation when used with ActiveRecord models Message-ID: <4a68b8cf0709161311m395efa49r5a063240c65505de@mail.gmail.com> Hi, I'm using ActiveRecord in my screens, and must admit I've become addicted because it's very handy: screen(:fatal) do well_known_customer = CustomerDimension.find_by_name("Acme") assert_not_nil well_known_customer, "Acme customer not found" assert_equal %w(chocolate ice-cream), well_known_customer.preferred_products.map(&:name) end I've spotted a limitation though: when using use_temp_tables, ActiveRecord tables are not affected and the target table still is customer_dimension instead of tmp_customer_dimension. This most likely impacts the following areas: - model source - foreign key look-up with ActiveRecord resolver - screens relying on ActiveRecord (example above) A work-around which seems to work is: CustomerDimension.set_table_name('tmp_customer_dimension') before the screen (in my case). Thoughts ? Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070916/99184311/attachment.html From anthonyeden at gmail.com Mon Sep 17 07:05:38 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Mon, 17 Sep 2007 13:05:38 +0200 Subject: [Activewarehouse-discuss] use_temp_tables limitation when used with ActiveRecord models In-Reply-To: <4a68b8cf0709161311m395efa49r5a063240c65505de@mail.gmail.com> References: <4a68b8cf0709161311m395efa49r5a063240c65505de@mail.gmail.com> Message-ID: Use: ETL::Engine.table(table_name, connection) And it will return a modified name if necessary. This method will also construct the temp table if necessary. Word of warning, I am still seeing behavior with MySQL whereby indexes are not being retained in the temp table and thus will be lost when the temp table and renamed to the production table. You have been warned. V/r Anthony On 9/16/07, Thibaut Barr?re wrote: > Hi, > > I'm using ActiveRecord in my screens, and must admit I've become addicted > because it's very handy: > > screen(:fatal) do > well_known_customer = CustomerDimension.find_by_name("Acme") > assert_not_nil well_known_customer, "Acme customer not found" > assert_equal %w(chocolate ice-cream), > well_known_customer.preferred_products.map(&:name) > end > > I've spotted a limitation though: when using use_temp_tables, ActiveRecord > tables are not affected and the target table still is customer_dimension > instead of tmp_customer_dimension. > > This most likely impacts the following areas: > - model source > - foreign key look-up with ActiveRecord resolver > - screens relying on ActiveRecord (example above) > > A work-around which seems to work is: > > CustomerDimension.set_table_name('tmp_customer_dimension') > > before the screen (in my case). > > Thoughts ? > > Thibaut > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > > -- Cell: 321 473-4966 Current Location: Melbourne, FL From thibaut.barrere at gmail.com Mon Sep 17 07:15:04 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Mon, 17 Sep 2007 13:15:04 +0200 Subject: [Activewarehouse-discuss] use_temp_tables limitation when used with ActiveRecord models In-Reply-To: References: <4a68b8cf0709161311m395efa49r5a063240c65505de@mail.gmail.com> Message-ID: <4a68b8cf0709170415v59ea3d2dh6360b69088867ce3@mail.gmail.com> > Word of warning, I am still seeing behavior with MySQL whereby indexes > are not being retained in the temp table and thus will be lost when > the temp table and renamed to the production table. You have been > warned. Yep, my screens have detected this: the self-incrementing primary key is always set to 0. Maybe I'll choose to use a notion of temporary database instead of temporary table (either using the new MySQL rename database, or a mysqldump). Another options would be to rename the tables to something like "production_customers" at the end, and have the views pointing to these (and the end-user would only see the views). Just raw thoughts. From anthonyeden at gmail.com Mon Sep 17 07:41:32 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Mon, 17 Sep 2007 13:41:32 +0200 Subject: [Activewarehouse-discuss] use_temp_tables limitation when used with ActiveRecord models In-Reply-To: <4a68b8cf0709170415v59ea3d2dh6360b69088867ce3@mail.gmail.com> References: <4a68b8cf0709161311m395efa49r5a063240c65505de@mail.gmail.com> <4a68b8cf0709170415v59ea3d2dh6360b69088867ce3@mail.gmail.com> Message-ID: On 9/17/07, Thibaut Barr?re wrote: > > Word of warning, I am still seeing behavior with MySQL whereby indexes > > are not being retained in the temp table and thus will be lost when > > the temp table and renamed to the production table. You have been > > warned. > > Yep, my screens have detected this: the self-incrementing primary key > is always set to 0. > > Maybe I'll choose to use a notion of temporary database instead of > temporary table (either using the new MySQL rename database, or a > mysqldump). Another options would be to rename the tables to something > like "production_customers" at the end, and have the views pointing to > these (and the end-user would only see the views). I'm going to work on fixing the temp table functionality in MySQL. V/r Anthony > > Just raw thoughts. > -- Cell: 321 473-4966 Current Location: Melbourne, FL From thibaut.barrere at gmail.com Mon Sep 17 07:59:12 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Mon, 17 Sep 2007 13:59:12 +0200 Subject: [Activewarehouse-discuss] use_temp_tables limitation when used with ActiveRecord models In-Reply-To: References: <4a68b8cf0709161311m395efa49r5a063240c65505de@mail.gmail.com> <4a68b8cf0709170415v59ea3d2dh6360b69088867ce3@mail.gmail.com> Message-ID: <4a68b8cf0709170459x5a2f5393w7a167237e9ddf876@mail.gmail.com> Great - keep me posted! Raw ideas again, but what I've been doing at some point in my tests was traversing ObjectSpace to get all the subclasses of ActiveWarehouse::Fact or Dimension, and assign them table_name = "tmp_" + table_name (unless the table_name already begins with "tmp_", which can happen if you include a "common.rb" in each of your control files then call this method to ensure every AR in scope is properly "temporised"). Basically I think that unless specified otherwise, when use_temp_tables is called with true, we could handle the AR patching behind. If you find this (or something similar) interesting and not too much black-magic, I could work on a patch (I will do it for myself anyway when you'll have fixed the temp table functionality). Still on this topic, here's a use case which I'd like to improve: use_temp_tables # use a variable for further reference table = "date_dimension" ... destination :table => table # (handles temp table mapping alone as I seem to remember) # this won't work because the table is not translated to the temp table screen(:fatal) do connection.query("select distinct customer_type from #{table}") end Basically, I think we'd just need a small helper which would do : def table_name(table_name) use_temp_tables? ? "tmp_#{table_name}" : table_name end (that's the idea, not the implementation) to allow a more seamless use in the screen. Now all this is just polishing - but I think it brings value. Thibaut From anthonyeden at gmail.com Mon Sep 17 08:30:48 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Mon, 17 Sep 2007 14:30:48 +0200 Subject: [Activewarehouse-discuss] use_temp_tables limitation when used with ActiveRecord models In-Reply-To: <4a68b8cf0709170459x5a2f5393w7a167237e9ddf876@mail.gmail.com> References: <4a68b8cf0709161311m395efa49r5a063240c65505de@mail.gmail.com> <4a68b8cf0709170415v59ea3d2dh6360b69088867ce3@mail.gmail.com> <4a68b8cf0709170459x5a2f5393w7a167237e9ddf876@mail.gmail.com> Message-ID: On 9/17/07, Thibaut Barr?re wrote: > Great - keep me posted! > > Raw ideas again, but what I've been doing at some point in my tests > was traversing ObjectSpace to get all the subclasses of > ActiveWarehouse::Fact or Dimension, and assign them table_name = > "tmp_" + table_name (unless the table_name already begins with "tmp_", > which can happen if you include a "common.rb" in each of your control > files then call this method to ensure every AR in scope is properly > "temporised"). > > Basically I think that unless specified otherwise, when > use_temp_tables is called with true, we could handle the AR patching > behind. > > If you find this (or something similar) interesting and not too much > black-magic, I could work on a patch (I will do it for myself anyway > when you'll have fixed the temp table functionality). So this would allow you to easily use the AR objects...yes, I'd like to see this happen, and I think it could be mixed into AR::Base at runtime if temp tables are being used, something overriding table_name perhaps? On a related note I've actually found a better way to create the temp table with MySQL, using two statements. It'll require an update to adapter extensions, but I think it's the right way to go. > Still on this topic, here's a use case which I'd like to improve: > > use_temp_tables > # use a variable for further reference > table = "date_dimension" > ... > destination :table => table # (handles temp table mapping alone as I > seem to remember) > > # this won't work because the table is not translated to the temp table > screen(:fatal) do > connection.query("select distinct customer_type from #{table}") > end > > Basically, I think we'd just need a small helper which would do : > def table_name(table_name) > use_temp_tables? ? "tmp_#{table_name}" : table_name > end > > (that's the idea, not the implementation) > > to allow a more seamless use in the screen. Yes, I think that would be a good helper, although if possible have it get the table name from ETL::Engine.table_name. > Now all this is just polishing - but I think it brings value. Agreed. V/r Anthony -- Cell: 321 473-4966 Current Location: Melbourne, FL From endersonmaia at gmail.com Mon Sep 17 10:19:11 2007 From: endersonmaia at gmail.com (Enderson Maia) Date: Mon, 17 Sep 2007 11:19:11 -0300 Subject: [Activewarehouse-discuss] bulk-import with update option Message-ID: <228840e70709170719y94b876dw690e99e060e76f25@mail.gmail.com> Is ther a a way to make bulk-import update records when a duplicate is found ? If not, this is a feature-request! :) It's possible with http://www.continuousthinking.com/tags/arext/rdoc/index.html and the import method. I have a case where I need to search two tables for a result, the first time I make a INSERT and the I make a INSERT with 'ON DUPLICATE UPDATE' I didn't look at the aw-etl source , I'm not that familiar with Ruby, but while getting better, maybe I try to make aw-etl work with UPDATE. -- Enderson Maia From chris.d.williams at gmail.com Mon Sep 17 17:23:44 2007 From: chris.d.williams at gmail.com (Chris Williams) Date: Mon, 17 Sep 2007 17:23:44 -0400 Subject: [Activewarehouse-discuss] Multiple Dimension Report Question Message-ID: I have three dimensions: date, employee, and contract number. My fact table captures labor hours for each employee. The report I am trying to create is the following. By Contract Number Column is date Row is employee class (from the employee dimension) I have been able to create a report with Column is date and Row is employee class but can't figure out how to get this by contract number. Do I have to create a custom report class? Is this possible out of the box? Thanks CW From tfakes at zipperint.com Tue Sep 18 14:38:05 2007 From: tfakes at zipperint.com (Tom Fakes) Date: Tue, 18 Sep 2007 11:38:05 -0700 Subject: [Activewarehouse-discuss] Figured out the issues I was havingwith the demo app In-Reply-To: References: Message-ID: <446B70CD673A974BB5D2E25385B912EF2568D6@troon.zipperint.com> This is close, but I think there is one more thing that needs to be done to make a control file work on all platforms. Here's the problem. On Windows, with the previous patches, this now works: FileDestination eol => \n (the default) Bulk Import line_separator => \r\n But this control file will not work on Unixes I'd like the defaults to work on both platforms: FileDestination eol => \n => \r\n (windows) \n (others) Bulk Import line_separator => \n => \n (all platforms) Sadly, the Ruby file output, in text mode, converts \n to \r\n. The fix for this is to open the file in Binary mode to stop all output translation. A simple change in file_destination.rb - add 'b' to the mode strings returned: def mode append ? 'ab' : 'wb' end -----Original Message----- From: activewarehouse-discuss-bounces at rubyforge.org [mailto:activewarehouse-discuss-bounces at rubyforge.org] On Behalf Of Chris Williams Sent: Sunday, August 19, 2007 12:46 PM To: Anthony Eden; activewarehouse-discuss at rubyforge.org Subject: Re: [Activewarehouse-discuss] Figured out the issues I was havingwith the demo app Anthony, here are some patches I came up with for the activewarehouse-etl and adapter_extensions. With these patches and adding the :line_separator => '\r\n' to the bulk_import portion of the etl scripts, it fixed the problem on windows with mysql. From what I could tell, Postgres doesn't have a similar line_separator option and I made an attempt at a fix for sqlserver. I am still working on creating some real test cases to pass on. Either way, he is a description of the changes. activewarehouse-etl/lib/etl/processor/bulk_import_processor.rb I created a new lines item in the hash. I did this since the that option is line based not field based. adapter_extensions/lib/adapter_extensions/connection_adapters/mysql_adap ter.rb Added the additional commands to the MySQL copy command to define the line_separator. adapter_extensions/lib/adapter_extensions/connection_adapters/sqlserver_ adapter.rb I added a -r to the bcp command to define the line_separator (not tested) author_dimension.txt Here is an example of file that has the \r\n for the end of line. Let me know if you have any questions. Thanks CW On 8/19/07, Anthony Eden wrote: > On 8/19/07, Chris Williams wrote: > > I think I have narrowed it down to the bulk importer but I have hit a > > wall. I dumped the authors table from my MySQL database and the extra > > character is \r. I tried changing the author_dimension.ctl to include > > the :line_separator => '\r\n' to the post_process item but that didn't > > help. Where is the conn.bulk_load defined? Is that outside the ETL > > code base? I searched the code and didn't see bulk_load referenced. > > It's defined in the adapter_extensions library, which is another > library under the ActiveWarehouse umbrella project. > > V/r > Anthony > > -- > Cell: 808 782-5046 > Current Location: Melbourne, FL > From thibaut.barrere at gmail.com Tue Sep 18 15:40:03 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Tue, 18 Sep 2007 21:40:03 +0200 Subject: [Activewarehouse-discuss] various bulk load issues (MySQL, SQLServer) Message-ID: <4a68b8cf0709181240t2ae8c939ocb19fd11dd997a83@mail.gmail.com> Here's a bunch of issues I've spotted, related to adapter_extensions, or to the way the databases work, anyway: * bulk load time-out on MySQL/Windows Only happens on MySQL/Windows. As soon as the file to bulk load is bigger than the maximum packet size (see mysql startup variables), the bulk ends up with "connection lost" or something similar. Pretty annoying if you must increase the value as time passes (more data etc). It may be purely related to the existing adapter. * bulk load does not work at all on SQLServer Seems broken to me, the following patch fixes it at least partially: - def do_bulk_load(file, table_name, options={}) + def do_bulk_load(filename, table_name, options={}) Even after fixing this, I get various glitches (encoding issues etc) but I did not look into it for the moment. As well, does anyone knows if ADO.rb is going (or is already maybe somewhere ?) to be included in the standard Ruby distribution ? cheers Thibaut From anthonyeden at gmail.com Tue Sep 18 15:51:52 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Tue, 18 Sep 2007 21:51:52 +0200 Subject: [Activewarehouse-discuss] various bulk load issues (MySQL, SQLServer) In-Reply-To: <4a68b8cf0709181240t2ae8c939ocb19fd11dd997a83@mail.gmail.com> References: <4a68b8cf0709181240t2ae8c939ocb19fd11dd997a83@mail.gmail.com> Message-ID: On 9/18/07, Thibaut Barr?re wrote: > Here's a bunch of issues I've spotted, related to adapter_extensions, > or to the way the databases work, anyway: > > * bulk load time-out on MySQL/Windows > > Only happens on MySQL/Windows. As soon as the file to bulk load is > bigger than the maximum packet size (see mysql startup variables), the > bulk ends up with "connection lost" or something similar. Pretty > annoying if you must increase the value as time passes (more data > etc). It may be purely related to the existing adapter. It is an adapter issue. I actually tracked it down and I'm pretty sure I wrote up something on how to fix it, but I can't find it at the moment. I'll get back to you on it. > > * bulk load does not work at all on SQLServer > > Seems broken to me, the following patch fixes it at least partially: > - def do_bulk_load(file, table_name, options={}) > + def do_bulk_load(filename, table_name, options={}) > > Even after fixing this, I get various glitches (encoding issues etc) > but I did not look into it for the moment. Unfortunately I don't have a SQL Server instance up and running and easy to test on. I need to change that. > As well, does anyone knows if ADO.rb is going (or is already maybe > somewhere ?) to be included in the standard Ruby distribution ? I don't think it will, but I could be wrong. V/r Anthony -- Cell: 321 473-4966 Current Location: Berlin, Germany From thibaut.barrere at gmail.com Tue Sep 18 16:01:08 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Tue, 18 Sep 2007 22:01:08 +0200 Subject: [Activewarehouse-discuss] various bulk load issues (MySQL, SQLServer) In-Reply-To: References: <4a68b8cf0709181240t2ae8c939ocb19fd11dd997a83@mail.gmail.com> Message-ID: <4a68b8cf0709181301p5ae8813y6c6dd14fa897f114@mail.gmail.com> > It is an adapter issue. I actually tracked it down and I'm pretty sure > I wrote up something on how to fix it, but I can't find it at the > moment. I'll get back to you on it. thanks - I should be able to do some testing by the end of the week on my Windows Parallels instance if you find it back. > Unfortunately I don't have a SQL Server instance up and running and > easy to test on. I need to change that. Not promising anything here, but I have installed an instance of SQLServer Express [1] in my Windows VM, which I think has BCP bundled. I may have a look after my current datawarehouse is finished. Thibaut [1] http://technet.microsoft.com/fr-fr/library/ms345154.aspx From thibaut.barrere at gmail.com Tue Sep 18 18:24:55 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Wed, 19 Sep 2007 00:24:55 +0200 Subject: [Activewarehouse-discuss] Snippet: a Rake task to copy a whole database (MySQL only) Message-ID: <4a68b8cf0709181524u572e4060ndefd5086b9879ee3@mail.gmail.com> Hi, in case it's useful to someone else, here's a Rake task I wrote to simulate a 'database rename' in MySQL (database rename is available in MySQL 5.1, so the Rake task is relevant only for 5.0 or before, where it's not available). I use it to publish my warehouse once everything is loaded and all the screens have passed - so the idea is a bit similar to using temp tables, except that the temp tables will perform better than dumping everything if you have a lot of data, not to forget that not everybody is granted the right to drop and create database (eg: issues on shared hosting for instance). Anyway, here's the snippet: def run_query(command,query) credentials = "--user=#{@username} --password=#{@password}" puts "Launching #{command} with '#{query}'" throw "Error while running #{command} with '#{query}'" unless system("#{command} #{credentials} #{query}") end desc "Publish the warehouse (MySQL only)" task :publish_warehouse => :environment do config = ActiveRecord::Base.configurations[RAILS_ENV] database_name, at username, at password = %w(database username password).map { |e| config[e] } published_database_name = "#{database_name}_published" backup_file = "#{database_name}.bak" run_query 'mysqldump',"#{database_name} > #{backup_file}" run_query 'mysql', "-e \"drop database if exists #{published_database_name}\"" run_query 'mysql', "-e \"create database #{published_database_name}\"" run_query 'mysql', "#{published_database_name} < #{backup_file} " end cheers Thibaut From thibaut.barrere at gmail.com Tue Sep 18 18:55:03 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Wed, 19 Sep 2007 00:55:03 +0200 Subject: [Activewarehouse-discuss] Feature proposal: stop the processing as soon as the error threshold is reached Message-ID: <4a68b8cf0709181555s3ede0715r219fb6a1d907dce2@mail.gmail.com> http://rubyforge.org/tracker/index.php?func=detail&aid=14055&group_id=2435&atid=9387 Today it seems that the processing is carried out, then the number of errors is checked to see if the process should be stopped. I propose to just stop as soon as the threshold is reached. Your opinion ? (if ok I can work on a patch this week) -- Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070919/d26ec4fb/attachment.html From chris.d.williams at gmail.com Tue Sep 18 19:32:30 2007 From: chris.d.williams at gmail.com (Chris Williams) Date: Tue, 18 Sep 2007 19:32:30 -0400 Subject: [Activewarehouse-discuss] Feature proposal: stop the processing as soon as the error threshold is reached In-Reply-To: <4a68b8cf0709181555s3ede0715r219fb6a1d907dce2@mail.gmail.com> References: <4a68b8cf0709181555s3ede0715r219fb6a1d907dce2@mail.gmail.com> Message-ID: Sounds good to me..for what it is worth :) On 9/18/07, Thibaut Barr?re wrote: > http://rubyforge.org/tracker/index.php?func=detail&aid=14055&group_id=2435&atid=9387 > > Today it seems that the processing is carried out, then the number of errors > is checked to see if the process should > be stopped. I propose to just stop as soon as the threshold is reached. > > Your opinion ? (if ok I can work on a patch this week) > > -- Thibaut > > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > > From chris.d.williams at gmail.com Tue Sep 18 19:53:05 2007 From: chris.d.williams at gmail.com (Chris Williams) Date: Tue, 18 Sep 2007 19:53:05 -0400 Subject: [Activewarehouse-discuss] Figured out the issues I was havingwith the demo app In-Reply-To: <446B70CD673A974BB5D2E25385B912EF2568D6@troon.zipperint.com> References: <446B70CD673A974BB5D2E25385B912EF2568D6@troon.zipperint.com> Message-ID: If the mode is changed to ab/wb as the default, will the eol option be used at all? I didn't know that was even an option for the file destination. Anthony, should I open a ticket in the tracker to document these fixes so they can get integrated? Thanks CW On 9/18/07, Tom Fakes wrote: > This is close, but I think there is one more thing that needs to be done > to make a control file work on all platforms. > > Here's the problem. On Windows, with the previous patches, this now > works: > FileDestination eol => \n (the default) > Bulk Import line_separator => \r\n > > But this control file will not work on Unixes > > I'd like the defaults to work on both platforms: > FileDestination eol => \n => \r\n (windows) \n (others) > Bulk Import line_separator => \n => \n (all platforms) > > Sadly, the Ruby file output, in text mode, converts \n to \r\n. The fix > for this is to open the file in Binary mode to stop all output > translation. > > A simple change in file_destination.rb - add 'b' to the mode strings > returned: > > def mode > append ? 'ab' : 'wb' > end > > > > -----Original Message----- > From: activewarehouse-discuss-bounces at rubyforge.org > [mailto:activewarehouse-discuss-bounces at rubyforge.org] On Behalf Of > Chris Williams > Sent: Sunday, August 19, 2007 12:46 PM > To: Anthony Eden; activewarehouse-discuss at rubyforge.org > Subject: Re: [Activewarehouse-discuss] Figured out the issues I was > havingwith the demo app > > Anthony, here are some patches I came up with for the > activewarehouse-etl and adapter_extensions. With these patches and > adding the :line_separator => '\r\n' to the bulk_import portion of the > etl scripts, it fixed the problem on windows with mysql. From what I > could tell, Postgres doesn't have a similar line_separator option and > I made an attempt at a fix for sqlserver. > > I am still working on creating some real test cases to pass on. > Either way, he is a description of the changes. > > activewarehouse-etl/lib/etl/processor/bulk_import_processor.rb > I created a new lines item in the hash. I did this since the that > option is line based not field based. > > adapter_extensions/lib/adapter_extensions/connection_adapters/mysql_adap > ter.rb > Added the additional commands to the MySQL copy command to define the > line_separator. > > adapter_extensions/lib/adapter_extensions/connection_adapters/sqlserver_ > adapter.rb > I added a -r to the bcp command to define the line_separator (not > tested) > > author_dimension.txt > Here is an example of file that has the \r\n for the end of line. > > Let me know if you have any questions. > Thanks > CW > > On 8/19/07, Anthony Eden wrote: > > On 8/19/07, Chris Williams wrote: > > > I think I have narrowed it down to the bulk importer but I have hit > a > > > wall. I dumped the authors table from my MySQL database and the > extra > > > character is \r. I tried changing the author_dimension.ctl to > include > > > the :line_separator => '\r\n' to the post_process item but that > didn't > > > help. Where is the conn.bulk_load defined? Is that outside the ETL > > > code base? I searched the code and didn't see bulk_load referenced. > > > > It's defined in the adapter_extensions library, which is another > > library under the ActiveWarehouse umbrella project. > > > > V/r > > Anthony > > > > -- > > Cell: 808 782-5046 > > Current Location: Melbourne, FL > > > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > From thibaut.barrere at gmail.com Wed Sep 19 02:51:06 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Wed, 19 Sep 2007 08:51:06 +0200 Subject: [Activewarehouse-discuss] various bulk load issues (MySQL, SQLServer) In-Reply-To: References: <4a68b8cf0709181240t2ae8c939ocb19fd11dd997a83@mail.gmail.com> Message-ID: <4a68b8cf0709182351t7c52b77bidd7fe98d474616b9@mail.gmail.com> > It is an adapter issue. I actually tracked it down and I'm pretty sure > I wrote up something on how to fix it, but I can't find it at the > moment. I'll get back to you on it. Ouch - today it seems that I have reached a limit: increasing the maximum packet size won't let the bulk load work (size is around 17 megabytes). I'm going to investigate further. Not sure it's related to the same issue or not, but your write-up will be even more precious to me now! (I'm not blocked right now though, I can work on other stuff for the moment). cheers Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070919/8f7c637a/attachment.html From anthonyeden at gmail.com Wed Sep 19 03:32:01 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Wed, 19 Sep 2007 09:32:01 +0200 Subject: [Activewarehouse-discuss] Feature proposal: stop the processing as soon as the error threshold is reached In-Reply-To: <4a68b8cf0709181555s3ede0715r219fb6a1d907dce2@mail.gmail.com> References: <4a68b8cf0709181555s3ede0715r219fb6a1d907dce2@mail.gmail.com> Message-ID: On 9/19/07, Thibaut Barr?re wrote: > http://rubyforge.org/tracker/index.php?func=detail&aid=14055&group_id=2435&atid=9387 > > Today it seems that the processing is carried out, then the number of errors > is checked to see if the process should > be stopped. I propose to just stop as soon as the threshold is reached. > > Your opinion ? (if ok I can work on a patch this week) +1 V/r Anthony -- Cell: 321 473-4966 Current Location: Berlin, Germany From thibaut.barrere at gmail.com Wed Sep 19 03:41:17 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Wed, 19 Sep 2007 09:41:17 +0200 Subject: [Activewarehouse-discuss] Feature proposal: stop the processing as soon as the error threshold is reached In-Reply-To: References: <4a68b8cf0709181555s3ede0715r219fb6a1d907dce2@mail.gmail.com> Message-ID: <4a68b8cf0709190041u34133c56pcfca683edc865653@mail.gmail.com> Hi guys, thanks for the feedback - I'll have a look this week (I already have a working test-case to reproduce it - and I'd like to stop writing huge etl.log :-/ ). Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/activewarehouse-discuss/attachments/20070919/2a8da2a0/attachment.html From anthonyeden at gmail.com Wed Sep 19 04:39:48 2007 From: anthonyeden at gmail.com (Anthony Eden) Date: Wed, 19 Sep 2007 10:39:48 +0200 Subject: [Activewarehouse-discuss] Figured out the issues I was havingwith the demo app In-Reply-To: References: <446B70CD673A974BB5D2E25385B912EF2568D6@troon.zipperint.com> Message-ID: On 9/19/07, Chris Williams wrote: > If the mode is changed to ab/wb as the default, will the eol option be > used at all? I didn't know that was even an option for the file > destination. > > Anthony, should I open a ticket in the tracker to document these fixes > so they can get integrated? Yes, go ahead and do that. V/r Anthony -- Cell: 321 473-4966 Current Location: Berlin, Germany From tfakes at zipperint.com Wed Sep 19 14:40:49 2007 From: tfakes at zipperint.com (Tom Fakes) Date: Wed, 19 Sep 2007 11:40:49 -0700 Subject: [Activewarehouse-discuss] Figured out the issues I was havingwith the demo app In-Reply-To: References: <446B70CD673A974BB5D2E25385B912EF2568D6@troon.zipperint.com> Message-ID: <446B70CD673A974BB5D2E25385B912EF2568D9@troon.zipperint.com> The eol option is used - the default value of \n is used, which ends up in the file as a single character on all platforms, which MySQL reads correctly on all platforms. Don't know about other bulk loaders. I don't know if the Windows Ruby libraries convert any other character sequences in Text mode that would cause problems elsewhere. Binary mode only works in Windows, but it doesn't break on my Redhat installation at least. -----Original Message----- From: Chris Williams [mailto:chris.d.williams at gmail.com] Sent: Tuesday, September 18, 2007 4:53 PM To: Tom Fakes Cc: activewarehouse-discuss at rubyforge.org Subject: Re: [Activewarehouse-discuss] Figured out the issues I was havingwith the demo app If the mode is changed to ab/wb as the default, will the eol option be used at all? I didn't know that was even an option for the file destination. Anthony, should I open a ticket in the tracker to document these fixes so they can get integrated? Thanks CW On 9/18/07, Tom Fakes wrote: > This is close, but I think there is one more thing that needs to be done > to make a control file work on all platforms. > > Here's the problem. On Windows, with the previous patches, this now > works: > FileDestination eol => \n (the default) > Bulk Import line_separator => \r\n > > But this control file will not work on Unixes > > I'd like the defaults to work on both platforms: > FileDestination eol => \n => \r\n (windows) \n (others) > Bulk Import line_separator => \n => \n (all platforms) > > Sadly, the Ruby file output, in text mode, converts \n to \r\n. The fix > for this is to open the file in Binary mode to stop all output > translation. > > A simple change in file_destination.rb - add 'b' to the mode strings > returned: > > def mode > append ? 'ab' : 'wb' > end > > > > -----Original Message----- > From: activewarehouse-discuss-bounces at rubyforge.org > [mailto:activewarehouse-discuss-bounces at rubyforge.org] On Behalf Of > Chris Williams > Sent: Sunday, August 19, 2007 12:46 PM > To: Anthony Eden; activewarehouse-discuss at rubyforge.org > Subject: Re: [Activewarehouse-discuss] Figured out the issues I was > havingwith the demo app > > Anthony, here are some patches I came up with for the > activewarehouse-etl and adapter_extensions. With these patches and > adding the :line_separator => '\r\n' to the bulk_import portion of the > etl scripts, it fixed the problem on windows with mysql. From what I > could tell, Postgres doesn't have a similar line_separator option and > I made an attempt at a fix for sqlserver. > > I am still working on creating some real test cases to pass on. > Either way, he is a description of the changes. > > activewarehouse-etl/lib/etl/processor/bulk_import_processor.rb > I created a new lines item in the hash. I did this since the that > option is line based not field based. > > adapter_extensions/lib/adapter_extensions/connection_adapters/mysql_adap > ter.rb > Added the additional commands to the MySQL copy command to define the > line_separator. > > adapter_extensions/lib/adapter_extensions/connection_adapters/sqlserver_ > adapter.rb > I added a -r to the bcp command to define the line_separator (not > tested) > > author_dimension.txt > Here is an example of file that has the \r\n for the end of line. > > Let me know if you have any questions. > Thanks > CW > > On 8/19/07, Anthony Eden wrote: > > On 8/19/07, Chris Williams wrote: > > > I think I have narrowed it down to the bulk importer but I have hit > a > > > wall. I dumped the authors table from my MySQL database and the > extra > > > character is \r. I tried changing the author_dimension.ctl to > include > > > the :line_separator => '\r\n' to the post_process item but that > didn't > > > help. Where is the conn.bulk_load defined? Is that outside the ETL > > > code base? I searched the code and didn't see bulk_load referenced. > > > > It's defined in the adapter_extensions library, which is another > > library under the ActiveWarehouse umbrella project. > > > > V/r > > Anthony > > > > -- > > Cell: 808 782-5046 > > Current Location: Melbourne, FL > > > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > From thibaut.barrere at gmail.com Thu Sep 20 04:20:09 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Thu, 20 Sep 2007 10:20:09 +0200 Subject: [Activewarehouse-discuss] Crossing borders - anyone used JRuby or other interpreters with AW-ETL ? Message-ID: <4a68b8cf0709200120u41a0df99ye82ea7e018295450@mail.gmail.com> Given that both Java and .Net are pretty good and well-established at handling file encoding and databases, I think that having AW-ETL work with JRuby (already there) and IronRuby (John Lam's team is at work) will widen the use cases of AW-ETL. Other interpretors will also bring benefits (Ruby 1.9 speed-up for instance). I didn't try anything so far (except Ruby 1.9, but it chokes on my activesupport version for the moment), but I'd be interested in any feedback on this topic. cheers Thibaut From thibaut.barrere at gmail.com Thu Sep 20 13:13:08 2007 From: thibaut.barrere at gmail.com (=?ISO-8859-1?Q?Thibaut_Barr=E8re?=) Date: Thu, 20 Sep 2007 19:13:08 +0200 Subject: [Activewarehouse-discuss] Feature proposal - bulk import processor ability to split the file into chunks before loading Message-ID: <4a68b8cf0709201013g37e95bbk72f45f1243537dcc@mail.gmail.com> Hi, to finally cope with bulk load issues on MySQL (lost connection etc), I've added the ability to split the file into chunks. It works this way: post_process :bulk_import, { :file => bulk_file, :columns => target_fields, :field_separator => ',', :target => CONFIG, :table => table, :rows_per_chunk => 10000 } rows_per_chunk defaults to false, which does not split the files at all (current behaviour). Is it interesting to others and should I commit this ? Any comments or remarks on naming or behaviour ? I'm pretty sure the code can be simplified (first version of the patch below if you care of the implementation details). cheers -- Thibaut @@ -21,6 +21,10 @@ attr_accessor :field_enclosure # The line separator (defaults to a newline) attr_accessor :line_separator + # How many rows should be sent at a time (defaults to false => all rows in one chunk) + attr_accessor :rows_per_chunk + # Chunk file name (defaults to file + '.chunk' ) + attr_accessor :chunk_file # Initialize the processor. # @@ -33,7 +37,9 @@ # the bulk data file # * :field_separator: The field separator. Defaults to a comma # * :line_separator: The line separator. Defaults to a newline - # * :field_enclosure: The field enclosure charcaters + # * :field_enclosure: The field enclosure characters + # * :rows_per_chunk: How many rows should be sent at a time (defaults to false => all rows in one chunk) + # * :chunk_file: The chunk file name (defaults to file + '.chunk' ), when using lines_per_chunk def initialize(control, configuration) super @file = File.join(File.dirname(control.file), configuration[:file]) @@ -44,7 +50,8 @@ @field_separator = (configuration[:field_separator] || ',') @line_separator = (configuration[:line_separator] || "\n") @field_enclosure = configuration[:field_enclosure] - + @rows_per_chunk = (configuration[:rows_per_chunk] || false) + @chunk_file = (configuration[:chunk_file] || (@file + '.chunk' )) raise ControlError, "Target must be specified" unless @target raise ControlError, "Table must be specified" unless @table end @@ -65,10 +72,34 @@ options[:fields][:enclosed_by] = field_enclosure if field_enclosure options[:fields][:terminated_by] = line_separator if line_separator end - conn.bulk_load(file, table_name, options) + split_into_chunks(file,rows_per_chunk) do |new_file,rows_count| + puts "Bulk loading #{rows_count} rows..." + conn.bulk_load(new_file, table_name, options) + end end end - + + # Split the file into rows_per_chunk, yield a temporary chunk filename each time + def split_into_chunks(filename,rows_per_chunk) + if rows_per_chunk + File.open(filename) do |input| + while not input.eof? + rows_count = 0 + File.open(chunk_file,'w') do |chunk| + while true + chunk << input.gets + rows_count += 1 + break if (input.lineno % rows_per_chunk == 0) || (input.eof?) + end + end + yield chunk_file,rows_count + end + end + else + yield filename + end + end + def table_name ETL::Engine.table(table, ETL::Engine.connection(target)) end From chris.d.williams at gmail.com Thu Sep 20 17:43:25 2007 From: chris.d.williams at gmail.com (Chris Williams) Date: Thu, 20 Sep 2007 17:43:25 -0400 Subject: [Activewarehouse-discuss] Feature proposal - bulk import processor ability to split the file into chunks before loading In-Reply-To: <4a68b8cf0709201013g37e95bbk72f45f1243537dcc@mail.gmail.com> References: <4a68b8cf0709201013g37e95bbk72f45f1243537dcc@mail.gmail.com> Message-ID: My only comment is should w or wb be used to open the temp file. This is related to the EOL issues I have been having between *nix and Windows. Thanks CW On 9/20/07, Thibaut Barr?re wrote: > Hi, > > to finally cope with bulk load issues on MySQL (lost connection etc), > I've added the ability to split the file into chunks. It works this > way: > > post_process :bulk_import, { :file => bulk_file, :columns => target_fields, > :field_separator => ',', :target => CONFIG, :table => table, > :rows_per_chunk => 10000 } > > rows_per_chunk defaults to false, which does not split the files at > all (current behaviour). > > Is it interesting to others and should I commit this ? Any comments or > remarks on naming or behaviour ? I'm pretty sure the code can be > simplified (first version of the patch below if you care of the > implementation details). > > cheers > -- Thibaut > > > @@ -21,6 +21,10 @@ > attr_accessor :field_enclosure > # The line separator (defaults to a newline) > attr_accessor :line_separator > + # How many rows should be sent at a time (defaults to false => > all rows in one chunk) > + attr_accessor :rows_per_chunk > + # Chunk file name (defaults to file + '.chunk' ) > + attr_accessor :chunk_file > > # Initialize the processor. > # > @@ -33,7 +37,9 @@ > # the bulk data file > # * :field_separator: The field separator. Defaults to a comma > # * :line_separator: The line separator. Defaults to a newline > - # * :field_enclosure: The field enclosure charcaters > + # * :field_enclosure: The field enclosure characters > + # * :rows_per_chunk: How many rows should be sent at a > time (defaults to false => all rows in one chunk) > + # * :chunk_file: The chunk file name (defaults to file > + '.chunk' ), when using lines_per_chunk > def initialize(control, configuration) > super > @file = File.join(File.dirname(control.file), configuration[:file]) > @@ -44,7 +50,8 @@ > @field_separator = (configuration[:field_separator] || ',') > @line_separator = (configuration[:line_separator] || "\n") > @field_enclosure = configuration[:field_enclosure] > - > + @rows_per_chunk = (configuration[:rows_per_chunk] || false) > + @chunk_file = (configuration[:chunk_file] || (@file + '.chunk' )) > raise ControlError, "Target must be specified" unless @target > raise ControlError, "Table must be specified" unless @table > end > @@ -65,10 +72,34 @@ > options[:fields][:enclosed_by] = field_enclosure if field_enclosure > options[:fields][:terminated_by] = line_separator if line_separator > end > - conn.bulk_load(file, table_name, options) > + split_into_chunks(file,rows_per_chunk) do |new_file,rows_count| > + puts "Bulk loading #{rows_count} rows..." > + conn.bulk_load(new_file, table_name, options) > + end > end > end > - > + > + # Split the file into rows_per_chunk, yield a temporary chunk > filename each time > + def split_into_chunks(filename,rows_per_chunk) > + if rows_per_chunk > + File.open(filename) do |input| > + while not input.eof? > + rows_count = 0 > + File.open(chunk_file,'w') do |chunk| > + while true > + chunk << input.gets > + rows_count += 1 > + break if (input.lineno % rows_per_chunk == 0) || (input.eof?) > + end > + end > + yield chunk_file,rows_count > + end > + end > + else > + yield filename > + end > + end > + > def table_name > ETL::Engine.table(table, ETL::Engine.connection(target)) > end > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss at rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss >