From Graham.Seaman at rhul.ac.uk Mon Feb 8 07:39:07 2010 From: Graham.Seaman at rhul.ac.uk (Seaman, Graham) Date: Mon, 8 Feb 2010 12:39:07 -0000 Subject: [Umlaut-general] mysql, utf8, index limitation Message-ID: <570A1E8C2E229F4B96A70E010278604003B40F72@exch-db-05.cc.rhul.local> Hi, I'm just starting my first umlaut install. I have mysql 5.0.84, and have created a db for umlaut set to UTF8 as recommended. I'm using umlaut 2.10.0 as recommended on the wiki. The command Rake db:schema:load Fails to add the keywords index at line 108 in schema.rb: Add_index "keywords", ["term", "keyword_type"], :name => "kwd_term_idx" With the error "specified key was too long; max key length is 1000 bytes". The same error appears with other indices (referent_values, referents..) This is because both term and keyword_type are defined as varchar (255), and each utf8 character is 3 bytes, giving a total required of 1530 bytes. Writing directly in mysql it's possible to avoid this problem by specifying the number of characters to use for each key (say the first 166 from each of the two). But there doesn't seem to be any ActiveRecord syntax to express this. Presumably I'm not the first to hit this problem; what have others done? I'd rather have a scripted fix than manual corrections to mysql each time. Thanks Graham -------------- next part -------------- An HTML attachment was scrubbed... URL: From rochkind at jhu.edu Mon Feb 8 09:39:25 2010 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Mon, 8 Feb 2010 09:39:25 -0500 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: <570A1E8C2E229F4B96A70E010278604003B40F72@exch-db-05.cc.rhul.local> References: <570A1E8C2E229F4B96A70E010278604003B40F72@exch-db-05.cc.rhul.local> Message-ID: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52A@JHEMTEXVS2.win.ad.jhu.edu> Okay, I just followed the wiki instructions exactly, with the tagged umlaut 2.10.0 release. And the db:schema:load somehow worked fine for me. I've got MySQL 5.0.77. I wouldn't think your slightly newer MySQL would add problems, but maybe that's it? If I do a 'show index' on 'keywords', that index on term,keyword_type DOES seem to have been created. Or maybe I didn't really create/use my database in utf-8 even though I tried? I did create the db with a "default character set utf8' command. Wonder if my db/connection is somehow not really using utf8, so I didn't run into your issue? I'm definitely not a mysql expert. But oddly, I can run "rake db:schema:load" just fine on a newly created db with 'default character set utf8'. I'm not really sure what's going on, but if you figure it out and can help us fix the umlaut install procedure to be more reliable, that would definitely be awesome of you. But I can try to work on it more too, although it's hard when I can't reproduce it. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 7:39 AM To: umlaut-general at rubyforge.org Subject: [Umlaut-general] mysql, utf8, index limitation Hi, I?m just starting my first umlaut install. I have mysql 5.0.84, and have created a db for umlaut set to UTF8 as recommended. I?m using umlaut 2.10.0 as recommended on the wiki. The command Rake db:schema:load Fails to add the keywords index at line 108 in schema.rb: Add_index ?keywords?, [?term?, ?keyword_type?], :name => ?kwd_term_idx? With the error ?specified key was too long; max key length is 1000 bytes?. The same error appears with other indices (referent_values, referents..) This is because both term and keyword_type are defined as varchar (255), and each utf8 character is 3 bytes, giving a total required of 1530 bytes. Writing directly in mysql it?s possible to avoid this problem by specifying the number of characters to use for each key (say the first 166 from each of the two). But there doesn?t seem to be any ActiveRecord syntax to express this. Presumably I?m not the first to hit this problem; what have others done? I?d rather have a scripted fix than manual corrections to mysql each time. Thanks Graham From Graham.Seaman at rhul.ac.uk Mon Feb 8 09:49:11 2010 From: Graham.Seaman at rhul.ac.uk (Seaman, Graham) Date: Mon, 8 Feb 2010 14:49:11 -0000 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52A@JHEMTEXVS2.win.ad.jhu.edu> References: <570A1E8C2E229F4B96A70E010278604003B40F72@exch-db-05.cc.rhul.local> <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52A@JHEMTEXVS2.win.ad.jhu.edu> Message-ID: <570A1E8C2E229F4B96A70E010278604003B40F74@exch-db-05.cc.rhul.local> Sure, if it's not reproducible elsewhere I need to come up with a solution! Will report back when I have one. One quick question: I'm using MyISAM, are you on InnoDb? Thanks for taking the time Graham -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 14:39 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Okay, I just followed the wiki instructions exactly, with the tagged umlaut 2.10.0 release. And the db:schema:load somehow worked fine for me. I've got MySQL 5.0.77. I wouldn't think your slightly newer MySQL would add problems, but maybe that's it? If I do a 'show index' on 'keywords', that index on term,keyword_type DOES seem to have been created. Or maybe I didn't really create/use my database in utf-8 even though I tried? I did create the db with a "default character set utf8' command. Wonder if my db/connection is somehow not really using utf8, so I didn't run into your issue? I'm definitely not a mysql expert. But oddly, I can run "rake db:schema:load" just fine on a newly created db with 'default character set utf8'. I'm not really sure what's going on, but if you figure it out and can help us fix the umlaut install procedure to be more reliable, that would definitely be awesome of you. But I can try to work on it more too, although it's hard when I can't reproduce it. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 7:39 AM To: umlaut-general at rubyforge.org Subject: [Umlaut-general] mysql, utf8, index limitation Hi, I'm just starting my first umlaut install. I have mysql 5.0.84, and have created a db for umlaut set to UTF8 as recommended. I'm using umlaut 2.10.0 as recommended on the wiki. The command Rake db:schema:load Fails to add the keywords index at line 108 in schema.rb: Add_index "keywords", ["term", "keyword_type"], :name => "kwd_term_idx" With the error "specified key was too long; max key length is 1000 bytes". The same error appears with other indices (referent_values, referents..) This is because both term and keyword_type are defined as varchar (255), and each utf8 character is 3 bytes, giving a total required of 1530 bytes. Writing directly in mysql it's possible to avoid this problem by specifying the number of characters to use for each key (say the first 166 from each of the two). But there doesn't seem to be any ActiveRecord syntax to express this. Presumably I'm not the first to hit this problem; what have others done? I'd rather have a scripted fix than manual corrections to mysql each time. Thanks Graham _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general From rochkind at jhu.edu Mon Feb 8 09:00:51 2010 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Mon, 8 Feb 2010 09:00:51 -0500 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: <570A1E8C2E229F4B96A70E010278604003B40F72@exch-db-05.cc.rhul.local> References: <570A1E8C2E229F4B96A70E010278604003B40F72@exch-db-05.cc.rhul.local> Message-ID: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF529@JHEMTEXVS2.win.ad.jhu.edu> Hmm, I have not run into this. I can't remember what the "keywords" index even is; it's possible this is an index on a column that isn't even used anymore. But I guess you're saying you're getting this error with all indexes,not just that one? Some of those indexes are definitely important. I'm currently not at work due to snow craziness, but I'll try to take a look tomorrow when I'm back? Is 1000 bytes the maximum length for any index key in mysql? I wonder how I've managed to create those indexes myself. While a pain, you could create those indexes by hand with SQL instead of by loading the schema.rb. Comment out the add_index lines in schema rb and then run it, and then add the indexes yourself with SQL, doing what you seem to have already figured out how to do in order to limit them to a 1000 byte prefix. But yeah, it would be better to have a scripted solution, not sure why I haven't run into this problem before, I'll try to take a look when I'm back at work. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 7:39 AM To: umlaut-general at rubyforge.org Subject: [Umlaut-general] mysql, utf8, index limitation Hi, I?m just starting my first umlaut install. I have mysql 5.0.84, and have created a db for umlaut set to UTF8 as recommended. I?m using umlaut 2.10.0 as recommended on the wiki. The command Rake db:schema:load Fails to add the keywords index at line 108 in schema.rb: Add_index ?keywords?, [?term?, ?keyword_type?], :name => ?kwd_term_idx? With the error ?specified key was too long; max key length is 1000 bytes?. The same error appears with other indices (referent_values, referents..) This is because both term and keyword_type are defined as varchar (255), and each utf8 character is 3 bytes, giving a total required of 1530 bytes. Writing directly in mysql it?s possible to avoid this problem by specifying the number of characters to use for each key (say the first 166 from each of the two). But there doesn?t seem to be any ActiveRecord syntax to express this. Presumably I?m not the first to hit this problem; what have others done? I?d rather have a scripted fix than manual corrections to mysql each time. Thanks Graham From Graham.Seaman at rhul.ac.uk Mon Feb 8 10:10:45 2010 From: Graham.Seaman at rhul.ac.uk (Seaman, Graham) Date: Mon, 8 Feb 2010 15:10:45 -0000 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF529@JHEMTEXVS2.win.ad.jhu.edu> References: <570A1E8C2E229F4B96A70E010278604003B40F72@exch-db-05.cc.rhul.local> <90FF863A96E1EC42B8B240D04C88FB1D12F07AF529@JHEMTEXVS2.win.ad.jhu.edu> Message-ID: <570A1E8C2E229F4B96A70E010278604003B40F75@exch-db-05.cc.rhul.local> There is a history of this problem on http://bugs.mysql.com/bug.php?id=451 It is a very long thread, starting in 2004 and still continuing in December 2009. As I read it, it says that the patch to fix the problem is not recommended to be included in mainstream releases since it adversely affects performance. I wonder if maybe some unix distributions include the patch and others don't? I've been trying it on gentoo linux; I'm going to have a try on CentOs and see if that behaves any differently. Graham PS I'm not getting it on all indices - only on compound indices where the sum of the lengths of the fields being indexed > 1000/3. I guess I could fix this manually but would prefer to stick at it till I've found a proper fix. -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 14:01 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Hmm, I have not run into this. I can't remember what the "keywords" index even is; it's possible this is an index on a column that isn't even used anymore. But I guess you're saying you're getting this error with all indexes,not just that one? Some of those indexes are definitely important. I'm currently not at work due to snow craziness, but I'll try to take a look tomorrow when I'm back? Is 1000 bytes the maximum length for any index key in mysql? I wonder how I've managed to create those indexes myself. While a pain, you could create those indexes by hand with SQL instead of by loading the schema.rb. Comment out the add_index lines in schema rb and then run it, and then add the indexes yourself with SQL, doing what you seem to have already figured out how to do in order to limit them to a 1000 byte prefix. But yeah, it would be better to have a scripted solution, not sure why I haven't run into this problem before, I'll try to take a look when I'm back at work. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 7:39 AM To: umlaut-general at rubyforge.org Subject: [Umlaut-general] mysql, utf8, index limitation Hi, I'm just starting my first umlaut install. I have mysql 5.0.84, and have created a db for umlaut set to UTF8 as recommended. I'm using umlaut 2.10.0 as recommended on the wiki. The command Rake db:schema:load Fails to add the keywords index at line 108 in schema.rb: Add_index "keywords", ["term", "keyword_type"], :name => "kwd_term_idx" With the error "specified key was too long; max key length is 1000 bytes". The same error appears with other indices (referent_values, referents..) This is because both term and keyword_type are defined as varchar (255), and each utf8 character is 3 bytes, giving a total required of 1530 bytes. Writing directly in mysql it's possible to avoid this problem by specifying the number of characters to use for each key (say the first 166 from each of the two). But there doesn't seem to be any ActiveRecord syntax to express this. Presumably I'm not the first to hit this problem; what have others done? I'd rather have a scripted fix than manual corrections to mysql each time. Thanks Graham _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general From rochkind at jhu.edu Mon Feb 8 11:19:16 2010 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Mon, 8 Feb 2010 11:19:16 -0500 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: <570A1E8C2E229F4B96A70E010278604003B40F75@exch-db-05.cc.rhul.local> References: <570A1E8C2E229F4B96A70E010278604003B40F72@exch-db-05.cc.rhul.local> <90FF863A96E1EC42B8B240D04C88FB1D12F07AF529@JHEMTEXVS2.win.ad.jhu.edu>, <570A1E8C2E229F4B96A70E010278604003B40F75@exch-db-05.cc.rhul.local> Message-ID: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52B@JHEMTEXVS2.win.ad.jhu.edu> So we could start by listing those failed indexes; some of them might be 'legacy' data that isn't actually used anymore. We could simply remove those indexes (and the columns they are on). Others are going to be on data that is still important, and has indexes that are important. But we could try to alter these indexes to not go over the mysql limit, by one or more of: Perhaps some of them are compound indexes but don't really need to be compound, if we think about the use of that column. I'm no DBA, I think I made most of those indexes (ross who's also no DBA made the rest :) ), I may have made certain indexes compound indexes out of 'extra' caution, when really a single column index might suffice, depending on the nature of the data. Alternately, depending on the nature of the data, we could make the size of the indexed columns smaller, so they don't exceed the 1000 byte limit (you CAN easily set the size of a column in ActiveRecord schema.rb). Alternately, we can find a way to have the database creation scripts actually supply the index-truncation MySQL you mention. The index creation script doesn't have to be just 'schema.rb', I'm not even sure that is the preferred way to do things anymore (I think there's some rake db:boostrap task I havent' looked at, that didn't used to exist when I devised the original umlaut installation path). If it can't easily be done in schema.rb, there may be another method of easily allowing db bootstrapping that makes the MySQL index truncation decleration easier. So those are my ideas! You seem to have a pretty good idea of what's going on in the code, Graham. If you come up with a good idea for a fix, I'll happily take the patch, or just plain give you commit rights to the repo, no problem. Of course the 'depending on the nature of the data' questions towards what indexes really ought to be there how is something you might not know, not being familiar with the Umlaut code; I'm happy to answer questions or try to think about that if you identify questions. ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 10:10 AM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation There is a history of this problem on http://bugs.mysql.com/bug.php?id=451 It is a very long thread, starting in 2004 and still continuing in December 2009. As I read it, it says that the patch to fix the problem is not recommended to be included in mainstream releases since it adversely affects performance. I wonder if maybe some unix distributions include the patch and others don't? I've been trying it on gentoo linux; I'm going to have a try on CentOs and see if that behaves any differently. Graham PS I'm not getting it on all indices - only on compound indices where the sum of the lengths of the fields being indexed > 1000/3. I guess I could fix this manually but would prefer to stick at it till I've found a proper fix. -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 14:01 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Hmm, I have not run into this. I can't remember what the "keywords" index even is; it's possible this is an index on a column that isn't even used anymore. But I guess you're saying you're getting this error with all indexes,not just that one? Some of those indexes are definitely important. I'm currently not at work due to snow craziness, but I'll try to take a look tomorrow when I'm back? Is 1000 bytes the maximum length for any index key in mysql? I wonder how I've managed to create those indexes myself. While a pain, you could create those indexes by hand with SQL instead of by loading the schema.rb. Comment out the add_index lines in schema rb and then run it, and then add the indexes yourself with SQL, doing what you seem to have already figured out how to do in order to limit them to a 1000 byte prefix. But yeah, it would be better to have a scripted solution, not sure why I haven't run into this problem before, I'll try to take a look when I'm back at work. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 7:39 AM To: umlaut-general at rubyforge.org Subject: [Umlaut-general] mysql, utf8, index limitation Hi, I'm just starting my first umlaut install. I have mysql 5.0.84, and have created a db for umlaut set to UTF8 as recommended. I'm using umlaut 2.10.0 as recommended on the wiki. The command Rake db:schema:load Fails to add the keywords index at line 108 in schema.rb: Add_index "keywords", ["term", "keyword_type"], :name => "kwd_term_idx" With the error "specified key was too long; max key length is 1000 bytes". The same error appears with other indices (referent_values, referents..) This is because both term and keyword_type are defined as varchar (255), and each utf8 character is 3 bytes, giving a total required of 1530 bytes. Writing directly in mysql it's possible to avoid this problem by specifying the number of characters to use for each key (say the first 166 from each of the two). But there doesn't seem to be any ActiveRecord syntax to express this. Presumably I'm not the first to hit this problem; what have others done? I'd rather have a scripted fix than manual corrections to mysql each time. Thanks Graham _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general From Graham.Seaman at rhul.ac.uk Mon Feb 8 12:27:21 2010 From: Graham.Seaman at rhul.ac.uk (Seaman, Graham) Date: Mon, 8 Feb 2010 17:27:21 -0000 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52B@JHEMTEXVS2.win.ad.jhu.edu> References: <570A1E8C2E229F4B96A70E010278604003B40F72@exch-db-05.cc.rhul.local><90FF863A96E1EC42B8B240D04C88FB1D12F07AF529@JHEMTEXVS2.win.ad.jhu.edu>, <570A1E8C2E229F4B96A70E010278604003B40F75@exch-db-05.cc.rhul.local> <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52B@JHEMTEXVS2.win.ad.jhu.edu> Message-ID: <570A1E8C2E229F4B96A70E010278604003B40F77@exch-db-05.cc.rhul.local> It works fine on centos with mysql 5.0.77 (I've added notes on a couple of very minor gotchas to the wiki). So as far as I know it's only the gentoo version of mysql giving problems. My personal motivation to do more about this has just vanished, I'm afraid: I think the fix needs to be to the gentoo mysql, not to umlaut. I'm going to carry on with the install on centos, see how far I get with that. Graham -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 16:19 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation So we could start by listing those failed indexes; some of them might be 'legacy' data that isn't actually used anymore. We could simply remove those indexes (and the columns they are on). Others are going to be on data that is still important, and has indexes that are important. But we could try to alter these indexes to not go over the mysql limit, by one or more of: Perhaps some of them are compound indexes but don't really need to be compound, if we think about the use of that column. I'm no DBA, I think I made most of those indexes (ross who's also no DBA made the rest :) ), I may have made certain indexes compound indexes out of 'extra' caution, when really a single column index might suffice, depending on the nature of the data. Alternately, depending on the nature of the data, we could make the size of the indexed columns smaller, so they don't exceed the 1000 byte limit (you CAN easily set the size of a column in ActiveRecord schema.rb). Alternately, we can find a way to have the database creation scripts actually supply the index-truncation MySQL you mention. The index creation script doesn't have to be just 'schema.rb', I'm not even sure that is the preferred way to do things anymore (I think there's some rake db:boostrap task I havent' looked at, that didn't used to exist when I devised the original umlaut installation path). If it can't easily be done in schema.rb, there may be another method of easily allowing db bootstrapping that makes the MySQL index truncation decleration easier. So those are my ideas! You seem to have a pretty good idea of what's going on in the code, Graham. If you come up with a good idea for a fix, I'll happily take the patch, or just plain give you commit rights to the repo, no problem. Of course the 'depending on the nature of the data' questions towards what indexes really ought to be there how is something you might not know, not being familiar with the Umlaut code; I'm happy to answer questions or try to think about that if you identify questions. ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 10:10 AM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation There is a history of this problem on http://bugs.mysql.com/bug.php?id=451 It is a very long thread, starting in 2004 and still continuing in December 2009. As I read it, it says that the patch to fix the problem is not recommended to be included in mainstream releases since it adversely affects performance. I wonder if maybe some unix distributions include the patch and others don't? I've been trying it on gentoo linux; I'm going to have a try on CentOs and see if that behaves any differently. Graham PS I'm not getting it on all indices - only on compound indices where the sum of the lengths of the fields being indexed > 1000/3. I guess I could fix this manually but would prefer to stick at it till I've found a proper fix. -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 14:01 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Hmm, I have not run into this. I can't remember what the "keywords" index even is; it's possible this is an index on a column that isn't even used anymore. But I guess you're saying you're getting this error with all indexes,not just that one? Some of those indexes are definitely important. I'm currently not at work due to snow craziness, but I'll try to take a look tomorrow when I'm back? Is 1000 bytes the maximum length for any index key in mysql? I wonder how I've managed to create those indexes myself. While a pain, you could create those indexes by hand with SQL instead of by loading the schema.rb. Comment out the add_index lines in schema rb and then run it, and then add the indexes yourself with SQL, doing what you seem to have already figured out how to do in order to limit them to a 1000 byte prefix. But yeah, it would be better to have a scripted solution, not sure why I haven't run into this problem before, I'll try to take a look when I'm back at work. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 7:39 AM To: umlaut-general at rubyforge.org Subject: [Umlaut-general] mysql, utf8, index limitation Hi, I'm just starting my first umlaut install. I have mysql 5.0.84, and have created a db for umlaut set to UTF8 as recommended. I'm using umlaut 2.10.0 as recommended on the wiki. The command Rake db:schema:load Fails to add the keywords index at line 108 in schema.rb: Add_index "keywords", ["term", "keyword_type"], :name => "kwd_term_idx" With the error "specified key was too long; max key length is 1000 bytes". The same error appears with other indices (referent_values, referents..) This is because both term and keyword_type are defined as varchar (255), and each utf8 character is 3 bytes, giving a total required of 1530 bytes. Writing directly in mysql it's possible to avoid this problem by specifying the number of characters to use for each key (say the first 166 from each of the two). But there doesn't seem to be any ActiveRecord syntax to express this. Presumably I'm not the first to hit this problem; what have others done? I'd rather have a scripted fix than manual corrections to mysql each time. Thanks Graham _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general From rochkind at jhu.edu Mon Feb 8 13:00:52 2010 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Mon, 8 Feb 2010 13:00:52 -0500 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: <570A1E8C2E229F4B96A70E010278604003B40F77@exch-db-05.cc.rhul.local> References: <570A1E8C2E229F4B96A70E010278604003B40F72@exch-db-05.cc.rhul.local> <90FF863A96E1EC42B8B240D04C88FB1D12F07AF529@JHEMTEXVS2.win.ad.jhu.edu> <570A1E8C2E229F4B96A70E010278604003B40F75@exch-db-05.cc.rhul.local> <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52B@JHEMTEXVS2.win.ad.jhu.edu>, <570A1E8C2E229F4B96A70E010278604003B40F77@exch-db-05.cc.rhul.local> Message-ID: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52C@JHEMTEXVS2.win.ad.jhu.edu> Sweet, that's good enough, I'm just happy you're making progress. For reference, I'm using a RHEL/yum mysql package here. Very happy to help as I can with any other installation or customization hurdles you run into. I tried to make the installation process smooth and documented, but it could be better. Right now I think it's two or maybe three libraries that are actually using Umlaut in production. But I think it's relatively mature and stable software that is suitable for getting up without TOO much hacking. I'd love to see more libraries using it. (You will probably need to write an adapter for your ILS/OPAC, unless you use Horizon or Aleph.) Can you share any more about the nature of your curiosity in it, what you are considering doing with it? I'll also say that longer-term I'd like to find time to make an Umlaut 3.0 that is a bit more cleaned up; Right now, it will only run on Rails 2.1, not 2.2. (Need to work on some threading issues to get it up to 2.2). I'd also like to verify it running under Passenger, instead of the ancient mongrel that nobody uses anymore unless they have to (_should_ work fine, but might be some threading issues). And I'd like to make all the currently embedded javascript use an 'unobtrusive' technique, and provide both Prototype and JQuery versions of the js functionality scripts, so you can use either one if you want to embed Umlaut content in an external application. If only I had the time to do that, but some day. I do think it's stable and mature as it is, just some annoyances as above. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 12:27 PM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation It works fine on centos with mysql 5.0.77 (I've added notes on a couple of very minor gotchas to the wiki). So as far as I know it's only the gentoo version of mysql giving problems. My personal motivation to do more about this has just vanished, I'm afraid: I think the fix needs to be to the gentoo mysql, not to umlaut. I'm going to carry on with the install on centos, see how far I get with that. Graham -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 16:19 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation So we could start by listing those failed indexes; some of them might be 'legacy' data that isn't actually used anymore. We could simply remove those indexes (and the columns they are on). Others are going to be on data that is still important, and has indexes that are important. But we could try to alter these indexes to not go over the mysql limit, by one or more of: Perhaps some of them are compound indexes but don't really need to be compound, if we think about the use of that column. I'm no DBA, I think I made most of those indexes (ross who's also no DBA made the rest :) ), I may have made certain indexes compound indexes out of 'extra' caution, when really a single column index might suffice, depending on the nature of the data. Alternately, depending on the nature of the data, we could make the size of the indexed columns smaller, so they don't exceed the 1000 byte limit (you CAN easily set the size of a column in ActiveRecord schema.rb). Alternately, we can find a way to have the database creation scripts actually supply the index-truncation MySQL you mention. The index creation script doesn't have to be just 'schema.rb', I'm not even sure that is the preferred way to do things anymore (I think there's some rake db:boostrap task I havent' looked at, that didn't used to exist when I devised the original umlaut installation path). If it can't easily be done in schema.rb, there may be another method of easily allowing db bootstrapping that makes the MySQL index truncation decleration easier. So those are my ideas! You seem to have a pretty good idea of what's going on in the code, Graham. If you come up with a good idea for a fix, I'll happily take the patch, or just plain give you commit rights to the repo, no problem. Of course the 'depending on the nature of the data' questions towards what indexes really ought to be there how is something you might not know, not being familiar with the Umlaut code; I'm happy to answer questions or try to think about that if you identify questions. ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 10:10 AM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation There is a history of this problem on http://bugs.mysql.com/bug.php?id=451 It is a very long thread, starting in 2004 and still continuing in December 2009. As I read it, it says that the patch to fix the problem is not recommended to be included in mainstream releases since it adversely affects performance. I wonder if maybe some unix distributions include the patch and others don't? I've been trying it on gentoo linux; I'm going to have a try on CentOs and see if that behaves any differently. Graham PS I'm not getting it on all indices - only on compound indices where the sum of the lengths of the fields being indexed > 1000/3. I guess I could fix this manually but would prefer to stick at it till I've found a proper fix. -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 14:01 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Hmm, I have not run into this. I can't remember what the "keywords" index even is; it's possible this is an index on a column that isn't even used anymore. But I guess you're saying you're getting this error with all indexes,not just that one? Some of those indexes are definitely important. I'm currently not at work due to snow craziness, but I'll try to take a look tomorrow when I'm back? Is 1000 bytes the maximum length for any index key in mysql? I wonder how I've managed to create those indexes myself. While a pain, you could create those indexes by hand with SQL instead of by loading the schema.rb. Comment out the add_index lines in schema rb and then run it, and then add the indexes yourself with SQL, doing what you seem to have already figured out how to do in order to limit them to a 1000 byte prefix. But yeah, it would be better to have a scripted solution, not sure why I haven't run into this problem before, I'll try to take a look when I'm back at work. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 7:39 AM To: umlaut-general at rubyforge.org Subject: [Umlaut-general] mysql, utf8, index limitation Hi, I'm just starting my first umlaut install. I have mysql 5.0.84, and have created a db for umlaut set to UTF8 as recommended. I'm using umlaut 2.10.0 as recommended on the wiki. The command Rake db:schema:load Fails to add the keywords index at line 108 in schema.rb: Add_index "keywords", ["term", "keyword_type"], :name => "kwd_term_idx" With the error "specified key was too long; max key length is 1000 bytes". The same error appears with other indices (referent_values, referents..) This is because both term and keyword_type are defined as varchar (255), and each utf8 character is 3 bytes, giving a total required of 1530 bytes. Writing directly in mysql it's possible to avoid this problem by specifying the number of characters to use for each key (say the first 166 from each of the two). But there doesn't seem to be any ActiveRecord syntax to express this. Presumably I'm not the first to hit this problem; what have others done? I'd rather have a scripted fix than manual corrections to mysql each time. Thanks Graham _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general From Graham.Seaman at rhul.ac.uk Tue Feb 9 04:34:01 2010 From: Graham.Seaman at rhul.ac.uk (Seaman, Graham) Date: Tue, 9 Feb 2010 09:34:01 -0000 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52C@JHEMTEXVS2.win.ad.jhu.edu> References: <570A1E8C2E229F4B96A70E010278604003B40F77@exch-db-05.cc.rhul.local> <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52C@JHEMTEXVS2.win.ad.jhu.edu> Message-ID: <570A1E8C2E229F4B96A70E010278604003B40F78@exch-db-05.cc.rhul.local> Hi Jonathan This is mainly investigative at the moment. Our library is in a process of transition, which makes it hard to predict exactly what we'll be using in a years time. And we are very low on technical staff, which means long-term ease of maintenance is also an issue. At the moment we're running aleph, sfx, metalib+xerxes. But we're just about to start using Summon. The plan at the moment is for metalib+xerxes to stay around for a year or so at least. So one aspect of umlaut I want to look at is ease of integration with either/both of xerxes and summon. Another is maintainability: having had a little experience of supporting mongrel/rails I'm not wild about the idea of either. I will certainly look at trying it with passenger instead of mongrel. Generally though the main point at the moment is just to get an idea of what umlaut can do: it's one thing looking at docs, and another altogether trying something out! Anyway, thanks for creating umlaut. And thanks in advance for all the queries I just know I'm going to be sending your way ;-) Graham -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 18:01 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Sweet, that's good enough, I'm just happy you're making progress. For reference, I'm using a RHEL/yum mysql package here. Very happy to help as I can with any other installation or customization hurdles you run into. I tried to make the installation process smooth and documented, but it could be better. Right now I think it's two or maybe three libraries that are actually using Umlaut in production. But I think it's relatively mature and stable software that is suitable for getting up without TOO much hacking. I'd love to see more libraries using it. (You will probably need to write an adapter for your ILS/OPAC, unless you use Horizon or Aleph.) Can you share any more about the nature of your curiosity in it, what you are considering doing with it? I'll also say that longer-term I'd like to find time to make an Umlaut 3.0 that is a bit more cleaned up; Right now, it will only run on Rails 2.1, not 2.2. (Need to work on some threading issues to get it up to 2.2). I'd also like to verify it running under Passenger, instead of the ancient mongrel that nobody uses anymore unless they have to (_should_ work fine, but might be some threading issues). And I'd like to make all the currently embedded javascript use an 'unobtrusive' technique, and provide both Prototype and JQuery versions of the js functionality scripts, so you can use either one if you want to embed Umlaut content in an external application. If only I had the time to do that, but some day. I do think it's stable and mature as it is, just some annoyances as above. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 12:27 PM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation It works fine on centos with mysql 5.0.77 (I've added notes on a couple of very minor gotchas to the wiki). So as far as I know it's only the gentoo version of mysql giving problems. My personal motivation to do more about this has just vanished, I'm afraid: I think the fix needs to be to the gentoo mysql, not to umlaut. I'm going to carry on with the install on centos, see how far I get with that. Graham -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 16:19 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation So we could start by listing those failed indexes; some of them might be 'legacy' data that isn't actually used anymore. We could simply remove those indexes (and the columns they are on). Others are going to be on data that is still important, and has indexes that are important. But we could try to alter these indexes to not go over the mysql limit, by one or more of: Perhaps some of them are compound indexes but don't really need to be compound, if we think about the use of that column. I'm no DBA, I think I made most of those indexes (ross who's also no DBA made the rest :) ), I may have made certain indexes compound indexes out of 'extra' caution, when really a single column index might suffice, depending on the nature of the data. Alternately, depending on the nature of the data, we could make the size of the indexed columns smaller, so they don't exceed the 1000 byte limit (you CAN easily set the size of a column in ActiveRecord schema.rb). Alternately, we can find a way to have the database creation scripts actually supply the index-truncation MySQL you mention. The index creation script doesn't have to be just 'schema.rb', I'm not even sure that is the preferred way to do things anymore (I think there's some rake db:boostrap task I havent' looked at, that didn't used to exist when I devised the original umlaut installation path). If it can't easily be done in schema.rb, there may be another method of easily allowing db bootstrapping that makes the MySQL index truncation decleration easier. So those are my ideas! You seem to have a pretty good idea of what's going on in the code, Graham. If you come up with a good idea for a fix, I'll happily take the patch, or just plain give you commit rights to the repo, no problem. Of course the 'depending on the nature of the data' questions towards what indexes really ought to be there how is something you might not know, not being familiar with the Umlaut code; I'm happy to answer questions or try to think about that if you identify questions. ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 10:10 AM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation There is a history of this problem on http://bugs.mysql.com/bug.php?id=451 It is a very long thread, starting in 2004 and still continuing in December 2009. As I read it, it says that the patch to fix the problem is not recommended to be included in mainstream releases since it adversely affects performance. I wonder if maybe some unix distributions include the patch and others don't? I've been trying it on gentoo linux; I'm going to have a try on CentOs and see if that behaves any differently. Graham PS I'm not getting it on all indices - only on compound indices where the sum of the lengths of the fields being indexed > 1000/3. I guess I could fix this manually but would prefer to stick at it till I've found a proper fix. -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 14:01 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Hmm, I have not run into this. I can't remember what the "keywords" index even is; it's possible this is an index on a column that isn't even used anymore. But I guess you're saying you're getting this error with all indexes,not just that one? Some of those indexes are definitely important. I'm currently not at work due to snow craziness, but I'll try to take a look tomorrow when I'm back? Is 1000 bytes the maximum length for any index key in mysql? I wonder how I've managed to create those indexes myself. While a pain, you could create those indexes by hand with SQL instead of by loading the schema.rb. Comment out the add_index lines in schema rb and then run it, and then add the indexes yourself with SQL, doing what you seem to have already figured out how to do in order to limit them to a 1000 byte prefix. But yeah, it would be better to have a scripted solution, not sure why I haven't run into this problem before, I'll try to take a look when I'm back at work. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 7:39 AM To: umlaut-general at rubyforge.org Subject: [Umlaut-general] mysql, utf8, index limitation Hi, I'm just starting my first umlaut install. I have mysql 5.0.84, and have created a db for umlaut set to UTF8 as recommended. I'm using umlaut 2.10.0 as recommended on the wiki. The command Rake db:schema:load Fails to add the keywords index at line 108 in schema.rb: Add_index "keywords", ["term", "keyword_type"], :name => "kwd_term_idx" With the error "specified key was too long; max key length is 1000 bytes". The same error appears with other indices (referent_values, referents..) This is because both term and keyword_type are defined as varchar (255), and each utf8 character is 3 bytes, giving a total required of 1530 bytes. Writing directly in mysql it's possible to avoid this problem by specifying the number of characters to use for each key (say the first 166 from each of the two). But there doesn't seem to be any ActiveRecord syntax to express this. Presumably I'm not the first to hit this problem; what have others done? I'd rather have a scripted fix than manual corrections to mysql each time. Thanks Graham _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general From rochkind at jhu.edu Tue Feb 9 11:35:16 2010 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Tue, 9 Feb 2010 11:35:16 -0500 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: <570A1E8C2E229F4B96A70E010278604003B40F78@exch-db-05.cc.rhul.local> References: <570A1E8C2E229F4B96A70E010278604003B40F77@exch-db-05.cc.rhul.local> <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52C@JHEMTEXVS2.win.ad.jhu.edu>, <570A1E8C2E229F4B96A70E010278604003B40F78@exch-db-05.cc.rhul.local> Message-ID: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52E@JHEMTEXVS2.win.ad.jhu.edu> Makes sense. It should be very easy to 'integrate' Umlaut with Summon -- and I think it will in fact make a lot of sense to do so, as my understanding with Summon is that your link resolver pretty much IS the item detail page, so you'll want a nice one, with Umlaut can provide. All you should have to do is point Summon at Umlaut as the link resolver, there shouldn't be any development neccesary. (So 'integrate' might not be quite the right word, but Summon is, as I understand it, set up to use an openurl link resolver of your choice, and Umlaut (backed by your SFX) should just work). Same for Xerxes, except as an added bonus Xerxes now has a feature where Umlaut-provided services (full text link, library holdings information, etc) can be placed directly on a Xerxes record detail page via ajax. Feature's already there, you just tell Xerxes where your Umlaut's at, it just works (or ought to). I know NYU uses Aleph, but they also use Primo. I'm not sure if Scot at NYU has written an aleph adapter for Umlaut, or if his adapter talks to Primo instead. So writing an adapter for Aleph, so library holdings can be displayed directly on the Umlaut page, might be the one piece of development you might have to do. It shouldn't be too hard to do, and I can maybe help if necessary. So, while I'm biased of course, I think you're right that Umlaut could be quite useful in your setup. Definitely feel free to send any questions or issues you have to the list. I don't think it should really be that hard to support Umlaut running under mongrel, but it might work with Passenger too. I don't think you need to be scared of mongrel though. We'll see! Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Tuesday, February 09, 2010 4:34 AM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Hi Jonathan This is mainly investigative at the moment. Our library is in a process of transition, which makes it hard to predict exactly what we'll be using in a years time. And we are very low on technical staff, which means long-term ease of maintenance is also an issue. At the moment we're running aleph, sfx, metalib+xerxes. But we're just about to start using Summon. The plan at the moment is for metalib+xerxes to stay around for a year or so at least. So one aspect of umlaut I want to look at is ease of integration with either/both of xerxes and summon. Another is maintainability: having had a little experience of supporting mongrel/rails I'm not wild about the idea of either. I will certainly look at trying it with passenger instead of mongrel. Generally though the main point at the moment is just to get an idea of what umlaut can do: it's one thing looking at docs, and another altogether trying something out! Anyway, thanks for creating umlaut. And thanks in advance for all the queries I just know I'm going to be sending your way ;-) Graham -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 18:01 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Sweet, that's good enough, I'm just happy you're making progress. For reference, I'm using a RHEL/yum mysql package here. Very happy to help as I can with any other installation or customization hurdles you run into. I tried to make the installation process smooth and documented, but it could be better. Right now I think it's two or maybe three libraries that are actually using Umlaut in production. But I think it's relatively mature and stable software that is suitable for getting up without TOO much hacking. I'd love to see more libraries using it. (You will probably need to write an adapter for your ILS/OPAC, unless you use Horizon or Aleph.) Can you share any more about the nature of your curiosity in it, what you are considering doing with it? I'll also say that longer-term I'd like to find time to make an Umlaut 3.0 that is a bit more cleaned up; Right now, it will only run on Rails 2.1, not 2.2. (Need to work on some threading issues to get it up to 2.2). I'd also like to verify it running under Passenger, instead of the ancient mongrel that nobody uses anymore unless they have to (_should_ work fine, but might be some threading issues). And I'd like to make all the currently embedded javascript use an 'unobtrusive' technique, and provide both Prototype and JQuery versions of the js functionality scripts, so you can use either one if you want to embed Umlaut content in an external application. If only I had the time to do that, but some day. I do think it's stable and mature as it is, just some annoyances as above. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 12:27 PM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation It works fine on centos with mysql 5.0.77 (I've added notes on a couple of very minor gotchas to the wiki). So as far as I know it's only the gentoo version of mysql giving problems. My personal motivation to do more about this has just vanished, I'm afraid: I think the fix needs to be to the gentoo mysql, not to umlaut. I'm going to carry on with the install on centos, see how far I get with that. Graham -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 16:19 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation So we could start by listing those failed indexes; some of them might be 'legacy' data that isn't actually used anymore. We could simply remove those indexes (and the columns they are on). Others are going to be on data that is still important, and has indexes that are important. But we could try to alter these indexes to not go over the mysql limit, by one or more of: Perhaps some of them are compound indexes but don't really need to be compound, if we think about the use of that column. I'm no DBA, I think I made most of those indexes (ross who's also no DBA made the rest :) ), I may have made certain indexes compound indexes out of 'extra' caution, when really a single column index might suffice, depending on the nature of the data. Alternately, depending on the nature of the data, we could make the size of the indexed columns smaller, so they don't exceed the 1000 byte limit (you CAN easily set the size of a column in ActiveRecord schema.rb). Alternately, we can find a way to have the database creation scripts actually supply the index-truncation MySQL you mention. The index creation script doesn't have to be just 'schema.rb', I'm not even sure that is the preferred way to do things anymore (I think there's some rake db:boostrap task I havent' looked at, that didn't used to exist when I devised the original umlaut installation path). If it can't easily be done in schema.rb, there may be another method of easily allowing db bootstrapping that makes the MySQL index truncation decleration easier. So those are my ideas! You seem to have a pretty good idea of what's going on in the code, Graham. If you come up with a good idea for a fix, I'll happily take the patch, or just plain give you commit rights to the repo, no problem. Of course the 'depending on the nature of the data' questions towards what indexes really ought to be there how is something you might not know, not being familiar with the Umlaut code; I'm happy to answer questions or try to think about that if you identify questions. ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 10:10 AM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation There is a history of this problem on http://bugs.mysql.com/bug.php?id=451 It is a very long thread, starting in 2004 and still continuing in December 2009. As I read it, it says that the patch to fix the problem is not recommended to be included in mainstream releases since it adversely affects performance. I wonder if maybe some unix distributions include the patch and others don't? I've been trying it on gentoo linux; I'm going to have a try on CentOs and see if that behaves any differently. Graham PS I'm not getting it on all indices - only on compound indices where the sum of the lengths of the fields being indexed > 1000/3. I guess I could fix this manually but would prefer to stick at it till I've found a proper fix. -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 14:01 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Hmm, I have not run into this. I can't remember what the "keywords" index even is; it's possible this is an index on a column that isn't even used anymore. But I guess you're saying you're getting this error with all indexes,not just that one? Some of those indexes are definitely important. I'm currently not at work due to snow craziness, but I'll try to take a look tomorrow when I'm back? Is 1000 bytes the maximum length for any index key in mysql? I wonder how I've managed to create those indexes myself. While a pain, you could create those indexes by hand with SQL instead of by loading the schema.rb. Comment out the add_index lines in schema rb and then run it, and then add the indexes yourself with SQL, doing what you seem to have already figured out how to do in order to limit them to a 1000 byte prefix. But yeah, it would be better to have a scripted solution, not sure why I haven't run into this problem before, I'll try to take a look when I'm back at work. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 7:39 AM To: umlaut-general at rubyforge.org Subject: [Umlaut-general] mysql, utf8, index limitation Hi, I'm just starting my first umlaut install. I have mysql 5.0.84, and have created a db for umlaut set to UTF8 as recommended. I'm using umlaut 2.10.0 as recommended on the wiki. The command Rake db:schema:load Fails to add the keywords index at line 108 in schema.rb: Add_index "keywords", ["term", "keyword_type"], :name => "kwd_term_idx" With the error "specified key was too long; max key length is 1000 bytes". The same error appears with other indices (referent_values, referents..) This is because both term and keyword_type are defined as varchar (255), and each utf8 character is 3 bytes, giving a total required of 1530 bytes. Writing directly in mysql it's possible to avoid this problem by specifying the number of characters to use for each key (say the first 166 from each of the two). But there doesn't seem to be any ActiveRecord syntax to express this. Presumably I'm not the first to hit this problem; what have others done? I'd rather have a scripted fix than manual corrections to mysql each time. Thanks Graham _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general From muzzye at mskcc.org Tue Feb 9 13:10:18 2010 From: muzzye at mskcc.org (muzzye at mskcc.org) Date: Tue, 9 Feb 2010 13:10:18 -0500 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52E@JHEMTEXVS2.win.ad.jhu.edu> Message-ID: I don't have any experience with umlaut or Summon, though we are looking at Serials Solution very soon. But I'm curious, just viewing the Summon literature, it seems to me as though it's offering the same services that umlaut does. For our purposes, it's local content that we can't readily make available in SFX, and the Summon literature suggests it can incorporate this. With Summon in place, what more does umlaut bring users. Jonathan's email seems to suggest that libraries have both the SFX KB and Serials Solution. Perhaps they do, and that would explain umlaut's value. For us, we're considering Serial Solutions A-Z list, I think it's called 360. Eric On 2/9/10 11:35 AM, "Jonathan Rochkind" wrote: Makes sense. It should be very easy to 'integrate' Umlaut with Summon -- and I think it will in fact make a lot of sense to do so, as my understanding with Summon is that your link resolver pretty much IS the item detail page, so you'll want a nice one, with Umlaut can provide. All you should have to do is point Summon at Umlaut as the link resolver, there shouldn't be any development neccesary. (So 'integrate' might not be quite the right word, but Summon is, as I understand it, set up to use an openurl link resolver of your choice, and Umlaut (backed by your SFX) should just work). Same for Xerxes, except as an added bonus Xerxes now has a feature where Umlaut-provided services (full text link, library holdings information, etc) can be placed directly on a Xerxes record detail page via ajax. Feature's already there, you just tell Xerxes where your Umlaut's at, it just works (or ought to). I know NYU uses Aleph, but they also use Primo. I'm not sure if Scot at NYU has written an aleph adapter for Umlaut, or if his adapter talks to Primo instead. So writing an adapter for Aleph, so library holdings can be displayed directly on the Umlaut page, might be the one piece of development you might have to do. It shouldn't be too hard to do, and I can maybe help if necessary. So, while I'm biased of course, I think you're right that Umlaut could be quite useful in your setup. Definitely feel free to send any questions or issues you have to the list. I don't think it should really be that hard to support Umlaut running under mongrel, but it might work with Passenger too. I don't think you need to be scared of mongrel though. We'll see! Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Tuesday, February 09, 2010 4:34 AM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Hi Jonathan This is mainly investigative at the moment. Our library is in a process of transition, which makes it hard to predict exactly what we'll be using in a years time. And we are very low on technical staff, which means long-term ease of maintenance is also an issue. At the moment we're running aleph, sfx, metalib+xerxes. But we're just about to start using Summon. The plan at the moment is for metalib+xerxes to stay around for a year or so at least. So one aspect of umlaut I want to look at is ease of integration with either/both of xerxes and summon. Another is maintainability: having had a little experience of supporting mongrel/rails I'm not wild about the idea of either. I will certainly look at trying it with passenger instead of mongrel. Generally though the main point at the moment is just to get an idea of what umlaut can do: it's one thing looking at docs, and another altogether trying something out! Anyway, thanks for creating umlaut. And thanks in advance for all the queries I just know I'm going to be sending your way ;-) Graham -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 18:01 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Sweet, that's good enough, I'm just happy you're making progress. For reference, I'm using a RHEL/yum mysql package here. Very happy to help as I can with any other installation or customization hurdles you run into. I tried to make the installation process smooth and documented, but it could be better. Right now I think it's two or maybe three libraries that are actually using Umlaut in production. But I think it's relatively mature and stable software that is suitable for getting up without TOO much hacking. I'd love to see more libraries using it. (You will probably need to write an adapter for your ILS/OPAC, unless you use Horizon or Aleph.) Can you share any more about the nature of your curiosity in it, what you are considering doing with it? I'll also say that longer-term I'd like to find time to make an Umlaut 3.0 that is a bit more cleaned up; Right now, it will only run on Rails 2.1, not 2.2. (Need to work on some threading issues to get it up to 2.2). I'd also like to verify it running under Passenger, instead of the ancient mongrel that nobody uses anymore unless they have to (_should_ work fine, but might be some threading issues). And I'd like to make all the currently embedded javascript use an 'unobtrusive' technique, and provide both Prototype and JQuery versions of the js functionality scripts, so you can use either one if you want to embed Umlaut content in an external application. If only I had the time to do that, but some day. I do think it's stable and mature as it is, just some annoyances as above. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 12:27 PM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation It works fine on centos with mysql 5.0.77 (I've added notes on a couple of very minor gotchas to the wiki). So as far as I know it's only the gentoo version of mysql giving problems. My personal motivation to do more about this has just vanished, I'm afraid: I think the fix needs to be to the gentoo mysql, not to umlaut. I'm going to carry on with the install on centos, see how far I get with that. Graham -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 16:19 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation So we could start by listing those failed indexes; some of them might be 'legacy' data that isn't actually used anymore. We could simply remove those indexes (and the columns they are on). Others are going to be on data that is still important, and has indexes that are important. But we could try to alter these indexes to not go over the mysql limit, by one or more of: Perhaps some of them are compound indexes but don't really need to be compound, if we think about the use of that column. I'm no DBA, I think I made most of those indexes (ross who's also no DBA made the rest :) ), I may have made certain indexes compound indexes out of 'extra' caution, when really a single column index might suffice, depending on the nature of the data. Alternately, depending on the nature of the data, we could make the size of the indexed columns smaller, so they don't exceed the 1000 byte limit (you CAN easily set the size of a column in ActiveRecord schema.rb). Alternately, we can find a way to have the database creation scripts actually supply the index-truncation MySQL you mention. The index creation script doesn't have to be just 'schema.rb', I'm not even sure that is the preferred way to do things anymore (I think there's some rake db:boostrap task I havent' looked at, that didn't used to exist when I devised the original umlaut installation path). If it can't easily be done in schema.rb, there may be another method of easily allowing db bootstrapping that makes the MySQL index truncation decleration easier. So those are my ideas! You seem to have a pretty good idea of what's going on in the code, Graham. If you come up with a good idea for a fix, I'll happily take the patch, or just plain give you commit rights to the repo, no problem. Of course the 'depending on the nature of the data' questions towards what indexes really ought to be there how is something you might not know, not being familiar with the Umlaut code; I'm happy to answer questions or try to think about that if you identify questions. ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 10:10 AM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation There is a history of this problem on http://bugs.mysql.com/bug.php?id=451 It is a very long thread, starting in 2004 and still continuing in December 2009. As I read it, it says that the patch to fix the problem is not recommended to be included in mainstream releases since it adversely affects performance. I wonder if maybe some unix distributions include the patch and others don't? I've been trying it on gentoo linux; I'm going to have a try on CentOs and see if that behaves any differently. Graham PS I'm not getting it on all indices - only on compound indices where the sum of the lengths of the fields being indexed > 1000/3. I guess I could fix this manually but would prefer to stick at it till I've found a proper fix. -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 14:01 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Hmm, I have not run into this. I can't remember what the "keywords" index even is; it's possible this is an index on a column that isn't even used anymore. But I guess you're saying you're getting this error with all indexes,not just that one? Some of those indexes are definitely important. I'm currently not at work due to snow craziness, but I'll try to take a look tomorrow when I'm back? Is 1000 bytes the maximum length for any index key in mysql? I wonder how I've managed to create those indexes myself. While a pain, you could create those indexes by hand with SQL instead of by loading the schema.rb. Comment out the add_index lines in schema rb and then run it, and then add the indexes yourself with SQL, doing what you seem to have already figured out how to do in order to limit them to a 1000 byte prefix. But yeah, it would be better to have a scripted solution, not sure why I haven't run into this problem before, I'll try to take a look when I'm back at work. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 7:39 AM To: umlaut-general at rubyforge.org Subject: [Umlaut-general] mysql, utf8, index limitation Hi, I'm just starting my first umlaut install. I have mysql 5.0.84, and have created a db for umlaut set to UTF8 as recommended. I'm using umlaut 2.10.0 as recommended on the wiki. The command Rake db:schema:load Fails to add the keywords index at line 108 in schema.rb: Add_index "keywords", ["term", "keyword_type"], :name => "kwd_term_idx" With the error "specified key was too long; max key length is 1000 bytes". The same error appears with other indices (referent_values, referents..) This is because both term and keyword_type are defined as varchar (255), and each utf8 character is 3 bytes, giving a total required of 1530 bytes. Writing directly in mysql it's possible to avoid this problem by specifying the number of characters to use for each key (say the first 166 from each of the two). But there doesn't seem to be any ActiveRecord syntax to express this. Presumably I'm not the first to hit this problem; what have others done? I'd rather have a scripted fix than manual corrections to mysql each time. Thanks Graham _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general -- Eric Muzzy Programmer Analyst, Virtual Library Services MSK Research Library Memorial Sloan-Kettering Cancer Center muzzye at mskcc.org telephone: 646-894-2573 fax: 646-422-2316 ===================================================================== Please note that this e-mail and any files transmitted with it may be privileged, confidential, and protected from disclosure under applicable law. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, or other use of this communication or any of its attachments is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to this message and deleting this message, any attachments, and all copies and backups from your computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rossfsinger at gmail.com Tue Feb 9 15:11:27 2010 From: rossfsinger at gmail.com (Ross Singer) Date: Tue, 9 Feb 2010 15:11:27 -0500 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: References: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52E@JHEMTEXVS2.win.ad.jhu.edu> Message-ID: <23b83f161002091211ncf9529ex524fc7a2802af585@mail.gmail.com> On Tue, Feb 9, 2010 at 1:10 PM, wrote: > But I?m curious, just viewing the Summon literature, it seems to me as > though it?s offering the same services that umlaut does. Well, not quite. Summon provides an aggregation of fulltext resources (mostly articles) and metadata for bibliographic records for searching. Summon's index is large, but not exhaustive -- in fact, like Google Scholar, I'm not sure customers actually know the entirety of what is indexed, who knows, Serials Solutions might not even know completely. Summon doesn't really bother caring about a specific and known item (where Umlaut comes in). In the new, popular jargon, "Discovery to Delivery", Summon sits in the discovery camp and Umlaut falls under delivery. Umlaut was also designed to find your known item in specific (and sometimes unorthodox) ways: locating a conference proceeding in a catalog tends fail a lot, so Umlaut can take a variety of different approaches to find it. Originally, it also had the capacity of searching the open web (via Google and Yahoo!) and leaned on a whole slew of heuristics to determine the probability of a result being a match in an open archive. It's reasonable to think that Summon could pretty easily be incorporated into this dragnet approach. While there could be considered some overlap of functionality (Summon could, theoretically, resolve OpenURLs for things in its index, for example) there is a fairly distinct difference in the services they provide. -Ross. From muzzye at mskcc.org Tue Feb 9 15:59:10 2010 From: muzzye at mskcc.org (muzzye at mskcc.org) Date: Tue, 9 Feb 2010 15:59:10 -0500 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: <23b83f161002091211ncf9529ex524fc7a2802af585@mail.gmail.com> Message-ID: Thanks. I get that now. And as Jonathan mentioned, umluat can be the link resolver to resources found by Summon. But SS also offers 360Link, which is it's own proprietary link resolver and provides the Delivery end of Discovery. I guess this works in a closed system where everything talks OpenURL. But umlaut seems to provide both discovery and delivery, maybe using OpenUrl, and also not. Umlaut wiki states that it doesn't do the work of knowledge base, so it needs to be more flexible in how it can generate links from content. And therein lies the opportunity and challenge. It's hard to imagine that all the linking umlaut makes available is generated soley by dynamic web-based queries. I do see links like: http://findit.library.jhu.edu/link_router/index/19931364 Eric On 2/9/10 3:11 PM, "Ross Singer" wrote: On Tue, Feb 9, 2010 at 1:10 PM, wrote: > But I'm curious, just viewing the Summon literature, it seems to me as > though it's offering the same services that umlaut does. Well, not quite. Summon provides an aggregation of fulltext resources (mostly articles) and metadata for bibliographic records for searching. Summon's index is large, but not exhaustive -- in fact, like Google Scholar, I'm not sure customers actually know the entirety of what is indexed, who knows, Serials Solutions might not even know completely. Summon doesn't really bother caring about a specific and known item (where Umlaut comes in). In the new, popular jargon, "Discovery to Delivery", Summon sits in the discovery camp and Umlaut falls under delivery. Umlaut was also designed to find your known item in specific (and sometimes unorthodox) ways: locating a conference proceeding in a catalog tends fail a lot, so Umlaut can take a variety of different approaches to find it. Originally, it also had the capacity of searching the open web (via Google and Yahoo!) and leaned on a whole slew of heuristics to determine the probability of a result being a match in an open archive. It's reasonable to think that Summon could pretty easily be incorporated into this dragnet approach. While there could be considered some overlap of functionality (Summon could, theoretically, resolve OpenURLs for things in its index, for example) there is a fairly distinct difference in the services they provide. -Ross. _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general ===================================================================== Please note that this e-mail and any files transmitted with it may be privileged, confidential, and protected from disclosure under applicable law. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, or other use of this communication or any of its attachments is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to this message and deleting this message, any attachments, and all copies and backups from your computer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rochkind at jhu.edu Tue Feb 9 16:42:30 2010 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Tue, 9 Feb 2010 16:42:30 -0500 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: References: <23b83f161002091211ncf9529ex524fc7a2802af585@mail.gmail.com>, Message-ID: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52F@JHEMTEXVS2.win.ad.jhu.edu> "I get that now. And as Jonathan mentioned, umluat can be the link resolver to resources found by Summon. But SS also offers 360Link, which is it?s own proprietary link resolver and provides the Delivery end of Discovery. I guess this works in a closed system where everything talks OpenURL." Currently, we're using Umlaut on top of SFX. You could also use it on top of SS 360Link, although an adapter would have to be written, it's quite feasible and I'd like to do it eventually. If you're entirely happy with your proprietary link resolver functionality and interface, there'd be no reason to use Umlaut; Umlaut provides additional features on top of a (likely commercial) knowledge base. It's probably not reasonable to use Umlaut _without_ an external knowledge base (like 360Link or SFX), since it doesn't have it's own internal knowledge base of library holdings and platform direct linking templates/algorithms, which is pretty important for link resolver functionality. So to say "Umlaut does the same thing as Summon" is really the same as to say "360Link does the same thing as Summon." It doesn't, not really, that's why Summon users are still using 360 Link -- or SFX -- or another link resolver of your choice -- such as Umlaut on top of a knowledge base. The way I explain the niche of Umlaut is that it's focused on _known item services_. If you've identified a correct citation, Umlaut (or any other link resolver), can display access, delivery, and other services from your library or from third party sources. Umlaut (or any other link resover) can't answer questions like "Help me find articles on topic X", or "Find anything by Author A" or "There's this article I think is by a guy named Smith on commercial aviation but I can't remmeber the title." To answer questions like that, you need to use a discovery service like Summon, your catalog, Google Scholar, licensed databases, etc. Once you've found an exact citation through one of these means, Umlaut can find services for this known item. As any other link resolver can, SFX, 360Link, whatever; Umlaut just does it better, is the idea. Hope this helps clear things up. I wrote a bit on how I see the role of a powerful link resolver like Umlaut in the tech infrastructure here: http://bibwild.wordpress.com/2008/09/25/rethinking-link-resolvers/ ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of muzzye at mskcc.org [muzzye at mskcc.org] Sent: Tuesday, February 09, 2010 3:59 PM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Thanks. But umlaut seems to provide both discovery and delivery, maybe using OpenUrl, and also not. Umlaut wiki states that it doesn?t do the work of knowledge base, so it needs to be more flexible in how it can generate links from content. And therein lies the opportunity and challenge. It?s hard to imagine that all the linking umlaut makes available is generated soley by dynamic web-based queries. I do see links like: http://findit.library.jhu.edu/link_router/index/19931364 Eric On 2/9/10 3:11 PM, "Ross Singer" wrote: On Tue, Feb 9, 2010 at 1:10 PM, wrote: > But I?m curious, just viewing the Summon literature, it seems to me as > though it?s offering the same services that umlaut does. Well, not quite. Summon provides an aggregation of fulltext resources (mostly articles) and metadata for bibliographic records for searching. Summon's index is large, but not exhaustive -- in fact, like Google Scholar, I'm not sure customers actually know the entirety of what is indexed, who knows, Serials Solutions might not even know completely. Summon doesn't really bother caring about a specific and known item (where Umlaut comes in). In the new, popular jargon, "Discovery to Delivery", Summon sits in the discovery camp and Umlaut falls under delivery. Umlaut was also designed to find your known item in specific (and sometimes unorthodox) ways: locating a conference proceeding in a catalog tends fail a lot, so Umlaut can take a variety of different approaches to find it. Originally, it also had the capacity of searching the open web (via Google and Yahoo!) and leaned on a whole slew of heuristics to determine the probability of a result being a match in an open archive. It's reasonable to think that Summon could pretty easily be incorporated into this dragnet approach. While there could be considered some overlap of functionality (Summon could, theoretically, resolve OpenURLs for things in its index, for example) there is a fairly distinct difference in the services they provide. -Ross. _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general ===================================================================== Please note that this e-mail and any files transmitted with it may be privileged, confidential, and protected from disclosure under applicable law. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, or other use of this communication or any of its attachments is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to this message and deleting this message, any attachments, and all copies and backups from your computer. From rochkind at jhu.edu Tue Feb 9 16:46:24 2010 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Tue, 9 Feb 2010 16:46:24 -0500 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: References: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52E@JHEMTEXVS2.win.ad.jhu.edu>, Message-ID: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF530@JHEMTEXVS2.win.ad.jhu.edu> PS: Actually, the really pertinent question is why you would buy both SFX and SS 360Link simultaneously, as you seem to be considering? (Or would you give up SFX if you bought SS 360Link?). Those are two things that really ARE in the same market niche, and both proprietary systems that cost money too. Umlaut is in that same basic market niche too, the link resolver market niche, with 360Link and SFX. So a better question than "Why get Summon and have Umlaut" would be "Why have Umlaut if you already have SFX or 360Link?". The answer is that Umlaut is free software to provide a better interface with more sophisticated features of a variety of sources, that goes on top of SFX (or hypothetically 360Link). Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of muzzye at mskcc.org [muzzye at mskcc.org] Sent: Tuesday, February 09, 2010 1:10 PM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation I don?t have any experience with umlaut or Summon, though we are looking at Serials Solution very soon. But I?m curious, just viewing the Summon literature, it seems to me as though it?s offering the same services that umlaut does. For our purposes, it?s local content that we can?t readily make available in SFX, and the Summon literature suggests it can incorporate this. With Summon in place, what more does umlaut bring users. Jonathan?s email seems to suggest that libraries have both the SFX KB and Serials Solution. Perhaps they do, and that would explain umlaut?s value. For us, we?re considering Serial Solutions A-Z list, I think it?s called 360. Eric On 2/9/10 11:35 AM, "Jonathan Rochkind" wrote: Makes sense. It should be very easy to 'integrate' Umlaut with Summon -- and I think it will in fact make a lot of sense to do so, as my understanding with Summon is that your link resolver pretty much IS the item detail page, so you'll want a nice one, with Umlaut can provide. All you should have to do is point Summon at Umlaut as the link resolver, there shouldn't be any development neccesary. (So 'integrate' might not be quite the right word, but Summon is, as I understand it, set up to use an openurl link resolver of your choice, and Umlaut (backed by your SFX) should just work). Same for Xerxes, except as an added bonus Xerxes now has a feature where Umlaut-provided services (full text link, library holdings information, etc) can be placed directly on a Xerxes record detail page via ajax. Feature's already there, you just tell Xerxes where your Umlaut's at, it just works (or ought to). I know NYU uses Aleph, but they also use Primo. I'm not sure if Scot at NYU has written an aleph adapter for Umlaut, or if his adapter talks to Primo instead. So writing an adapter for Aleph, so library holdings can be displayed directly on the Umlaut page, might be the one piece of development you might have to do. It shouldn't be too hard to do, and I can maybe help if necessary. So, while I'm biased of course, I think you're right that Umlaut could be quite useful in your setup. Definitely feel free to send any questions or issues you have to the list. I don't think it should really be that hard to support Umlaut running under mongrel, but it might work with Passenger too. I don't think you need to be scared of mongrel though. We'll see! Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Tuesday, February 09, 2010 4:34 AM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Hi Jonathan This is mainly investigative at the moment. Our library is in a process of transition, which makes it hard to predict exactly what we'll be using in a years time. And we are very low on technical staff, which means long-term ease of maintenance is also an issue. At the moment we're running aleph, sfx, metalib+xerxes. But we're just about to start using Summon. The plan at the moment is for metalib+xerxes to stay around for a year or so at least. So one aspect of umlaut I want to look at is ease of integration with either/both of xerxes and summon. Another is maintainability: having had a little experience of supporting mongrel/rails I'm not wild about the idea of either. I will certainly look at trying it with passenger instead of mongrel. Generally though the main point at the moment is just to get an idea of what umlaut can do: it's one thing looking at docs, and another altogether trying something out! Anyway, thanks for creating umlaut. And thanks in advance for all the queries I just know I'm going to be sending your way ;-) Graham -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 18:01 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Sweet, that's good enough, I'm just happy you're making progress. For reference, I'm using a RHEL/yum mysql package here. Very happy to help as I can with any other installation or customization hurdles you run into. I tried to make the installation process smooth and documented, but it could be better. Right now I think it's two or maybe three libraries that are actually using Umlaut in production. But I think it's relatively mature and stable software that is suitable for getting up without TOO much hacking. I'd love to see more libraries using it. (You will probably need to write an adapter for your ILS/OPAC, unless you use Horizon or Aleph.) Can you share any more about the nature of your curiosity in it, what you are considering doing with it? I'll also say that longer-term I'd like to find time to make an Umlaut 3.0 that is a bit more cleaned up; Right now, it will only run on Rails 2.1, not 2.2. (Need to work on some threading issues to get it up to 2.2). I'd also like to verify it running under Passenger, instead of the ancient mongrel that nobody uses anymore unless they have to (_should_ work fine, but might be some threading issues). And I'd like to make all the currently embedded javascript use an 'unobtrusive' technique, and provide both Prototype and JQuery versions of the js functionality scripts, so you can use either one if you want to embed Umlaut content in an external application. If only I had the time to do that, but some day. I do think it's stable and mature as it is, just some annoyances as above. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 12:27 PM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation It works fine on centos with mysql 5.0.77 (I've added notes on a couple of very minor gotchas to the wiki). So as far as I know it's only the gentoo version of mysql giving problems. My personal motivation to do more about this has just vanished, I'm afraid: I think the fix needs to be to the gentoo mysql, not to umlaut. I'm going to carry on with the install on centos, see how far I get with that. Graham -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 16:19 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation So we could start by listing those failed indexes; some of them might be 'legacy' data that isn't actually used anymore. We could simply remove those indexes (and the columns they are on). Others are going to be on data that is still important, and has indexes that are important. But we could try to alter these indexes to not go over the mysql limit, by one or more of: Perhaps some of them are compound indexes but don't really need to be compound, if we think about the use of that column. I'm no DBA, I think I made most of those indexes (ross who's also no DBA made the rest :) ), I may have made certain indexes compound indexes out of 'extra' caution, when really a single column index might suffice, depending on the nature of the data. Alternately, depending on the nature of the data, we could make the size of the indexed columns smaller, so they don't exceed the 1000 byte limit (you CAN easily set the size of a column in ActiveRecord schema.rb). Alternately, we can find a way to have the database creation scripts actually supply the index-truncation MySQL you mention. The index creation script doesn't have to be just 'schema.rb', I'm not even sure that is the preferred way to do things anymore (I think there's some rake db:boostrap task I havent' looked at, that didn't used to exist when I devised the original umlaut installation path). If it can't easily be done in schema.rb, there may be another method of easily allowing db bootstrapping that makes the MySQL index truncation decleration easier. So those are my ideas! You seem to have a pretty good idea of what's going on in the code, Graham. If you come up with a good idea for a fix, I'll happily take the patch, or just plain give you commit rights to the repo, no problem. Of course the 'depending on the nature of the data' questions towards what indexes really ought to be there how is something you might not know, not being familiar with the Umlaut code; I'm happy to answer questions or try to think about that if you identify questions. ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 10:10 AM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation There is a history of this problem on http://bugs.mysql.com/bug.php?id=451 It is a very long thread, starting in 2004 and still continuing in December 2009. As I read it, it says that the patch to fix the problem is not recommended to be included in mainstream releases since it adversely affects performance. I wonder if maybe some unix distributions include the patch and others don't? I've been trying it on gentoo linux; I'm going to have a try on CentOs and see if that behaves any differently. Graham PS I'm not getting it on all indices - only on compound indices where the sum of the lengths of the fields being indexed > 1000/3. I guess I could fix this manually but would prefer to stick at it till I've found a proper fix. -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 14:01 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Hmm, I have not run into this. I can't remember what the "keywords" index even is; it's possible this is an index on a column that isn't even used anymore. But I guess you're saying you're getting this error with all indexes,not just that one? Some of those indexes are definitely important. I'm currently not at work due to snow craziness, but I'll try to take a look tomorrow when I'm back? Is 1000 bytes the maximum length for any index key in mysql? I wonder how I've managed to create those indexes myself. While a pain, you could create those indexes by hand with SQL instead of by loading the schema.rb. Comment out the add_index lines in schema rb and then run it, and then add the indexes yourself with SQL, doing what you seem to have already figured out how to do in order to limit them to a 1000 byte prefix. But yeah, it would be better to have a scripted solution, not sure why I haven't run into this problem before, I'll try to take a look when I'm back at work. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 7:39 AM To: umlaut-general at rubyforge.org Subject: [Umlaut-general] mysql, utf8, index limitation Hi, I'm just starting my first umlaut install. I have mysql 5.0.84, and have created a db for umlaut set to UTF8 as recommended. I'm using umlaut 2.10.0 as recommended on the wiki. The command Rake db:schema:load Fails to add the keywords index at line 108 in schema.rb: Add_index "keywords", ["term", "keyword_type"], :name => "kwd_term_idx" With the error "specified key was too long; max key length is 1000 bytes". The same error appears with other indices (referent_values, referents..) This is because both term and keyword_type are defined as varchar (255), and each utf8 character is 3 bytes, giving a total required of 1530 bytes. Writing directly in mysql it's possible to avoid this problem by specifying the number of characters to use for each key (say the first 166 from each of the two). But there doesn't seem to be any ActiveRecord syntax to express this. Presumably I'm not the first to hit this problem; what have others done? I'd rather have a scripted fix than manual corrections to mysql each time. Thanks Graham _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general -- Eric Muzzy Programmer Analyst, Virtual Library Services MSK Research Library Memorial Sloan-Kettering Cancer Center muzzye at mskcc.org telephone: 646-894-2573 fax: 646-422-2316 ===================================================================== Please note that this e-mail and any files transmitted with it may be privileged, confidential, and protected from disclosure under applicable law. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, or other use of this communication or any of its attachments is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to this message and deleting this message, any attachments, and all copies and backups from your computer. From muzzye at mskcc.org Wed Feb 10 09:57:08 2010 From: muzzye at mskcc.org (muzzye at mskcc.org) Date: Wed, 10 Feb 2010 09:57:08 -0500 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF530@JHEMTEXVS2.win.ad.jhu.edu> Message-ID: No, we're considering replacing one commercial product with the other, the new being SS though which package in their suite is open. Umlaut has always been something I want to consider using. We're meeting with the vendor today ... so this has been helpful. Thanks again. Eric On 2/9/10 4:46 PM, "Jonathan Rochkind" wrote: PS: Actually, the really pertinent question is why you would buy both SFX and SS 360Link simultaneously, as you seem to be considering? (Or would you give up SFX if you bought SS 360Link?). Those are two things that really ARE in the same market niche, and both proprietary systems that cost money too. Umlaut is in that same basic market niche too, the link resolver market niche, with 360Link and SFX. So a better question than "Why get Summon and have Umlaut" would be "Why have Umlaut if you already have SFX or 360Link?". The answer is that Umlaut is free software to provide a better interface with more sophisticated features of a variety of sources, that goes on top of SFX (or hypothetically 360Link). Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of muzzye at mskcc.org [muzzye at mskcc.org] Sent: Tuesday, February 09, 2010 1:10 PM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation I don't have any experience with umlaut or Summon, though we are looking at Serials Solution very soon. But I'm curious, just viewing the Summon literature, it seems to me as though it's offering the same services that umlaut does. For our purposes, it's local content that we can't readily make available in SFX, and the Summon literature suggests it can incorporate this. With Summon in place, what more does umlaut bring users. Jonathan's email seems to suggest that libraries have both the SFX KB and Serials Solution. Perhaps they do, and that would explain umlaut's value. For us, we're considering Serial Solutions A-Z list, I think it's called 360. Eric On 2/9/10 11:35 AM, "Jonathan Rochkind" wrote: Makes sense. It should be very easy to 'integrate' Umlaut with Summon -- and I think it will in fact make a lot of sense to do so, as my understanding with Summon is that your link resolver pretty much IS the item detail page, so you'll want a nice one, with Umlaut can provide. All you should have to do is point Summon at Umlaut as the link resolver, there shouldn't be any development neccesary. (So 'integrate' might not be quite the right word, but Summon is, as I understand it, set up to use an openurl link resolver of your choice, and Umlaut (backed by your SFX) should just work). Same for Xerxes, except as an added bonus Xerxes now has a feature where Umlaut-provided services (full text link, library holdings information, etc) can be placed directly on a Xerxes record detail page via ajax. Feature's already there, you just tell Xerxes where your Umlaut's at, it just works (or ought to). I know NYU uses Aleph, but they also use Primo. I'm not sure if Scot at NYU has written an aleph adapter for Umlaut, or if his adapter talks to Primo instead. So writing an adapter for Aleph, so library holdings can be displayed directly on the Umlaut page, might be the one piece of development you might have to do. It shouldn't be too hard to do, and I can maybe help if necessary. So, while I'm biased of course, I think you're right that Umlaut could be quite useful in your setup. Definitely feel free to send any questions or issues you have to the list. I don't think it should really be that hard to support Umlaut running under mongrel, but it might work with Passenger too. I don't think you need to be scared of mongrel though. We'll see! Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Tuesday, February 09, 2010 4:34 AM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Hi Jonathan This is mainly investigative at the moment. Our library is in a process of transition, which makes it hard to predict exactly what we'll be using in a years time. And we are very low on technical staff, which means long-term ease of maintenance is also an issue. At the moment we're running aleph, sfx, metalib+xerxes. But we're just about to start using Summon. The plan at the moment is for metalib+xerxes to stay around for a year or so at least. So one aspect of umlaut I want to look at is ease of integration with either/both of xerxes and summon. Another is maintainability: having had a little experience of supporting mongrel/rails I'm not wild about the idea of either. I will certainly look at trying it with passenger instead of mongrel. Generally though the main point at the moment is just to get an idea of what umlaut can do: it's one thing looking at docs, and another altogether trying something out! Anyway, thanks for creating umlaut. And thanks in advance for all the queries I just know I'm going to be sending your way ;-) Graham -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 18:01 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Sweet, that's good enough, I'm just happy you're making progress. For reference, I'm using a RHEL/yum mysql package here. Very happy to help as I can with any other installation or customization hurdles you run into. I tried to make the installation process smooth and documented, but it could be better. Right now I think it's two or maybe three libraries that are actually using Umlaut in production. But I think it's relatively mature and stable software that is suitable for getting up without TOO much hacking. I'd love to see more libraries using it. (You will probably need to write an adapter for your ILS/OPAC, unless you use Horizon or Aleph.) Can you share any more about the nature of your curiosity in it, what you are considering doing with it? I'll also say that longer-term I'd like to find time to make an Umlaut 3.0 that is a bit more cleaned up; Right now, it will only run on Rails 2.1, not 2.2. (Need to work on some threading issues to get it up to 2.2). I'd also like to verify it running under Passenger, instead of the ancient mongrel that nobody uses anymore unless they have to (_should_ work fine, but might be some threading issues). And I'd like to make all the currently embedded javascript use an 'unobtrusive' technique, and provide both Prototype and JQuery versions of the js functionality scripts, so you can use either one if you want to embed Umlaut content in an external application. If only I had the time to do that, but some day. I do think it's stable and mature as it is, just some annoyances as above. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 12:27 PM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation It works fine on centos with mysql 5.0.77 (I've added notes on a couple of very minor gotchas to the wiki). So as far as I know it's only the gentoo version of mysql giving problems. My personal motivation to do more about this has just vanished, I'm afraid: I think the fix needs to be to the gentoo mysql, not to umlaut. I'm going to carry on with the install on centos, see how far I get with that. Graham -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 16:19 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation So we could start by listing those failed indexes; some of them might be 'legacy' data that isn't actually used anymore. We could simply remove those indexes (and the columns they are on). Others are going to be on data that is still important, and has indexes that are important. But we could try to alter these indexes to not go over the mysql limit, by one or more of: Perhaps some of them are compound indexes but don't really need to be compound, if we think about the use of that column. I'm no DBA, I think I made most of those indexes (ross who's also no DBA made the rest :) ), I may have made certain indexes compound indexes out of 'extra' caution, when really a single column index might suffice, depending on the nature of the data. Alternately, depending on the nature of the data, we could make the size of the indexed columns smaller, so they don't exceed the 1000 byte limit (you CAN easily set the size of a column in ActiveRecord schema.rb). Alternately, we can find a way to have the database creation scripts actually supply the index-truncation MySQL you mention. The index creation script doesn't have to be just 'schema.rb', I'm not even sure that is the preferred way to do things anymore (I think there's some rake db:boostrap task I havent' looked at, that didn't used to exist when I devised the original umlaut installation path). If it can't easily be done in schema.rb, there may be another method of easily allowing db bootstrapping that makes the MySQL index truncation decleration easier. So those are my ideas! You seem to have a pretty good idea of what's going on in the code, Graham. If you come up with a good idea for a fix, I'll happily take the patch, or just plain give you commit rights to the repo, no problem. Of course the 'depending on the nature of the data' questions towards what indexes really ought to be there how is something you might not know, not being familiar with the Umlaut code; I'm happy to answer questions or try to think about that if you identify questions. ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 10:10 AM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation There is a history of this problem on http://bugs.mysql.com/bug.php?id=451 It is a very long thread, starting in 2004 and still continuing in December 2009. As I read it, it says that the patch to fix the problem is not recommended to be included in mainstream releases since it adversely affects performance. I wonder if maybe some unix distributions include the patch and others don't? I've been trying it on gentoo linux; I'm going to have a try on CentOs and see if that behaves any differently. Graham PS I'm not getting it on all indices - only on compound indices where the sum of the lengths of the fields being indexed > 1000/3. I guess I could fix this manually but would prefer to stick at it till I've found a proper fix. -----Original Message----- From: umlaut-general-bounces at rubyforge.org [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan Rochkind Sent: 08 February 2010 14:01 To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Hmm, I have not run into this. I can't remember what the "keywords" index even is; it's possible this is an index on a column that isn't even used anymore. But I guess you're saying you're getting this error with all indexes,not just that one? Some of those indexes are definitely important. I'm currently not at work due to snow craziness, but I'll try to take a look tomorrow when I'm back? Is 1000 bytes the maximum length for any index key in mysql? I wonder how I've managed to create those indexes myself. While a pain, you could create those indexes by hand with SQL instead of by loading the schema.rb. Comment out the add_index lines in schema rb and then run it, and then add the indexes yourself with SQL, doing what you seem to have already figured out how to do in order to limit them to a 1000 byte prefix. But yeah, it would be better to have a scripted solution, not sure why I haven't run into this problem before, I'll try to take a look when I'm back at work. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham [Graham.Seaman at rhul.ac.uk] Sent: Monday, February 08, 2010 7:39 AM To: umlaut-general at rubyforge.org Subject: [Umlaut-general] mysql, utf8, index limitation Hi, I'm just starting my first umlaut install. I have mysql 5.0.84, and have created a db for umlaut set to UTF8 as recommended. I'm using umlaut 2.10.0 as recommended on the wiki. The command Rake db:schema:load Fails to add the keywords index at line 108 in schema.rb: Add_index "keywords", ["term", "keyword_type"], :name => "kwd_term_idx" With the error "specified key was too long; max key length is 1000 bytes". The same error appears with other indices (referent_values, referents..) This is because both term and keyword_type are defined as varchar (255), and each utf8 character is 3 bytes, giving a total required of 1530 bytes. Writing directly in mysql it's possible to avoid this problem by specifying the number of characters to use for each key (say the first 166 from each of the two). But there doesn't seem to be any ActiveRecord syntax to express this. Presumably I'm not the first to hit this problem; what have others done? I'd rather have a scripted fix than manual corrections to mysql each time. Thanks Graham _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general -- Eric Muzzy Programmer Analyst, Virtual Library Services MSK Research Library Memorial Sloan-Kettering Cancer Center muzzye at mskcc.org telephone: 646-894-2573 fax: 646-422-2316 ===================================================================== Please note that this e-mail and any files transmitted with it may be privileged, confidential, and protected from disclosure under applicable law. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, or other use of this communication or any of its attachments is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to this message and deleting this message, any attachments, and all copies and backups from your computer. _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general -- Eric Muzzy Programmer Analyst, Virtual Library Services MSK Research Library Memorial Sloan-Kettering Cancer Center muzzye at mskcc.org telephone: 646-894-2573 fax: 646-422-2316 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonesa at newschool.edu Wed Feb 10 10:43:19 2010 From: jonesa at newschool.edu (Allen Jones) Date: Wed, 10 Feb 2010 10:43:19 -0500 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF530@JHEMTEXVS2.win.ad.jhu.edu> References: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52E@JHEMTEXVS2.win.ad.jhu.edu>, <90FF863A96E1EC42B8B240D04C88FB1D12F07AF530@JHEMTEXVS2.win.ad.jhu.edu> Message-ID: Jonathan is correct, but more to the point, working with SerialsSolution is horrible. Before you can do any work, you have to sign a non-disclosure agreement and promise anything you find out about their XML API, you'll take to your grave. Even getting code examples that use their API was difficult. For the life of us, we could not get their support staff to work with us (New School) and NYU. Their answer was always, "if you all were serialssolutions customers, you could do what you wanted to do.". They're nice people who worked with us in our transition to sfx, but they weren't open. Having been a former serialssolutions customer, they are great if all you need is a header and footer slapped on your output screen. Don't think you'll do any development with them, but then again, I've said too much. LOL. Allen Jones Director - Digital Library Programs The New School Libraries On Feb 9, 2010, at 4:46 PM, Jonathan Rochkind wrote: > PS: Actually, the really pertinent question is why you would buy > both SFX and SS 360Link simultaneously, as you seem to be > considering? (Or would you give up SFX if you bought SS 360Link?). > Those are two things that really ARE in the same market niche, and > both proprietary systems that cost money too. > > Umlaut is in that same basic market niche too, the link resolver > market niche, with 360Link and SFX. So a better question than "Why > get Summon and have Umlaut" would be "Why have Umlaut if you already > have SFX or 360Link?". The answer is that Umlaut is free software > to provide a better interface with more sophisticated features of a > variety of sources, that goes on top of SFX (or hypothetically > 360Link). > > Jonathan > ________________________________________ > From: umlaut-general-bounces at rubyforge.org [umlaut-general- > bounces at rubyforge.org] On Behalf Of muzzye at mskcc.org > [muzzye at mskcc.org] > Sent: Tuesday, February 09, 2010 1:10 PM > To: umlaut-general at rubyforge.org > Subject: Re: [Umlaut-general] mysql, utf8, index limitation > > I don?t have any experience with umlaut or Summon, though we are loo > king at Serials Solution very soon. > > But I?m curious, just viewing the Summon literature, it seems to me > as though it?s offering the same services that umlaut does. > For our purposes, it?s local content that we can?t readily make > available in SFX, and the Summon literature suggests it can incorpor > ate this. > With Summon in place, what more does umlaut bring users. > > Jonathan?s email seems to suggest that libraries have both the SFX K > B and Serials Solution. > Perhaps they do, and that would explain umlaut?s value. > For us, we?re considering Serial Solutions A-Z list, I think it?s > called 360. > > Eric > > > On 2/9/10 11:35 AM, "Jonathan Rochkind" wrote: > > Makes sense. > > It should be very easy to 'integrate' Umlaut with Summon -- and I > think it will in fact make a lot of sense to do so, as my > understanding with Summon is that your link resolver pretty much IS > the item detail page, so you'll want a nice one, with Umlaut can > provide. All you should have to do is point Summon at Umlaut as the > link resolver, there shouldn't be any development neccesary. (So > 'integrate' might not be quite the right word, but Summon is, as I > understand it, set up to use an openurl link resolver of your > choice, and Umlaut (backed by your SFX) should just work). > > Same for Xerxes, except as an added bonus Xerxes now has a feature > where Umlaut-provided services (full text link, library holdings > information, etc) can be placed directly on a Xerxes record detail > page via ajax. Feature's already there, you just tell Xerxes where > your Umlaut's at, it just works (or ought to). > > I know NYU uses Aleph, but they also use Primo. I'm not sure if Scot > at NYU has written an aleph adapter for Umlaut, or if his adapter > talks to Primo instead. So writing an adapter for Aleph, so library > holdings can be displayed directly on the Umlaut page, might be the > one piece of development you might have to do. It shouldn't be too > hard to do, and I can maybe help if necessary. > > So, while I'm biased of course, I think you're right that Umlaut > could be quite useful in your setup. Definitely feel free to send > any questions or issues you have to the list. I don't think it > should really be that hard to support Umlaut running under mongrel, > but it might work with Passenger too. I don't think you need to be > scared of mongrel though. We'll see! > > Jonathan > ________________________________________ > From: umlaut-general-bounces at rubyforge.org [umlaut-general- > bounces at rubyforge.org] On Behalf Of Seaman, Graham > [Graham.Seaman at rhul.ac.uk] > Sent: Tuesday, February 09, 2010 4:34 AM > To: umlaut-general at rubyforge.org > Subject: Re: [Umlaut-general] mysql, utf8, index limitation > > Hi Jonathan > > This is mainly investigative at the moment. Our library is in a > process > of transition, which makes it hard to predict exactly what we'll be > using in a years time. And we are very low on technical staff, which > means long-term ease of maintenance is also an issue. > > At the moment we're running aleph, sfx, metalib+xerxes. But we're just > about to start using Summon. The plan at the moment is for > metalib+xerxes to stay around for a year or so at least. So one aspect > of umlaut I want to look at is ease of integration with either/both of > xerxes and summon. Another is maintainability: having had a little > experience of supporting mongrel/rails I'm not wild about the idea of > either. I will certainly look at trying it with passenger instead of > mongrel. Generally though the main point at the moment is just to > get an > idea of what umlaut can do: it's one thing looking at docs, and > another > altogether trying something out! > > Anyway, thanks for creating umlaut. And thanks in advance for all the > queries I just know I'm going to be sending your way ;-) > > Graham > > -----Original Message----- > From: umlaut-general-bounces at rubyforge.org > [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan > Rochkind > Sent: 08 February 2010 18:01 > To: umlaut-general at rubyforge.org > Subject: Re: [Umlaut-general] mysql, utf8, index limitation > > Sweet, that's good enough, I'm just happy you're making progress. > For > reference, I'm using a RHEL/yum mysql package here. > > Very happy to help as I can with any other installation or > customization > hurdles you run into. I tried to make the installation process smooth > and documented, but it could be better. > > Right now I think it's two or maybe three libraries that are actually > using Umlaut in production. But I think it's relatively mature and > stable software that is suitable for getting up without TOO much > hacking. I'd love to see more libraries using it. (You will probably > need to write an adapter for your ILS/OPAC, unless you use Horizon or > Aleph.) Can you share any more about the nature of your curiosity in > it, what you are considering doing with it? > > I'll also say that longer-term I'd like to find time to make an Umlaut > 3.0 that is a bit more cleaned up; Right now, it will only run on > Rails > 2.1, not 2.2. (Need to work on some threading issues to get it up to > 2.2). I'd also like to verify it running under Passenger, instead of > the ancient mongrel that nobody uses anymore unless they have to > (_should_ work fine, but might be some threading issues). And I'd > like > to make all the currently embedded javascript use an 'unobtrusive' > technique, and provide both Prototype and JQuery versions of the js > functionality scripts, so you can use either one if you want to embed > Umlaut content in an external application. > > If only I had the time to do that, but some day. I do think it's > stable and mature as it is, just some annoyances as above. > > Jonathan > ________________________________________ > From: umlaut-general-bounces at rubyforge.org > [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham > [Graham.Seaman at rhul.ac.uk] > Sent: Monday, February 08, 2010 12:27 PM > To: umlaut-general at rubyforge.org > Subject: Re: [Umlaut-general] mysql, utf8, index limitation > > It works fine on centos with mysql 5.0.77 (I've added notes on a > couple > of very minor gotchas to the wiki). So as far as I know it's only the > gentoo version of mysql giving problems. My personal motivation to do > more about this has just vanished, I'm afraid: I think the fix needs > to > be to the gentoo mysql, not to umlaut. > > I'm going to carry on with the install on centos, see how far I get > with > that. > > Graham > > -----Original Message----- > From: umlaut-general-bounces at rubyforge.org > [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan > Rochkind > Sent: 08 February 2010 16:19 > To: umlaut-general at rubyforge.org > Subject: Re: [Umlaut-general] mysql, utf8, index limitation > > So we could start by listing those failed indexes; some of them > might be > 'legacy' data that isn't actually used anymore. We could simply > remove > those indexes (and the columns they are on). Others are going to be > on > data that is still important, and has indexes that are important. > But we > could try to alter these indexes to not go over the mysql limit, by > one > or more of: > > Perhaps some of them are compound indexes but don't really need to be > compound, if we think about the use of that column. I'm no DBA, I > think > I made most of those indexes (ross who's also no DBA made the > rest :) ), > I may have made certain indexes compound indexes out of 'extra' > caution, > when really a single column index might suffice, depending on the > nature > of the data. Alternately, depending on the nature of the data, we > could > make the size of the indexed columns smaller, so they don't exceed the > 1000 byte limit (you CAN easily set the size of a column in > ActiveRecord > schema.rb). > > Alternately, we can find a way to have the database creation scripts > actually supply the index-truncation MySQL you mention. The index > creation script doesn't have to be just 'schema.rb', I'm not even sure > that is the preferred way to do things anymore (I think there's some > rake db:boostrap task I havent' looked at, that didn't used to exist > when I devised the original umlaut installation path). If it can't > easily be done in schema.rb, there may be another method of easily > allowing db bootstrapping that makes the MySQL index truncation > decleration easier. > > So those are my ideas! You seem to have a pretty good idea of what's > going on in the code, Graham. If you come up with a good idea for a > fix, > I'll happily take the patch, or just plain give you commit rights to > the > repo, no problem. Of course the 'depending on the nature of the data' > questions towards what indexes really ought to be there how is > something > you might not know, not being familiar with the Umlaut code; I'm happy > to answer questions or try to think about that if you identify > questions. > ________________________________________ > From: umlaut-general-bounces at rubyforge.org > [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham > [Graham.Seaman at rhul.ac.uk] > Sent: Monday, February 08, 2010 10:10 AM > To: umlaut-general at rubyforge.org > Subject: Re: [Umlaut-general] mysql, utf8, index limitation > > There is a history of this problem on > http://bugs.mysql.com/bug.php?id=451 > It is a very long thread, starting in 2004 and still continuing in > December 2009. > > As I read it, it says that the patch to fix the problem is not > recommended to be included in mainstream releases since it adversely > affects performance. I wonder if maybe some unix distributions include > the patch and others don't? > > I've been trying it on gentoo linux; I'm going to have a try on CentOs > and see if that behaves any differently. > > Graham > PS I'm not getting it on all indices - only on compound indices where > the sum of the lengths of the fields being indexed > 1000/3. I > guess I > could fix this manually but would prefer to stick at it till I've > found > a proper fix. > > -----Original Message----- > From: umlaut-general-bounces at rubyforge.org > [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan > Rochkind > Sent: 08 February 2010 14:01 > To: umlaut-general at rubyforge.org > Subject: Re: [Umlaut-general] mysql, utf8, index limitation > > Hmm, I have not run into this. I can't remember what the "keywords" > index even is; it's possible this is an index on a column that isn't > even used anymore. > > But I guess you're saying you're getting this error with all > indexes,not > just that one? Some of those indexes are definitely important. > > I'm currently not at work due to snow craziness, but I'll try to > take a > look tomorrow when I'm back? > > Is 1000 bytes the maximum length for any index key in mysql? I wonder > how I've managed to create those indexes myself. > > While a pain, you could create those indexes by hand with SQL > instead of > by loading the schema.rb. Comment out the add_index lines in schema > rb > and then run it, and then add the indexes yourself with SQL, doing > what > you seem to have already figured out how to do in order to limit > them to > a 1000 byte prefix. But yeah, it would be better to have a scripted > solution, not sure why I haven't run into this problem before, I'll > try > to take a look when I'm back at work. > > > Jonathan > ________________________________________ > From: umlaut-general-bounces at rubyforge.org > [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham > [Graham.Seaman at rhul.ac.uk] > Sent: Monday, February 08, 2010 7:39 AM > To: umlaut-general at rubyforge.org > Subject: [Umlaut-general] mysql, utf8, index limitation > > Hi, > > I'm just starting my first umlaut install. I have mysql 5.0.84, and > have > created a db for umlaut set to UTF8 as recommended. I'm using umlaut > 2.10.0 as recommended on the wiki. > > The command > Rake db:schema:load > > Fails to add the keywords index at line 108 in schema.rb: > > Add_index "keywords", ["term", "keyword_type"], :name => > "kwd_term_idx" > > With the error "specified key was too long; max key length is 1000 > bytes". The same error appears with other indices (referent_values, > referents..) > > This is because both term and keyword_type are defined as varchar > (255), > and each utf8 character is 3 bytes, giving a total required of 1530 > bytes. > > Writing directly in mysql it's possible to avoid this problem by > specifying the number of characters to use for each key (say the first > 166 from each of the two). But there doesn't seem to be any > ActiveRecord > syntax to express this. Presumably I'm not the first to hit this > problem; what have others done? I'd rather have a scripted fix than > manual corrections to mysql each time. > > Thanks > Graham > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > > > > -- > > Eric Muzzy > Programmer Analyst, Virtual Library Services > MSK Research Library > Memorial Sloan-Kettering Cancer Center > muzzye at mskcc.org > telephone: 646-894-2573 > fax: 646-422-2316 > > > > > > > ===================================================================== > > Please note that this e-mail and any files transmitted with it > may be > privileged, confidential, and protected from disclosure under > applicable law. If the reader of this message is not the intended > recipient, or an employee or agent responsible for delivering this > message to the intended recipient, you are hereby notified that > any > reading, dissemination, distribution, copying, or other use of > this > communication or any of its attachments is strictly prohibited. > If > you have received this communication in error, please notify the > sender immediately by replying to this message and deleting this > message, any attachments, and all copies and backups from your > computer. > > > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general From rochkind at jhu.edu Wed Feb 10 11:40:02 2010 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Wed, 10 Feb 2010 11:40:02 -0500 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: References: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52E@JHEMTEXVS2.win.ad.jhu.edu> <90FF863A96E1EC42B8B240D04C88FB1D12F07AF530@JHEMTEXVS2.win.ad.jhu.edu>, Message-ID: <90FF863A96E1EC42B8B240D04C88FB1D13006CAD89@JHEMTEXVS2.win.ad.jhu.edu> I think they're getting better, or at least some people at SerSol are getting better, at some of that stuff. A SerSol staff published an article about the 360Link API in the Code4Lib Journal. I said "This is surprising to me, as you're publishing information that previously people had to sign an NDA to get!" The person who wrote teh article who worked at SerSol basically said "Yeah, that stuff is kind of a relic of an earlier time, we're trying to change it." At any rate, from the pubished article about the 360Link API, it _seems_ that it would be quite sufficient to writing an adapter for Umlaut to use the SerSol knowledge base. Of course, there are often bugs and idiosyncracies that are not evident from the documentation, only evident once you start developing. But I'm fairly optimistic that a SerSol 360Link plugin for Umlaut is feasible. At MPOW we constantly consider switching to SerSol from SFX. The SerSol shared knowledge base seems to possibly be better than the SFX one. And once you're using Umlaut, it's IMO really the shared knowledge base you're paying for; you don't _need_ to customize the interface they provide you, you've got all the interface customization you want with Umlaut, using their APIs instead of their end-user interface. Jonathan ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Allen Jones [jonesa at newschool.edu] Sent: Wednesday, February 10, 2010 10:43 AM To: umlaut-general at rubyforge.org Cc: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation Jonathan is correct, but more to the point, working with SerialsSolution is horrible. Before you can do any work, you have to sign a non-disclosure agreement and promise anything you find out about their XML API, you'll take to your grave. Even getting code examples that use their API was difficult. For the life of us, we could not get their support staff to work with us (New School) and NYU. Their answer was always, "if you all were serialssolutions customers, you could do what you wanted to do.". They're nice people who worked with us in our transition to sfx, but they weren't open. Having been a former serialssolutions customer, they are great if all you need is a header and footer slapped on your output screen. Don't think you'll do any development with them, but then again, I've said too much. LOL. Allen Jones Director - Digital Library Programs The New School Libraries On Feb 9, 2010, at 4:46 PM, Jonathan Rochkind wrote: > PS: Actually, the really pertinent question is why you would buy > both SFX and SS 360Link simultaneously, as you seem to be > considering? (Or would you give up SFX if you bought SS 360Link?). > Those are two things that really ARE in the same market niche, and > both proprietary systems that cost money too. > > Umlaut is in that same basic market niche too, the link resolver > market niche, with 360Link and SFX. So a better question than "Why > get Summon and have Umlaut" would be "Why have Umlaut if you already > have SFX or 360Link?". The answer is that Umlaut is free software > to provide a better interface with more sophisticated features of a > variety of sources, that goes on top of SFX (or hypothetically > 360Link). > > Jonathan > ________________________________________ > From: umlaut-general-bounces at rubyforge.org [umlaut-general- > bounces at rubyforge.org] On Behalf Of muzzye at mskcc.org > [muzzye at mskcc.org] > Sent: Tuesday, February 09, 2010 1:10 PM > To: umlaut-general at rubyforge.org > Subject: Re: [Umlaut-general] mysql, utf8, index limitation > > I don?t have any experience with umlaut or Summon, though we are loo > king at Serials Solution very soon. > > But I?m curious, just viewing the Summon literature, it seems to me > as though it?s offering the same services that umlaut does. > For our purposes, it?s local content that we can?t readily make > available in SFX, and the Summon literature suggests it can incorpor > ate this. > With Summon in place, what more does umlaut bring users. > > Jonathan?s email seems to suggest that libraries have both the SFX K > B and Serials Solution. > Perhaps they do, and that would explain umlaut?s value. > For us, we?re considering Serial Solutions A-Z list, I think it?s > called 360. > > Eric > > > On 2/9/10 11:35 AM, "Jonathan Rochkind" wrote: > > Makes sense. > > It should be very easy to 'integrate' Umlaut with Summon -- and I > think it will in fact make a lot of sense to do so, as my > understanding with Summon is that your link resolver pretty much IS > the item detail page, so you'll want a nice one, with Umlaut can > provide. All you should have to do is point Summon at Umlaut as the > link resolver, there shouldn't be any development neccesary. (So > 'integrate' might not be quite the right word, but Summon is, as I > understand it, set up to use an openurl link resolver of your > choice, and Umlaut (backed by your SFX) should just work). > > Same for Xerxes, except as an added bonus Xerxes now has a feature > where Umlaut-provided services (full text link, library holdings > information, etc) can be placed directly on a Xerxes record detail > page via ajax. Feature's already there, you just tell Xerxes where > your Umlaut's at, it just works (or ought to). > > I know NYU uses Aleph, but they also use Primo. I'm not sure if Scot > at NYU has written an aleph adapter for Umlaut, or if his adapter > talks to Primo instead. So writing an adapter for Aleph, so library > holdings can be displayed directly on the Umlaut page, might be the > one piece of development you might have to do. It shouldn't be too > hard to do, and I can maybe help if necessary. > > So, while I'm biased of course, I think you're right that Umlaut > could be quite useful in your setup. Definitely feel free to send > any questions or issues you have to the list. I don't think it > should really be that hard to support Umlaut running under mongrel, > but it might work with Passenger too. I don't think you need to be > scared of mongrel though. We'll see! > > Jonathan > ________________________________________ > From: umlaut-general-bounces at rubyforge.org [umlaut-general- > bounces at rubyforge.org] On Behalf Of Seaman, Graham > [Graham.Seaman at rhul.ac.uk] > Sent: Tuesday, February 09, 2010 4:34 AM > To: umlaut-general at rubyforge.org > Subject: Re: [Umlaut-general] mysql, utf8, index limitation > > Hi Jonathan > > This is mainly investigative at the moment. Our library is in a > process > of transition, which makes it hard to predict exactly what we'll be > using in a years time. And we are very low on technical staff, which > means long-term ease of maintenance is also an issue. > > At the moment we're running aleph, sfx, metalib+xerxes. But we're just > about to start using Summon. The plan at the moment is for > metalib+xerxes to stay around for a year or so at least. So one aspect > of umlaut I want to look at is ease of integration with either/both of > xerxes and summon. Another is maintainability: having had a little > experience of supporting mongrel/rails I'm not wild about the idea of > either. I will certainly look at trying it with passenger instead of > mongrel. Generally though the main point at the moment is just to > get an > idea of what umlaut can do: it's one thing looking at docs, and > another > altogether trying something out! > > Anyway, thanks for creating umlaut. And thanks in advance for all the > queries I just know I'm going to be sending your way ;-) > > Graham > > -----Original Message----- > From: umlaut-general-bounces at rubyforge.org > [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan > Rochkind > Sent: 08 February 2010 18:01 > To: umlaut-general at rubyforge.org > Subject: Re: [Umlaut-general] mysql, utf8, index limitation > > Sweet, that's good enough, I'm just happy you're making progress. > For > reference, I'm using a RHEL/yum mysql package here. > > Very happy to help as I can with any other installation or > customization > hurdles you run into. I tried to make the installation process smooth > and documented, but it could be better. > > Right now I think it's two or maybe three libraries that are actually > using Umlaut in production. But I think it's relatively mature and > stable software that is suitable for getting up without TOO much > hacking. I'd love to see more libraries using it. (You will probably > need to write an adapter for your ILS/OPAC, unless you use Horizon or > Aleph.) Can you share any more about the nature of your curiosity in > it, what you are considering doing with it? > > I'll also say that longer-term I'd like to find time to make an Umlaut > 3.0 that is a bit more cleaned up; Right now, it will only run on > Rails > 2.1, not 2.2. (Need to work on some threading issues to get it up to > 2.2). I'd also like to verify it running under Passenger, instead of > the ancient mongrel that nobody uses anymore unless they have to > (_should_ work fine, but might be some threading issues). And I'd > like > to make all the currently embedded javascript use an 'unobtrusive' > technique, and provide both Prototype and JQuery versions of the js > functionality scripts, so you can use either one if you want to embed > Umlaut content in an external application. > > If only I had the time to do that, but some day. I do think it's > stable and mature as it is, just some annoyances as above. > > Jonathan > ________________________________________ > From: umlaut-general-bounces at rubyforge.org > [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham > [Graham.Seaman at rhul.ac.uk] > Sent: Monday, February 08, 2010 12:27 PM > To: umlaut-general at rubyforge.org > Subject: Re: [Umlaut-general] mysql, utf8, index limitation > > It works fine on centos with mysql 5.0.77 (I've added notes on a > couple > of very minor gotchas to the wiki). So as far as I know it's only the > gentoo version of mysql giving problems. My personal motivation to do > more about this has just vanished, I'm afraid: I think the fix needs > to > be to the gentoo mysql, not to umlaut. > > I'm going to carry on with the install on centos, see how far I get > with > that. > > Graham > > -----Original Message----- > From: umlaut-general-bounces at rubyforge.org > [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan > Rochkind > Sent: 08 February 2010 16:19 > To: umlaut-general at rubyforge.org > Subject: Re: [Umlaut-general] mysql, utf8, index limitation > > So we could start by listing those failed indexes; some of them > might be > 'legacy' data that isn't actually used anymore. We could simply > remove > those indexes (and the columns they are on). Others are going to be > on > data that is still important, and has indexes that are important. > But we > could try to alter these indexes to not go over the mysql limit, by > one > or more of: > > Perhaps some of them are compound indexes but don't really need to be > compound, if we think about the use of that column. I'm no DBA, I > think > I made most of those indexes (ross who's also no DBA made the > rest :) ), > I may have made certain indexes compound indexes out of 'extra' > caution, > when really a single column index might suffice, depending on the > nature > of the data. Alternately, depending on the nature of the data, we > could > make the size of the indexed columns smaller, so they don't exceed the > 1000 byte limit (you CAN easily set the size of a column in > ActiveRecord > schema.rb). > > Alternately, we can find a way to have the database creation scripts > actually supply the index-truncation MySQL you mention. The index > creation script doesn't have to be just 'schema.rb', I'm not even sure > that is the preferred way to do things anymore (I think there's some > rake db:boostrap task I havent' looked at, that didn't used to exist > when I devised the original umlaut installation path). If it can't > easily be done in schema.rb, there may be another method of easily > allowing db bootstrapping that makes the MySQL index truncation > decleration easier. > > So those are my ideas! You seem to have a pretty good idea of what's > going on in the code, Graham. If you come up with a good idea for a > fix, > I'll happily take the patch, or just plain give you commit rights to > the > repo, no problem. Of course the 'depending on the nature of the data' > questions towards what indexes really ought to be there how is > something > you might not know, not being familiar with the Umlaut code; I'm happy > to answer questions or try to think about that if you identify > questions. > ________________________________________ > From: umlaut-general-bounces at rubyforge.org > [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham > [Graham.Seaman at rhul.ac.uk] > Sent: Monday, February 08, 2010 10:10 AM > To: umlaut-general at rubyforge.org > Subject: Re: [Umlaut-general] mysql, utf8, index limitation > > There is a history of this problem on > http://bugs.mysql.com/bug.php?id=451 > It is a very long thread, starting in 2004 and still continuing in > December 2009. > > As I read it, it says that the patch to fix the problem is not > recommended to be included in mainstream releases since it adversely > affects performance. I wonder if maybe some unix distributions include > the patch and others don't? > > I've been trying it on gentoo linux; I'm going to have a try on CentOs > and see if that behaves any differently. > > Graham > PS I'm not getting it on all indices - only on compound indices where > the sum of the lengths of the fields being indexed > 1000/3. I > guess I > could fix this manually but would prefer to stick at it till I've > found > a proper fix. > > -----Original Message----- > From: umlaut-general-bounces at rubyforge.org > [mailto:umlaut-general-bounces at rubyforge.org] On Behalf Of Jonathan > Rochkind > Sent: 08 February 2010 14:01 > To: umlaut-general at rubyforge.org > Subject: Re: [Umlaut-general] mysql, utf8, index limitation > > Hmm, I have not run into this. I can't remember what the "keywords" > index even is; it's possible this is an index on a column that isn't > even used anymore. > > But I guess you're saying you're getting this error with all > indexes,not > just that one? Some of those indexes are definitely important. > > I'm currently not at work due to snow craziness, but I'll try to > take a > look tomorrow when I'm back? > > Is 1000 bytes the maximum length for any index key in mysql? I wonder > how I've managed to create those indexes myself. > > While a pain, you could create those indexes by hand with SQL > instead of > by loading the schema.rb. Comment out the add_index lines in schema > rb > and then run it, and then add the indexes yourself with SQL, doing > what > you seem to have already figured out how to do in order to limit > them to > a 1000 byte prefix. But yeah, it would be better to have a scripted > solution, not sure why I haven't run into this problem before, I'll > try > to take a look when I'm back at work. > > > Jonathan > ________________________________________ > From: umlaut-general-bounces at rubyforge.org > [umlaut-general-bounces at rubyforge.org] On Behalf Of Seaman, Graham > [Graham.Seaman at rhul.ac.uk] > Sent: Monday, February 08, 2010 7:39 AM > To: umlaut-general at rubyforge.org > Subject: [Umlaut-general] mysql, utf8, index limitation > > Hi, > > I'm just starting my first umlaut install. I have mysql 5.0.84, and > have > created a db for umlaut set to UTF8 as recommended. I'm using umlaut > 2.10.0 as recommended on the wiki. > > The command > Rake db:schema:load > > Fails to add the keywords index at line 108 in schema.rb: > > Add_index "keywords", ["term", "keyword_type"], :name => > "kwd_term_idx" > > With the error "specified key was too long; max key length is 1000 > bytes". The same error appears with other indices (referent_values, > referents..) > > This is because both term and keyword_type are defined as varchar > (255), > and each utf8 character is 3 bytes, giving a total required of 1530 > bytes. > > Writing directly in mysql it's possible to avoid this problem by > specifying the number of characters to use for each key (say the first > 166 from each of the two). But there doesn't seem to be any > ActiveRecord > syntax to express this. Presumably I'm not the first to hit this > problem; what have others done? I'd rather have a scripted fix than > manual corrections to mysql each time. > > Thanks > Graham > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > > > > -- > > Eric Muzzy > Programmer Analyst, Virtual Library Services > MSK Research Library > Memorial Sloan-Kettering Cancer Center > muzzye at mskcc.org > telephone: 646-894-2573 > fax: 646-422-2316 > > > > > > > ===================================================================== > > Please note that this e-mail and any files transmitted with it > may be > privileged, confidential, and protected from disclosure under > applicable law. If the reader of this message is not the intended > recipient, or an employee or agent responsible for delivering this > message to the intended recipient, you are hereby notified that > any > reading, dissemination, distribution, copying, or other use of > this > communication or any of its attachments is strictly prohibited. > If > you have received this communication in error, please notify the > sender immediately by replying to this message and deleting this > message, any attachments, and all copies and backups from your > computer. > > > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general From rossfsinger at gmail.com Wed Feb 10 12:55:34 2010 From: rossfsinger at gmail.com (Ross Singer) Date: Wed, 10 Feb 2010 12:55:34 -0500 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: <90FF863A96E1EC42B8B240D04C88FB1D13006CAD89@JHEMTEXVS2.win.ad.jhu.edu> References: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52E@JHEMTEXVS2.win.ad.jhu.edu> <90FF863A96E1EC42B8B240D04C88FB1D12F07AF530@JHEMTEXVS2.win.ad.jhu.edu> <90FF863A96E1EC42B8B240D04C88FB1D13006CAD89@JHEMTEXVS2.win.ad.jhu.edu> Message-ID: <23b83f161002100955n74ad6309s74f647c700483d4d@mail.gmail.com> On Wed, Feb 10, 2010 at 11:40 AM, Jonathan Rochkind wrote: > At MPOW we constantly consider switching to SerSol from SFX. ? The SerSol shared knowledge base seems to possibly be better than the SFX one. ?And once you're using Umlaut, it's IMO really the shared knowledge base you're paying for; you don't _need_ to customize the interface they provide you, you've got all the interface customization you want with Umlaut, using their APIs instead of their end-user interface. Assuming, of course, the API is comprehensive, comprehensible and generally useful. Unfortunately in library land, this is a rather big assumption. -Ross. From rochkind at jhu.edu Wed Feb 10 14:26:19 2010 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Wed, 10 Feb 2010 14:26:19 -0500 Subject: [Umlaut-general] mysql, utf8, index limitation In-Reply-To: <23b83f161002100955n74ad6309s74f647c700483d4d@mail.gmail.com> References: <90FF863A96E1EC42B8B240D04C88FB1D12F07AF52E@JHEMTEXVS2.win.ad.jhu.edu> <90FF863A96E1EC42B8B240D04C88FB1D12F07AF530@JHEMTEXVS2.win.ad.jhu.edu> <90FF863A96E1EC42B8B240D04C88FB1D13006CAD89@JHEMTEXVS2.win.ad.jhu.edu>, <23b83f161002100955n74ad6309s74f647c700483d4d@mail.gmail.com> Message-ID: <90FF863A96E1EC42B8B240D04C88FB1D13006CAD8A@JHEMTEXVS2.win.ad.jhu.edu> Yep, absolutely, like I said, from the write-up in the journal about the API, it LOOKS complete enough to work. So I'm optimistic enough to try it some time, but only trying it will tell for sure. ________________________________________ From: umlaut-general-bounces at rubyforge.org [umlaut-general-bounces at rubyforge.org] On Behalf Of Ross Singer [rossfsinger at gmail.com] Sent: Wednesday, February 10, 2010 12:55 PM To: umlaut-general at rubyforge.org Subject: Re: [Umlaut-general] mysql, utf8, index limitation On Wed, Feb 10, 2010 at 11:40 AM, Jonathan Rochkind wrote: > At MPOW we constantly consider switching to SerSol from SFX. The SerSol shared knowledge base seems to possibly be better than the SFX one. And once you're using Umlaut, it's IMO really the shared knowledge base you're paying for; you don't _need_ to customize the interface they provide you, you've got all the interface customization you want with Umlaut, using their APIs instead of their end-user interface. Assuming, of course, the API is comprehensive, comprehensible and generally useful. Unfortunately in library land, this is a rather big assumption. -Ross. _______________________________________________ Umlaut-general mailing list Umlaut-general at rubyforge.org http://rubyforge.org/mailman/listinfo/umlaut-general