From tristil at gmail.com Thu Feb 1 17:23:19 2007 From: tristil at gmail.com (Joseph Method) Date: Thu, 1 Feb 2007 17:23:19 -0500 Subject: [Alexandria-list] Book Providers and Tests Message-ID: <167b6aa00702011423n61c817car832ab257aa777dc0@mail.gmail.com> The most recent commit has the book providers all passing their tests, except for Barnes & Noble and MCU. To tell the truth, I got tired of fixing book providers and commented out their tests and requires. If someone else would like to fix these two, that would be great, and they can be re-included for the release. Essentially, all of these book providers didn't work because their regexes didn't match anything and other basic issues (like amadeusbuch.at now being thalia.at!). As things stand, the tests only certify that the book providers find a *given isbn*. Additional tests have to be written to cover titles, authors and keywords. While I'm on the topic, these are other tests that need to be written: * Alexandria starts up completely and then shuts down when given the close signal * import an isbn list * check that encoding is correct within book fields My impression is that unittests are mainly for testing backend functionality. Is this true, or is it feasible to erect a full Alexandria scaffold that will allow data entry through the Gtk widgets? Is there a Ruby framework for this, and is it worth the effort? Would anyone like to become the Testing Czar??? In any case, what I want to highlight about the current passing tests is that 1) they don't tell you as much as you may think (they're only as good as the assertions they make) and 2) at least within the current release cycle, *tests must pass before a commit or a diff will be accepted*. This means, for example, that any additional book providers that don't pass tests will be rejected. As new tests are written, this policy will cover whatever piece of functionality is being tested. I think this is a sound policy for a collaborative project and I hope that there can be consensus about this, but when I say "policy" I am actually just talking about what I will work on. Oh, and a really boring, but helpful task for anyone who wants it: the Amazon provider should be broken up into each of the separate language sites, so people don't have to choose, and so we can see that they all actually work (jp and de settings don't do anything currently). Actually, there's more to it, because the providers should be capable of activation/deactivation, but that's an additional step. -- -J. Method From costanti at science.unitn.it Thu Feb 1 18:34:20 2007 From: costanti at science.unitn.it (Marco Costantini) Date: Fri, 2 Feb 2007 00:34:20 +0100 Subject: [Alexandria-list] new providers webster.it and bol.it Message-ID: <200702020034.20822.costanti@science.unitn.it> Dear all, attached here is the code for two new book providers: the Italian bookshops webster.it (also known as libreriauniversitaria.it) and bol.it . Please add them to the svn. These providers work well enough, and especially webster.it includes almost all the Italian books. There are however the following small problems, and maybe someone of you can help. Both: they don't handle accented letters very nice (but this happens also with other providers). In particular with Bol.it, if the accented letter is the last of the author(s) or of the title, the regexes don't work well. Example: 9788806134747 Both: these providers include in their databases not only Italian book, but also English ones, and webster.it also German ones. It would be possible to select the database to search with a locale, but this has not yet been implemented. Currently only Italian books are searched. (The same happens for ibs.it. the already implemented Italian book provider.) Bol.it: only the search by isbn works. The search by title/author/keyword doesn't work for unknown reason. (This could be related to the fact that internally bol.it uses the first 12 digits of the ean/isbn13 code.) Webster.it: for books with more than one author, only the first author is consider, and the other author(s) are ignored. I didn't implement this, because this requires a bit more knowledge about ruby than I have. Examples in case that someone wants to fix: 9782067102392 no authors 9788804196471 one author 9788800470490 two authors 9788804559016 three authors In Italian: autore=author, autori=authors. Webster.it: the site contains both medium- and big-size images. The code find out whether there in a link to an image in the book page, and in case downloads the medium image. For the book 9788830417588, there is a link to the big image, and my code tries to download the medium, which is not present for this book. In response to a request for a non-existing image, webster.it replies with a dummy HTML page. Then the code executes return [ Book.new(...), image ] but the returned image is hence not valid, and the book is not added. I think this should be fixed on the side on main library. If a Book.new is called, but the image is not valid, then the book should be added, and the image discarded. Greetings, Marco -------------- next part -------------- # Copyright (C) 2007 Marco Costantini # based on ibs_it.rb by Claudio Belotti # # Alexandria is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as # published by the Free Software Foundation; either version 2 of the # License, or (at your option) any later version. # # Alexandria is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public # License along with Alexandria; see the file COPYING. If not, # write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. require 'fileutils' require 'net/http' require 'open-uri' #require 'cgi' module Alexandria class BookProviders class Webster_itProvider < GenericProvider BASE_URI = "http://www.libreriauniversitaria.it" # also "http://www.webster.it" CACHE_DIR = File.join(Alexandria::Library::DIR, '.webster_it_cache') REFERER = BASE_URI def initialize super("Webster_it", "Webster Italia") FileUtils.mkdir_p(CACHE_DIR) unless File.exists?(CACHE_DIR) # no preferences for the moment at_exit { clean_cache } end def search(criterion, type) req = BASE_URI + "/" req += case type when SEARCH_BY_ISBN "BIT/" when SEARCH_BY_TITLE "c_search.php?noinput=1&shelf=BIT&title_query=" when SEARCH_BY_AUTHORS "c_search.php?noinput=1&shelf=BIT&author_query=" when SEARCH_BY_KEYWORD "c_search.php?noinput=1&shelf=BIT&subject_query=" else raise InvalidSearchTypeError end if type == SEARCH_BY_ISBN req += Library.canonicalise_isbn(criterion) else req += CGI.escape(criterion) end p req if $DEBUG data = transport.get(URI.parse(req)) if type == SEARCH_BY_ISBN to_book(data) rescue raise NoResultsError else begin results = [] each_book_page(data) do |code, title| results << to_book(transport.get(URI.parse(BASE_URI + "/BIT/" + code))) end return results rescue raise NoResultsError end end end def url(book) return nil unless book.isbn BASE_URI + "/BIT/" + Library.canonicalise_isbn(book.isbn) end ####### private ####### def to_book(data) raise unless md = /
  • Titolo:<\/span> ([^<]+)/.match(data) title = CGI.unescape(md[1].strip) if md = /([^<]+)/.match(data) title += " " + CGI.unescape(md[1].strip) end authors = [] if md = /
  • Autor([ei]):<\/span> ISBN:<\/span> ([^<]+)/.match(data) isbn = "978" + md[1].strip[0..8] isbn += String( Library.ean_checksum( Library.extract_numbers( isbn ) ) ) raise unless md = /
  • Editore:<\/span> Pagine:<\/span> ([^<]+)/.match(data) edition = "p. " + CGI.unescape(md[1].strip) else edition = nil end publish_year = nil if md = /
  • Data di Pubblicazione:<\/span> ([^"]+)/.match(data) publish_year = CGI.unescape(md[1].strip).to_i publish_year = nil if publish_year == 0 end if data =~ /javascript:popImage/ cover_url = BASE_URI + "/data/images/BIT/" + isbn[9 .. 11] + "/" + isbn + "p.jpg" # use "g" instead of "p" for bigger image cover_filename = isbn + ".tmp" Dir.chdir(CACHE_DIR) do File.open(cover_filename, "w") do |file| file.write open(cover_url, "Referer" => REFERER ).read end end medium_cover = CACHE_DIR + "/" + cover_filename if File.size(medium_cover) > 0 puts medium_cover + " has non-0 size" if $DEBUG return [ Book.new(title, authors, isbn, publisher, publish_year, edition),medium_cover ] end puts medium_cover + " has 0 size, removing ..." if $DEBUG File.delete(medium_cover) end return [ Book.new(title, authors, isbn, publisher, publish_year, edition) ] end def each_book_page(data) raise if data.scan(/  REFERER ).read end end medium_cover = CACHE_DIR + "/" + cover_filename if File.size(medium_cover) > 0 puts medium_cover + " has non-0 size" if $DEBUG return [ Book.new(title, authors, isbn, publisher, publish_year, edition),medium_cover ] end puts medium_cover + " has 0 size, removing ..." if $DEBUG File.delete(medium_cover) return [ Book.new(title, authors, isbn, publisher, publish_year, edition) ] end def each_book_page(data) raise if data.scan(//) { |a| yield a}.empty? end def clean_cache #FIXME begin ... rescue ... end? Dir.chdir(CACHE_DIR) do Dir.glob("*.tmp") do |file| puts "removing " + file if $DEBUG File.delete(file) end end end end end end From marrakis at gmail.com Fri Feb 2 08:34:12 2007 From: marrakis at gmail.com (Mathieu Leduc-Hamel) Date: Fri, 2 Feb 2007 08:34:12 -0500 Subject: [Alexandria-list] Book Providers and Tests In-Reply-To: <167b6aa00702011423n61c817car832ab257aa777dc0@mail.gmail.com> References: <167b6aa00702011423n61c817car832ab257aa777dc0@mail.gmail.com> Message-ID: Hi all, I'm sorry Joseph, I hadn't any time to code on alexandria last week and you had achieve all what I would do on the side of the book providers unit test. I already start to split the amazon.com provider in multiples book_provider, separated by language and country. I had some interrogation the current book provider window. I think we should have a way to select book providers by language and to have the possibility to disable the one we don't want. I will work on that. Another think, for my test, it would cool to have a command-line interface for alexandria, I already implement something in my own repository but I think it's a good improvement, particularly if we'll implement a network interface for alexandria. And by the way we could use that to force log level and the log file location, or another option. I will send you some patch shortly. But what do you think of that? Is this good idea? Do You see any other command-line options that should be available? See ya Mathieu Leduc-Hamel On 2/1/07, Joseph Method wrote: > The most recent commit has the book providers all passing their tests, > except for Barnes & Noble and MCU. To tell the truth, I got tired of > fixing book providers and commented out their tests and requires. If > someone else would like to fix these two, that would be great, and > they can be re-included for the release. Essentially, all of these > book providers didn't work because their regexes didn't match anything > and other basic issues (like amadeusbuch.at now being thalia.at!). As > things stand, the tests only certify that the book providers find a > *given isbn*. Additional tests have to be written to cover titles, > authors and keywords. While I'm on the topic, these are other tests > that need to be written: > > * Alexandria starts up completely and then shuts down when given the > close signal > * import an isbn list > * check that encoding is correct within book fields > > My impression is that unittests are mainly for testing backend > functionality. Is this true, or is it feasible to erect a full > Alexandria scaffold that will allow data entry through the Gtk > widgets? Is there a Ruby framework for this, and is it worth the > effort? Would anyone like to become the Testing Czar??? > > In any case, what I want to highlight about the current passing tests > is that 1) they don't tell you as much as you may think (they're only > as good as the assertions they make) and 2) at least within the > current release cycle, *tests must pass before a commit or a diff will > be accepted*. This means, for example, that any additional book > providers that don't pass tests will be rejected. As new tests are > written, this policy will cover whatever piece of functionality is > being tested. > > I think this is a sound policy for a collaborative project and I hope > that there can be consensus about this, but when I say "policy" I am > actually just talking about what I will work on. > > Oh, and a really boring, but helpful task for anyone who wants it: the > Amazon provider should be broken up into each of the separate language > sites, so people don't have to choose, and so we can see that they all > actually work (jp and de settings don't do anything currently). > Actually, there's more to it, because the providers should be capable > of activation/deactivation, but that's an additional step. > -- > -J. Method > _______________________________________________ > Alexandria-list mailing list > Alexandria-list at rubyforge.org > http://rubyforge.org/mailman/listinfo/alexandria-list > From tristil at gmail.com Fri Feb 2 10:30:04 2007 From: tristil at gmail.com (Joseph Method) Date: Fri, 2 Feb 2007 10:30:04 -0500 Subject: [Alexandria-list] Book Providers and Tests In-Reply-To: References: <167b6aa00702011423n61c817car832ab257aa777dc0@mail.gmail.com> Message-ID: <167b6aa00702020730l63c3ecbcwe357990f91df1aa6@mail.gmail.com> Mathieu, I thought of the command-line thing too. Yes, that's a great idea. But instead of sending me a diff, why not become a developer and commit it yourself? I just need your Rubyforge account name. At first I didn't know what you meant by being able to select book providers by language, but yes, if you mean being able to narrow the selections. The default should be all languages, though. Take a look at what's already covered in the preferences for book providers and how they're implemented on the actual book provider. It would be great to be able to turn off the book providers without having to go into preferences, just by clicking a check box on the book provider. On 2/2/07, Mathieu Leduc-Hamel wrote: > Hi all, > > I'm sorry Joseph, I hadn't any time to code on alexandria last week > and you had achieve all what I would do on the side of the book > providers unit test. I already start to split the amazon.com provider > in multiples book_provider, separated by language and country. > > I had some interrogation the current book provider window. I think we > should have a way to select book providers by language and to have the > possibility to disable the one we don't want. I will work on that. > > Another think, for my test, it would cool to have a command-line > interface for alexandria, I already implement something in my own > repository but I think it's a good improvement, particularly if we'll > implement a network interface for alexandria. And by the way we could > use that to force log level and the log file location, or another > option. > > I will send you some patch shortly. But what do you think of that? Is > this good idea? Do You see any other command-line options that should > be available? > > See ya > > Mathieu Leduc-Hamel > > On 2/1/07, Joseph Method wrote: > > The most recent commit has the book providers all passing their tests, > > except for Barnes & Noble and MCU. To tell the truth, I got tired of > > fixing book providers and commented out their tests and requires. If > > someone else would like to fix these two, that would be great, and > > they can be re-included for the release. Essentially, all of these > > book providers didn't work because their regexes didn't match anything > > and other basic issues (like amadeusbuch.at now being thalia.at!). As > > things stand, the tests only certify that the book providers find a > > *given isbn*. Additional tests have to be written to cover titles, > > authors and keywords. While I'm on the topic, these are other tests > > that need to be written: > > > > * Alexandria starts up completely and then shuts down when given the > > close signal > > * import an isbn list > > * check that encoding is correct within book fields > > > > My impression is that unittests are mainly for testing backend > > functionality. Is this true, or is it feasible to erect a full > > Alexandria scaffold that will allow data entry through the Gtk > > widgets? Is there a Ruby framework for this, and is it worth the > > effort? Would anyone like to become the Testing Czar??? > > > > In any case, what I want to highlight about the current passing tests > > is that 1) they don't tell you as much as you may think (they're only > > as good as the assertions they make) and 2) at least within the > > current release cycle, *tests must pass before a commit or a diff will > > be accepted*. This means, for example, that any additional book > > providers that don't pass tests will be rejected. As new tests are > > written, this policy will cover whatever piece of functionality is > > being tested. > > > > I think this is a sound policy for a collaborative project and I hope > > that there can be consensus about this, but when I say "policy" I am > > actually just talking about what I will work on. > > > > Oh, and a really boring, but helpful task for anyone who wants it: the > > Amazon provider should be broken up into each of the separate language > > sites, so people don't have to choose, and so we can see that they all > > actually work (jp and de settings don't do anything currently). > > Actually, there's more to it, because the providers should be capable > > of activation/deactivation, but that's an additional step. > > -- > > -J. Method > > _______________________________________________ > > Alexandria-list mailing list > > Alexandria-list at rubyforge.org > > http://rubyforge.org/mailman/listinfo/alexandria-list > > > -- -J. Method From marrakis at gmail.com Fri Feb 2 10:36:10 2007 From: marrakis at gmail.com (Mathieu Leduc-Hamel) Date: Fri, 2 Feb 2007 10:36:10 -0500 Subject: [Alexandria-list] Book Providers and Tests In-Reply-To: <167b6aa00702020730l63c3ecbcwe357990f91df1aa6@mail.gmail.com> References: <167b6aa00702011423n61c817car832ab257aa777dc0@mail.gmail.com> <167b6aa00702020730l63c3ecbcwe357990f91df1aa6@mail.gmail.com> Message-ID: Joseph: Yah it would be great, I'm already mark as developer on the alexandria website but i think you need enable something to give access to the subversion repository... For the book providers, yes everything should be enable by default, but selecting by language could be implemented by adding a language flag to each book providers. maybe we could add status flag to bookproviders too. just to disable them who are not so stable? L.C. Karssen: Yeah we can think of many option like that on the command line and I'm planning to offer the more options available in alexandria right now, on the command line, export would a great candidate. Mathieu Leduc-Hamel On 2/2/07, Joseph Method wrote: > Mathieu, I thought of the command-line thing too. Yes, that's a great > idea. But instead of sending me a diff, why not become a developer and > commit it yourself? I just need your Rubyforge account name. > > At first I didn't know what you meant by being able to select book > providers by language, but yes, if you mean being able to narrow the > selections. The default should be all languages, though. Take a look > at what's already covered in the preferences for book providers and > how they're implemented on the actual book provider. It would be great > to be able to turn off the book providers without having to go into > preferences, just by clicking a check box on the book provider. > > On 2/2/07, Mathieu Leduc-Hamel wrote: > > Hi all, > > > > I'm sorry Joseph, I hadn't any time to code on alexandria last week > > and you had achieve all what I would do on the side of the book > > providers unit test. I already start to split the amazon.com provider > > in multiples book_provider, separated by language and country. > > > > I had some interrogation the current book provider window. I think we > > should have a way to select book providers by language and to have the > > possibility to disable the one we don't want. I will work on that. > > > > Another think, for my test, it would cool to have a command-line > > interface for alexandria, I already implement something in my own > > repository but I think it's a good improvement, particularly if we'll > > implement a network interface for alexandria. And by the way we could > > use that to force log level and the log file location, or another > > option. > > > > I will send you some patch shortly. But what do you think of that? Is > > this good idea? Do You see any other command-line options that should > > be available? > > > > See ya > > > > Mathieu Leduc-Hamel > > > > On 2/1/07, Joseph Method wrote: > > > The most recent commit has the book providers all passing their tests, > > > except for Barnes & Noble and MCU. To tell the truth, I got tired of > > > fixing book providers and commented out their tests and requires. If > > > someone else would like to fix these two, that would be great, and > > > they can be re-included for the release. Essentially, all of these > > > book providers didn't work because their regexes didn't match anything > > > and other basic issues (like amadeusbuch.at now being thalia.at!). As > > > things stand, the tests only certify that the book providers find a > > > *given isbn*. Additional tests have to be written to cover titles, > > > authors and keywords. While I'm on the topic, these are other tests > > > that need to be written: > > > > > > * Alexandria starts up completely and then shuts down when given the > > > close signal > > > * import an isbn list > > > * check that encoding is correct within book fields > > > > > > My impression is that unittests are mainly for testing backend > > > functionality. Is this true, or is it feasible to erect a full > > > Alexandria scaffold that will allow data entry through the Gtk > > > widgets? Is there a Ruby framework for this, and is it worth the > > > effort? Would anyone like to become the Testing Czar??? > > > > > > In any case, what I want to highlight about the current passing tests > > > is that 1) they don't tell you as much as you may think (they're only > > > as good as the assertions they make) and 2) at least within the > > > current release cycle, *tests must pass before a commit or a diff will > > > be accepted*. This means, for example, that any additional book > > > providers that don't pass tests will be rejected. As new tests are > > > written, this policy will cover whatever piece of functionality is > > > being tested. > > > > > > I think this is a sound policy for a collaborative project and I hope > > > that there can be consensus about this, but when I say "policy" I am > > > actually just talking about what I will work on. > > > > > > Oh, and a really boring, but helpful task for anyone who wants it: the > > > Amazon provider should be broken up into each of the separate language > > > sites, so people don't have to choose, and so we can see that they all > > > actually work (jp and de settings don't do anything currently). > > > Actually, there's more to it, because the providers should be capable > > > of activation/deactivation, but that's an additional step. > > > -- > > > -J. Method > > > _______________________________________________ > > > Alexandria-list mailing list > > > Alexandria-list at rubyforge.org > > > http://rubyforge.org/mailman/listinfo/alexandria-list > > > > > > > > -- > -J. Method > From tristil at gmail.com Fri Feb 2 11:09:27 2007 From: tristil at gmail.com (Joseph Method) Date: Fri, 2 Feb 2007 11:09:27 -0500 Subject: [Alexandria-list] Book Providers and Tests In-Reply-To: References: <167b6aa00702011423n61c817car832ab257aa777dc0@mail.gmail.com> <167b6aa00702020730l63c3ecbcwe357990f91df1aa6@mail.gmail.com> Message-ID: <167b6aa00702020809n159afed6xab680a76570de450@mail.gmail.com> Hmm... I think what you have to do is check out alexandria again with svn checkout svn+ssh://developername at rubyforge.org/var/svn/alexandria/trunk/alexandria and merge your changes into that. I didn't see LC Karssen's suggestion. L.C., would you copy it to the list? On 2/2/07, Mathieu Leduc-Hamel wrote: > Joseph: > > Yah it would be great, I'm already mark as developer on the alexandria > website but i think you need enable something to give access to the > subversion repository... > > For the book providers, yes everything should be enable by default, > but selecting by language could be implemented by adding a language > flag to each book providers. > > maybe we could add status flag to bookproviders too. just to disable > them who are not so stable? > > L.C. Karssen: > > Yeah we can think of many option like that on the command line and I'm > planning to offer the more options available in alexandria right now, > on the command line, export would a great candidate. > > > > Mathieu Leduc-Hamel > > On 2/2/07, Joseph Method wrote: > > Mathieu, I thought of the command-line thing too. Yes, that's a great > > idea. But instead of sending me a diff, why not become a developer and > > commit it yourself? I just need your Rubyforge account name. > > > > At first I didn't know what you meant by being able to select book > > providers by language, but yes, if you mean being able to narrow the > > selections. The default should be all languages, though. Take a look > > at what's already covered in the preferences for book providers and > > how they're implemented on the actual book provider. It would be great > > to be able to turn off the book providers without having to go into > > preferences, just by clicking a check box on the book provider. > > > > On 2/2/07, Mathieu Leduc-Hamel wrote: > > > Hi all, > > > > > > I'm sorry Joseph, I hadn't any time to code on alexandria last week > > > and you had achieve all what I would do on the side of the book > > > providers unit test. I already start to split the amazon.com provider > > > in multiples book_provider, separated by language and country. > > > > > > I had some interrogation the current book provider window. I think we > > > should have a way to select book providers by language and to have the > > > possibility to disable the one we don't want. I will work on that. > > > > > > Another think, for my test, it would cool to have a command-line > > > interface for alexandria, I already implement something in my own > > > repository but I think it's a good improvement, particularly if we'll > > > implement a network interface for alexandria. And by the way we could > > > use that to force log level and the log file location, or another > > > option. > > > > > > I will send you some patch shortly. But what do you think of that? Is > > > this good idea? Do You see any other command-line options that should > > > be available? > > > > > > See ya > > > > > > Mathieu Leduc-Hamel > > > > > > On 2/1/07, Joseph Method wrote: > > > > The most recent commit has the book providers all passing their tests, > > > > except for Barnes & Noble and MCU. To tell the truth, I got tired of > > > > fixing book providers and commented out their tests and requires. If > > > > someone else would like to fix these two, that would be great, and > > > > they can be re-included for the release. Essentially, all of these > > > > book providers didn't work because their regexes didn't match anything > > > > and other basic issues (like amadeusbuch.at now being thalia.at!). As > > > > things stand, the tests only certify that the book providers find a > > > > *given isbn*. Additional tests have to be written to cover titles, > > > > authors and keywords. While I'm on the topic, these are other tests > > > > that need to be written: > > > > > > > > * Alexandria starts up completely and then shuts down when given the > > > > close signal > > > > * import an isbn list > > > > * check that encoding is correct within book fields > > > > > > > > My impression is that unittests are mainly for testing backend > > > > functionality. Is this true, or is it feasible to erect a full > > > > Alexandria scaffold that will allow data entry through the Gtk > > > > widgets? Is there a Ruby framework for this, and is it worth the > > > > effort? Would anyone like to become the Testing Czar??? > > > > > > > > In any case, what I want to highlight about the current passing tests > > > > is that 1) they don't tell you as much as you may think (they're only > > > > as good as the assertions they make) and 2) at least within the > > > > current release cycle, *tests must pass before a commit or a diff will > > > > be accepted*. This means, for example, that any additional book > > > > providers that don't pass tests will be rejected. As new tests are > > > > written, this policy will cover whatever piece of functionality is > > > > being tested. > > > > > > > > I think this is a sound policy for a collaborative project and I hope > > > > that there can be consensus about this, but when I say "policy" I am > > > > actually just talking about what I will work on. > > > > > > > > Oh, and a really boring, but helpful task for anyone who wants it: the > > > > Amazon provider should be broken up into each of the separate language > > > > sites, so people don't have to choose, and so we can see that they all > > > > actually work (jp and de settings don't do anything currently). > > > > Actually, there's more to it, because the providers should be capable > > > > of activation/deactivation, but that's an additional step. > > > > -- > > > > -J. Method > > > > _______________________________________________ > > > > Alexandria-list mailing list > > > > Alexandria-list at rubyforge.org > > > > http://rubyforge.org/mailman/listinfo/alexandria-list > > > > > > > > > > > > > -- > > -J. Method > > > -- -J. Method From lennart at karssen.org Fri Feb 2 11:38:27 2007 From: lennart at karssen.org (L.C. Karssen) Date: Fri, 02 Feb 2007 17:38:27 +0100 Subject: [Alexandria-list] Book Providers and Tests Message-ID: <1170434307.22161.19.camel@rubidium01> Sorry, I forgot to include the list. Here's my answer to Mathieu: On Fri, 2007-02-02 at 08:34 -0500, Mathieu Leduc-Hamel wrote: > (...) > Do You see any other command-line options that should > be available? What about the possibility to export to a specific file format? Or to stdout? So that you could do "alexandria --export --type=plain |grep -i dickens" to get a list of all books from Dickens in your library, which you could then use for other interesting things. This way you could send yourself a weekly e-mail with the titles of the books you lend to other people (and should be getting back). And if export to BiBTeX is implemented (don't remember if that has been done already) you could include such a command in a Makefile to generate the bibliography of your LaTeX documents. Or you could use it to generate dynamic HTML pages of your libraries... Hmm, I'm getting a bit carried away here, I think :-). Lennart. -- ---------------------------------------------------- L.C. Karssen Utrecht The Netherlands lennart at karssen.org Stuur me aub geen Word of Powerpoint bestanden! Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html ---------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/alexandria-list/attachments/20070202/7b2fa1fe/attachment-0001.bin From bel8 at lilik.it Sat Feb 3 02:59:21 2007 From: bel8 at lilik.it (Claudio Belotti) Date: Sat, 03 Feb 2007 08:59:21 +0100 Subject: [Alexandria-list] alexandria isbn bug in ibs_it Message-ID: <45C440D9.50904@lilik.it> dear Joseph, let me thank you for all the work you are doing in Alexandria. I think you changed this lines of ibs_it.rb because of a failure in the tests > - raise unless md = //i.match(data) > + raise "No ISBN" unless md = //i.match(data) > + #It was getting the isbn with a number tacked onto the front? Try removing first three digits. > + isbn = isbn[3...isbn.length] #A place where Python has a much better idiom. the isbn I put in the test was failing because the provider is now using longer ISBN i.e. 9788851520663 instead of 8851520666 but if you just cut the first 3 digits the checksum fails! So now when I add a new book and I open it to check the details Alexandria complains because the ISBN could not be verified do you agree to stay with the longer isbn and to change the one used in test_providers? Claudio From tristil at gmail.com Sat Feb 3 11:06:44 2007 From: tristil at gmail.com (Joseph Method) Date: Sat, 3 Feb 2007 11:06:44 -0500 Subject: [Alexandria-list] alexandria isbn bug in ibs_it In-Reply-To: <45C440D9.50904@lilik.it> References: <45C440D9.50904@lilik.it> Message-ID: <167b6aa00702030806v134888aate94238f3da089ce9@mail.gmail.com> Oh, sure. I didn't understand what was going on! :) In general, do we know if Alexandria is handling the 13 digit isbns well? On 2/3/07, Claudio Belotti wrote: > dear Joseph, > let me thank you for all the work you are doing in Alexandria. > I think you changed this lines of ibs_it.rb because of a failure in the > tests > > > - raise unless md = //i.match(data) > > + raise "No ISBN" unless md = //i.match(data) > > + #It was getting the isbn with a number tacked onto the front? Try removing first three digits. > > + isbn = isbn[3...isbn.length] #A place where Python has a much better idiom. > > the isbn I put in the test was failing because the provider is now using > longer ISBN > > i.e. > 9788851520663 > instead of 8851520666 > > but if you just cut the first 3 digits the checksum fails! > So now when I add a new book and I open it to check the details > Alexandria complains because the ISBN could not be verified > > do you agree to stay with the longer isbn and to change the one used in > test_providers? > > Claudio > > > > -- -J. Method From tristil at gmail.com Sat Feb 3 15:39:02 2007 From: tristil at gmail.com (Joseph Method) Date: Sat, 3 Feb 2007 15:39:02 -0500 Subject: [Alexandria-list] alexandria isbn bug in ibs_it In-Reply-To: <167b6aa00702030806v134888aate94238f3da089ce9@mail.gmail.com> References: <45C440D9.50904@lilik.it> <167b6aa00702030806v134888aate94238f3da089ce9@mail.gmail.com> Message-ID: <167b6aa00702031239n726aa395ya4898bfcf2df396f@mail.gmail.com> Claudio, what do you think about bug [#4931] in light of your changes? Can it be closed? On 2/3/07, Joseph Method wrote: > Oh, sure. I didn't understand what was going on! :) In general, do we > know if Alexandria is handling the 13 digit isbns well? > > On 2/3/07, Claudio Belotti wrote: > > dear Joseph, > > let me thank you for all the work you are doing in Alexandria. > > I think you changed this lines of ibs_it.rb because of a failure in the > > tests > > > > > - raise unless md = //i.match(data) > > > + raise "No ISBN" unless md = //i.match(data) > > > + #It was getting the isbn with a number tacked onto the front? Try removing first three digits. > > > + isbn = isbn[3...isbn.length] #A place where Python has a much better idiom. > > > > the isbn I put in the test was failing because the provider is now using > > longer ISBN > > > > i.e. > > 9788851520663 > > instead of 8851520666 > > > > but if you just cut the first 3 digits the checksum fails! > > So now when I add a new book and I open it to check the details > > Alexandria complains because the ISBN could not be verified > > > > do you agree to stay with the longer isbn and to change the one used in > > test_providers? > > > > Claudio > > > > > > > > > > > -- > -J. Method > -- -J. Method From bel8 at lilik.it Sun Feb 4 05:16:10 2007 From: bel8 at lilik.it (Claudio Belotti) Date: Sun, 04 Feb 2007 11:16:10 +0100 Subject: [Alexandria-list] alexandria isbn bug in ibs_it In-Reply-To: <167b6aa00702030806v134888aate94238f3da089ce9@mail.gmail.com> References: <45C440D9.50904@lilik.it> <167b6aa00702030806v134888aate94238f3da089ce9@mail.gmail.com> Message-ID: <45C5B26A.7040206@lilik.it> Joseph Method wrote: > Oh, sure. I didn't understand what was going on! :) In general, do we > know if Alexandria is handling the 13 digit isbns well? it seems ok to me, maybe we have to implement an isequal? that answers true if the isbn code is the same in 13 or 8 digits (maybe is there I haven't looked) Claudio From bel8 at lilik.it Sun Feb 4 05:18:58 2007 From: bel8 at lilik.it (Claudio Belotti) Date: Sun, 04 Feb 2007 11:18:58 +0100 Subject: [Alexandria-list] alexandria isbn bug in ibs_it In-Reply-To: <167b6aa00702031239n726aa395ya4898bfcf2df396f@mail.gmail.com> References: <45C440D9.50904@lilik.it> <167b6aa00702030806v134888aate94238f3da089ce9@mail.gmail.com> <167b6aa00702031239n726aa395ya4898bfcf2df396f@mail.gmail.com> Message-ID: <45C5B312.3040207@lilik.it> Joseph Method wrote: > Claudio, what do you think about bug [#4931] in light of your changes? > Can it be closed? in my case the details window freezes, I think we have to unencode the accented chars: see my message http://rubyforge.org/pipermail/alexandria-list/2007-January/001194.html or we have to find a library that's already handling à properly Claudio From tristil at gmail.com Sun Feb 4 12:33:20 2007 From: tristil at gmail.com (Joseph Method) Date: Sun, 4 Feb 2007 12:33:20 -0500 Subject: [Alexandria-list] alexandria isbn bug in ibs_it In-Reply-To: <45C5B312.3040207@lilik.it> References: <45C440D9.50904@lilik.it> <167b6aa00702030806v134888aate94238f3da089ce9@mail.gmail.com> <167b6aa00702031239n726aa395ya4898bfcf2df396f@mail.gmail.com> <45C5B312.3040207@lilik.it> Message-ID: <167b6aa00702040933i597cc799x776f55401a1a98fd@mail.gmail.com> Okay, but here is what I have found: if there is one book that has improper encoding, the properties window on any book (is this what you meant by details window?) will crash when editing the title. This is because it is trying to do book completion using the titles in all the libraries. If you remove the book with improper encoding you *should* be able to edit the properties. So for testing, you need to make sure that any books that were entered with improper encoding are deleted. In svn trunk I get many fewer encoding errors than I used to. What we need is a list of isbns that cause problems, one for each type of encoding problem (`) etc. to form the basis of a test. > in my case the details window freezes, I think we have to unencode the > accented chars: > > see my message > http://rubyforge.org/pipermail/alexandria-list/2007-January/001194.html > > or we have to find a library that's already handling à properly > > Claudio -- -J. Method From costanti at science.unitn.it Sat Feb 17 03:48:44 2007 From: costanti at science.unitn.it (Marco Costantini) Date: Sat, 17 Feb 2007 09:48:44 +0100 Subject: [Alexandria-list] Encoding (was: alexandria isbn bug in ibs_it) In-Reply-To: <167b6aa00702040933i597cc799x776f55401a1a98fd@mail.gmail.com> References: <45C440D9.50904@lilik.it> <45C5B312.3040207@lilik.it> <167b6aa00702040933i597cc799x776f55401a1a98fd@mail.gmail.com> Message-ID: <200702170948.44345.costanti@science.unitn.it> Dear all, I have solved most of the problems with encoding by adding data = data.convert("UTF-8", "iso-8859-15") or similar, after def to_book(data), to the various providers. This also solves the crashes, and most of the warning messages. Later today I will also add the conversion on the other way, for search by title/author/keyword. The provider renaud uses the HTML entities like è , and for those cases I'm going to add also CGI::unescapeHTML . The encoding problem will remain only for a few providers like ibs.it , that uses è . This is not too bad, because there is not the problem that the encoding is not compatible with utf-8; and if the data is exported as HTML page, the encoding is OK. However, for handling this, the method proposed by Claudio seems OK to me. Greetings, Marco On Sunday 04 February 2007 18:33, Joseph Method wrote: > Okay, but here is what I have found: if there is one book that has > improper encoding, the properties window on any book (is this what you > meant by details window?) will crash when editing the title. This is > because it is trying to do book completion using the titles in all the > libraries. If you remove the book with improper encoding you *should* > be able to edit the properties. So for testing, you need to make sure > that any books that were entered with improper encoding are deleted. > In svn trunk I get many fewer encoding errors than I used to. What we > need is a list of isbns that cause problems, one for each type of > encoding problem (`) etc. to form the basis of a test. > > > in my case the details window freezes, I think we have to unencode the > > accented chars: > > > > see my message > > http://rubyforge.org/pipermail/alexandria-list/2007-January/001194.html > > > > or we have to find a library that's already handling à properly > > > > Claudio From costanti at science.unitn.it Sat Feb 17 07:38:54 2007 From: costanti at science.unitn.it (Marco Costantini) Date: Sat, 17 Feb 2007 13:38:54 +0100 Subject: [Alexandria-list] Switching to EAN / ISBN13 In-Reply-To: <167b6aa00702030806v134888aate94238f3da089ce9@mail.gmail.com> References: <45C440D9.50904@lilik.it> <167b6aa00702030806v134888aate94238f3da089ce9@mail.gmail.com> Message-ID: <200702171338.54267.costanti@science.unitn.it> Dear all, starting from 1/1/2007, the standard requires to switch from ISBN (for instance 88-87554-09-9) to the so called EAN or ISBN 13 (for instance 978-88-87554-09-0). For more information see http://en.wikipedia.org/wiki/International_Standard_Book_Number and http://en.wikipedia.org/wiki/European_Article_Number (In this mail, "EAN" is short for "EAN or ISBN 13" and ISBN means "ISBN 10".) The EAN is used also for music CDs, movies DVDs, (and all the stuff that you buy at the supermarket), and so by switching to EAN we automatically get the advantage that Alexandria can be used to catalogate CDs and DVDs too. Note that some of the providers used by Alexandria, such as bol.it, sell also CDs or DVDs, and are also almost ready for being used by Alexandria also for cataloging that. By changing a few lines in my copy of Alexandria, I have already cataloged some CDs and DVDs. I think that Alexandria should focus on books, but we should avoid anything unnecessary that can prevent using Alexandria to catalogate something else. See also feature request #783, #866 http://rubyforge.org/tracker/index.php?func=detail&aid=783&group_id=205&atid=866 http://rubyforge.org/tracker/index.php?func=detail&aid=2332&group_id=205&atid=866 On Saturday 03 February 2007 17:06, Joseph Method wrote: > do we know if Alexandria is handling the 13 digit isbns well? Yes, apart the required changes that I describe below. * Internally, Alexandria must use only the EAN, and the data stored must contain only the EAN. The isbn must not be stored, because it can be obtained from the EAN. On this topic, see http://www.catb.org/~esr/writings/taoup/html/ch04s02.html#spot_rule * The text in the dialog box for adding a new book should be changed from "ISBN" to "EAN or ISBN" or something like this. It should be checked whether other points require the same change. * In file alexandria/library.rb there is the function canonicalise_isbn Besides this function, it's needed a function canonicalise_ean, that converts to EAN. * The function canonicalise_isbn must be fixed. Currently, it is def self.canonicalise_isbn(isbn) numbers = self.extract_numbers(isbn) canonical = if self.valid_ean?(isbn) # Looks like an EAN number -- extract the intersting part and # calculate a checksum. It would be nice if we could validate # the EAN number somehow. numbers[3 .. 11] + [self.isbn_checksum(numbers[3 .. 11])] ... this can be done only if numbers[0 .. 2] = [9,7,8] . For a valid EAN whose first digits are not 978, the output of canonicalise_isbn should be the EAN itself. * Each occurrence of canonicalise_isbn must be checked whether it is OK or it must be replaced by canonicalise_ean. * For each provider: the main library must pass to the functions search(criterion, type) and url(book) the EAN. The code for the provider must send the EAN to the provider, unless the provider supports only the ISBN: in this case, the code for that provider will use canonicalise_isbn for the conversion. The code for the provider must return the EAN. If the provider returns only the ISBN, canonicalise_ean will be used. I have already implemented part of this point. * The current option to export the library as a list of ISBNs should be replaced by the two options to export as a list of EANs, and as a list of ISBN-10. * The column ISBN should be replaced by two columns EAN and ISBN, of which only one will be visible by default. * When adding a new book, if the EAN or ISBN is not found, it would be useful if the error message reports which EAN has not been found. it will be especially useful in case that a long list of EANs or ISBNs is imported. (By the way, a recent change in the repository causes that Alexandria crashes when trying to import.) * Let's also be aware that existing data collected by Alexandria users is organized using ISBN, and that it should somehow switch to EAN. Greetings, Marco From tristil at gmail.com Mon Feb 19 00:05:16 2007 From: tristil at gmail.com (Joseph Method) Date: Mon, 19 Feb 2007 00:05:16 -0500 Subject: [Alexandria-list] Switching to EAN / ISBN13 In-Reply-To: <200702171338.54267.costanti@science.unitn.it> References: <45C440D9.50904@lilik.it> <167b6aa00702030806v134888aate94238f3da089ce9@mail.gmail.com> <200702171338.54267.costanti@science.unitn.it> Message-ID: <167b6aa00702182105p47d697b5u884153ce7480436a@mail.gmail.com> This is great stuff, Marco. I personally don't see any issue with adding other media, even within this release cycle if there are no issues. We can do some exciting things with music and dvds in later cycles, for example representing music indexed by rhythmbox, etc. In what you discuss, everything makes sense. If ean is more essential than isbn, we should switch to using that as the attribute. The nice thing about Ruby is that we can switch the isbn accessor to use the ean internally and also add an ean accessor for new and refactored code. Data migration is the biggest issue, then. We can catch whether the yaml files store the old representation or not and just brute force update them to the new format. In the future, we can set a data format version number to help with later migrations. On 2/17/07, Marco Costantini wrote: > Dear all, > starting from 1/1/2007, the standard requires to switch from ISBN (for > instance 88-87554-09-9) to the so called EAN or ISBN 13 (for instance > 978-88-87554-09-0). For more information see > http://en.wikipedia.org/wiki/International_Standard_Book_Number > and > http://en.wikipedia.org/wiki/European_Article_Number > > (In this mail, "EAN" is short for "EAN or ISBN 13" and ISBN means "ISBN 10".) > > The EAN is used also for music CDs, movies DVDs, (and all the stuff that you > buy at the supermarket), and so by switching to EAN we automatically get the > advantage that Alexandria can be used to catalogate CDs and DVDs too. > Note that some of the providers used by Alexandria, such as bol.it, sell also > CDs or DVDs, and are also almost ready for being used by Alexandria also for > cataloging that. By changing a few lines in my copy of Alexandria, I have > already cataloged some CDs and DVDs. > > I think that Alexandria should focus on books, but we should avoid anything > unnecessary that can prevent using Alexandria to catalogate something else. > See also feature request #783, #866 > http://rubyforge.org/tracker/index.php?func=detail&aid=783&group_id=205&atid=866 > http://rubyforge.org/tracker/index.php?func=detail&aid=2332&group_id=205&atid=866 > > On Saturday 03 February 2007 17:06, Joseph Method wrote: > > do we know if Alexandria is handling the 13 digit isbns well? > > Yes, apart the required changes that I describe below. > > * Internally, Alexandria must use only the EAN, and the data stored must > contain only the EAN. The isbn must not be stored, because it can be obtained > from the EAN. On this topic, see > http://www.catb.org/~esr/writings/taoup/html/ch04s02.html#spot_rule > > * The text in the dialog box for adding a new book should be changed from > "ISBN" to "EAN or ISBN" or something like this. It should be checked whether > other points require the same change. > > * In file alexandria/library.rb there is the function canonicalise_isbn > Besides this function, it's needed a function canonicalise_ean, that converts > to EAN. > > * The function canonicalise_isbn must be fixed. Currently, it is > > def self.canonicalise_isbn(isbn) > numbers = self.extract_numbers(isbn) > > canonical = if self.valid_ean?(isbn) > # Looks like an EAN number -- extract the intersting part and > # calculate a checksum. It would be nice if we could validate > # the EAN number somehow. > numbers[3 .. 11] + [self.isbn_checksum(numbers[3 .. 11])] > ... > > this can be done only if numbers[0 .. 2] = [9,7,8] . For a valid EAN whose > first digits are not 978, the output of canonicalise_isbn should be the EAN > itself. > > * Each occurrence of canonicalise_isbn must be checked whether it is OK or it > must be replaced by canonicalise_ean. > > * For each provider: the main library must pass to the functions > search(criterion, type) and url(book) the EAN. > > The code for the provider must send the EAN to the provider, unless the > provider supports only the ISBN: in this case, the code for that provider > will use canonicalise_isbn for the conversion. > > The code for the provider must return the EAN. If the provider returns only > the ISBN, canonicalise_ean will be used. > > I have already implemented part of this point. > > * The current option to export the library as a list of ISBNs should be > replaced by the two options to export as a list of EANs, and as a list of > ISBN-10. > > * The column ISBN should be replaced by two columns EAN and ISBN, of which > only one will be visible by default. > > * When adding a new book, if the EAN or ISBN is not found, it would be useful > if the error message reports which EAN has not been found. it will be > especially useful in case that a long list of EANs or ISBNs is imported. > > (By the way, a recent change in the repository causes that Alexandria crashes > when trying to import.) > > * Let's also be aware that existing data collected by Alexandria users is > organized using ISBN, and that it should somehow switch to EAN. > > Greetings, > Marco > _______________________________________________ > Alexandria-list mailing list > Alexandria-list at rubyforge.org > http://rubyforge.org/mailman/listinfo/alexandria-list > -- -J. Method From costanti at science.unitn.it Wed Feb 21 18:17:38 2007 From: costanti at science.unitn.it (Marco Costantini) Date: Thu, 22 Feb 2007 00:17:38 +0100 Subject: [Alexandria-list] Book Providers and Tests, request of help for MCU In-Reply-To: <167b6aa00702011423n61c817car832ab257aa777dc0@mail.gmail.com> References: <167b6aa00702011423n61c817car832ab257aa777dc0@mail.gmail.com> Message-ID: <200702220017.38387.costanti@science.unitn.it> Dear all, On Thursday 01 February 2007 23:23, Joseph Method wrote: > The most recent commit has the book providers all passing their tests, > except for Barnes & Noble and MCU. I have fixed Barnes & Noble, then I have fixed various small problems for the other providers. I have also added the new providers deastore.com, bol.it, webster.it, worldcat.org (Thank you Joseph for some fixes). For fixing MCU the help of someone is needed. It would be nice to have again MCU, the Spanish Culture Ministry, because according to Spanish law, from 1972 on, every publisher must give an ISBN to each published book, and must inform the Ministry about it. Hence MCU contains the data about every book published in Spain from 1972 on. This provider has switched from method "get" to method "post", and for than WWW:mechanize can be used, this is already done with provider deastore. About mechanize, see http://mechanize.rubyforge.org/ http://www.ntecs.de/blog-old/Blog/WWW-Mechanize.rdoc http://rubyforge.org/projects/mechanize/ The up-to-date site of MCU is http://www.mcu.es/libro/CE/AgenciaISBN/BBDDLibros/Sobre.html About this provider, see the bugs https://rubyforge.org/tracker/index.php?func=detail&aid=2518&group_id=205&atid=863 https://rubyforge.org/tracker/index.php?func=detail&aid=2533&group_id=205&atid=863 I've tried the following code, but I get an error page. Any cooperation is welcome. criterion = "8496075850" # other possibility criterion = "84-89464-99-5" require 'rubygems' require 'mechanize' agent = WWW::Mechanize.new agent.user_agent_alias = 'Mac Safari' page = agent.get('http://www.mcu.es/busquedaisbn/cargarFiltroIsbn.do?cache=init&layout=busquedaisbn&language=$ mcu_form = page.form('busquedaIsbnForm') mcu_form.fields.name('brscgi_WISB-C').value = criterion mcu_form.radiobuttons.name('brscgi_WDIS-C')[0].checked = false mcu_form.radiobuttons.name('brscgi_WDIS-C')[2].checked = true page = agent.submit(mcu_form) pp page By the way, installing WWW::Mechanize may be combersome, because of the dependencies. Here are the steps that work for me. sudo apt-get -y install ruby1.8-dev make gcc libc6-dev rdoc ragel libopenssl-ruby tar xfz rubygems-0.9.2.tgz cd rubygems-0.9.2/ sudo ruby setup.rb cd sudo gem install rake sudo gem install rake 2 sudo gem install hpricot --source http://code.whytheluckystiff.net tar mechanize-0.6.4.tgz cd mechanize-0.6.4/ sudo ruby setup.rb cd Greetings, Marco From costanti at science.unitn.it Wed Feb 21 18:54:47 2007 From: costanti at science.unitn.it (Marco Costantini) Date: Thu, 22 Feb 2007 00:54:47 +0100 Subject: [Alexandria-list] Things like raise "No edition" instead of nil Message-ID: <200702220054.47145.costanti@science.unitn.it> Dear all, in the code for the providers, there is in several places a code like raise "No edition" unless md = /regexp/.match(data) edition = CGI.unescape(md[1].strip) Or similar for authors and so on. However, the provider may not store information about the edition, or there may be no author (in the sense that the book is anonymous, or is made by small parts written by several people). In this case, the nice behavior of Alexandria would be to collect the available data even if partial, instead to reject all. Furthermore, currently if a book is not found, the error message is meaningless, such as "No edition". My proposal is to replace the code like above with something like if md = /regexp/.match(data) edition = CGI.unescape(md[1].strip) else edition = nil end or something similar (suggestions are welcome). Is there in ruby a more compact way for it? nil should be used in this case. I have already changed some places, in which the result was "" or "n/a". On this topic, there is also bug [#3130] "Problem importing Tellico library" http://rubyforge.org/tracker/index.php?func=detail&aid=3130&group_id=205&atid=863 Anyone wants to fix? Now also when the code executes return [ Book.new(...), image ] but the returned image is not valid, and the book is not added. This happens for instance with book 9788830417588 with webster.it. If a Book.new is called, but the image is not valid, then the book should be added, and the image discarded. All the best, Marco From tristil at gmail.com Wed Feb 21 23:14:29 2007 From: tristil at gmail.com (Joseph Method) Date: Wed, 21 Feb 2007 23:14:29 -0500 Subject: [Alexandria-list] Things like raise "No edition" instead of nil In-Reply-To: <200702220054.47145.costanti@science.unitn.it> References: <200702220054.47145.costanti@science.unitn.it> Message-ID: <167b6aa00702212014i1b72467bh23c71f7a92c8f17f@mail.gmail.com> Hi Marco, This was a design decision from the old team. I guess they didn't want the book search to return with too little data from a page, because it might just be garbage unrelated to the search. Putting nil instead of raising errors makes sense to me. Is this okay? edition = /regex/.match(data) edition = CGI.unescape(md[1].strip) if edition I find it arbitrary right now what information is required for a book to be valid, like binding as a required field. We can definitely simplify this, and make it user-definable. What might be called for in the future is a dialog that warns that all of the wanted information wasn't found. On 2/21/07, Marco Costantini wrote: > Dear all, > in the code for the providers, there is in several places a code like > > raise "No edition" unless md = /regexp/.match(data) > edition = CGI.unescape(md[1].strip) > > Or similar for authors and so on. > However, the provider may not store information about the edition, or there > may be no author (in the sense that the book is anonymous, or is made by > small parts written by several people). > > In this case, the nice behavior of Alexandria would be to collect the > available data even if partial, instead to reject all. Furthermore, currently > if a book is not found, the error message is meaningless, such as "No > edition". > > My proposal is to replace the code like above with something like > > if md = /regexp/.match(data) > edition = CGI.unescape(md[1].strip) > else > edition = nil > end > > or something similar (suggestions are welcome). Is there in ruby a more > compact way for it? > > nil should be used in this case. I have already changed some places, in which > the result was "" or "n/a". > > On this topic, there is also bug [#3130] "Problem importing Tellico library" > http://rubyforge.org/tracker/index.php?func=detail&aid=3130&group_id=205&atid=863 > Anyone wants to fix? > > Now also when the code executes return [ Book.new(...), image ] but the > returned image is not valid, and the book is not added. This happens for > instance with book 9788830417588 with webster.it. > If a Book.new is called, but the image is not valid, then the book should be > added, and the image discarded. > > All the best, > Marco > _______________________________________________ > Alexandria-list mailing list > Alexandria-list at rubyforge.org > http://rubyforge.org/mailman/listinfo/alexandria-list > -- -J. Method From tristil at gmail.com Wed Feb 21 23:33:58 2007 From: tristil at gmail.com (Joseph Method) Date: Wed, 21 Feb 2007 23:33:58 -0500 Subject: [Alexandria-list] Book Providers and Tests, request of help for MCU In-Reply-To: <200702220017.38387.costanti@science.unitn.it> References: <167b6aa00702011423n61c817car832ab257aa777dc0@mail.gmail.com> <200702220017.38387.costanti@science.unitn.it> Message-ID: <167b6aa00702212033g1842c124y8cd0174cf9c51eb2@mail.gmail.com> Thank you, Marco, for doing such a thorough job on the providers, and for thinking through the technical issues with the EAN. I'll take a look at MCU tomorrow. But is the url correct that agent retrieves? On 2/21/07, Marco Costantini wrote: > Dear all, > > On Thursday 01 February 2007 23:23, Joseph Method wrote: > > The most recent commit has the book providers all passing their tests, > > except for Barnes & Noble and MCU. > > I have fixed Barnes & Noble, then I have fixed various small problems for the > other providers. I have also added the new providers deastore.com, bol.it, > webster.it, worldcat.org (Thank you Joseph for some fixes). > > For fixing MCU the help of someone is needed. It would be nice to have again > MCU, the Spanish Culture Ministry, because according to Spanish law, from > 1972 on, every publisher must give an ISBN to each published book, and must > inform the Ministry about it. Hence MCU contains the data about every book > published in Spain from 1972 on. > > This provider has switched from method "get" to method "post", and for than > WWW:mechanize can be used, this is already done with provider deastore. > About mechanize, see > http://mechanize.rubyforge.org/ > http://www.ntecs.de/blog-old/Blog/WWW-Mechanize.rdoc > http://rubyforge.org/projects/mechanize/ > > The up-to-date site of MCU is > http://www.mcu.es/libro/CE/AgenciaISBN/BBDDLibros/Sobre.html > > About this provider, see the bugs > https://rubyforge.org/tracker/index.php?func=detail&aid=2518&group_id=205&atid=863 > https://rubyforge.org/tracker/index.php?func=detail&aid=2533&group_id=205&atid=863 > > I've tried the following code, but I get an error page. Any cooperation is > welcome. > > criterion = "8496075850" > # other possibility criterion = "84-89464-99-5" > require 'rubygems' > require 'mechanize' > agent = WWW::Mechanize.new > agent.user_agent_alias = 'Mac Safari' > page = > agent.get('http://www.mcu.es/busquedaisbn/cargarFiltroIsbn.do?cache=init&layout=busquedaisbn&language=$ > mcu_form = page.form('busquedaIsbnForm') > mcu_form.fields.name('brscgi_WISB-C').value = criterion > mcu_form.radiobuttons.name('brscgi_WDIS-C')[0].checked = false > mcu_form.radiobuttons.name('brscgi_WDIS-C')[2].checked = true > page = agent.submit(mcu_form) > pp page > > > By the way, installing WWW::Mechanize may be combersome, because of the > dependencies. Here are the steps that work for me. > > sudo apt-get -y install ruby1.8-dev make gcc libc6-dev rdoc ragel > libopenssl-ruby > > tar xfz rubygems-0.9.2.tgz > cd rubygems-0.9.2/ > sudo ruby setup.rb > cd > sudo gem install rake > sudo gem install rake > 2 > > sudo gem install hpricot --source http://code.whytheluckystiff.net > > tar mechanize-0.6.4.tgz > cd mechanize-0.6.4/ > sudo ruby setup.rb > cd > > Greetings, > Marco > > > -- -J. Method From tristil at gmail.com Thu Feb 22 00:04:31 2007 From: tristil at gmail.com (Joseph Method) Date: Thu, 22 Feb 2007 00:04:31 -0500 Subject: [Alexandria-list] Release plans Message-ID: <167b6aa00702212104n1c8d914at6b17293be94c9835@mail.gmail.com> I'd like to do a full-blown release (Sourceforge announcements, deb packaging, etcetera) by Sunday evening. This just means we release an official tarball for packagers to work with and for new users to download. I mainly want to do this to generate some new interest in Alexandria and to improve the quality of bug reports, since many come from Ubuntu, etc., where the old 6.1 Alexandria is paired with a broken libruby-gnome2 0.15. We could still do mini-releases for followup on currently unhandled bugs. Does anyone disagree with this plan? -- -J. Method From costanti at science.unitn.it Sat Feb 24 13:15:27 2007 From: costanti at science.unitn.it (Marco Costantini) Date: Sat, 24 Feb 2007 19:15:27 +0100 Subject: [Alexandria-list] Now Alexandria supports music CDs, movie DVDs, games, ... Message-ID: <200702241915.27411.costanti@science.unitn.it> Dear all, now Alexandria supports music CDs, movie DVDs, games, ... this require the svn version or the forthcoming version, and is done using the provider thalia.de. Of course thalia.de continue to support books. (thalia.de is the buyer of Amadeus Buch, a provider used in the past.) Simply enter the ISBN or EAN code, as usual, and if it is known by thalia.de, it will be cataloged. For CDs and DVDs, the EAN number, with 13 digits is required. If the number reported on the CD is shorter, just prepend some 0, to make it 13 digits long. Some Barcode scanners can be configured (by something like "enable EAN") to do this automatically. Regards, Marco From tristil at gmail.com Sat Feb 24 14:37:15 2007 From: tristil at gmail.com (Joseph Method) Date: Sat, 24 Feb 2007 14:37:15 -0500 Subject: [Alexandria-list] Now Alexandria supports music CDs, movie DVDs, games, ... In-Reply-To: <200702241915.27411.costanti@science.unitn.it> References: <200702241915.27411.costanti@science.unitn.it> Message-ID: <167b6aa00702241137x6c9c7b18uab83a101b704729d@mail.gmail.com> Great news. Do you expect any possible issues with image sizing, etc? As I mentioned before, I think this is a good direction for Alexandria to go in the future. It would be great to enable Amazon, etc. to do this as well. For the current release, I think this should be treated as an "unsupported feature". For later releases, we should consider how and whether to differentiate the data model and ui for CDs, DVDs, etc. For example, most functionality could depend on a common interface of Possession, but a MusicCD, which conforms to the Possession interface, would have special attributes, such as a track listing with links to open a music player. On 2/24/07, Marco Costantini wrote: > Dear all, > now Alexandria supports music CDs, movie DVDs, games, ... > this require the svn version or the forthcoming version, and is done using the > provider thalia.de. Of course thalia.de continue to support books. > (thalia.de is the buyer of Amadeus Buch, a provider used in the past.) > Simply enter the ISBN or EAN code, as usual, and if it is known by thalia.de, > it will be cataloged. > > For CDs and DVDs, the EAN number, with 13 digits is required. > If the number reported on the CD is shorter, just prepend some 0, to make it > 13 digits long. Some Barcode scanners can be configured (by something like > "enable EAN") to do this automatically. > > Regards, > Marco > _______________________________________________ > Alexandria-list mailing list > Alexandria-list at rubyforge.org > http://rubyforge.org/mailman/listinfo/alexandria-list > -- -J. Method From costanti at science.unitn.it Sun Feb 25 10:02:33 2007 From: costanti at science.unitn.it (Marco Costantini) Date: Sun, 25 Feb 2007 16:02:33 +0100 Subject: [Alexandria-list] Now Alexandria supports music CDs, movie DVDs, games, ... In-Reply-To: <167b6aa00702241137x6c9c7b18uab83a101b704729d@mail.gmail.com> References: <200702241915.27411.costanti@science.unitn.it> <167b6aa00702241137x6c9c7b18uab83a101b704729d@mail.gmail.com> Message-ID: <200702251602.33089.costanti@science.unitn.it> Dear all, On Saturday 24 February 2007 20:37, Joseph Method wrote: > Great news. Do you expect any possible issues with image sizing, etc? No, there aren't. The code used for that is the same code used for books (both in the file book_providers/thalia.rb and in the rest of Alexandria). Hence I expect that there are no issues other that those possible with books and a given provider. > As I mentioned before, I think this is a good direction for Alexandria > to go in the future. It would be great to enable Amazon, etc. to do > this as well. With provider bol.it is already possible to do the same. At the beginning of book_providers/bol_it.rb there is a line LOCALE = "libri" # possible locales are: "libri", "inglesi", "video", "musica", "choco" (Probably the name "locale" for the variable is not good.) By changing this variable, you can catalogate respectively Italian books, English books, video DVDs, music CDs, and chocolate (yes, they sell chocolate too). It would by nice to change this, and make the choice of this variable a selectable preference, but I don't know how to do it. Also with provider ibs_it it is possible to catalogate music: I have a slight modification of the file ibs_it that does the job. > For the current release, I think this should be treated as an > "unsupported feature". I would say "experimental feature". Currently, it is simply that Thalia returns information about a book or a CD or DVD, according to the EAN number entered; and Alexandria doesn't know that the object could not be a book. There isn't any option or feature to enable or disable, nor any part of the code specific to non-book objects. > For later releases, we should consider how and > whether to differentiate the data model and ui for CDs, DVDs, etc. For > example, most functionality could depend on a common interface of > Possession, but a MusicCD, which conforms to the Possession interface, > would have special attributes, such as a track listing with links to > open a music player. An integration with the ruby program MusicExtras (see http://musicextras.divineinvasion.net/ ) would be possible. Also providers specific to music could be added, see for instance http://www.cdquest.com/music/details.aspx?id=0028943876926 (0028943876926 is the EAN). Does someone of you know a good provider for CDs or DVDs? Good in the sense of wide database and easy parsable web pages. On the other hand, in my opinion, Alexandria should continue to focus on books, and provide the cataloging of non-book objects just as a byproduct of being a good program for books. Let's be warned about the risks of "featuritis" (see http://en.wikipedia.org/wiki/Featuritis ), and let's think well before adding all this stuff. Track listing could not fit so well in a program for books. Greetings, Marco From tristil at gmail.com Sun Feb 25 11:28:57 2007 From: tristil at gmail.com (Joseph Method) Date: Sun, 25 Feb 2007 11:28:57 -0500 Subject: [Alexandria-list] Now Alexandria supports music CDs, movie DVDs, games, ... In-Reply-To: <200702251602.33089.costanti@science.unitn.it> References: <200702241915.27411.costanti@science.unitn.it> <167b6aa00702241137x6c9c7b18uab83a101b704729d@mail.gmail.com> <200702251602.33089.costanti@science.unitn.it> Message-ID: <167b6aa00702250828j2a208481k68b720c0893702a9@mail.gmail.com> > On the other hand, in my opinion, Alexandria should continue to focus on > books, and provide the cataloging of non-book objects just as a byproduct of > being a good program for books. Let's be warned about the risks of > "featuritis" (see http://en.wikipedia.org/wiki/Featuritis ), and let's think > well before adding all this stuff. Track listing could not fit so well in a > program for books. > > Greetings, > Marco > This can also be discussed as the question of scope: what is the scope of the Alexandria project? First, I think we can agree that we shouldn't just pile on features, but that we should add features incrementally, each time appraising how well/whether it fits into the program. But on the scope of the project, my personal view is that it can be quite wide. After this release, I will write an email to the list describing some ambitious proposals for the future. I will describe them in more depth then, but my overarching idea is that Alexandria should be part of a suite of standalone applications that operate on a database of "personal objects". Alexandria itself would still be the flagship application, but its role would be more clearly defined as a viewer and cataloger of all the various types of objects. Some functions already in Alexandria, like borrowing, would actually be moved out to another application. The goal would be to make Alexandria a reference database for all the other applications in Gnome to use. In keeping with this vision, a longterm goal should be to store and display as much relevant information about each type of object. For example, an anthology of stories should also list the contained stories, not just the title and authors. For CDs, the tracks are this relevant information. But I agree that Alexandria should not be a music player! :) -- -J. Method From costanti at science.unitn.it Sun Feb 25 12:32:26 2007 From: costanti at science.unitn.it (Marco Costantini) Date: Sun, 25 Feb 2007 18:32:26 +0100 Subject: [Alexandria-list] Release plans In-Reply-To: <167b6aa00702212104n1c8d914at6b17293be94c9835@mail.gmail.com> References: <167b6aa00702212104n1c8d914at6b17293be94c9835@mail.gmail.com> Message-ID: <200702251832.26150.costanti@science.unitn.it> Dear Joseph and dear all, some comments before the new release. * Consider to use shipper for releasing, see http://www.catb.org/~esr/shipper/shipper.html * Provider mcu works again (they switched back to method GET, and it was easy to fix). * Everyone of you is invited to test and possibly to fix his/her favorite provider, because each provider could have had slight changes. * The problem with encoding persist in the following way. If an user has data collected with previous versions of Alexandria, this data may contain wrong encoding. When this legacy data is used by the forthcoming version of Alexandria, the encoding problem will happen again. As a solution, I see exporting the data as list of ISBN, and importing again, or manually editing of the files. The Unix command "file" maybe can be used to select which files need manual editing. * I'm preparing a FAQ for Alexandria, any comment is welcome. ########### There is still something that should be done before a release to be advertised. Another release "stop_gap" to be tested may be useful. * The provider proxis still returns the ISBN-10 code instead of the ISBN-13. I didn't succeed in fixing it. * I think it is necessary to complete the switch from ISBN-10 to ISBN-13. Almost all the calls to Library.canonicalise_isbn must be replaced by Library.canonicalise_ean, except in the directory book_providers and test. For this, each occurrence of Library.canonicalise_isbn must be manually checked. * For new books that are not found on the providers, the misleading message "No Title" is returned. This should be fixed. * Bugs 1633 and 8647, and feature requests 1672 and 1809 should be considered. Greetings, Marco On Thursday 22 February 2007 06:04, Joseph Method wrote: > I'd like to do a full-blown release (Sourceforge announcements, deb > packaging, etcetera) by Sunday evening. This just means we release an > official tarball for packagers to work with and for new users to > download. I mainly want to do this to generate some new interest in > Alexandria and to improve the quality of bug reports, since many come > from Ubuntu, etc., where the old 6.1 Alexandria is paired with a > broken libruby-gnome2 0.15. We could still do mini-releases for > followup on currently unhandled bugs. Does anyone disagree with this > plan? From tristil at gmail.com Sun Feb 25 12:45:47 2007 From: tristil at gmail.com (Joseph Method) Date: Sun, 25 Feb 2007 12:45:47 -0500 Subject: [Alexandria-list] Release plans In-Reply-To: <200702251832.26150.costanti@science.unitn.it> References: <167b6aa00702212104n1c8d914at6b17293be94c9835@mail.gmail.com> <200702251832.26150.costanti@science.unitn.it> Message-ID: <167b6aa00702250945v1aab1e8egfde985805eccfb9@mail.gmail.com> I agree that we should hold off on a real release. Setting a date was more to suggest that a release is imminent. :) I'll put up a candidate tarball once I get at a newly introduced bug (my fault). > * Consider to use shipper for releasing, see > http://www.catb.org/~esr/shipper/shipper.html Maybe. I might just do this manually, though. > * The problem with encoding persist in the following way. If an user has data > collected with previous versions of Alexandria, this data may contain wrong > encoding. When this legacy data is used by the forthcoming version of > Alexandria, the encoding problem will happen again. Yeah, this is a big problem. If we handle this wrongly, we'll get tons of bugs saying "Alexandria doesn't work :(" ... > * I'm preparing a FAQ for Alexandria, any comment is welcome. Take a look at the information already in the help manual for this. > * I think it is necessary to complete the switch from ISBN-10 to ISBN-13. > Almost all the calls to Library.canonicalise_isbn must be replaced by > Library.canonicalise_ean, except in the directory book_providers and test. > For this, each occurrence of Library.canonicalise_isbn must be manually > checked. What if you make canonicalise_isbn an alias for canonicalise_isbn_ean and throw a deprecation message, e.g., "canonicalise_isbn is deprecated, please use canonicalise_isbn_ean"? > * For new books that are not found on the providers, the misleading message > "No Title" is returned. This should be fixed. > > * Bugs 1633 and 8647, and feature requests 1672 and 1809 should be considered. > > Greetings, > Marco -- -J. Method From costanti at science.unitn.it Sun Feb 25 13:52:09 2007 From: costanti at science.unitn.it (Marco Costantini) Date: Sun, 25 Feb 2007 19:52:09 +0100 Subject: [Alexandria-list] Release plans In-Reply-To: <167b6aa00702250945v1aab1e8egfde985805eccfb9@mail.gmail.com> References: <167b6aa00702212104n1c8d914at6b17293be94c9835@mail.gmail.com> <200702251832.26150.costanti@science.unitn.it> <167b6aa00702250945v1aab1e8egfde985805eccfb9@mail.gmail.com> Message-ID: <200702251952.10250.costanti@science.unitn.it> Dear all. On Sunday 25 February 2007 18:45, Joseph Method wrote: > > * The problem with encoding persist in the following way. If an user has > > data collected with previous versions of Alexandria, this data may > > contain wrong encoding. When this legacy data is used by the forthcoming > > version of Alexandria, the encoding problem will happen again. > > Yeah, this is a big problem. If we handle this wrongly, we'll get tons > of bugs saying "Alexandria doesn't work :(" ... and they may quit forever using Alexandria. > > * I think it is necessary to complete the switch from ISBN-10 to > > ISBN-13. Almost all the calls to Library.canonicalise_isbn must be > > replaced by Library.canonicalise_ean, except in the directory > > book_providers and test. For this, each occurrence of > > Library.canonicalise_isbn must be manually checked. > > What if you make canonicalise_isbn an alias for canonicalise_isbn_ean > and throw a deprecation message, e.g., "canonicalise_isbn is > deprecated, please use canonicalise_isbn_ean"? No, because both are needed. For instance, providers Webster, Amazon, and Ls need isbn-10. Hence most of the occurrences can be replaced, but a check is needed. By the way, in the function Library.canonicalise_isbn there is the conversion from upc to isbn. If someone knows, can he/she add the conversion from upc to ean in the function canonicalise_isbn_ean"? All the best, Marco