Bugs: Browse | Submit New | Admin

[#27145] Book search by UTF-8 string with WorldCat provider

Date:
2009-09-19 16:46
Priority:
3
Submitted By:
Mamoru Tasaka (mtasaka)
Assigned To:
Cathal Mc Ginley (cathalmagus)
Category:
None
State:
Open
Summary:
Book search by UTF-8 string with WorldCat provider

Detailed description
It seems that WorldCat provider accepts book search by UTF-8
string.

With alexandria-0.6.5-7.fc12 book search by UTF-8 string on WorldCat
provider returns some results, however it does not detect the titles
of the found books correctly.

How to reproduce:
* Launch alexandria
* Make sure that WorldCat provider is selected at the top of the
  Providers' list
* Select "Add book" -> "Select by title", enter the string
  "ハヤテ" (Hiragana string, in Romaji "hayate")

Result:
The result of "$ alexandria --debug" is put on:
http://mtasaka.fedorapeople.org/Bugfix/alexandria-WorldCat-Hiragana.log

Some part of this log is as below:
----------------------------------------------------------------
WorldCat lookup
http://www.worldcat.org/search?q=ti%3A%E3%83%8F%E3%83%A4%E3%83%86&qt=advanced
D, [2009-09-19T04:53:05.443309 #13198] DEBUG -- <Obj Alexandria::UI::NewBookDialog>: update message : Searching
Provider 'WorldCat'...
Net::HTTPOK
10
D, [2009-09-19T04:53:07.230642 #13198] DEBUG -- <Obj Alexandria::BookProviders::WorldCatProvider>: Fetching book
from http://www.worldcat.org/oclc/232343379&referer=brief_results
I, [2009-09-19T04:53:08.751205 #13198]  INFO -- <Obj Alexandria::BookProviders::WorldCatProvider>: Found book
at WorldCat
----------------------------------------------------------------
Note that the title name of the book should be shown between
"Found book" and "at".

When searching books with "hayate" string (alphabet string),
"$ alexandria --debug" shows:
----------------------------------------------------------------
WorldCat lookup
http://www.worldcat.org/search?q=ti%3Ahayate&qt=advanced
D, [2009-09-20T01:44:34.550468 #2746] DEBUG -- <Obj Alexandria::UI::NewBookDialog>: update message : Searching
Provider 'WorldCat'...
Net::HTTPOK
10
D, [2009-09-20T01:44:44.762613 #2746] DEBUG -- <Obj Alexandria::BookProviders::WorldCatProvider>: Fetching book
from http://www.worldcat.org/oclc/294887634&referer=brief_results
I, [2009-09-20T01:44:49.517889 #2746]  INFO -- <Obj Alexandria::BookProviders::WorldCatProvider>: Found book Hayate
cross blade. 4 at WorldCat
----------------------------------------------------------------

Add A Comment: Notepad

Please login


Followup

Message
Date: 2009-09-20 04:16
Sender: Mamoru Tasaka

A small correction

"ハヤテ" is actually Katakana, not Hiragana :)
Date: 2009-09-20 04:02
Sender: Cathal Mc Ginley

Thanks for the report, confirmed here too!

I can see that the pages have an extra div or span with class
"vernacular" for the Hiragana text - I thought that
might be it, but the correct value is found by the equivalent
Palatina code. So it might be to do with how the WorldCat provider
downloads the HTML (AbstractProvider#transport).

I'll look into it further.

Attached Files:

Name Description Download
No Files Currently Attached

Changes:

Field Old Value Date By
assigned_tonone2009-09-20 04:02cathalmagus