From rochkind at jhu.edu Thu Jun 19 11:08:23 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Thu, 19 Jun 2008 11:08:23 -0400 Subject: [Umlaut-general] URL parsing, duplicate query param keys Message-ID: <485A7667.4090909@jhu.edu> Hi Ross and Jason. So, I've realized that Umlaut currently is not fully handling openurls in URL query params (KEV style) that may have duplicate param keys. Which is allowed by OpenURL. For instance, a KEV url could have two or more rft_id values, specified seperately with multiple query params all named rft_id. The reason this is tricky is because the Rails HTTP query param parser doesn't like this situation, it just ignores all but one of those values. So to start with, to fix this I think I'm going to have to write my own query param parser rather then relying on Rails. This is do-able. (Can anyone think of any other way to deal with this?). But then I wonder what I should parse it _into_, so I can feed it to ropenurl and get a full openurl object out of it. Ross, can you give me some advice here? I use OpenURL::ContextObject.new_from_form_vars to create my context object, currently just passing it the Rails params hash. If I instead pass it a hash that's like the Rails param hash, except some of the values are arrays instead of strings (when there are multiple values for the same query param key), will ContextObject do the right thing with it? If not, is this the right approach, should I fix ContextObject to do the right thing with that kind of string? Thanks for any advice, Jonathan -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From rossfsinger at gmail.com Thu Jun 19 14:06:36 2008 From: rossfsinger at gmail.com (Ross Singer) Date: Thu, 19 Jun 2008 14:06:36 -0400 Subject: [Umlaut-general] URL parsing, duplicate query param keys In-Reply-To: <485A7667.4090909@jhu.edu> References: <485A7667.4090909@jhu.edu> Message-ID: <23b83f160806191106t6e640bb5r9533cf980c64b23@mail.gmail.com> I *think* (but am not entirely sure) that new_from_form_vars was designed with regular Ruby CGI objects in mind. If that's right, this should work just fine. A simple way to know is if the parser checks to see if the values are arrays first. -Ross. On Thu, Jun 19, 2008 at 11:08 AM, Jonathan Rochkind wrote: > Hi Ross and Jason. > > So, I've realized that Umlaut currently is not fully handling openurls in > URL query params (KEV style) that may have duplicate param keys. Which is > allowed by OpenURL. For instance, a KEV url could have two or more rft_id > values, specified seperately with multiple query params all named rft_id. > > The reason this is tricky is because the Rails HTTP query param parser > doesn't like this situation, it just ignores all but one of those values. So > to start with, to fix this I think I'm going to have to write my own query > param parser rather then relying on Rails. This is do-able. (Can anyone > think of any other way to deal with this?). > > But then I wonder what I should parse it _into_, so I can feed it to > ropenurl and get a full openurl object out of it. Ross, can you give me some > advice here? I use OpenURL::ContextObject.new_from_form_vars to create my > context object, currently just passing it the Rails params hash. If I > instead pass it a hash that's like the Rails param hash, except some of the > values are arrays instead of strings (when there are multiple values for the > same query param key), will ContextObject do the right thing with it? If > not, is this the right approach, should I fix ContextObject to do the right > thing with that kind of string? > > Thanks for any advice, > Jonathan > > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > From rochkind at jhu.edu Thu Jun 19 14:21:18 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Thu, 19 Jun 2008 14:21:18 -0400 Subject: [Umlaut-general] URL parsing, duplicate query param keys In-Reply-To: <23b83f160806191106t6e640bb5r9533cf980c64b23@mail.gmail.com> References: <485A7667.4090909@jhu.edu> <23b83f160806191106t6e640bb5r9533cf980c64b23@mail.gmail.com> Message-ID: <485AA39E.9030101@jhu.edu> Hmm, is that a hint that there may already be a function in some "regular Ruby CGI object" to parse the query string for me in the way we need here, I don't need to write one myself from scratch? Cool, I'll look into it. Sounds like if new_from_form_vars does NOT play well with that, you agree that it's a reasonable approach to make it so? Cool, good, thanks. Jonathan Ross Singer wrote: > I *think* (but am not entirely sure) that new_from_form_vars was > designed with regular Ruby CGI objects in mind. If that's right, this > should work just fine. A simple way to know is if the parser checks > to see if the values are arrays first. > > -Ross. > > On Thu, Jun 19, 2008 at 11:08 AM, Jonathan Rochkind wrote: > >> Hi Ross and Jason. >> >> So, I've realized that Umlaut currently is not fully handling openurls in >> URL query params (KEV style) that may have duplicate param keys. Which is >> allowed by OpenURL. For instance, a KEV url could have two or more rft_id >> values, specified seperately with multiple query params all named rft_id. >> >> The reason this is tricky is because the Rails HTTP query param parser >> doesn't like this situation, it just ignores all but one of those values. So >> to start with, to fix this I think I'm going to have to write my own query >> param parser rather then relying on Rails. This is do-able. (Can anyone >> think of any other way to deal with this?). >> >> But then I wonder what I should parse it _into_, so I can feed it to >> ropenurl and get a full openurl object out of it. Ross, can you give me some >> advice here? I use OpenURL::ContextObject.new_from_form_vars to create my >> context object, currently just passing it the Rails params hash. If I >> instead pass it a hash that's like the Rails param hash, except some of the >> values are arrays instead of strings (when there are multiple values for the >> same query param key), will ContextObject do the right thing with it? If >> not, is this the right approach, should I fix ContextObject to do the right >> thing with that kind of string? >> >> Thanks for any advice, >> Jonathan >> >> -- >> Jonathan Rochkind >> Digital Services Software Engineer >> The Sheridan Libraries >> Johns Hopkins University >> 410.516.8886 rochkind (at) jhu.edu >> >> _______________________________________________ >> Umlaut-general mailing list >> Umlaut-general at rubyforge.org >> http://rubyforge.org/mailman/listinfo/umlaut-general >> >> -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From jronallo at gmail.com Fri Jun 20 22:12:43 2008 From: jronallo at gmail.com (Jason Ronallo) Date: Fri, 20 Jun 2008 22:12:43 -0400 Subject: [Umlaut-general] trunk, branches, tags Message-ID: <763570460806201912g65d43603kb57893e56a74f8d3@mail.gmail.com> Hi, Since I was asked today to commit some changes to a branch, I realized that the repo does not have a standard layout--no trunk, branches, tags directories to work in. Would it be alright if I moved the U2 directory contents into trunk and created branches to start committing on my branch? I'll do it in two separate commits: 1. move U2 into trunk 2. remove U2 directory. I wanted to let you (Jonathan and Ross) know about this before I went ahead and committed this change. I know I'll have to checkout the repo again and create a new project in my IDE that will use the new trunk directory. Is this alright with both of you? Please let me know so I can commit this. BTW, I've used git-svn to grab the whole contents of the svn repo as a complete backup. From a bit that I read it may be possible to have both svn and git repos on rubyforge. This could give us a chance to try before we buy if you're interested. Jason From jronallo at gmail.com Sat Jun 21 22:10:17 2008 From: jronallo at gmail.com (Jason Ronallo) Date: Sat, 21 Jun 2008 22:10:17 -0400 Subject: [Umlaut-general] enhancing metadata OR are my eyes failing me? Message-ID: <763570460806211910j14bcf78j6de7fe2ad470baeb@mail.gmail.com> Hi, I used the following OpenURL from WorldCat[1] http://localhost:3000/resolve?url_ver=Z39.88-2004&rfr_id=info%3Asid%2Fworldcat.org%3Aworldcat&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&req_dat=%3Csessionid%3E&rft_id=info%3Aoclcnum%2F959213&rft_id=urn%3AISBN%3A&rft_id=urn%3AISSN%3A&rft.aulast=Dickens&rft.aufirst=Charles&rft.auinitm=&rft.btitle=Oliver+Twist%2C&rft.atitle=&rft.date=1941&rft.tpages=&rft.isbn=&rft.aucorp=&rft.place=New+York&rft.pub=Dodd++Mead+%26+Co.&rft.edition=&rft.series=&rft.genre=book&url_ver=Z39.88-2004 Somehow I ended up with an ISBN as part of the metadata. The metadata from the OpenURL does not contain an ISBN. And the book is from 1941 and AFAIK ISBNs weren't used until the late 1960's. Maybe my eyes are failing me? I only have two services activated. My new Google Book Search service[2] and SFX (IUPUI). It looks to me as if SFX enhances the data with two different ISBNs. And the 10-digit ISBN shows up in the view. I find this very strange and confusing behavior to have an ISBN show with a pre-ISBN book. JHU's FindIt also enhances with a nonsensical ISBN: http://findit.library.jhu.edu/go/414489 [3] I get ISBN: 1-901843-62-9 JHU gives me: 3-458-34857-3 While both of these are in fact ISBNs for Oliver Twist, they are a step removed from the manifestation that was being requested. I think as we add more services we need to look for ways to do this differently. If I have an exact full view hit (from GBS or another foreground service) I may not want to present the Internet Archive hits as prominently since they are fuzzier matches. Another case: I have an OCLCnum (or another identifier) as my only piece of metadata. A query against the GBS API would return nothing for this particular oclcnum. Using xOCLCNUM I can get returned all related editions and their ISBNs and LCCNs. I first am interested in the identifiers related to the original oclcnum. If a GBS search for those fails (or if there is no oclcnum or lccn), only then am I interested in trying to get the user to a manifestation of one of the related identifiers from other editions. If I get results from these, I'll choose the best possible results (most viewable) to return. But I do not want to present these in the same way I would have if there was a match on the numbers for the original request. Does this make sense? Is there somewhere to put this kind of enhancing metadata without touching the original referent? I want to enhance from all kinds of sources, but don't want it all lumped into the original referent. I also wonder if how its being done now doesn't do screwy things with reusing cached responses. If we can figure out what needs to happen with this, I'm happy to try to write the code to make it happen. Jason [1] http://www.worldcat.org/oclc/959213 [2] Right now it works by accepting the rails request. I might commit a version without that. [3] http://findit.library.jhu.edu/resolve?url_ver=Z39.88-2004&rfr_id=info%3Asid%2Fworldcat.org%3Aworldcat&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&req_dat=%3Csessionid%3E&rft_id=info%3Aoclcnum%2F959213&rft_id=urn%3AISBN%3A&rft_id=urn%3AISSN%3A&rft.aulast=Dickens&rft.aufirst=Charles&rft.auinitm=&rft.btitle=Oliver+Twist%2C&rft.atitle=&rft.date=1941&rft.tpages=&rft.isbn=&rft.aucorp=&rft.place=New+York&rft.pub=Dodd++Mead+%26+Co.&rft.edition=&rft.series=&rft.genre=book&url_ver=Z39.88-2004 From rossfsinger at gmail.com Mon Jun 23 10:06:35 2008 From: rossfsinger at gmail.com (Ross Singer) Date: Mon, 23 Jun 2008 10:06:35 -0400 Subject: [Umlaut-general] trunk, branches, tags In-Reply-To: <763570460806201912g65d43603kb57893e56a74f8d3@mail.gmail.com> References: <763570460806201912g65d43603kb57893e56a74f8d3@mail.gmail.com> Message-ID: <23b83f160806230706i24b1720bn8cec6ec5fec90e87@mail.gmail.com> Jason, I'm all for this. The reason why it was the way it is currently (I think) is because Tech was stuck with Umlaut 1 and the migrated SVN repo (from Tech to Rubyforge) kept this distinction. Since nobody's using U1 now, U2 is all there is - I say go for it. As far as git, I think this is a good strategy. I really have no preferences regarding SCM. I'll screw things up no matter which one we choose, honestly. I like git (and bzr and its ilk) for purely preservation purposes, not necessarily a preference for their syntax (I have no preference in that regard -- all SCM is awkward in different ways). I guess this really depends on Jonathan. -Ross. On Fri, Jun 20, 2008 at 10:12 PM, Jason Ronallo wrote: > Hi, > Since I was asked today to commit some changes to a branch, I realized > that the repo does not have a standard layout--no trunk, branches, > tags directories to work in. Would it be alright if I moved the U2 > directory contents into trunk and created branches to start committing > on my branch? I'll do it in two separate commits: 1. move U2 into > trunk 2. remove U2 directory. I wanted to let you (Jonathan and Ross) > know about this before I went ahead and committed this change. I know > I'll have to checkout the repo again and create a new project in my > IDE that will use the new trunk directory. Is this alright with both > of you? Please let me know so I can commit this. > > BTW, I've used git-svn to grab the whole contents of the svn repo as a > complete backup. From a bit that I read it may be possible to have > both svn and git repos on rubyforge. This could give us a chance to > try before we buy if you're interested. > > Jason > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > From rochkind at jhu.edu Mon Jun 23 11:59:25 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Mon, 23 Jun 2008 11:59:25 -0400 Subject: [Umlaut-general] enhancing metadata OR are my eyes failing me? In-Reply-To: <763570460806211910j14bcf78j6de7fe2ad470baeb@mail.gmail.com> References: <763570460806211910j14bcf78j6de7fe2ad470baeb@mail.gmail.com> Message-ID: <485FC85D.1080507@jhu.edu> Yes, it would seem to be SFX. SFX is now doing matching on books in it's KB, it looks like. (author/title? OCLCnum? I haven't looked into it enough). It has 3 matches for Oliver Twist. Umlaut has some routines I wrote (in the SFX adaptor) that try to guess which one of these, if any, are most likely to have full text---I wrote these routines with periodicals in mind, not books. What do you mean by 'nonsensical' ISBN though? You just mean that it's getting ISBNs from SFX for alternate manifestations, right? (When you use the word 'nonsensical', I'm worried you're saying it's winding up with an ISBN that isnt' for oliver twist at all, or isnt' even a legal ISBN at all, or something). I could change the SFX code to not enhance metadata from SFX when the format/genre given is book. You think that's a good idea? Some would consider it a bonus to get an ISBN for _some_ manifestation of the work from SFX, since with that ISBN you can find a lot more services than you could without an ISBN, no? But I admit In general, with Internet Archive for instance, keep in mind that since we're doing author/title matching only, we can't do anything about distinguishing exact manifestation from another version fo the work anyway. And I'm not sure users clicking on an OPenURL that leads you to Umlaut are ever frequently _intending_ to ask for a specific version (instead of just the work) anyhow. It's unclear. I think this stuff is going to always going to be messy due to the messy state of our metadata, I'm not sure it's worth it to try to pin it down. I think your ideas about somehow making the Internet Archive hits show up "less prominently" as a result of the GBS hits end up getting rather too micro-managing, requiring infrastructure that's not there in Umlaut right now, and I'm not sure it's worth the extra complexity. There is at the moment no way of enhancing metadata without touching the original referent, no. I'm in general not sure the added complexity to umlaut is worth it, for what you are considering. It's complicated enough already. Jonathan Jason Ronallo wrote: > Hi, > I used the following OpenURL from WorldCat[1] > > http://localhost:3000/resolve?url_ver=Z39.88-2004&rfr_id=info%3Asid%2Fworldcat.org%3Aworldcat&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&req_dat=%3Csessionid%3E&rft_id=info%3Aoclcnum%2F959213&rft_id=urn%3AISBN%3A&rft_id=urn%3AISSN%3A&rft.aulast=Dickens&rft.aufirst=Charles&rft.auinitm=&rft.btitle=Oliver+Twist%2C&rft.atitle=&rft.date=1941&rft.tpages=&rft.isbn=&rft.aucorp=&rft.place=New+York&rft.pub=Dodd++Mead+%26+Co.&rft.edition=&rft.series=&rft.genre=book&url_ver=Z39.88-2004 > > Somehow I ended up with an ISBN as part of the metadata. The metadata > from the OpenURL does not contain an ISBN. And the book is from 1941 > and AFAIK ISBNs weren't used until the late 1960's. Maybe my eyes are > failing me? > > I only have two services activated. My new Google Book Search > service[2] and SFX (IUPUI). It looks to me as if SFX enhances the data > with two different ISBNs. And the 10-digit ISBN shows up in the view. > I find this very strange and confusing behavior to have an ISBN show > with a pre-ISBN book. > > JHU's FindIt also enhances with a nonsensical ISBN: > http://findit.library.jhu.edu/go/414489 [3] > I get ISBN: 1-901843-62-9 > JHU gives me: 3-458-34857-3 > > While both of these are in fact ISBNs for Oliver Twist, they are a > step removed from the manifestation that was being requested. I think > as we add more services we need to look for ways to do this > differently. If I have an exact full view hit (from GBS or another > foreground service) I may not want to present the Internet Archive > hits as prominently since they are fuzzier matches. > > Another case: I have an OCLCnum (or another identifier) as my only > piece of metadata. A query against the GBS API would return nothing > for this particular oclcnum. Using xOCLCNUM I can get returned all > related editions and their ISBNs and LCCNs. I first am interested in > the identifiers related to the original oclcnum. If a GBS search for > those fails (or if there is no oclcnum or lccn), only then am I > interested in trying to get the user to a manifestation of one of the > related identifiers from other editions. If I get results from these, > I'll choose the best possible results (most viewable) to return. But I > do not want to present these in the same way I would have if there was > a match on the numbers for the original request. Does this make sense? > > Is there somewhere to put this kind of enhancing metadata without > touching the original referent? I want to enhance from all kinds of > sources, but don't want it all lumped into the original referent. I > also wonder if how its being done now doesn't do screwy things with > reusing cached responses. > > If we can figure out what needs to happen with this, I'm happy to try > to write the code to make it happen. > > Jason > > [1] http://www.worldcat.org/oclc/959213 > > [2] Right now it works by accepting the rails request. I might commit > a version without that. > > [3] http://findit.library.jhu.edu/resolve?url_ver=Z39.88-2004&rfr_id=info%3Asid%2Fworldcat.org%3Aworldcat&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&req_dat=%3Csessionid%3E&rft_id=info%3Aoclcnum%2F959213&rft_id=urn%3AISBN%3A&rft_id=urn%3AISSN%3A&rft.aulast=Dickens&rft.aufirst=Charles&rft.auinitm=&rft.btitle=Oliver+Twist%2C&rft.atitle=&rft.date=1941&rft.tpages=&rft.isbn=&rft.aucorp=&rft.place=New+York&rft.pub=Dodd++Mead+%26+Co.&rft.edition=&rft.series=&rft.genre=book&url_ver=Z39.88-2004 > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From jronallo at gmail.com Mon Jun 23 12:03:55 2008 From: jronallo at gmail.com (Jason Ronallo) Date: Mon, 23 Jun 2008 12:03:55 -0400 Subject: [Umlaut-general] enhancing metadata OR are my eyes failing me? In-Reply-To: <485FC85D.1080507@jhu.edu> References: <763570460806211910j14bcf78j6de7fe2ad470baeb@mail.gmail.com> <485FC85D.1080507@jhu.edu> Message-ID: <763570460806230903l79ce10f0y1199edd6eb6b3ad9@mail.gmail.com> It's a nonsensical ISBN because it was appended to a manifestation from 1941. That doesn't make any sense. If someone is using the COiNS on the page for their citation manager (Zotero) they get a reference to a 1941 book with an ISBN? I think there's a difference in the kind of metadata you enhance with. If you only have an ISBN and you get metadata from Amazon or another service to supply author and title that's great. Those things belong to the same manifestation. I do get your point about most folks not caring what manifestation they get. But there are times when I do want the illustrated version of Oliver Twist instead of just a text version. If the infrastructure for doing more complex things between services is not there that's fine. By "more prominently" it could just be a matter of optionally including the words "close match" to those which are not the exact manifestation looked for. This would be very easy to implement if the metadata for the manifestation was kept separate from any enhancing metadata from close matches belonging to other manifestations. I do want to have more ISBNs. In fact I want a lot more than just one if they are available. This would come in very handy for the GBS service. Searching for just one ISBN is very poor for the GBS service. Lots more fulltext would be available if I could have a lot more identifiers to search on. If I had all the related ISBNs (and OCLCnums, LCCNs) I could get users to fulltext much more frequently. Jason On Mon, Jun 23, 2008 at 11:59 AM, Jonathan Rochkind wrote: > Yes, it would seem to be SFX. > > SFX is now doing matching on books in it's KB, it looks like. > (author/title? OCLCnum? I haven't looked into it enough). It has 3 > matches for Oliver Twist. Umlaut has some routines I wrote (in the SFX > adaptor) that try to guess which one of these, if any, are most likely > to have full text---I wrote these routines with periodicals in mind, not > books. > > What do you mean by 'nonsensical' ISBN though? You just mean that it's > getting ISBNs from SFX for alternate manifestations, right? (When you > use the word 'nonsensical', I'm worried you're saying it's winding up > with an ISBN that isnt' for oliver twist at all, or isnt' even a legal > ISBN at all, or something). > > I could change the SFX code to not enhance metadata from SFX when the > format/genre given is book. You think that's a good idea? Some would > consider it a bonus to get an ISBN for _some_ manifestation of the work > from SFX, since with that ISBN you can find a lot more services than you > could without an ISBN, no? But I admit > > In general, with Internet Archive for instance, keep in mind that since > we're doing author/title matching only, we can't do anything about > distinguishing exact manifestation from another version fo the work > anyway. And I'm not sure users clicking on an OPenURL that leads you to > Umlaut are ever frequently _intending_ to ask for a specific version > (instead of just the work) anyhow. It's unclear. I think this stuff is > going to always going to be messy due to the messy state of our > metadata, I'm not sure it's worth it to try to pin it down. > > I think your ideas about somehow making the Internet Archive hits show > up "less prominently" as a result of the GBS hits end up getting rather > too micro-managing, requiring infrastructure that's not there in Umlaut > right now, and I'm not sure it's worth the extra complexity. > > There is at the moment no way of enhancing metadata without touching the > original referent, no. > > I'm in general not sure the added complexity to umlaut is worth it, for > what you are considering. It's complicated enough already. > > Jonathan > > Jason Ronallo wrote: >> >> Hi, >> I used the following OpenURL from WorldCat[1] >> >> >> http://localhost:3000/resolve?url_ver=Z39.88-2004&rfr_id=info%3Asid%2Fworldcat.org%3Aworldcat&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&req_dat=%3Csessionid%3E&rft_id=info%3Aoclcnum%2F959213&rft_id=urn%3AISBN%3A&rft_id=urn%3AISSN%3A&rft.aulast=Dickens&rft.aufirst=Charles&rft.auinitm=&rft.btitle=Oliver+Twist%2C&rft.atitle=&rft.date=1941&rft.tpages=&rft.isbn=&rft.aucorp=&rft.place=New+York&rft.pub=Dodd++Mead+%26+Co.&rft.edition=&rft.series=&rft.genre=book&url_ver=Z39.88-2004 >> >> Somehow I ended up with an ISBN as part of the metadata. The metadata >> from the OpenURL does not contain an ISBN. And the book is from 1941 >> and AFAIK ISBNs weren't used until the late 1960's. Maybe my eyes are >> failing me? >> >> I only have two services activated. My new Google Book Search >> service[2] and SFX (IUPUI). It looks to me as if SFX enhances the data >> with two different ISBNs. And the 10-digit ISBN shows up in the view. >> I find this very strange and confusing behavior to have an ISBN show >> with a pre-ISBN book. >> >> JHU's FindIt also enhances with a nonsensical ISBN: >> http://findit.library.jhu.edu/go/414489 [3] >> I get ISBN: 1-901843-62-9 >> JHU gives me: 3-458-34857-3 >> >> While both of these are in fact ISBNs for Oliver Twist, they are a >> step removed from the manifestation that was being requested. I think >> as we add more services we need to look for ways to do this >> differently. If I have an exact full view hit (from GBS or another >> foreground service) I may not want to present the Internet Archive >> hits as prominently since they are fuzzier matches. >> >> Another case: I have an OCLCnum (or another identifier) as my only >> piece of metadata. A query against the GBS API would return nothing >> for this particular oclcnum. Using xOCLCNUM I can get returned all >> related editions and their ISBNs and LCCNs. I first am interested in >> the identifiers related to the original oclcnum. If a GBS search for >> those fails (or if there is no oclcnum or lccn), only then am I >> interested in trying to get the user to a manifestation of one of the >> related identifiers from other editions. If I get results from these, >> I'll choose the best possible results (most viewable) to return. But I >> do not want to present these in the same way I would have if there was >> a match on the numbers for the original request. Does this make sense? >> >> Is there somewhere to put this kind of enhancing metadata without >> touching the original referent? I want to enhance from all kinds of >> sources, but don't want it all lumped into the original referent. I >> also wonder if how its being done now doesn't do screwy things with >> reusing cached responses. >> >> If we can figure out what needs to happen with this, I'm happy to try >> to write the code to make it happen. >> >> Jason >> >> [1] http://www.worldcat.org/oclc/959213 >> >> [2] Right now it works by accepting the rails request. I might commit >> a version without that. >> >> [3] >> http://findit.library.jhu.edu/resolve?url_ver=Z39.88-2004&rfr_id=info%3Asid%2Fworldcat.org%3Aworldcat&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&req_dat=%3Csessionid%3E&rft_id=info%3Aoclcnum%2F959213&rft_id=urn%3AISBN%3A&rft_id=urn%3AISSN%3A&rft.aulast=Dickens&rft.aufirst=Charles&rft.auinitm=&rft.btitle=Oliver+Twist%2C&rft.atitle=&rft.date=1941&rft.tpages=&rft.isbn=&rft.aucorp=&rft.place=New+York&rft.pub=Dodd++Mead+%26+Co.&rft.edition=&rft.series=&rft.genre=book&url_ver=Z39.88-2004 >> _______________________________________________ >> Umlaut-general mailing list >> Umlaut-general at rubyforge.org >> http://rubyforge.org/mailman/listinfo/umlaut-general >> > > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 > rochkind (at) jhu.edu > > > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > From tennantr at oclc.org Mon Jun 23 13:09:19 2008 From: tennantr at oclc.org (Roy Tennant) Date: Mon, 23 Jun 2008 10:09:19 -0700 Subject: [Umlaut-general] enhancing metadata OR are my eyes failing me? In-Reply-To: <763570460806230903l79ce10f0y1199edd6eb6b3ad9@mail.gmail.com> Message-ID: On 6/23/08 6/23/08 ? 9:03 AM, "Jason Ronallo" wrote: > I do want to have more ISBNs. In fact I want a lot more than just one > if they are available. This would come in very handy for the GBS > service. Searching for just one ISBN is very poor for the GBS service. Lots > more fulltext would be available if I could have a lot more > identifiers to search on. If I had all the related ISBNs (and > OCLCnums, LCCNs) I could get users to fulltext much more frequently. Then use the xISBN service, which is now free to OCLC cataloging subscribers: Roy From jronallo at gmail.com Mon Jun 23 14:09:10 2008 From: jronallo at gmail.com (Jason Ronallo) Date: Mon, 23 Jun 2008 14:09:10 -0400 Subject: [Umlaut-general] enhancing metadata OR are my eyes failing me? In-Reply-To: References: <763570460806230903l79ce10f0y1199edd6eb6b3ad9@mail.gmail.com> Message-ID: <763570460806231109w4d29f4ebt1ff1d4b3407fba3c@mail.gmail.com> On Mon, Jun 23, 2008 at 1:09 PM, Roy Tennant wrote: > Then use the xISBN service, which is now free to OCLC cataloging > subscribers: Roy, Oh, I hope to write an Umlaut service that uses xISBN (and go back and finish the next version of the ruby-xisbn gem). It will greatly increase the number of hits we can get through some web services. What I'm after here is how Umlaut should handle all those identifiers once it has them. It needs some way to pass those around so that all the services that could use them have access to them. Jason From rochkind at jhu.edu Mon Jun 23 16:42:53 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Mon, 23 Jun 2008 16:42:53 -0400 Subject: [Umlaut-general] enhancing metadata OR are my eyes failing me? In-Reply-To: <763570460806230903l79ce10f0y1199edd6eb6b3ad9@mail.gmail.com> References: <763570460806211910j14bcf78j6de7fe2ad470baeb@mail.gmail.com> <485FC85D.1080507@jhu.edu> <763570460806230903l79ce10f0y1199edd6eb6b3ad9@mail.gmail.com> Message-ID: <48600ACD.7050200@jhu.edu> Okay, since there's some controversy on this on the channel, let's go over the use cases and pros and cons for metadata enhancement of books from SFX. There are two basic situations I can think of: 1) An openURL request comes in with only an ISBN. If SFX happens to recognize that ISBN, it is certainly useful to get title, author, etc, for it. But that's a big "if". SFX only has records for certain e-books in it's KB (and I'm also not sure if it has print ISBNs or only e-ISBNs). So even though this could be useful in a few cases, coverage isn't big enough to rely on it, so another solution would be required. Fortunately, there are a couple other solutions: the WorldCat API and/or the Bowker ISBN api. (Jason, interested in working on either of those next? I think the Worldcat API is some low hanging fruit here that would be extremely useful and easy). So this case doesn't seem to justify the SFX metadata enhancement in light of the cons Jason has identified. 2) An OpenURL comes in with just title and author, and without an ISBN, and SFX provides an ISBN. This _is_ potentially useful, since that ISBN can be the key to more stuff from other services. Again though, we have pretty limited coverage so can't count on it--we may want to try and use the Worldcat or Bowker APIs even here, although we'd still run into the same problem--if you find an ISBN, do you add it to the metadata knowing that it quite possibly for a different manifestation than what the user asked for? Using the Worldcat or Bowker APIs, you could more easily tell if it's a different manifestation or not, by comparing dates or publishers (although this wouldn't be 100%), so those seem preferable to SFX for this anyway---although the issue is still there, even if you do know that it's a different manifestation, what do you _do_ with the info, how do you store it in Umlaut as a "different manifestation ISBN"? Another thing to keep in mind is that SFX will already happily deliver a fulltext link in these cases even if it is a different manifestation (in some cases of course there's no way for SFX (or umlaut) to know if it IS a different manifestation or not, depending on metadata supplied in the openurl). If no ISBN is provided but author and title are, and SFX finds a match, and that match is activated in SFX, the user will get a full text link. Right now Umlaut is enhancing metadata from SFX whether or not there's a full text match. This could be changed to only enhance metadata for _books_ if there's actually a fulltext match. That makes some amount of sense---if we add an ISBN for a different manifestation, then at least we'll only be doing it when we're also providing full text for that different manifestation. But _my_ institution doesn't activate targets in SFX for ebooks at all, so this would, in my case, amount to the same thing as disabling metadata enhancement for books altogether. Overall, I think the idea of keeping track _for sure_ of what identifiers apply to manifestations and which don't--especially when the incomign openurl doesn't in fact _have_ any identifiers--is pretty much a lost cause. On the other hand, the book coverage of SFX is so minimal that I dont' see SFX metadata enhancement being all that useful _anyway_. So I'm not too concerned with whether to enhance metadata for books from SFX either way, neither the pro nor the con seem that powerful. I do think we need to add a Worldcat API and/or Bowker API service adaptor to enhance metadata. We also all agree that we need an xISBN service adaptor, and that will require us to figure out how to store metadata for multiple variant versions of the work in Umlaut anyway, so that's the real unanswered question which needs an answer, more important than whether to enhance book metadata from SFX or not. Jonathan From rochkind at jhu.edu Tue Jun 24 18:11:45 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Tue, 24 Jun 2008 18:11:45 -0400 Subject: [Umlaut-general] enhancing metadata OR are my eyes failing me? In-Reply-To: <763570460806241451y2b6e18cbr6fd05bd390539a13@mail.gmail.com> References: <763570460806211910j14bcf78j6de7fe2ad470baeb@mail.gmail.com> <485FC85D.1080507@jhu.edu> <763570460806230903l79ce10f0y1199edd6eb6b3ad9@mail.gmail.com> <48600ABB.6030806@jhu.edu> <763570460806240647q66b7d9c0r9bd192d0ef828cbe@mail.gmail.com> <48611D29.505@jhu.edu> <763570460806240930n74ef2677l6e043803e3d2d213@mail.gmail.com> <486122D6.10503@jhu.edu> <763570460806240949m6ee7fe51wa0735d355e188401@mail.gmail.com> <48616056.5080604@jhu.edu> <763570460806241451y2b6e18cbr6fd05bd390539a13@mail.gmail.com> Message-ID: <48617121.5090000@jhu.edu> [Redirected to list in case Ross is interested] Jason Ronallo wrote: > One enhancement that I could add now would be a link to a search of > GBS by author/title limiting to fulltext views. Would you be > interested in me adding that? > Possibly. Is that possible to do? I thought the GBS API only involved identifiers? Would you screen-scrape? Or you'd just give a link to a search in GBS native, without knowing if the search would find anything? Hmm, as a general philosophy I don't like providing links on Umlaut unless they can actually be pre-checked to make sure they go somewhere---this was sort of a precedent Ross set. I think we work with the API GPS gives us, even if it's not what we'd like. You'd do this secondary search just if the identifier-based search failed to find full text, I think? Right now, it's possible for you to find multiple hits for GBS, even with just identifier search, right? If you find multiple GBS hits, are they all put on the screen? I'm thinking of a full-text version is found, there's no need to link to other lesser versions in GBS. Altough alternately, you could link to "see all at GBS" analagous to Internet Archive---if that's possible with GBS. What I don't like is providing a full text link AND an individual link to a "more information at GBS". You aren't doing that, are you? Jonathan From jronallo at gmail.com Tue Jun 24 18:27:59 2008 From: jronallo at gmail.com (Jason Ronallo) Date: Tue, 24 Jun 2008 18:27:59 -0400 Subject: [Umlaut-general] enhancing metadata OR are my eyes failing me? In-Reply-To: <48617121.5090000@jhu.edu> References: <763570460806211910j14bcf78j6de7fe2ad470baeb@mail.gmail.com> <48600ABB.6030806@jhu.edu> <763570460806240647q66b7d9c0r9bd192d0ef828cbe@mail.gmail.com> <48611D29.505@jhu.edu> <763570460806240930n74ef2677l6e043803e3d2d213@mail.gmail.com> <486122D6.10503@jhu.edu> <763570460806240949m6ee7fe51wa0735d355e188401@mail.gmail.com> <48616056.5080604@jhu.edu> <763570460806241451y2b6e18cbr6fd05bd390539a13@mail.gmail.com> <48617121.5090000@jhu.edu> Message-ID: <763570460806241527n46da7856laf160d76ca2a0857@mail.gmail.com> On Tue, Jun 24, 2008 at 6:11 PM, Jonathan Rochkind wrote: > [Redirected to list in case Ross is interested] > > Jason Ronallo wrote: >> >> One enhancement that I could add now would be a link to a search of >> GBS by author/title limiting to fulltext views. Would you be >> interested in me adding that? >> > > Possibly. Is that possible to do? I thought the GBS API only involved > identifiers? Would you screen-scrape? Or you'd just give a link to a search > in GBS native, without knowing if the search would find anything? Hmm, as a > general philosophy I don't like providing links on Umlaut unless they can > actually be pre-checked to make sure they go somewhere---this was sort of a > precedent Ross set. I think we work with the API GPS gives us, even if it's > not what we'd like. I was suggesting a non-pre-checked link. I agree it wouldn't be great. It could get folks to more fulltext, though. The real solution is getting more related identifiers to search with. After the IA screen scraper I'm not interested in writing another one if I can help it. > Right now, it's possible for you to find multiple hits for GBS, even with > just identifier search, right? If you find multiple GBS hits, are they all > put on the screen? I'm thinking of a full-text version is found, there's no > need to link to other lesser versions in GBS. Altough alternately, you could > link to "see all at GBS" analagous to Internet Archive---if that's possible > with GBS. What I don't like is providing a full text link AND an individual > link to a "more information at GBS". You aren't doing that, are you? I ask GBS for all available identifiers. I can conduct a search with as many identifiers as I can fit into a URL, and GBS will respond with all that it knows about. Sometimes none; sometimes all of them. Right now I show all links that are not duplicates. I do deduplicate because if I'm asking for ISBN, OCLCnum and LCCN they will often all be associated with the same book. But sometimes you will have the ISBN and OCLCnum associated with different GBS ids. I can rewrite to only show side links (highlighted_link) if there is no fulltext view available. Easy enough. Jason From rochkind at jhu.edu Tue Jun 24 18:31:56 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Tue, 24 Jun 2008 18:31:56 -0400 Subject: [Umlaut-general] enhancing metadata OR are my eyes failing me? In-Reply-To: <763570460806241527n46da7856laf160d76ca2a0857@mail.gmail.com> References: <763570460806211910j14bcf78j6de7fe2ad470baeb@mail.gmail.com> <48600ABB.6030806@jhu.edu> <763570460806240647q66b7d9c0r9bd192d0ef828cbe@mail.gmail.com> <48611D29.505@jhu.edu> <763570460806240930n74ef2677l6e043803e3d2d213@mail.gmail.com> <486122D6.10503@jhu.edu> <763570460806240949m6ee7fe51wa0735d355e188401@mail.gmail.com> <48616056.5080604@jhu.edu> <763570460806241451y2b6e18cbr6fd05bd390539a13@mail.gmail.com> <48617121.5090000@jhu.edu> <763570460806241527n46da7856laf160d76ca2a0857@mail.gmail.com> Message-ID: <486175DC.1000107@jhu.edu> For IA, we basically show the one best link, then a "see also" link to see all others. Is there any way to duplicate that with Google? I think that is actually the best way to do it. But failing that, I think we only show the one best link (ie, one with actual full text search), yeah. I think the menu is starting to get cluttered up with non-essential stuff, and giving them several (or a dozen!) seperate lines linking to different digitizations at google (some more complete than others), I think we want to avoid. Tell me if you disagree. Jonathan Jason Ronallo wrote: > On Tue, Jun 24, 2008 at 6:11 PM, Jonathan Rochkind wrote: > >> [Redirected to list in case Ross is interested] >> >> Jason Ronallo wrote: >> >>> One enhancement that I could add now would be a link to a search of >>> GBS by author/title limiting to fulltext views. Would you be >>> interested in me adding that? >>> >>> >> Possibly. Is that possible to do? I thought the GBS API only involved >> identifiers? Would you screen-scrape? Or you'd just give a link to a search >> in GBS native, without knowing if the search would find anything? Hmm, as a >> general philosophy I don't like providing links on Umlaut unless they can >> actually be pre-checked to make sure they go somewhere---this was sort of a >> precedent Ross set. I think we work with the API GPS gives us, even if it's >> not what we'd like. >> > > I was suggesting a non-pre-checked link. I agree it wouldn't be great. > It could get folks to more fulltext, though. The real solution is > getting more related identifiers to search with. After the IA screen > scraper I'm not interested in writing another one if I can help it. > > > >> Right now, it's possible for you to find multiple hits for GBS, even with >> just identifier search, right? If you find multiple GBS hits, are they all >> put on the screen? I'm thinking of a full-text version is found, there's no >> need to link to other lesser versions in GBS. Altough alternately, you could >> link to "see all at GBS" analagous to Internet Archive---if that's possible >> with GBS. What I don't like is providing a full text link AND an individual >> link to a "more information at GBS". You aren't doing that, are you? >> > > I ask GBS for all available identifiers. I can conduct a search with > as many identifiers as I can fit into a URL, and GBS will respond with > all that it knows about. Sometimes none; sometimes all of them. Right > now I show all links that are not duplicates. I do deduplicate > because if I'm asking for ISBN, OCLCnum and LCCN they will often all > be associated with the same book. But sometimes you will have the ISBN > and OCLCnum associated with different GBS ids. > > I can rewrite to only show side links (highlighted_link) if there is > no fulltext view available. Easy enough. > > Jason > -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From jronallo at gmail.com Tue Jun 24 18:43:35 2008 From: jronallo at gmail.com (Jason Ronallo) Date: Tue, 24 Jun 2008 18:43:35 -0400 Subject: [Umlaut-general] enhancing metadata OR are my eyes failing me? In-Reply-To: <486175DC.1000107@jhu.edu> References: <763570460806211910j14bcf78j6de7fe2ad470baeb@mail.gmail.com> <48611D29.505@jhu.edu> <763570460806240930n74ef2677l6e043803e3d2d213@mail.gmail.com> <486122D6.10503@jhu.edu> <763570460806240949m6ee7fe51wa0735d355e188401@mail.gmail.com> <48616056.5080604@jhu.edu> <763570460806241451y2b6e18cbr6fd05bd390539a13@mail.gmail.com> <48617121.5090000@jhu.edu> <763570460806241527n46da7856laf160d76ca2a0857@mail.gmail.com> <486175DC.1000107@jhu.edu> Message-ID: <763570460806241543m2e51d99at5fa78b84d73fbab4@mail.gmail.com> I agree. What you want should be easy to implement. It will definitely matter much more as we are able to get more hits returned from this service and others. And this is the kind of case where if we have many fulltext hits and one matches an identifier from the original request, we could show that one and provide a link (or not) to other related fulltext versions. Jason On Tue, Jun 24, 2008 at 6:31 PM, Jonathan Rochkind wrote: > For IA, we basically show the one best link, then a "see also" link to see > all others. Is there any way to duplicate that with Google? I think that is > actually the best way to do it. > > But failing that, I think we only show the one best link (ie, one with > actual full text search), yeah. I think the menu is starting to get > cluttered up with non-essential stuff, and giving them several (or a dozen!) > seperate lines linking to different digitizations at google (some more > complete than others), I think we want to avoid. > > Tell me if you disagree. > > Jonathan > > Jason Ronallo wrote: >> >> On Tue, Jun 24, 2008 at 6:11 PM, Jonathan Rochkind >> wrote: >> >>> >>> [Redirected to list in case Ross is interested] >>> >>> Jason Ronallo wrote: >>> >>>> >>>> One enhancement that I could add now would be a link to a search of >>>> GBS by author/title limiting to fulltext views. Would you be >>>> interested in me adding that? >>>> >>>> >>> >>> Possibly. Is that possible to do? I thought the GBS API only involved >>> identifiers? Would you screen-scrape? Or you'd just give a link to a >>> search >>> in GBS native, without knowing if the search would find anything? Hmm, >>> as a >>> general philosophy I don't like providing links on Umlaut unless they can >>> actually be pre-checked to make sure they go somewhere---this was sort of >>> a >>> precedent Ross set. I think we work with the API GPS gives us, even if >>> it's >>> not what we'd like. >>> >> >> I was suggesting a non-pre-checked link. I agree it wouldn't be great. >> It could get folks to more fulltext, though. The real solution is >> getting more related identifiers to search with. After the IA screen >> scraper I'm not interested in writing another one if I can help it. >> >> >> >>> >>> Right now, it's possible for you to find multiple hits for GBS, even with >>> just identifier search, right? If you find multiple GBS hits, are they >>> all >>> put on the screen? I'm thinking of a full-text version is found, there's >>> no >>> need to link to other lesser versions in GBS. Altough alternately, you >>> could >>> link to "see all at GBS" analagous to Internet Archive---if that's >>> possible >>> with GBS. What I don't like is providing a full text link AND an >>> individual >>> link to a "more information at GBS". You aren't doing that, are you? >>> >> >> I ask GBS for all available identifiers. I can conduct a search with >> as many identifiers as I can fit into a URL, and GBS will respond with >> all that it knows about. Sometimes none; sometimes all of them. Right >> now I show all links that are not duplicates. I do deduplicate >> because if I'm asking for ISBN, OCLCnum and LCCN they will often all >> be associated with the same book. But sometimes you will have the ISBN >> and OCLCnum associated with different GBS ids. >> >> I can rewrite to only show side links (highlighted_link) if there is >> no fulltext view available. Easy enough. >> >> Jason >> > > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > > From rochkind at jhu.edu Tue Jun 24 18:50:14 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Tue, 24 Jun 2008 18:50:14 -0400 Subject: [Umlaut-general] enhancing metadata OR are my eyes failing me? In-Reply-To: <763570460806241543m2e51d99at5fa78b84d73fbab4@mail.gmail.com> References: <763570460806211910j14bcf78j6de7fe2ad470baeb@mail.gmail.com> <48611D29.505@jhu.edu> <763570460806240930n74ef2677l6e043803e3d2d213@mail.gmail.com> <486122D6.10503@jhu.edu> <763570460806240949m6ee7fe51wa0735d355e188401@mail.gmail.com> <48616056.5080604@jhu.edu> <763570460806241451y2b6e18cbr6fd05bd390539a13@mail.gmail.com> <48617121.5090000@jhu.edu> <763570460806241527n46da7856laf160d76ca2a0857@mail.gmail.com> <486175DC.1000107@jhu.edu> <763570460806241543m2e51d99at5fa78b84d73fbab4@mail.gmail.com> Message-ID: <48617A26.8090403@jhu.edu> I know you're pushing for xISBN Jason, really. I agree it's useful, although I don't think it's as vital as you do, but that's okay. But meanwhile some of my librarians who have been conducting actual interviews with users say one of the things they want most is a "search inside the book" link directly on the screen (OPAC and Find It). We talked before about how we could potentially do this with both GBS and IA after identifying the availability of search inside the book. So I'd really like to focus on that first, as some more 'low hanging fruit' with big improvement for relatively small effort. For IA, there's really only a good 'search inside the book' interface when they provide the 'flipbook' version. Can you tell from your XML results if a flipbook version is there? Can you figure out if there's a way to generate a URL into a flipbook with a particular query? Perhaps we should try asking Alexis, who has been helpful at solving our IA mysteries, if there's a way to do that. For GBS, the URL format to send a query into GBS is clear. And two of the three GBS format types promise search inside the book---the third, you cant' tell if it'll be there or not, so I figure don't provide the box for those. Actual view integration of that new type of Umlaut response is fairly straightforward---except it'll be a bit klunky, because the AJAX stuff goes and replaces the DIV this stuff is in every four seconds until bg processes are done. Which doesn't matter too much if all that's in the DIVs is links, but when there's a textbox in the div that the user could be typing in---bad. Have to think of a clever way around this. Jonathan Jason Ronallo wrote: > I agree. What you want should be easy to implement. It will definitely > matter much more as we are able to get more hits returned from this > service and others. And this is the kind of case where if we have many > fulltext hits and one matches an identifier from the original request, > we could show that one and provide a link (or not) to other related > fulltext versions. > > Jason > > On Tue, Jun 24, 2008 at 6:31 PM, Jonathan Rochkind wrote: > >> For IA, we basically show the one best link, then a "see also" link to see >> all others. Is there any way to duplicate that with Google? I think that is >> actually the best way to do it. >> >> But failing that, I think we only show the one best link (ie, one with >> actual full text search), yeah. I think the menu is starting to get >> cluttered up with non-essential stuff, and giving them several (or a dozen!) >> seperate lines linking to different digitizations at google (some more >> complete than others), I think we want to avoid. >> >> Tell me if you disagree. >> >> Jonathan >> >> Jason Ronallo wrote: >> >>> On Tue, Jun 24, 2008 at 6:11 PM, Jonathan Rochkind >>> wrote: >>> >>> >>>> [Redirected to list in case Ross is interested] >>>> >>>> Jason Ronallo wrote: >>>> >>>> >>>>> One enhancement that I could add now would be a link to a search of >>>>> GBS by author/title limiting to fulltext views. Would you be >>>>> interested in me adding that? >>>>> >>>>> >>>>> >>>> Possibly. Is that possible to do? I thought the GBS API only involved >>>> identifiers? Would you screen-scrape? Or you'd just give a link to a >>>> search >>>> in GBS native, without knowing if the search would find anything? Hmm, >>>> as a >>>> general philosophy I don't like providing links on Umlaut unless they can >>>> actually be pre-checked to make sure they go somewhere---this was sort of >>>> a >>>> precedent Ross set. I think we work with the API GPS gives us, even if >>>> it's >>>> not what we'd like. >>>> >>>> >>> I was suggesting a non-pre-checked link. I agree it wouldn't be great. >>> It could get folks to more fulltext, though. The real solution is >>> getting more related identifiers to search with. After the IA screen >>> scraper I'm not interested in writing another one if I can help it. >>> >>> >>> >>> >>>> Right now, it's possible for you to find multiple hits for GBS, even with >>>> just identifier search, right? If you find multiple GBS hits, are they >>>> all >>>> put on the screen? I'm thinking of a full-text version is found, there's >>>> no >>>> need to link to other lesser versions in GBS. Altough alternately, you >>>> could >>>> link to "see all at GBS" analagous to Internet Archive---if that's >>>> possible >>>> with GBS. What I don't like is providing a full text link AND an >>>> individual >>>> link to a "more information at GBS". You aren't doing that, are you? >>>> >>>> >>> I ask GBS for all available identifiers. I can conduct a search with >>> as many identifiers as I can fit into a URL, and GBS will respond with >>> all that it knows about. Sometimes none; sometimes all of them. Right >>> now I show all links that are not duplicates. I do deduplicate >>> because if I'm asking for ISBN, OCLCnum and LCCN they will often all >>> be associated with the same book. But sometimes you will have the ISBN >>> and OCLCnum associated with different GBS ids. >>> >>> I can rewrite to only show side links (highlighted_link) if there is >>> no fulltext view available. Easy enough. >>> >>> Jason >>> >>> >> -- >> Jonathan Rochkind >> Digital Services Software Engineer >> The Sheridan Libraries >> Johns Hopkins University >> 410.516.8886 rochkind (at) jhu.edu >> >> >> -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From rossfsinger at gmail.com Tue Jun 24 23:16:15 2008 From: rossfsinger at gmail.com (Ross Singer) Date: Tue, 24 Jun 2008 23:16:15 -0400 Subject: [Umlaut-general] enhancing metadata OR are my eyes failing me? In-Reply-To: <48617A26.8090403@jhu.edu> References: <763570460806211910j14bcf78j6de7fe2ad470baeb@mail.gmail.com> <486122D6.10503@jhu.edu> <763570460806240949m6ee7fe51wa0735d355e188401@mail.gmail.com> <48616056.5080604@jhu.edu> <763570460806241451y2b6e18cbr6fd05bd390539a13@mail.gmail.com> <48617121.5090000@jhu.edu> <763570460806241527n46da7856laf160d76ca2a0857@mail.gmail.com> <486175DC.1000107@jhu.edu> <763570460806241543m2e51d99at5fa78b84d73fbab4@mail.gmail.com> <48617A26.8090403@jhu.edu> Message-ID: <23b83f160806242016s25932526ge9503cd0b571d84@mail.gmail.com> I'm really glad this conversation steered towards the philosophy of "link if it seems like there's something there" rather than "link to see if something's there". Jonathan, why do you think xISBN is such a low priority? I felt it was a *hugely* high priority in U1 because I assumed books would generally be defined by their ISBN (I mean, if we're talking OpenURL) and that the likelihood that I had that manifestation was low. Plus the xisbn gem made it dead simple. The 'search in this book' concept is pretty cool. Is GBS's coverage on par with Amazon's? Would it be worth taking the Amazon API, scraping their product page, and seeing the "Search Inside this book!" icon is there? I mean, as a last resort after the other options? -Ross. On Tue, Jun 24, 2008 at 6:50 PM, Jonathan Rochkind wrote: > I know you're pushing for xISBN Jason, really. I agree it's useful, > although I don't think it's as vital as you do, but that's okay. > > But meanwhile some of my librarians who have been conducting actual > interviews with users say one of the things they want most is a "search > inside the book" link directly on the screen (OPAC and Find It). We talked > before about how we could potentially do this with both GBS and IA after > identifying the availability of search inside the book. So I'd really like > to focus on that first, as some more 'low hanging fruit' with big > improvement for relatively small effort. > > For IA, there's really only a good 'search inside the book' interface when > they provide the 'flipbook' version. Can you tell from your XML results if a > flipbook version is there? Can you figure out if there's a way to generate > a URL into a flipbook with a particular query? Perhaps we should try asking > Alexis, who has been helpful at solving our IA mysteries, if there's a way > to do that. For GBS, the URL format to send a query into GBS is clear. And > two of the three GBS format types promise search inside the book---the > third, you cant' tell if it'll be there or not, so I figure don't provide > the box for those. > > Actual view integration of that new type of Umlaut response is fairly > straightforward---except it'll be a bit klunky, because the AJAX stuff goes > and replaces the DIV this stuff is in every four seconds until bg processes > are done. Which doesn't matter too much if all that's in the DIVs is links, > but when there's a textbox in the div that the user could be typing > in---bad. Have to think of a clever way around this. > > Jonathan > > Jason Ronallo wrote: >> >> I agree. What you want should be easy to implement. It will definitely >> matter much more as we are able to get more hits returned from this >> service and others. And this is the kind of case where if we have many >> fulltext hits and one matches an identifier from the original request, >> we could show that one and provide a link (or not) to other related >> fulltext versions. >> >> Jason >> >> On Tue, Jun 24, 2008 at 6:31 PM, Jonathan Rochkind >> wrote: >> >>> >>> For IA, we basically show the one best link, then a "see also" link to >>> see >>> all others. Is there any way to duplicate that with Google? I think that >>> is >>> actually the best way to do it. >>> >>> But failing that, I think we only show the one best link (ie, one with >>> actual full text search), yeah. I think the menu is starting to get >>> cluttered up with non-essential stuff, and giving them several (or a >>> dozen!) >>> seperate lines linking to different digitizations at google (some more >>> complete than others), I think we want to avoid. >>> >>> Tell me if you disagree. >>> >>> Jonathan >>> >>> Jason Ronallo wrote: >>> >>>> >>>> On Tue, Jun 24, 2008 at 6:11 PM, Jonathan Rochkind >>>> wrote: >>>> >>>> >>>>> >>>>> [Redirected to list in case Ross is interested] >>>>> >>>>> Jason Ronallo wrote: >>>>> >>>>> >>>>>> >>>>>> One enhancement that I could add now would be a link to a search of >>>>>> GBS by author/title limiting to fulltext views. Would you be >>>>>> interested in me adding that? >>>>>> >>>>>> >>>>>> >>>>> >>>>> Possibly. Is that possible to do? I thought the GBS API only involved >>>>> identifiers? Would you screen-scrape? Or you'd just give a link to a >>>>> search >>>>> in GBS native, without knowing if the search would find anything? Hmm, >>>>> as a >>>>> general philosophy I don't like providing links on Umlaut unless they >>>>> can >>>>> actually be pre-checked to make sure they go somewhere---this was sort >>>>> of >>>>> a >>>>> precedent Ross set. I think we work with the API GPS gives us, even if >>>>> it's >>>>> not what we'd like. >>>>> >>>>> >>>> >>>> I was suggesting a non-pre-checked link. I agree it wouldn't be great. >>>> It could get folks to more fulltext, though. The real solution is >>>> getting more related identifiers to search with. After the IA screen >>>> scraper I'm not interested in writing another one if I can help it. >>>> >>>> >>>> >>>> >>>>> >>>>> Right now, it's possible for you to find multiple hits for GBS, even >>>>> with >>>>> just identifier search, right? If you find multiple GBS hits, are they >>>>> all >>>>> put on the screen? I'm thinking of a full-text version is found, >>>>> there's >>>>> no >>>>> need to link to other lesser versions in GBS. Altough alternately, you >>>>> could >>>>> link to "see all at GBS" analagous to Internet Archive---if that's >>>>> possible >>>>> with GBS. What I don't like is providing a full text link AND an >>>>> individual >>>>> link to a "more information at GBS". You aren't doing that, are you? >>>>> >>>>> >>>> >>>> I ask GBS for all available identifiers. I can conduct a search with >>>> as many identifiers as I can fit into a URL, and GBS will respond with >>>> all that it knows about. Sometimes none; sometimes all of them. Right >>>> now I show all links that are not duplicates. I do deduplicate >>>> because if I'm asking for ISBN, OCLCnum and LCCN they will often all >>>> be associated with the same book. But sometimes you will have the ISBN >>>> and OCLCnum associated with different GBS ids. >>>> >>>> I can rewrite to only show side links (highlighted_link) if there is >>>> no fulltext view available. Easy enough. >>>> >>>> Jason >>>> >>>> >>> >>> -- >>> Jonathan Rochkind >>> Digital Services Software Engineer >>> The Sheridan Libraries >>> Johns Hopkins University >>> 410.516.8886 rochkind (at) jhu.edu >>> >>> >>> > > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > From rochkind at jhu.edu Wed Jun 25 10:07:55 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Wed, 25 Jun 2008 10:07:55 -0400 Subject: [Umlaut-general] Internet Archive search---search only if you actually have an author? Message-ID: <4862513B.5000006@jhu.edu> Here's an example of a false positive from the Internet Archive author/title keyword search: http://findit.library.jhu.edu/resolve?url_ver=Z39.88-2004&url_ctx_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Actx&ctx_tim=2008-06-25T09%3A12%3A08-0400&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&ctx_id=&rfr_id=info%3Asid%2Fsfxit.com%3Acitation&rft.jtitle=world+economics&rft.genre=journal&rft.__citation_form=journal&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal http://findit.library.jhu.edu/go/421022 Note that while the words "world" and "economics" were in the title of the hit---it's a different work entirely. While some false positives are unavoidable with the author/title keyword technique, I'm thinking this one may suggest that we shouldn't do an IA lookup when there is no author information in the Referent. Searching on title alone will lead to a lot more false positives, which is what's happened here. I'm leaning toward changing the IA adaptor to only do a search if it has author or title information. What do you think? -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From rossfsinger at gmail.com Wed Jun 25 10:10:48 2008 From: rossfsinger at gmail.com (Ross Singer) Date: Wed, 25 Jun 2008 10:10:48 -0400 Subject: [Umlaut-general] enhancing metadata OR are my eyes failing me? In-Reply-To: <48624F67.1000701@jhu.edu> References: <763570460806211910j14bcf78j6de7fe2ad470baeb@mail.gmail.com> <48616056.5080604@jhu.edu> <763570460806241451y2b6e18cbr6fd05bd390539a13@mail.gmail.com> <48617121.5090000@jhu.edu> <763570460806241527n46da7856laf160d76ca2a0857@mail.gmail.com> <486175DC.1000107@jhu.edu> <763570460806241543m2e51d99at5fa78b84d73fbab4@mail.gmail.com> <48617A26.8090403@jhu.edu> <23b83f160806242016s25932526ge9503cd0b571d84@mail.gmail.com> <48624F67.1000701@jhu.edu> Message-ID: <23b83f160806250710j5c20c2bamd303decc7f33aed7@mail.gmail.com> Redirecting this back to the list. We may want to change the mailing list options to make the list the reply-to address... I'm not saying the Amazon screen scraping is a priority. That was more of a question of, "how much is in GBS" and, if it's "not nearly as much as Amazon", we could put that in the development queue. Speaking of, do we want to formalize that a bit? Some kind of bug/feature tracking? I'm totally having a problem wrapping my head around creating new 'dummy' referents to store this complementary data. It's not that I'm opposed to this approach, I just don't understand it at all. That being said, I'm all about discussing how to include xISBN (or insert data enrichment service here) data consistently and coherently. -Ross. On Wed, Jun 25, 2008 at 10:00 AM, Jonathan Rochkind wrote: > Well, instead of arguing about how big a priority xISBN is, we could talk > about how to do it instead---by which I mean, where and how to store the > data for alternate versions. > > But first let me respond to the simpler question. It might be worth screen > scraping Amazon to see if there's a search-inside-the-book provided---if we > can also figure out a way to generate a direct link into search-inside for a > particular query. But I think we should focus on the more straightforward > ones first. > > As far as how to store the alternate manifestation stuff----after sleeping > on it, I'm still leaning toward creating a new Referent in the db, and > linking the original Referent to one or more new Referents of "other > versions". But now I don't have time to outline why I think this makes > sense---and what problems still exist. I'll try to write more on it today or > tommorow though. > > Jonathan > > Ross Singer wrote: >> >> I'm really glad this conversation steered towards the philosophy of >> "link if it seems like there's something there" rather than "link to >> see if something's there". >> >> Jonathan, why do you think xISBN is such a low priority? I felt it >> was a *hugely* high priority in U1 because I assumed books would >> generally be defined by their ISBN (I mean, if we're talking OpenURL) >> and that the likelihood that I had that manifestation was low. Plus >> the xisbn gem made it dead simple. >> >> The 'search in this book' concept is pretty cool. Is GBS's coverage >> on par with Amazon's? Would it be worth taking the Amazon API, >> scraping their product page, and seeing the "Search Inside this book!" >> icon is there? I mean, as a last resort after the other options? >> >> -Ross. >> >> On Tue, Jun 24, 2008 at 6:50 PM, Jonathan Rochkind >> wrote: >> >>> >>> I know you're pushing for xISBN Jason, really. I agree it's useful, >>> although I don't think it's as vital as you do, but that's okay. >>> >>> But meanwhile some of my librarians who have been conducting actual >>> interviews with users say one of the things they want most is a "search >>> inside the book" link directly on the screen (OPAC and Find It). We >>> talked >>> before about how we could potentially do this with both GBS and IA after >>> identifying the availability of search inside the book. So I'd really >>> like >>> to focus on that first, as some more 'low hanging fruit' with big >>> improvement for relatively small effort. >>> >>> For IA, there's really only a good 'search inside the book' interface >>> when >>> they provide the 'flipbook' version. Can you tell from your XML results >>> if a >>> flipbook version is there? Can you figure out if there's a way to >>> generate >>> a URL into a flipbook with a particular query? Perhaps we should try >>> asking >>> Alexis, who has been helpful at solving our IA mysteries, if there's a >>> way >>> to do that. For GBS, the URL format to send a query into GBS is clear. >>> And >>> two of the three GBS format types promise search inside the book---the >>> third, you cant' tell if it'll be there or not, so I figure don't provide >>> the box for those. >>> >>> Actual view integration of that new type of Umlaut response is fairly >>> straightforward---except it'll be a bit klunky, because the AJAX stuff >>> goes >>> and replaces the DIV this stuff is in every four seconds until bg >>> processes >>> are done. Which doesn't matter too much if all that's in the DIVs is >>> links, >>> but when there's a textbox in the div that the user could be typing >>> in---bad. Have to think of a clever way around this. >>> >>> Jonathan >>> >>> Jason Ronallo wrote: >>> >>>> >>>> I agree. What you want should be easy to implement. It will definitely >>>> matter much more as we are able to get more hits returned from this >>>> service and others. And this is the kind of case where if we have many >>>> fulltext hits and one matches an identifier from the original request, >>>> we could show that one and provide a link (or not) to other related >>>> fulltext versions. >>>> >>>> Jason >>>> >>>> On Tue, Jun 24, 2008 at 6:31 PM, Jonathan Rochkind >>>> wrote: >>>> >>>> >>>>> >>>>> For IA, we basically show the one best link, then a "see also" link to >>>>> see >>>>> all others. Is there any way to duplicate that with Google? I think >>>>> that >>>>> is >>>>> actually the best way to do it. >>>>> >>>>> But failing that, I think we only show the one best link (ie, one with >>>>> actual full text search), yeah. I think the menu is starting to get >>>>> cluttered up with non-essential stuff, and giving them several (or a >>>>> dozen!) >>>>> seperate lines linking to different digitizations at google (some more >>>>> complete than others), I think we want to avoid. >>>>> >>>>> Tell me if you disagree. >>>>> >>>>> Jonathan >>>>> >>>>> Jason Ronallo wrote: >>>>> >>>>> >>>>>> >>>>>> On Tue, Jun 24, 2008 at 6:11 PM, Jonathan Rochkind >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> [Redirected to list in case Ross is interested] >>>>>>> >>>>>>> Jason Ronallo wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> One enhancement that I could add now would be a link to a search of >>>>>>>> GBS by author/title limiting to fulltext views. Would you be >>>>>>>> interested in me adding that? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> Possibly. Is that possible to do? I thought the GBS API only >>>>>>> involved >>>>>>> identifiers? Would you screen-scrape? Or you'd just give a link to a >>>>>>> search >>>>>>> in GBS native, without knowing if the search would find anything? >>>>>>> Hmm, >>>>>>> as a >>>>>>> general philosophy I don't like providing links on Umlaut unless they >>>>>>> can >>>>>>> actually be pre-checked to make sure they go somewhere---this was >>>>>>> sort >>>>>>> of >>>>>>> a >>>>>>> precedent Ross set. I think we work with the API GPS gives us, even >>>>>>> if >>>>>>> it's >>>>>>> not what we'd like. >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> I was suggesting a non-pre-checked link. I agree it wouldn't be great. >>>>>> It could get folks to more fulltext, though. The real solution is >>>>>> getting more related identifiers to search with. After the IA screen >>>>>> scraper I'm not interested in writing another one if I can help it. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Right now, it's possible for you to find multiple hits for GBS, even >>>>>>> with >>>>>>> just identifier search, right? If you find multiple GBS hits, are >>>>>>> they >>>>>>> all >>>>>>> put on the screen? I'm thinking of a full-text version is found, >>>>>>> there's >>>>>>> no >>>>>>> need to link to other lesser versions in GBS. Altough alternately, >>>>>>> you >>>>>>> could >>>>>>> link to "see all at GBS" analagous to Internet Archive---if that's >>>>>>> possible >>>>>>> with GBS. What I don't like is providing a full text link AND an >>>>>>> individual >>>>>>> link to a "more information at GBS". You aren't doing that, are you? >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> I ask GBS for all available identifiers. I can conduct a search with >>>>>> as many identifiers as I can fit into a URL, and GBS will respond with >>>>>> all that it knows about. Sometimes none; sometimes all of them. Right >>>>>> now I show all links that are not duplicates. I do deduplicate >>>>>> because if I'm asking for ISBN, OCLCnum and LCCN they will often all >>>>>> be associated with the same book. But sometimes you will have the ISBN >>>>>> and OCLCnum associated with different GBS ids. >>>>>> >>>>>> I can rewrite to only show side links (highlighted_link) if there is >>>>>> no fulltext view available. Easy enough. >>>>>> >>>>>> Jason >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Jonathan Rochkind >>>>> Digital Services Software Engineer >>>>> The Sheridan Libraries >>>>> Johns Hopkins University >>>>> 410.516.8886 rochkind (at) jhu.edu >>>>> >>>>> >>>>> >>>>> >>> >>> -- >>> Jonathan Rochkind >>> Digital Services Software Engineer >>> The Sheridan Libraries >>> Johns Hopkins University >>> 410.516.8886 rochkind (at) jhu.edu >>> >>> _______________________________________________ >>> Umlaut-general mailing list >>> Umlaut-general at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/umlaut-general >>> >>> > > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > > From jronallo at gmail.com Wed Jun 25 10:18:33 2008 From: jronallo at gmail.com (Jason Ronallo) Date: Wed, 25 Jun 2008 10:18:33 -0400 Subject: [Umlaut-general] mailing list settings Message-ID: <763570460806250718i2b458b45mf6f12437c6bf3c74@mail.gmail.com> Hi, I tried to get into the admin interface for the umlaut-general list and was unable to. Jonathan, I think you'll need to make the change or send me a/the admin password. I think you'll want to change this setting to "This list": "Where are replies to list messages directed? Poster is strongly recommended for most mailing lists." Jason From rochkind at jhu.edu Wed Jun 25 10:41:33 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Wed, 25 Jun 2008 10:41:33 -0400 Subject: [Umlaut-general] mailing list settings In-Reply-To: <763570460806250718i2b458b45mf6f12437c6bf3c74@mail.gmail.com> References: <763570460806250718i2b458b45mf6f12437c6bf3c74@mail.gmail.com> Message-ID: <4862591D.70602@jhu.edu> I'm having trouble figuring out how to do this. Where do I get to that screen with those settings? I have no idea what the admin password is. Hmm. Jonathan Jason Ronallo wrote: > Hi, > I tried to get into the admin interface for the umlaut-general list > and was unable to. > > Jonathan, I think you'll need to make the change or send me a/the > admin password. > > I think you'll want to change this setting to "This list": > "Where are replies to list messages directed? Poster is strongly > recommended for most mailing lists." > > Jason > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From jronallo at gmail.com Wed Jun 25 10:47:56 2008 From: jronallo at gmail.com (Jason Ronallo) Date: Wed, 25 Jun 2008 10:47:56 -0400 Subject: [Umlaut-general] Internet Archive search---search only if you actually have an author? In-Reply-To: <4862513B.5000006@jhu.edu> References: <4862513B.5000006@jhu.edu> Message-ID: <763570460806250747r75645be0l54415f3e8433b3ec@mail.gmail.com> Yes, definite problem. There are two possibilities to correct this one: 1. As you suggest, don't search IA unless there is both title and author. This seems reasonable and easiest to do. How often would an incoming OpenURL for a book not have both author and title? Can't imagine that its too often. 2. Don't search IA if there is a jtitle. How often do they have any real journal coverage? Currently metadata_helper doesn't use jtitle in any way but can return an atitle. So first if there is an atitle; use that; elsif there is a jtitle; return nil for the title. Hmmmm. Another place that might need to be fixed is somewhere else. >From your COiNS on the page it looks like "World economics" is the value of both rft.jtitle and rft.title. If it were just in jtitle then metadata_helper would find no appropriate title and not search since there is no atitle. What may be happening is that it filters down through to the last elsif and selects the rft.title. So the fix might be in metadata_helper and/or in however you create the referent in the first place. Any reason to have that rft.title in there? Jason On Wed, Jun 25, 2008 at 10:07 AM, Jonathan Rochkind wrote: > Here's an example of a false positive from the Internet Archive author/title > keyword search: > > http://findit.library.jhu.edu/resolve?url_ver=Z39.88-2004&url_ctx_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Actx&ctx_tim=2008-06-25T09%3A12%3A08-0400&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&ctx_id=&rfr_id=info%3Asid%2Fsfxit.com%3Acitation&rft.jtitle=world+economics&rft.genre=journal&rft.__citation_form=journal&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal > > http://findit.library.jhu.edu/go/421022 > > Note that while the words "world" and "economics" were in the title of the > hit---it's a different work entirely. While some false positives are > unavoidable with the author/title keyword technique, I'm thinking this one > may suggest that we shouldn't do an IA lookup when there is no author > information in the Referent. Searching on title alone will lead to a lot > more false positives, which is what's happened here. I'm leaning toward > changing the IA adaptor to only do a search if it has author or title > information. > > What do you think? > > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > From rochkind at jhu.edu Wed Jun 25 10:50:52 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Wed, 25 Jun 2008 10:50:52 -0400 Subject: [Umlaut-general] mailing list settings In-Reply-To: <4862591D.70602@jhu.edu> References: <763570460806250718i2b458b45mf6f12437c6bf3c74@mail.gmail.com> <4862591D.70602@jhu.edu> Message-ID: <48625B4C.1000402@jhu.edu> Actually, I figured it out, and I don't have the admin password, have to file a support request on rubyforge. Jonathan Jonathan Rochkind wrote: > I'm having trouble figuring out how to do this. Where do I get to that > screen with those settings? > > I have no idea what the admin password is. Hmm. > > Jonathan > > Jason Ronallo wrote: >> Hi, >> I tried to get into the admin interface for the umlaut-general list >> and was unable to. >> >> Jonathan, I think you'll need to make the change or send me a/the >> admin password. >> >> I think you'll want to change this setting to "This list": >> "Where are replies to list messages directed? Poster is strongly >> recommended for most mailing lists." >> >> Jason >> _______________________________________________ >> Umlaut-general mailing list >> Umlaut-general at rubyforge.org >> http://rubyforge.org/mailman/listinfo/umlaut-general >> > -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From jronallo at gmail.com Wed Jun 25 10:51:06 2008 From: jronallo at gmail.com (Jason Ronallo) Date: Wed, 25 Jun 2008 10:51:06 -0400 Subject: [Umlaut-general] mailing list settings In-Reply-To: <4862591D.70602@jhu.edu> References: <763570460806250718i2b458b45mf6f12437c6bf3c74@mail.gmail.com> <4862591D.70602@jhu.edu> Message-ID: <763570460806250751m18a545ccjf39887a1e7b823d6@mail.gmail.com> Go here: http://rubyforge.org/mail/?group_id=4382 Click: Admin (next to Lists) Next to umlaut-general click: administrate If you didn't change your mailman administrator password search your email for: "Your mailing list password is" from mailman-owner at rubyforge.org HTH, Jason On Wed, Jun 25, 2008 at 10:41 AM, Jonathan Rochkind wrote: > I'm having trouble figuring out how to do this. Where do I get to that > screen with those settings? > > I have no idea what the admin password is. Hmm. > > Jonathan > > Jason Ronallo wrote: >> >> Hi, >> I tried to get into the admin interface for the umlaut-general list >> and was unable to. >> >> Jonathan, I think you'll need to make the change or send me a/the >> admin password. >> >> I think you'll want to change this setting to "This list": >> "Where are replies to list messages directed? Poster is strongly >> recommended for most mailing lists." >> >> Jason >> _______________________________________________ >> Umlaut-general mailing list >> Umlaut-general at rubyforge.org >> http://rubyforge.org/mailman/listinfo/umlaut-general >> > > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > From jronallo at gmail.com Wed Jun 25 11:09:50 2008 From: jronallo at gmail.com (Jason Ronallo) Date: Wed, 25 Jun 2008 11:09:50 -0400 Subject: [Umlaut-general] Internet Archive search---search only if you actually have an author? In-Reply-To: <48625D36.7050807@jhu.edu> References: <4862513B.5000006@jhu.edu> <763570460806250747r75645be0l54415f3e8433b3ec@mail.gmail.com> <48625D36.7050807@jhu.edu> Message-ID: <763570460806250809j6801953ckb0d652ab2656de9d@mail.gmail.com> > I don't really know what's going on with the rft.title as you suggest. It > could be that whatever source generated the openurl supplied the journal > title as an rft.title. We don't really have control over what sources send > us. By all means I'm happy to have this stay in production. :) OK, so maybe not related to this problem but it does seem to be another error. The original link you gave does not have an rft.title (I don't see one), but your COiNS does. And wherever it is inserting an rft.title contributed to this problem. So I think this is still a problem within Umlaut. This might have other unintended consquences other places as well? Jason From rossfsinger at gmail.com Wed Jun 25 11:41:54 2008 From: rossfsinger at gmail.com (Ross Singer) Date: Wed, 25 Jun 2008 11:41:54 -0400 Subject: [Umlaut-general] enhancing metadata OR are my eyes failing me? In-Reply-To: <486257FC.5030503@jhu.edu> References: <763570460806211910j14bcf78j6de7fe2ad470baeb@mail.gmail.com> <48617121.5090000@jhu.edu> <763570460806241527n46da7856laf160d76ca2a0857@mail.gmail.com> <486175DC.1000107@jhu.edu> <763570460806241543m2e51d99at5fa78b84d73fbab4@mail.gmail.com> <48617A26.8090403@jhu.edu> <23b83f160806242016s25932526ge9503cd0b571d84@mail.gmail.com> <48624F67.1000701@jhu.edu> <23b83f160806250710j5c20c2bamd303decc7f33aed7@mail.gmail.com> <486257FC.5030503@jhu.edu> Message-ID: <23b83f160806250841h3cfc7874n84c672b79acb04d1@mail.gmail.com> Redirecting this back to the list *again*. I don't really care, honestly. If the wiki works, there you go. I guess I thought if we had something to track and document the fixing of bugs, we would also have a system to handle dependencies and blah blah blah. Since Rubyforge is G-Forge (I think, right?), I assumed that we could use what's there (although, admittedly, it might suck). Ultimately, it doesn't matter much. I'm going to look into the mailing list options *now*, though. -Ross. On Wed, Jun 25, 2008 at 10:36 AM, Jonathan Rochkind wrote: > There is a laundry list of feature wishlist here: > > http://wiki.code4lib.org/index.php/Umlaut_wishlist > > But feel free to start a list in trac or whatever other tools are on > rubyforge or somewhere else, if you feel it would be valuable. Myself, I'm > happy with the laundry list on the wiki, but understand if someone wants > something more organized. > > I'll try to write more about the Referent thing soon. > > Jonathan > > Ross Singer wrote: >> >> Redirecting this back to the list. We may want to change the mailing >> list options to make the list the reply-to address... >> >> I'm not saying the Amazon screen scraping is a priority. That was >> more of a question of, "how much is in GBS" and, if it's "not nearly >> as much as Amazon", we could put that in the development queue. >> >> Speaking of, do we want to formalize that a bit? Some kind of >> bug/feature tracking? >> >> I'm totally having a problem wrapping my head around creating new >> 'dummy' referents to store this complementary data. It's not that I'm >> opposed to this approach, I just don't understand it at all. >> >> That being said, I'm all about discussing how to include xISBN (or >> insert data enrichment service here) data consistently and coherently. >> >> -Ross. >> >> On Wed, Jun 25, 2008 at 10:00 AM, Jonathan Rochkind >> wrote: >> >>> >>> Well, instead of arguing about how big a priority xISBN is, we could talk >>> about how to do it instead---by which I mean, where and how to store the >>> data for alternate versions. >>> >>> But first let me respond to the simpler question. It might be worth >>> screen >>> scraping Amazon to see if there's a search-inside-the-book provided---if >>> we >>> can also figure out a way to generate a direct link into search-inside >>> for a >>> particular query. But I think we should focus on the more >>> straightforward >>> ones first. >>> >>> As far as how to store the alternate manifestation stuff----after >>> sleeping >>> on it, I'm still leaning toward creating a new Referent in the db, and >>> linking the original Referent to one or more new Referents of "other >>> versions". But now I don't have time to outline why I think this makes >>> sense---and what problems still exist. I'll try to write more on it today >>> or >>> tommorow though. >>> >>> Jonathan >>> >>> Ross Singer wrote: >>> >>>> >>>> I'm really glad this conversation steered towards the philosophy of >>>> "link if it seems like there's something there" rather than "link to >>>> see if something's there". >>>> >>>> Jonathan, why do you think xISBN is such a low priority? I felt it >>>> was a *hugely* high priority in U1 because I assumed books would >>>> generally be defined by their ISBN (I mean, if we're talking OpenURL) >>>> and that the likelihood that I had that manifestation was low. Plus >>>> the xisbn gem made it dead simple. >>>> >>>> The 'search in this book' concept is pretty cool. Is GBS's coverage >>>> on par with Amazon's? Would it be worth taking the Amazon API, >>>> scraping their product page, and seeing the "Search Inside this book!" >>>> icon is there? I mean, as a last resort after the other options? >>>> >>>> -Ross. >>>> >>>> On Tue, Jun 24, 2008 at 6:50 PM, Jonathan Rochkind >>>> wrote: >>>> >>>> >>>>> >>>>> I know you're pushing for xISBN Jason, really. I agree it's useful, >>>>> although I don't think it's as vital as you do, but that's okay. >>>>> >>>>> But meanwhile some of my librarians who have been conducting actual >>>>> interviews with users say one of the things they want most is a "search >>>>> inside the book" link directly on the screen (OPAC and Find It). We >>>>> talked >>>>> before about how we could potentially do this with both GBS and IA >>>>> after >>>>> identifying the availability of search inside the book. So I'd really >>>>> like >>>>> to focus on that first, as some more 'low hanging fruit' with big >>>>> improvement for relatively small effort. >>>>> >>>>> For IA, there's really only a good 'search inside the book' interface >>>>> when >>>>> they provide the 'flipbook' version. Can you tell from your XML results >>>>> if a >>>>> flipbook version is there? Can you figure out if there's a way to >>>>> generate >>>>> a URL into a flipbook with a particular query? Perhaps we should try >>>>> asking >>>>> Alexis, who has been helpful at solving our IA mysteries, if there's a >>>>> way >>>>> to do that. For GBS, the URL format to send a query into GBS is clear. >>>>> And >>>>> two of the three GBS format types promise search inside the book---the >>>>> third, you cant' tell if it'll be there or not, so I figure don't >>>>> provide >>>>> the box for those. >>>>> >>>>> Actual view integration of that new type of Umlaut response is fairly >>>>> straightforward---except it'll be a bit klunky, because the AJAX stuff >>>>> goes >>>>> and replaces the DIV this stuff is in every four seconds until bg >>>>> processes >>>>> are done. Which doesn't matter too much if all that's in the DIVs is >>>>> links, >>>>> but when there's a textbox in the div that the user could be typing >>>>> in---bad. Have to think of a clever way around this. >>>>> >>>>> Jonathan >>>>> >>>>> Jason Ronallo wrote: >>>>> >>>>> >>>>>> >>>>>> I agree. What you want should be easy to implement. It will definitely >>>>>> matter much more as we are able to get more hits returned from this >>>>>> service and others. And this is the kind of case where if we have many >>>>>> fulltext hits and one matches an identifier from the original request, >>>>>> we could show that one and provide a link (or not) to other related >>>>>> fulltext versions. >>>>>> >>>>>> Jason >>>>>> >>>>>> On Tue, Jun 24, 2008 at 6:31 PM, Jonathan Rochkind >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> For IA, we basically show the one best link, then a "see also" link >>>>>>> to >>>>>>> see >>>>>>> all others. Is there any way to duplicate that with Google? I think >>>>>>> that >>>>>>> is >>>>>>> actually the best way to do it. >>>>>>> >>>>>>> But failing that, I think we only show the one best link (ie, one >>>>>>> with >>>>>>> actual full text search), yeah. I think the menu is starting to get >>>>>>> cluttered up with non-essential stuff, and giving them several (or a >>>>>>> dozen!) >>>>>>> seperate lines linking to different digitizations at google (some >>>>>>> more >>>>>>> complete than others), I think we want to avoid. >>>>>>> >>>>>>> Tell me if you disagree. >>>>>>> >>>>>>> Jonathan >>>>>>> >>>>>>> Jason Ronallo wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> On Tue, Jun 24, 2008 at 6:11 PM, Jonathan Rochkind >>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> [Redirected to list in case Ross is interested] >>>>>>>>> >>>>>>>>> Jason Ronallo wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> One enhancement that I could add now would be a link to a search >>>>>>>>>> of >>>>>>>>>> GBS by author/title limiting to fulltext views. Would you be >>>>>>>>>> interested in me adding that? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> Possibly. Is that possible to do? I thought the GBS API only >>>>>>>>> involved >>>>>>>>> identifiers? Would you screen-scrape? Or you'd just give a link to >>>>>>>>> a >>>>>>>>> search >>>>>>>>> in GBS native, without knowing if the search would find anything? >>>>>>>>> Hmm, >>>>>>>>> as a >>>>>>>>> general philosophy I don't like providing links on Umlaut unless >>>>>>>>> they >>>>>>>>> can >>>>>>>>> actually be pre-checked to make sure they go somewhere---this was >>>>>>>>> sort >>>>>>>>> of >>>>>>>>> a >>>>>>>>> precedent Ross set. I think we work with the API GPS gives us, >>>>>>>>> even >>>>>>>>> if >>>>>>>>> it's >>>>>>>>> not what we'd like. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> I was suggesting a non-pre-checked link. I agree it wouldn't be >>>>>>>> great. >>>>>>>> It could get folks to more fulltext, though. The real solution is >>>>>>>> getting more related identifiers to search with. After the IA screen >>>>>>>> scraper I'm not interested in writing another one if I can help it. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Right now, it's possible for you to find multiple hits for GBS, >>>>>>>>> even >>>>>>>>> with >>>>>>>>> just identifier search, right? If you find multiple GBS hits, are >>>>>>>>> they >>>>>>>>> all >>>>>>>>> put on the screen? I'm thinking of a full-text version is found, >>>>>>>>> there's >>>>>>>>> no >>>>>>>>> need to link to other lesser versions in GBS. Altough alternately, >>>>>>>>> you >>>>>>>>> could >>>>>>>>> link to "see all at GBS" analagous to Internet Archive---if that's >>>>>>>>> possible >>>>>>>>> with GBS. What I don't like is providing a full text link AND an >>>>>>>>> individual >>>>>>>>> link to a "more information at GBS". You aren't doing that, are >>>>>>>>> you? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> I ask GBS for all available identifiers. I can conduct a search with >>>>>>>> as many identifiers as I can fit into a URL, and GBS will respond >>>>>>>> with >>>>>>>> all that it knows about. Sometimes none; sometimes all of them. >>>>>>>> Right >>>>>>>> now I show all links that are not duplicates. I do deduplicate >>>>>>>> because if I'm asking for ISBN, OCLCnum and LCCN they will often all >>>>>>>> be associated with the same book. But sometimes you will have the >>>>>>>> ISBN >>>>>>>> and OCLCnum associated with different GBS ids. >>>>>>>> >>>>>>>> I can rewrite to only show side links (highlighted_link) if there is >>>>>>>> no fulltext view available. Easy enough. >>>>>>>> >>>>>>>> Jason >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jonathan Rochkind >>>>>>> Digital Services Software Engineer >>>>>>> The Sheridan Libraries >>>>>>> Johns Hopkins University >>>>>>> 410.516.8886 rochkind (at) jhu.edu >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> Jonathan Rochkind >>>>> Digital Services Software Engineer >>>>> The Sheridan Libraries >>>>> Johns Hopkins University >>>>> 410.516.8886 rochkind (at) jhu.edu >>>>> >>>>> _______________________________________________ >>>>> Umlaut-general mailing list >>>>> Umlaut-general at rubyforge.org >>>>> http://rubyforge.org/mailman/listinfo/umlaut-general >>>>> >>>>> >>>>> >>> >>> -- >>> Jonathan Rochkind >>> Digital Services Software Engineer >>> The Sheridan Libraries >>> Johns Hopkins University >>> 410.516.8886 rochkind (at) jhu.edu >>> >>> >>> >> >> _______________________________________________ >> Umlaut-general mailing list >> Umlaut-general at rubyforge.org >> http://rubyforge.org/mailman/listinfo/umlaut-general >> > > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > > From rossfsinger at gmail.com Wed Jun 25 12:04:17 2008 From: rossfsinger at gmail.com (Ross Singer) Date: Wed, 25 Jun 2008 12:04:17 -0400 Subject: [Umlaut-general] Internet Archive search---search only if you actually have an author? In-Reply-To: <763570460806250809j6801953ckb0d652ab2656de9d@mail.gmail.com> References: <4862513B.5000006@jhu.edu> <763570460806250747r75645be0l54415f3e8433b3ec@mail.gmail.com> <48625D36.7050807@jhu.edu> <763570460806250809j6801953ckb0d652ab2656de9d@mail.gmail.com> Message-ID: <23b83f160806250904v2f5cf01ajaaa8a11ce97e03cd@mail.gmail.com> rft.title is deprecated in Z39.88 (it's an OpenURL 0.1 legacy), but often kept for OpenURL 0.1 and hybrid OpenURLs. I have no objection to ditching it in favor of the more specific title key. I don't know if this might cause problems with more haphazard OpenURL resolvers in the wild, though. -Ross. On Wed, Jun 25, 2008 at 11:09 AM, Jason Ronallo wrote: >> I don't really know what's going on with the rft.title as you suggest. It >> could be that whatever source generated the openurl supplied the journal >> title as an rft.title. We don't really have control over what sources send >> us. > > By all means I'm happy to have this stay in production. :) > > OK, so maybe not related to this problem but it does seem to be > another error. The original link you gave does not have an rft.title > (I don't see one), but your COiNS does. And wherever it is inserting > an rft.title contributed to this problem. So I think this is still a > problem within Umlaut. This might have other unintended consquences > other places as well? > > Jason > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > From rochkind at jhu.edu Wed Jun 25 12:10:04 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Wed, 25 Jun 2008 12:10:04 -0400 Subject: [Umlaut-general] Internet Archive search---search only if you actually have an author? In-Reply-To: <763570460806250808t4eb0d75fnf18855da128c6d5f@mail.gmail.com> References: <4862513B.5000006@jhu.edu> <763570460806250747r75645be0l54415f3e8433b3ec@mail.gmail.com> <48625D36.7050807@jhu.edu> <763570460806250808t4eb0d75fnf18855da128c6d5f@mail.gmail.com> Message-ID: <48626DDC.7070605@jhu.edu> I'm confused about why you're putting the COinS into the conversation? Let's ignore the COinS, it's got nothing to do with this, does it? Are you saying that an rft.title has appeared in the Referent, even though it was not in the original OpenURL? If so, that may be worth looking into. I don't see what the COinS has got to do with it. Perhaps there's a bug or an anomaly in the COinS generating code or something, who knows, it's got nothing to do with InternetArchive service, does it? Jonathan Jason Ronallo wrote: >> I don't really know what's going on with the rft.title as you suggest. It >> could be that whatever source generated the openurl supplied the journal >> title as an rft.title. We don't really have control over what sources send >> us. >> > > By all means I'm happy to have this stay in production. :) > > OK, so maybe not related to this problem but it does seem to be > another error. The original link you gave does not have an rft.title > (I don't see one), but your COiNS does. And wherever it is inserting > an rft.title contributed to this problem. This might have other > unintended consquences other places as well? > > Jason > -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From rochkind at jhu.edu Wed Jun 25 12:14:27 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Wed, 25 Jun 2008 12:14:27 -0400 Subject: [Umlaut-general] enhancing metadata OR are my eyes failing me? In-Reply-To: <23b83f160806250841h3cfc7874n84c672b79acb04d1@mail.gmail.com> References: <763570460806211910j14bcf78j6de7fe2ad470baeb@mail.gmail.com> <48617121.5090000@jhu.edu> <763570460806241527n46da7856laf160d76ca2a0857@mail.gmail.com> <486175DC.1000107@jhu.edu> <763570460806241543m2e51d99at5fa78b84d73fbab4@mail.gmail.com> <48617A26.8090403@jhu.edu> <23b83f160806242016s25932526ge9503cd0b571d84@mail.gmail.com> <48624F67.1000701@jhu.edu> <23b83f160806250710j5c20c2bamd303decc7f33aed7@mail.gmail.com> <486257FC.5030503@jhu.edu> <23b83f160806250841h3cfc7874n84c672b79acb04d1@mail.gmail.com> Message-ID: <48626EE3.7070608@jhu.edu> I'm afraid I've lost the listserv admin password. I've filed a ticket on rubyforge to have them hopefully recover it for me. Jonathan Ross Singer wrote: > Redirecting this back to the list *again*. > > I don't really care, honestly. If the wiki works, there you go. I > guess I thought if we had something to track and document the fixing > of bugs, we would also have a system to handle dependencies and blah > blah blah. > > Since Rubyforge is G-Forge (I think, right?), I assumed that we could > use what's there (although, admittedly, it might suck). > > Ultimately, it doesn't matter much. I'm going to look into the > mailing list options *now*, though. > > -Ross. > > On Wed, Jun 25, 2008 at 10:36 AM, Jonathan Rochkind wrote: > >> There is a laundry list of feature wishlist here: >> >> http://wiki.code4lib.org/index.php/Umlaut_wishlist >> >> But feel free to start a list in trac or whatever other tools are on >> rubyforge or somewhere else, if you feel it would be valuable. Myself, I'm >> happy with the laundry list on the wiki, but understand if someone wants >> something more organized. >> >> I'll try to write more about the Referent thing soon. >> >> Jonathan >> >> Ross Singer wrote: >> >>> Redirecting this back to the list. We may want to change the mailing >>> list options to make the list the reply-to address... >>> >>> I'm not saying the Amazon screen scraping is a priority. That was >>> more of a question of, "how much is in GBS" and, if it's "not nearly >>> as much as Amazon", we could put that in the development queue. >>> >>> Speaking of, do we want to formalize that a bit? Some kind of >>> bug/feature tracking? >>> >>> I'm totally having a problem wrapping my head around creating new >>> 'dummy' referents to store this complementary data. It's not that I'm >>> opposed to this approach, I just don't understand it at all. >>> >>> That being said, I'm all about discussing how to include xISBN (or >>> insert data enrichment service here) data consistently and coherently. >>> >>> -Ross. >>> >>> On Wed, Jun 25, 2008 at 10:00 AM, Jonathan Rochkind >>> wrote: >>> >>> >>>> Well, instead of arguing about how big a priority xISBN is, we could talk >>>> about how to do it instead---by which I mean, where and how to store the >>>> data for alternate versions. >>>> >>>> But first let me respond to the simpler question. It might be worth >>>> screen >>>> scraping Amazon to see if there's a search-inside-the-book provided---if >>>> we >>>> can also figure out a way to generate a direct link into search-inside >>>> for a >>>> particular query. But I think we should focus on the more >>>> straightforward >>>> ones first. >>>> >>>> As far as how to store the alternate manifestation stuff----after >>>> sleeping >>>> on it, I'm still leaning toward creating a new Referent in the db, and >>>> linking the original Referent to one or more new Referents of "other >>>> versions". But now I don't have time to outline why I think this makes >>>> sense---and what problems still exist. I'll try to write more on it today >>>> or >>>> tommorow though. >>>> >>>> Jonathan >>>> >>>> Ross Singer wrote: >>>> >>>> >>>>> I'm really glad this conversation steered towards the philosophy of >>>>> "link if it seems like there's something there" rather than "link to >>>>> see if something's there". >>>>> >>>>> Jonathan, why do you think xISBN is such a low priority? I felt it >>>>> was a *hugely* high priority in U1 because I assumed books would >>>>> generally be defined by their ISBN (I mean, if we're talking OpenURL) >>>>> and that the likelihood that I had that manifestation was low. Plus >>>>> the xisbn gem made it dead simple. >>>>> >>>>> The 'search in this book' concept is pretty cool. Is GBS's coverage >>>>> on par with Amazon's? Would it be worth taking the Amazon API, >>>>> scraping their product page, and seeing the "Search Inside this book!" >>>>> icon is there? I mean, as a last resort after the other options? >>>>> >>>>> -Ross. >>>>> >>>>> On Tue, Jun 24, 2008 at 6:50 PM, Jonathan Rochkind >>>>> wrote: >>>>> >>>>> >>>>> >>>>>> I know you're pushing for xISBN Jason, really. I agree it's useful, >>>>>> although I don't think it's as vital as you do, but that's okay. >>>>>> >>>>>> But meanwhile some of my librarians who have been conducting actual >>>>>> interviews with users say one of the things they want most is a "search >>>>>> inside the book" link directly on the screen (OPAC and Find It). We >>>>>> talked >>>>>> before about how we could potentially do this with both GBS and IA >>>>>> after >>>>>> identifying the availability of search inside the book. So I'd really >>>>>> like >>>>>> to focus on that first, as some more 'low hanging fruit' with big >>>>>> improvement for relatively small effort. >>>>>> >>>>>> For IA, there's really only a good 'search inside the book' interface >>>>>> when >>>>>> they provide the 'flipbook' version. Can you tell from your XML results >>>>>> if a >>>>>> flipbook version is there? Can you figure out if there's a way to >>>>>> generate >>>>>> a URL into a flipbook with a particular query? Perhaps we should try >>>>>> asking >>>>>> Alexis, who has been helpful at solving our IA mysteries, if there's a >>>>>> way >>>>>> to do that. For GBS, the URL format to send a query into GBS is clear. >>>>>> And >>>>>> two of the three GBS format types promise search inside the book---the >>>>>> third, you cant' tell if it'll be there or not, so I figure don't >>>>>> provide >>>>>> the box for those. >>>>>> >>>>>> Actual view integration of that new type of Umlaut response is fairly >>>>>> straightforward---except it'll be a bit klunky, because the AJAX stuff >>>>>> goes >>>>>> and replaces the DIV this stuff is in every four seconds until bg >>>>>> processes >>>>>> are done. Which doesn't matter too much if all that's in the DIVs is >>>>>> links, >>>>>> but when there's a textbox in the div that the user could be typing >>>>>> in---bad. Have to think of a clever way around this. >>>>>> >>>>>> Jonathan >>>>>> >>>>>> Jason Ronallo wrote: >>>>>> >>>>>> >>>>>> >>>>>>> I agree. What you want should be easy to implement. It will definitely >>>>>>> matter much more as we are able to get more hits returned from this >>>>>>> service and others. And this is the kind of case where if we have many >>>>>>> fulltext hits and one matches an identifier from the original request, >>>>>>> we could show that one and provide a link (or not) to other related >>>>>>> fulltext versions. >>>>>>> >>>>>>> Jason >>>>>>> >>>>>>> On Tue, Jun 24, 2008 at 6:31 PM, Jonathan Rochkind >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> For IA, we basically show the one best link, then a "see also" link >>>>>>>> to >>>>>>>> see >>>>>>>> all others. Is there any way to duplicate that with Google? I think >>>>>>>> that >>>>>>>> is >>>>>>>> actually the best way to do it. >>>>>>>> >>>>>>>> But failing that, I think we only show the one best link (ie, one >>>>>>>> with >>>>>>>> actual full text search), yeah. I think the menu is starting to get >>>>>>>> cluttered up with non-essential stuff, and giving them several (or a >>>>>>>> dozen!) >>>>>>>> seperate lines linking to different digitizations at google (some >>>>>>>> more >>>>>>>> complete than others), I think we want to avoid. >>>>>>>> >>>>>>>> Tell me if you disagree. >>>>>>>> >>>>>>>> Jonathan >>>>>>>> >>>>>>>> Jason Ronallo wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Tue, Jun 24, 2008 at 6:11 PM, Jonathan Rochkind >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> [Redirected to list in case Ross is interested] >>>>>>>>>> >>>>>>>>>> Jason Ronallo wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> One enhancement that I could add now would be a link to a search >>>>>>>>>>> of >>>>>>>>>>> GBS by author/title limiting to fulltext views. Would you be >>>>>>>>>>> interested in me adding that? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Possibly. Is that possible to do? I thought the GBS API only >>>>>>>>>> involved >>>>>>>>>> identifiers? Would you screen-scrape? Or you'd just give a link to >>>>>>>>>> a >>>>>>>>>> search >>>>>>>>>> in GBS native, without knowing if the search would find anything? >>>>>>>>>> Hmm, >>>>>>>>>> as a >>>>>>>>>> general philosophy I don't like providing links on Umlaut unless >>>>>>>>>> they >>>>>>>>>> can >>>>>>>>>> actually be pre-checked to make sure they go somewhere---this was >>>>>>>>>> sort >>>>>>>>>> of >>>>>>>>>> a >>>>>>>>>> precedent Ross set. I think we work with the API GPS gives us, >>>>>>>>>> even >>>>>>>>>> if >>>>>>>>>> it's >>>>>>>>>> not what we'd like. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> I was suggesting a non-pre-checked link. I agree it wouldn't be >>>>>>>>> great. >>>>>>>>> It could get folks to more fulltext, though. The real solution is >>>>>>>>> getting more related identifiers to search with. After the IA screen >>>>>>>>> scraper I'm not interested in writing another one if I can help it. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Right now, it's possible for you to find multiple hits for GBS, >>>>>>>>>> even >>>>>>>>>> with >>>>>>>>>> just identifier search, right? If you find multiple GBS hits, are >>>>>>>>>> they >>>>>>>>>> all >>>>>>>>>> put on the screen? I'm thinking of a full-text version is found, >>>>>>>>>> there's >>>>>>>>>> no >>>>>>>>>> need to link to other lesser versions in GBS. Altough alternately, >>>>>>>>>> you >>>>>>>>>> could >>>>>>>>>> link to "see all at GBS" analagous to Internet Archive---if that's >>>>>>>>>> possible >>>>>>>>>> with GBS. What I don't like is providing a full text link AND an >>>>>>>>>> individual >>>>>>>>>> link to a "more information at GBS". You aren't doing that, are >>>>>>>>>> you? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> I ask GBS for all available identifiers. I can conduct a search with >>>>>>>>> as many identifiers as I can fit into a URL, and GBS will respond >>>>>>>>> with >>>>>>>>> all that it knows about. Sometimes none; sometimes all of them. >>>>>>>>> Right >>>>>>>>> now I show all links that are not duplicates. I do deduplicate >>>>>>>>> because if I'm asking for ISBN, OCLCnum and LCCN they will often all >>>>>>>>> be associated with the same book. But sometimes you will have the >>>>>>>>> ISBN >>>>>>>>> and OCLCnum associated with different GBS ids. >>>>>>>>> >>>>>>>>> I can rewrite to only show side links (highlighted_link) if there is >>>>>>>>> no fulltext view available. Easy enough. >>>>>>>>> >>>>>>>>> Jason >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> -- >>>>>>>> Jonathan Rochkind >>>>>>>> Digital Services Software Engineer >>>>>>>> The Sheridan Libraries >>>>>>>> Johns Hopkins University >>>>>>>> 410.516.8886 rochkind (at) jhu.edu >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> Jonathan Rochkind >>>>>> Digital Services Software Engineer >>>>>> The Sheridan Libraries >>>>>> Johns Hopkins University >>>>>> 410.516.8886 rochkind (at) jhu.edu >>>>>> >>>>>> _______________________________________________ >>>>>> Umlaut-general mailing list >>>>>> Umlaut-general at rubyforge.org >>>>>> http://rubyforge.org/mailman/listinfo/umlaut-general >>>>>> >>>>>> >>>>>> >>>>>> >>>> -- >>>> Jonathan Rochkind >>>> Digital Services Software Engineer >>>> The Sheridan Libraries >>>> Johns Hopkins University >>>> 410.516.8886 rochkind (at) jhu.edu >>>> >>>> >>>> >>>> >>> _______________________________________________ >>> Umlaut-general mailing list >>> Umlaut-general at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/umlaut-general >>> >>> >> -- >> Jonathan Rochkind >> Digital Services Software Engineer >> The Sheridan Libraries >> Johns Hopkins University >> 410.516.8886 rochkind (at) jhu.edu >> >> >> > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From jronallo at gmail.com Wed Jun 25 12:26:07 2008 From: jronallo at gmail.com (Jason Ronallo) Date: Wed, 25 Jun 2008 12:26:07 -0400 Subject: [Umlaut-general] Internet Archive search---search only if you actually have an author? In-Reply-To: <48626DDC.7070605@jhu.edu> References: <4862513B.5000006@jhu.edu> <763570460806250747r75645be0l54415f3e8433b3ec@mail.gmail.com> <48625D36.7050807@jhu.edu> <763570460806250808t4eb0d75fnf18855da128c6d5f@mail.gmail.com> <48626DDC.7070605@jhu.edu> Message-ID: <763570460806250926t5a60d034ifdd305f14d01ae55@mail.gmail.com> The COiNS was just a clue that an rft.title was getting inserted in there somewhere, but since I can't test it right now it was just a hunch. Yes, from what I can tell the Referent gained an rft.title some place. The problem you found would not have surfaced if there was no rft.title. I didn't know that Referents with a more exact title like rft.jtitle in the original Referent could pick up an rft.title. If this is happening other places it becomes more difficult to write the kind of logic found in metadata_helper that is looking for the best title for a particular purpose. Not insurmountable, but something I hadn't considered. I'd like to try to discover where that is happening (just my curiosity at this point). If it is as part of COiNS generation, then Ross has given a good reason why it ought to be in the COiNS (and maybe other places), but not necessarily that it ought to become part of the original referent. If the original referent is v.1.0 why enhance it with a v.0.1 key? I'm just trying to understand what's going on here and why. Jason On Wed, Jun 25, 2008 at 12:10 PM, Jonathan Rochkind wrote: > I'm confused about why you're putting the COinS into the conversation? > Let's ignore the COinS, it's got nothing to do with this, does it? > > Are you saying that an rft.title has appeared in the Referent, even though > it was not in the original OpenURL? If so, that may be worth looking into. I > don't see what the COinS has got to do with it. Perhaps there's a bug or an > anomaly in the COinS generating code or something, who knows, it's got > nothing to do with InternetArchive service, does it? > > Jonathan > > Jason Ronallo wrote: >>> >>> I don't really know what's going on with the rft.title as you suggest. It >>> could be that whatever source generated the openurl supplied the journal >>> title as an rft.title. We don't really have control over what sources >>> send >>> us. >>> >> >> By all means I'm happy to have this stay in production. :) >> >> OK, so maybe not related to this problem but it does seem to be >> another error. The original link you gave does not have an rft.title >> (I don't see one), but your COiNS does. And wherever it is inserting >> an rft.title contributed to this problem. This might have other >> unintended consquences other places as well? >> >> Jason >> > > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > From rossfsinger at gmail.com Wed Jun 25 12:40:08 2008 From: rossfsinger at gmail.com (Ross Singer) Date: Wed, 25 Jun 2008 12:40:08 -0400 Subject: [Umlaut-general] Internet Archive search---search only if you actually have an author? In-Reply-To: <763570460806250926t5a60d034ifdd305f14d01ae55@mail.gmail.com> References: <4862513B.5000006@jhu.edu> <763570460806250747r75645be0l54415f3e8433b3ec@mail.gmail.com> <48625D36.7050807@jhu.edu> <763570460806250808t4eb0d75fnf18855da128c6d5f@mail.gmail.com> <48626DDC.7070605@jhu.edu> <763570460806250926t5a60d034ifdd305f14d01ae55@mail.gmail.com> Message-ID: <23b83f160806250940x4096acd2j3b5d4b4673339219@mail.gmail.com> On Wed, Jun 25, 2008 at 12:26 PM, Jason Ronallo wrote: > If the original referent is v.1.0 why enhance it with a v.0.1 key? Because you can't assume targets are written well. In the case of resolver services that pre-date 1.0 and have been updated to support 1.0, it's possible they still only support .title. There's a tension in OpenURL between interoperability and metadata integrity; given the primary use case of OpenURL, I'd say interoperability should win the day. -Ross. From rochkind at jhu.edu Wed Jun 25 13:07:25 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Wed, 25 Jun 2008 13:07:25 -0400 Subject: [Umlaut-general] Internet Archive search---search only if you actually have an author? In-Reply-To: <763570460806250926t5a60d034ifdd305f14d01ae55@mail.gmail.com> References: <4862513B.5000006@jhu.edu> <763570460806250747r75645be0l54415f3e8433b3ec@mail.gmail.com> <48625D36.7050807@jhu.edu> <763570460806250808t4eb0d75fnf18855da128c6d5f@mail.gmail.com> <48626DDC.7070605@jhu.edu> <763570460806250926t5a60d034ifdd305f14d01ae55@mail.gmail.com> Message-ID: <48627B4D.6080702@jhu.edu> Makes sense. I'm not sure either where that's coming from. Maybe some logic that adds the defualt rft.title that should be re-thought. However, either way, our services need to deal with a very broad domain of data, including bad data. An incoming openurl very well _could_ have an rft.title and an rft.jtitle, so that can't mess everything up when it does happen. I'm confused as to what problem we're talking about that wouldn't have occured if it weren't for the rft.title, and why that is? Jonathan Jason Ronallo wrote: > The COiNS was just a clue that an rft.title was getting inserted in > there somewhere, but since I can't test it right now it was just a > hunch. Yes, from what I can tell the Referent gained an rft.title some > place. The problem you found would not have surfaced if there was no > rft.title. I didn't know that Referents with a more exact title like > rft.jtitle in the original Referent could pick up an rft.title. If > this is happening other places it becomes more difficult to write the > kind of logic found in metadata_helper that is looking for the best > title for a particular purpose. > > Not insurmountable, but something I hadn't considered. I'd like to try > to discover where that is happening (just my curiosity at this point). > If it is as part of COiNS generation, then Ross has given a good > reason why it ought to be in the COiNS (and maybe other places), but > not necessarily that it ought to become part of the original referent. > If the original referent is v.1.0 why enhance it with a v.0.1 key? > > I'm just trying to understand what's going on here and why. > > Jason > > On Wed, Jun 25, 2008 at 12:10 PM, Jonathan Rochkind wrote: > >> I'm confused about why you're putting the COinS into the conversation? >> Let's ignore the COinS, it's got nothing to do with this, does it? >> >> Are you saying that an rft.title has appeared in the Referent, even though >> it was not in the original OpenURL? If so, that may be worth looking into. I >> don't see what the COinS has got to do with it. Perhaps there's a bug or an >> anomaly in the COinS generating code or something, who knows, it's got >> nothing to do with InternetArchive service, does it? >> >> Jonathan >> >> Jason Ronallo wrote: >> >>>> I don't really know what's going on with the rft.title as you suggest. It >>>> could be that whatever source generated the openurl supplied the journal >>>> title as an rft.title. We don't really have control over what sources >>>> send >>>> us. >>>> >>>> >>> By all means I'm happy to have this stay in production. :) >>> >>> OK, so maybe not related to this problem but it does seem to be >>> another error. The original link you gave does not have an rft.title >>> (I don't see one), but your COiNS does. And wherever it is inserting >>> an rft.title contributed to this problem. This might have other >>> unintended consquences other places as well? >>> >>> Jason >>> >>> >> -- >> Jonathan Rochkind >> Digital Services Software Engineer >> The Sheridan Libraries >> Johns Hopkins University >> 410.516.8886 rochkind (at) jhu.edu >> >> _______________________________________________ >> Umlaut-general mailing list >> Umlaut-general at rubyforge.org >> http://rubyforge.org/mailman/listinfo/umlaut-general >> >> -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From rochkind at jhu.edu Wed Jun 25 13:25:28 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Wed, 25 Jun 2008 13:25:28 -0400 Subject: [Umlaut-general] "Not available" Message-ID: <48627F88.1020009@jhu.edu> So as I'm adding _some_ ebooks into our link resolver with the IA and future GBS services, and also more tightly integrating my link resolver into my OPAC, an old complaint has come up again from the librarians. "It says 'not available', but we DO pay for it, it's just that Find It can't find it!" Which is true. Of course the real solution here is making Find It able to find more and more things, identify why it can't find it, and make it find it. Sometimes that can be done. But in the end, we have a horrible mess of metadata and disorganization here at hopkins, there are for the foreseeable future going to be LOTS of things that Find It (Umlaut) can not find (although less and less every day). Just the way it is (If I were King of the Library....). What if instead of 'not available', it said 'not found'. I don't think my users would even notice the difference, but the librarians would be satisfied. Jonathan -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From rossfsinger at gmail.com Wed Jun 25 13:29:42 2008 From: rossfsinger at gmail.com (Ross Singer) Date: Wed, 25 Jun 2008 13:29:42 -0400 Subject: [Umlaut-general] "Not available" In-Reply-To: <48627F88.1020009@jhu.edu> References: <48627F88.1020009@jhu.edu> Message-ID: <23b83f160806251029l7dd7b6d1qe4bd1cff18784fe8@mail.gmail.com> Interesting and brings back memories. Out of curiosity, why isn't this just a local template issue? Also, what are the things it's not finding? I'm not asking an exhaustive list, just the trends. -Ross. On Wed, Jun 25, 2008 at 1:25 PM, Jonathan Rochkind wrote: > So as I'm adding _some_ ebooks into our link resolver with the IA and future > GBS services, and also more tightly integrating my link resolver into my > OPAC, an old complaint has come up again from the librarians. > > "It says 'not available', but we DO pay for it, it's just that Find It can't > find it!" Which is true. Of course the real solution here is making Find > It able to find more and more things, identify why it can't find it, and > make it find it. Sometimes that can be done. But in the end, we have a > horrible mess of metadata and disorganization here at hopkins, there are for > the foreseeable future going to be LOTS of things that Find It (Umlaut) can > not find (although less and less every day). Just the way it is (If I were > King of the Library....). > > What if instead of 'not available', it said 'not found'. I don't think my > users would even notice the difference, but the librarians would be > satisfied. > > Jonathan > > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > From jronallo at gmail.com Wed Jun 25 13:31:22 2008 From: jronallo at gmail.com (Jason Ronallo) Date: Wed, 25 Jun 2008 13:31:22 -0400 Subject: [Umlaut-general] "Not available" In-Reply-To: <48627F88.1020009@jhu.edu> References: <48627F88.1020009@jhu.edu> Message-ID: <763570460806251031v6b566b92mea48cb0da3d6a9f9@mail.gmail.com> If that slight a change in wording would make your people happy, then make the change. If you can give examples of what it can't find, that'd be helpful in figuring out what's going on. Jason On Wed, Jun 25, 2008 at 1:25 PM, Jonathan Rochkind wrote: > So as I'm adding _some_ ebooks into our link resolver with the IA and future > GBS services, and also more tightly integrating my link resolver into my > OPAC, an old complaint has come up again from the librarians. > > "It says 'not available', but we DO pay for it, it's just that Find It can't > find it!" Which is true. Of course the real solution here is making Find > It able to find more and more things, identify why it can't find it, and > make it find it. Sometimes that can be done. But in the end, we have a > horrible mess of metadata and disorganization here at hopkins, there are for > the foreseeable future going to be LOTS of things that Find It (Umlaut) can > not find (although less and less every day). Just the way it is (If I were > King of the Library....). > > What if instead of 'not available', it said 'not found'. I don't think my > users would even notice the difference, but the librarians would be > satisfied. > > Jonathan > > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > From jronallo at gmail.com Wed Jun 25 13:43:37 2008 From: jronallo at gmail.com (Jason Ronallo) Date: Wed, 25 Jun 2008 13:43:37 -0400 Subject: [Umlaut-general] Internet Archive search---search only if you actually have an author? In-Reply-To: <48627B4D.6080702@jhu.edu> References: <4862513B.5000006@jhu.edu> <763570460806250747r75645be0l54415f3e8433b3ec@mail.gmail.com> <48625D36.7050807@jhu.edu> <763570460806250808t4eb0d75fnf18855da128c6d5f@mail.gmail.com> <48626DDC.7070605@jhu.edu> <763570460806250926t5a60d034ifdd305f14d01ae55@mail.gmail.com> <48627B4D.6080702@jhu.edu> Message-ID: <763570460806251043w12196c90k7e3ed27a30d91a20@mail.gmail.com> On Wed, Jun 25, 2008 at 1:07 PM, Jonathan Rochkind wrote: > Makes sense. I'm not sure either where that's coming from. Maybe some logic > that adds the defualt rft.title that should be re-thought. > > However, either way, our services need to deal with a very broad domain of > data, including bad data. An incoming openurl very well _could_ have an > rft.title and an rft.jtitle, so that can't mess everything up when it does > happen. > > I'm confused as to what problem we're talking about that wouldn't have > occured if it weren't for the rft.title, and why that is? If you take a look at MetadataHelper#get_search_title, you'll see the logic used to pick the title to search by for the IA service. My assumption was that the last elsif there that picked rft.title would be safe, but I see in your case that's part of where things went wrong. I assumed that something with format journal wouldn't have a rft.title. I need to think through many more of the possibilities and combinations to sure that method up some. So besides the problem of searching for a title even without a creator, the logic of metadata_helper wasn't as good as it needs to be. Jason From rochkind at jhu.edu Wed Jun 25 14:28:35 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Wed, 25 Jun 2008 14:28:35 -0400 Subject: [Umlaut-general] Internet Archive search---search only if you actually have an author? In-Reply-To: <763570460806251043w12196c90k7e3ed27a30d91a20@mail.gmail.com> References: <4862513B.5000006@jhu.edu> <763570460806250747r75645be0l54415f3e8433b3ec@mail.gmail.com> <48625D36.7050807@jhu.edu> <763570460806250808t4eb0d75fnf18855da128c6d5f@mail.gmail.com> <48626DDC.7070605@jhu.edu> <763570460806250926t5a60d034ifdd305f14d01ae55@mail.gmail.com> <48627B4D.6080702@jhu.edu> <763570460806251043w12196c90k7e3ed27a30d91a20@mail.gmail.com> Message-ID: <48628E53.9000307@jhu.edu> I don't remember what there made things go wrong, if anything. The only thing I remember doing there was making sure that a blank (empty string) title or creator was never returned, nil was returned instead. That was really a different problem. Is that the change you're looking at? Otherwise, I'm not sure what you're referring to, but that's okay, I don't know if I need to. At any rate, no code should raise an exception or do something nonsensical when both an rft.title and an rft.jtitle are present, because it's always possible a foreign source will generate that, even if we changed whatever in Umlaut seems to be generating it. Jonathan Jason Ronallo wrote: > On Wed, Jun 25, 2008 at 1:07 PM, Jonathan Rochkind wrote: > >> Makes sense. I'm not sure either where that's coming from. Maybe some logic >> that adds the defualt rft.title that should be re-thought. >> >> However, either way, our services need to deal with a very broad domain of >> data, including bad data. An incoming openurl very well _could_ have an >> rft.title and an rft.jtitle, so that can't mess everything up when it does >> happen. >> >> I'm confused as to what problem we're talking about that wouldn't have >> occured if it weren't for the rft.title, and why that is? >> > > If you take a look at MetadataHelper#get_search_title, you'll see the > logic used to pick the title to search by for the IA service. My > assumption was that the last elsif there that picked rft.title would > be safe, but I see in your case that's part of where things went > wrong. I assumed that something with format journal wouldn't have a > rft.title. I need to think through many more of the possibilities and > combinations to sure that method up some. > > So besides the problem of searching for a title even without a > creator, the logic of metadata_helper wasn't as good as it needs to > be. > > Jason > -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From rochkind at jhu.edu Thu Jun 26 15:39:26 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Thu, 26 Jun 2008 15:39:26 -0400 Subject: [Umlaut-general] clean up "see also" section? Message-ID: <4863F06E.6050406@jhu.edu> As we add more stuff to the "See also" section it's getting kind of hard to read. Here's one idea for making it more clear, automatically including the "source" in greyed out text, instead of including it in the title/hyperlink of the individual response, which had become a pattern. What do you think? Compare: now: http://testbox.mse.jhu.edu/rochkind/umlaut_now.png suggested: http://testbox.mse.jhu.edu/rochkind/umlaut_suggested.png Improvement? Does it also need actual bullet markers of some kind in addition? [This is a screen shot of the development version on my computer, I've already made these changes in code to see how it looked, twas easy]. Jonathan -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From jronallo at gmail.com Thu Jun 26 16:05:39 2008 From: jronallo at gmail.com (Jason Ronallo) Date: Thu, 26 Jun 2008 16:05:39 -0400 Subject: [Umlaut-general] clean up "see also" section? In-Reply-To: <4863F06E.6050406@jhu.edu> References: <4863F06E.6050406@jhu.edu> Message-ID: <763570460806261305g1247690em7e26d9604bedb2a0@mail.gmail.com> I do like being able to scan just a few words for each entry and get the important part. Looks sharper. I might make the text a little darker for accessibility but as long as that choice is made in CSS do what works for your institution. I don't think I'd use bullets; but that's aesthetics here for me and not usability. Maybe a bit more spacing between each of the links, though, if you want nitpicks. Jason On Thu, Jun 26, 2008 at 3:39 PM, Jonathan Rochkind wrote: > As we add more stuff to the "See also" section it's getting kind of hard to > read. Here's one idea for making it more clear, automatically including the > "source" in greyed out text, instead of including it in the title/hyperlink > of the individual response, which had become a pattern. > > What do you think? Compare: > > now: http://testbox.mse.jhu.edu/rochkind/umlaut_now.png > > suggested: http://testbox.mse.jhu.edu/rochkind/umlaut_suggested.png > > Improvement? Does it also need actual bullet markers of some kind in > addition? [This is a screen shot of the development version on my computer, > I've already made these changes in code to see how it looked, twas easy]. > > Jonathan > > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > From rochkind at jhu.edu Thu Jun 26 17:40:28 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Thu, 26 Jun 2008 17:40:28 -0400 Subject: [Umlaut-general] search inside the book Message-ID: <48640CCC.8060500@jhu.edu> I meant to write you guys an essay about where to store xISBN and why, but instead I just hacked in a sample search-inside-the-book service that puts a search box right on the page. And did a simple screen scrape of Amazon to do it for Amazon. Plus for GBS. Plus emailed Alexis to see if there's a way to generate a search-inside query in a url to IA flipbooks. It's SWEET. I'll commit it to a branch tommorow hopefully. It's got some pecularities--like becuase of the problem of ajax updating content in a div that includes a textbox, without interrupting the user, no search-inside textbox is shown until ALL possible search-inside sources are done being checked. Tolerable I think. Jonathan Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From jronallo at gmail.com Thu Jun 26 22:04:27 2008 From: jronallo at gmail.com (Jason Ronallo) Date: Thu, 26 Jun 2008 22:04:27 -0400 Subject: [Umlaut-general] search inside the book In-Reply-To: <48640CCC.8060500@jhu.edu> References: <48640CCC.8060500@jhu.edu> Message-ID: <763570460806261904r7274baehf5d7bfe2a55a0b54@mail.gmail.com> Awesome. Are you unsure about it? Is that why you're going to commit to a branch? If you're happy with it why not just commit it to trunk? Is it any sort of bigger architectural change? I just made a fairly large commit to GBS making some of the changes you've suggested. Hopefully none of those conflict with what you've done in there. Jason On Thu, Jun 26, 2008 at 5:40 PM, Jonathan Rochkind wrote: > I meant to write you guys an essay about where to store xISBN and why, but > instead I just hacked in a sample search-inside-the-book service that puts a > search box right on the page. And did a simple screen scrape of Amazon to do > it for Amazon. Plus for GBS. Plus emailed Alexis to see if there's a way to > generate a search-inside query in a url to IA flipbooks. > > It's SWEET. I'll commit it to a branch tommorow hopefully. It's got some > pecularities--like becuase of the problem of ajax updating content in a div > that includes a textbox, without interrupting the user, no search-inside > textbox is shown until ALL possible search-inside sources are done being > checked. Tolerable I think. > > Jonathan > > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > From rochkind at jhu.edu Fri Jun 27 09:43:55 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Fri, 27 Jun 2008 09:43:55 -0400 Subject: [Umlaut-general] search inside the book In-Reply-To: <763570460806261904r7274baehf5d7bfe2a55a0b54@mail.gmail.com> References: <48640CCC.8060500@jhu.edu> <763570460806261904r7274baehf5d7bfe2a55a0b54@mail.gmail.com> Message-ID: <4864EE9B.5040304@jhu.edu> Yeah, I'm not sure about it yet. And I have been in the bad habit of updating right from trunk SVN to my production copy. Which works when it's a bug fix that I'm getting. But when I'm getting some architectural changes instead, right before I'm about to go on vacation for two weeks--I could just not update before I go on vacation, but then, what if there's a bug fix I need? Clearly, this indicates the need for a more professional/best-practice repository use. Possibly moving to actually tagging and branching particular versions, etc. Blah. Jonathan Jason Ronallo wrote: > Awesome. Are you unsure about it? Is that why you're going to commit > to a branch? If you're happy with it why not just commit it to trunk? > Is it any sort of bigger architectural change? > > I just made a fairly large commit to GBS making some of the changes > you've suggested. Hopefully none of those conflict with what you've > done in there. > > Jason > > On Thu, Jun 26, 2008 at 5:40 PM, Jonathan Rochkind wrote: > >> I meant to write you guys an essay about where to store xISBN and why, but >> instead I just hacked in a sample search-inside-the-book service that puts a >> search box right on the page. And did a simple screen scrape of Amazon to do >> it for Amazon. Plus for GBS. Plus emailed Alexis to see if there's a way to >> generate a search-inside query in a url to IA flipbooks. >> >> It's SWEET. I'll commit it to a branch tommorow hopefully. It's got some >> pecularities--like becuase of the problem of ajax updating content in a div >> that includes a textbox, without interrupting the user, no search-inside >> textbox is shown until ALL possible search-inside sources are done being >> checked. Tolerable I think. >> >> Jonathan >> >> Jonathan Rochkind >> Digital Services Software Engineer >> The Sheridan Libraries >> Johns Hopkins University >> 410.516.8886 rochkind (at) jhu.edu >> >> _______________________________________________ >> Umlaut-general mailing list >> Umlaut-general at rubyforge.org >> http://rubyforge.org/mailman/listinfo/umlaut-general >> >> -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From rochkind at jhu.edu Fri Jun 27 10:39:32 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Fri, 27 Jun 2008 10:39:32 -0400 Subject: [Umlaut-general] xISBN, alternate version, Referent Message-ID: <4864FBA4.2070409@jhu.edu> Okay, here are my thoughts on how to deal with alternate version info getting/consuming services (like xISBN). Initial Postulates: 1) There will be multiple service (adaptors) that _produce_ alternate version information (eg, xISBN, thingISBN, other?), and there will be multiple service (adaptors) that _consume_ alternate version information (many). 2) Alternate version info consists of diverse citation information, not just ISBN (more on this later). So here's my thought process. Because of #1, we need to put alternate version info (and originally I was just thinking ISBN from xISBN here) in a standard location. Initially, I thought of two possibilities. It could be in a service_response----constructed according to standardized convention, so all producers will put it in the same place and consumers can look for it there. Say, a response with a certain ServiceType ("more_isbns"), and the isbn in the "key". Or, quite similarly really, it could instead be hanging off the Referent object, again in a standardized place. Say, we could create a new table, << Referent has_many :more_isbns >> or something. Really these two solutions are basically the same thing, although I slightly lean toward putting things off the Referent model when they are really additional citation data for use by services, rather than a service link to present to the user directly. But either one could do the same thing. But here's the issue, I was discussing it just there as if it were just "isbns", but really alternate version info isn't limited to ISBNs. xISBN for one already returns oclcnum and lccn in addition to ISBN. And returns them in units. xISBN might say "Here's one alt version I know of, it has oclcnum A and isbn B; here's another I know of, with LCCN X and ISBN Y". So already, we have a need to keep track of several possible alternate versions, each with several pieces of information. And that information isn't neccesarily limited to identifiers---maybe we'd want to look up the edition statement for each one, and record that, so we could tell the user when presenting them alternate versions in certain contexts. So we need to keep track of diverse data for an alternate version, not all of which we neccesarily are thinking of now, we might have even more later. So we need to create a data structure to hold such data, such that each Referent can have multiple of those data structures. We certainly _could_ create such a data structure and hang it off Referent, or stuff it in a ServiceResponse somehow. But then it struck me: We already HAVE such a data structure: The Referent model itself! So why not create a new Referent object for each alternate version, and then create a reflexive (aka "self referential") association on Referent, so each "primary" referent can be related to 0-to-many "alternate version" Referents? Referent has_many :alternate_versions, :class_name=>Referent, :foreign_key=>is_alternate_version_id Referent belongs_to :is_alternate_version_of :class_name=>Referent, :foreign_key=>is_alternate_version_id Now you get an alternate version from xISBN? You just create a Referent for it, and then attach it to the original. co = ContextObject.new co.referent.set_metadata("isbn", some_isbn) co.referent.set_metadata("lccn", some_lccn) alt_version = Referent.new_from_context_object( co ) orig_referent.alternate_versions << alt_version Ta-da. Another service wants to check to see what alternate version info is available, no problem, just iterate through request.referent.alternate_versions. We could make some helper methods for that. I can think of a few issues that might come up, but I'll let this lengthy email digest for a while before discussing those. :) Let me know, Ross and Jason, if you actually read this, and what you think. Jonathan -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From rossfsinger at gmail.com Fri Jun 27 11:42:10 2008 From: rossfsinger at gmail.com (Ross Singer) Date: Fri, 27 Jun 2008 11:42:10 -0400 Subject: [Umlaut-general] xISBN, alternate version, Referent In-Reply-To: <4864FBA4.2070409@jhu.edu> References: <4864FBA4.2070409@jhu.edu> Message-ID: <23b83f160806270842x71769924i5b46b54a0b2ba2c6@mail.gmail.com> Would Amazon's similar items or recommendations work the same way? -Ross. On Fri, Jun 27, 2008 at 10:39 AM, Jonathan Rochkind wrote: > Okay, here are my thoughts on how to deal with alternate version info > getting/consuming services (like xISBN). > > Initial Postulates: > > 1) There will be multiple service (adaptors) that _produce_ alternate > version information (eg, xISBN, thingISBN, other?), and there will be > multiple service (adaptors) that _consume_ alternate version information > (many). > > 2) Alternate version info consists of diverse citation information, not just > ISBN (more on this later). > > > > So here's my thought process. > > Because of #1, we need to put alternate version info (and originally I was > just thinking ISBN from xISBN here) in a standard location. Initially, I > thought of two possibilities. It could be in a > service_response----constructed according to standardized convention, so all > producers will put it in the same place and consumers can look for it there. > Say, a response with a certain ServiceType ("more_isbns"), and the isbn in > the "key". > Or, quite similarly really, it could instead be hanging off the Referent > object, again in a standardized place. Say, we could create a new table, << > Referent has_many :more_isbns >> or something. > > Really these two solutions are basically the same thing, although I slightly > lean toward putting things off the Referent model when they are really > additional citation data for use by services, rather than a service link to > present to the user directly. But either one could do the same thing. > > But here's the issue, I was discussing it just there as if it were just > "isbns", but really alternate version info isn't limited to ISBNs. xISBN > for one already returns oclcnum and lccn in addition to ISBN. And returns > them in units. xISBN might say "Here's one alt version I know of, it has > oclcnum A and isbn B; here's another I know of, with LCCN X and ISBN Y". > So already, we have a need to keep track of several possible alternate > versions, each with several pieces of information. And that information > isn't neccesarily limited to identifiers---maybe we'd want to look up the > edition statement for each one, and record that, so we could tell the user > when presenting them alternate versions in certain contexts. > > So we need to keep track of diverse data for an alternate version, not all > of which we neccesarily are thinking of now, we might have even more later. > So we need to create a data structure to hold such data, such that each > Referent can have multiple of those data structures. We certainly _could_ > create such a data structure and hang it off Referent, or stuff it in a > ServiceResponse somehow. > > But then it struck me: We already HAVE such a data structure: The Referent > model itself! > > So why not create a new Referent object for each alternate version, and then > create a reflexive (aka "self referential") association on Referent, so each > "primary" referent can be related to 0-to-many "alternate version" > Referents? > > Referent has_many :alternate_versions, > :class_name=>Referent, :foreign_key=>is_alternate_version_id > Referent belongs_to :is_alternate_version_of > :class_name=>Referent, :foreign_key=>is_alternate_version_id > > > Now you get an alternate version from xISBN? You just create a Referent for > it, and then attach it to the original. > > co = ContextObject.new > co.referent.set_metadata("isbn", some_isbn) > co.referent.set_metadata("lccn", some_lccn) > > alt_version = Referent.new_from_context_object( co ) > orig_referent.alternate_versions << alt_version > > Ta-da. Another service wants to check to see what alternate version info is > available, no problem, just iterate through > request.referent.alternate_versions. We could make some helper methods for > that. > > I can think of a few issues that might come up, but I'll let this lengthy > email digest for a while before discussing those. :) Let me know, Ross and > Jason, if you actually read this, and what you think. > > Jonathan > > > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > From rochkind at jhu.edu Fri Jun 27 11:56:04 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Fri, 27 Jun 2008 11:56:04 -0400 Subject: [Umlaut-general] xISBN, alternate version, Referent In-Reply-To: <23b83f160806270842x71769924i5b46b54a0b2ba2c6@mail.gmail.com> References: <4864FBA4.2070409@jhu.edu> <23b83f160806270842x71769924i5b46b54a0b2ba2c6@mail.gmail.com> Message-ID: <48650D94.9070508@jhu.edu> Hmm, good question. I'm not sure. Maybe? Do you have an opinion? Jonathan Ross Singer wrote: > Would Amazon's similar items or recommendations work the same way? > > -Ross. > > On Fri, Jun 27, 2008 at 10:39 AM, Jonathan Rochkind wrote: > >> Okay, here are my thoughts on how to deal with alternate version info >> getting/consuming services (like xISBN). >> >> Initial Postulates: >> >> 1) There will be multiple service (adaptors) that _produce_ alternate >> version information (eg, xISBN, thingISBN, other?), and there will be >> multiple service (adaptors) that _consume_ alternate version information >> (many). >> >> 2) Alternate version info consists of diverse citation information, not just >> ISBN (more on this later). >> >> >> >> So here's my thought process. >> >> Because of #1, we need to put alternate version info (and originally I was >> just thinking ISBN from xISBN here) in a standard location. Initially, I >> thought of two possibilities. It could be in a >> service_response----constructed according to standardized convention, so all >> producers will put it in the same place and consumers can look for it there. >> Say, a response with a certain ServiceType ("more_isbns"), and the isbn in >> the "key". >> Or, quite similarly really, it could instead be hanging off the Referent >> object, again in a standardized place. Say, we could create a new table, << >> Referent has_many :more_isbns >> or something. >> >> Really these two solutions are basically the same thing, although I slightly >> lean toward putting things off the Referent model when they are really >> additional citation data for use by services, rather than a service link to >> present to the user directly. But either one could do the same thing. >> >> But here's the issue, I was discussing it just there as if it were just >> "isbns", but really alternate version info isn't limited to ISBNs. xISBN >> for one already returns oclcnum and lccn in addition to ISBN. And returns >> them in units. xISBN might say "Here's one alt version I know of, it has >> oclcnum A and isbn B; here's another I know of, with LCCN X and ISBN Y". >> So already, we have a need to keep track of several possible alternate >> versions, each with several pieces of information. And that information >> isn't neccesarily limited to identifiers---maybe we'd want to look up the >> edition statement for each one, and record that, so we could tell the user >> when presenting them alternate versions in certain contexts. >> >> So we need to keep track of diverse data for an alternate version, not all >> of which we neccesarily are thinking of now, we might have even more later. >> So we need to create a data structure to hold such data, such that each >> Referent can have multiple of those data structures. We certainly _could_ >> create such a data structure and hang it off Referent, or stuff it in a >> ServiceResponse somehow. >> >> But then it struck me: We already HAVE such a data structure: The Referent >> model itself! >> >> So why not create a new Referent object for each alternate version, and then >> create a reflexive (aka "self referential") association on Referent, so each >> "primary" referent can be related to 0-to-many "alternate version" >> Referents? >> >> Referent has_many :alternate_versions, >> :class_name=>Referent, :foreign_key=>is_alternate_version_id >> Referent belongs_to :is_alternate_version_of >> :class_name=>Referent, :foreign_key=>is_alternate_version_id >> >> >> Now you get an alternate version from xISBN? You just create a Referent for >> it, and then attach it to the original. >> >> co = ContextObject.new >> co.referent.set_metadata("isbn", some_isbn) >> co.referent.set_metadata("lccn", some_lccn) >> >> alt_version = Referent.new_from_context_object( co ) >> orig_referent.alternate_versions << alt_version >> >> Ta-da. Another service wants to check to see what alternate version info is >> available, no problem, just iterate through >> request.referent.alternate_versions. We could make some helper methods for >> that. >> >> I can think of a few issues that might come up, but I'll let this lengthy >> email digest for a while before discussing those. :) Let me know, Ross and >> Jason, if you actually read this, and what you think. >> >> Jonathan >> >> >> -- >> Jonathan Rochkind >> Digital Services Software Engineer >> The Sheridan Libraries >> Johns Hopkins University >> 410.516.8886 rochkind (at) jhu.edu >> >> _______________________________________________ >> Umlaut-general mailing list >> Umlaut-general at rubyforge.org >> http://rubyforge.org/mailman/listinfo/umlaut-general >> >> -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From rochkind at jhu.edu Fri Jun 27 15:19:09 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Fri, 27 Jun 2008 15:19:09 -0400 Subject: [Umlaut-general] search-inside box Message-ID: <48653D2D.7040403@jhu.edu> I've made a branch for my efforts to put a search-inside-the-book box on the link resolver menu. svn+ssh://jrochkind at rubyforge.org/var/svn/umlaut/branches/search-inside-branch In addition to just more testing, there are two things I know I want to do here: 1) Split the Amazon service adaptor in two, so an Amazon-metadata adaptor can run very quickly in the foreground, and then I can let the second half---finding Search Inside or Look Inside availability--run as a background service. 2) There's this service adaptor API url response_url(), which was written to take a ServiceResponse. Turns out, to do what needs doing here, the parameters to that method should really be a ServiceType, as well as the array of HTTP request query parameters. I have some hacks in there now to work around that, but think really ought to be refactored and all service adaptors changed as needed. -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From rochkind at jhu.edu Mon Jun 30 13:27:34 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Mon, 30 Jun 2008 13:27:34 -0400 Subject: [Umlaut-general] storing something weird in referent Message-ID: <48691786.7030808@jhu.edu> If I want to store something in Referent/ReferentValue that isn't really part of an openURL at all.... looks like I safely do that by setting "metadata" on the referent_value to false... and also set private_data to true? Not sure what private_data is for. Do you remember, Ross? I'm using this to store an asin, to let the amazon service work in waves, so I can make the AWS call in the foreground, but not do the slower HTTP request/scrape until a later wave, using the stored AWS. referent.enhance_metadata("asin", asin, false, true) => would be metadata=>false, private_data=>true. Can you confirm from your memory whether this is a sane thing to do, ross? Not entirely sure what private_data is intended for. Wonder if just setting metadata to false, but not using private_data is sufficient to avoid confusing Umlaut into thinking this is actually an OpenURL referent value? Jonathan -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From rossfsinger at gmail.com Mon Jun 30 15:30:27 2008 From: rossfsinger at gmail.com (Ross Singer) Date: Mon, 30 Jun 2008 15:30:27 -0400 Subject: [Umlaut-general] storing something weird in referent In-Reply-To: <48691786.7030808@jhu.edu> References: <48691786.7030808@jhu.edu> Message-ID: <23b83f160806301230v437069f8o56360a2d76c58434@mail.gmail.com> Hey, we're replying to the list now by default. Nice. Ok, I'm going to lead this off with: I don't really like the idea of shoving data into the referent that isn't intended to be part of the referent. The expectation was that you could serialize the referent from that data without having to know what the actual rows meant. Private data is part of the OpenURL spec. Every ContextObject entity can have private data, which are key-value pairs that are meant for the target's eyes only (i.e., don't reserialize). I don't know of any instances of this actually being used. If you're using the Amazon ServiceAdapter, why can't you just use the data it stored in service_responses? -Ross. On Mon, Jun 30, 2008 at 1:27 PM, Jonathan Rochkind wrote: > If I want to store something in Referent/ReferentValue that isn't really > part of an openURL at all.... > > looks like I safely do that by setting "metadata" on the referent_value to > false... and also set private_data to true? Not sure what private_data is > for. Do you remember, Ross? > > I'm using this to store an asin, to let the amazon service work in waves, so > I can make the AWS call in the foreground, but not do the slower HTTP > request/scrape until a later wave, using the stored AWS. > > referent.enhance_metadata("asin", asin, false, true) => would be > metadata=>false, private_data=>true. > > Can you confirm from your memory whether this is a sane thing to do, ross? > Not entirely sure what private_data is intended for. Wonder if just setting > metadata to false, but not using private_data is sufficient to avoid > confusing Umlaut into thinking this is actually an OpenURL referent value? > > Jonathan > > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > From rochkind at jhu.edu Mon Jun 30 15:46:33 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Mon, 30 Jun 2008 15:46:33 -0400 Subject: [Umlaut-general] storing something weird in referent In-Reply-To: <23b83f160806301230v437069f8o56360a2d76c58434@mail.gmail.com> References: <48691786.7030808@jhu.edu> <23b83f160806301230v437069f8o56360a2d76c58434@mail.gmail.com> Message-ID: <48693819.9060607@jhu.edu> Well, you know how we all kept saying, gee, it's kind of too bad that the data object Umlaut uses to store a citation is tied so tightly to the OpenURL spec, it really ought to be a more generic "Umlaut Citation data" object? I think I've talked with you about that before Ross, and I think I even read you saying something about it in a recent thread. Well, a while ago I realized---okay, if we prefer this to be so, then why not just MAKE it so? Who says that Umlaut's Referent has to be so tightly tied to OpenURL Z39.88? I say it doens't, and the Referent model object is just Umlaut's representation of citation metadata, and is welcome to have things in it that are not part of Z39.88 or do not directly relate to it. It also of course has a lot of stuff that is from Z39.88--it makes sense that Umlaut's internal citation data model would be based off of Z39.88, because of the nature of Umlaut, but it doesn't need to stop there. Of course, it is still important that you can tell the difference between data that is part of an OpenURL (so OpenURLs can be generated from the Referent), and data that is just internal Umlaut data that should not be included in an OpenURL. Thanks for clearing up that private_data is in fact part of Z39.88, so by putting stuff there I'm not really doing what I thought I was doing. (Still confused that I thought OpenURL private_data was a single string, non-repeatable, but you've got the internal representation being repeatable with a key? Maybe I misunderstand OpenURL private data?). Anyhow, so the original question is: Why not just put it in a service response? I could. It still seems to me to make sense to distinguish between Referent and ServiceResponse based on the idea that Referent is where we store metadata about the citation (like an asin), and an individual ServiceResponse shouldn't just be a random tiny piece of data (like an asin) but should generally be a, well, service that is going to be presented to the user that the user can click on or whatever. For instance, when the Amazon service realizes that it knows the title of this citation that was only given as an ISBN in the OpenURL request---the service adds it to the Referent. It doesn't make a ServiceResponse that has nothing in it by a title. It would be a lot harder to work with if it did. To me, it makes sense to put an asin in the Referent too, even though it's nto part of the OpenURL. But I guess not the way I was thinking of doing it. Hmm. I could create an extra column in Referent for asin? Jonathan Ross Singer wrote: > Hey, we're replying to the list now by default. Nice. > > Ok, I'm going to lead this off with: I don't really like the idea of > shoving data into the referent that isn't intended to be part of the > referent. The expectation was that you could serialize the referent > from that data without having to know what the actual rows meant. > > Private data is part of the OpenURL spec. Every ContextObject entity > can have private data, which are key-value pairs that are meant for > the target's eyes only (i.e., don't reserialize). I don't know of any > instances of this actually being used. > > If you're using the Amazon ServiceAdapter, why can't you just use the > data it stored in service_responses? > > -Ross. > > On Mon, Jun 30, 2008 at 1:27 PM, Jonathan Rochkind wrote: > >> If I want to store something in Referent/ReferentValue that isn't really >> part of an openURL at all.... >> >> looks like I safely do that by setting "metadata" on the referent_value to >> false... and also set private_data to true? Not sure what private_data is >> for. Do you remember, Ross? >> >> I'm using this to store an asin, to let the amazon service work in waves, so >> I can make the AWS call in the foreground, but not do the slower HTTP >> request/scrape until a later wave, using the stored AWS. >> >> referent.enhance_metadata("asin", asin, false, true) => would be >> metadata=>false, private_data=>true. >> >> Can you confirm from your memory whether this is a sane thing to do, ross? >> Not entirely sure what private_data is intended for. Wonder if just setting >> metadata to false, but not using private_data is sufficient to avoid >> confusing Umlaut into thinking this is actually an OpenURL referent value? >> >> Jonathan >> >> -- >> Jonathan Rochkind >> Digital Services Software Engineer >> The Sheridan Libraries >> Johns Hopkins University >> 410.516.8886 rochkind (at) jhu.edu >> >> _______________________________________________ >> Umlaut-general mailing list >> Umlaut-general at rubyforge.org >> http://rubyforge.org/mailman/listinfo/umlaut-general >> >> > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From rossfsinger at gmail.com Mon Jun 30 16:02:35 2008 From: rossfsinger at gmail.com (Ross Singer) Date: Mon, 30 Jun 2008 16:02:35 -0400 Subject: [Umlaut-general] storing something weird in referent In-Reply-To: <48693819.9060607@jhu.edu> References: <48691786.7030808@jhu.edu> <23b83f160806301230v437069f8o56360a2d76c58434@mail.gmail.com> <48693819.9060607@jhu.edu> Message-ID: <23b83f160806301302m2e342834jbe3b8f2af46d34a4@mail.gmail.com> Yes, we discussed detaching the data model from Z39.88. However, it's not currently detached. So what we would be doing is shoehorning non-referent data into compartments designed for referent data. In my mind this is similar to what we've done with MARC, in the 9xx tags: shoving local data in there that doesn't make any sense out of context and probably wasn't the best place to put it in the first place, but hey, you work with the tools you have. However, in our case, the tool can be modified to do our bidding. In your example, you're not taking the title and shoving it into a service response because it makes more sense for you to enhance the referent's metadata with it, since it fits, legally, into the referent. The same can apply for ASIN, if you use rft_id=URN:ASIN:123456789X that would be perfectly fine and can be passed on, legally, in the referent. -Ross. On Mon, Jun 30, 2008 at 3:46 PM, Jonathan Rochkind wrote: > Well, you know how we all kept saying, gee, it's kind of too bad that the > data object Umlaut uses to store a citation is tied so tightly to the > OpenURL spec, it really ought to be a more generic "Umlaut Citation data" > object? I think I've talked with you about that before Ross, and I think I > even read you saying something about it in a recent thread. > > Well, a while ago I realized---okay, if we prefer this to be so, then why > not just MAKE it so? Who says that Umlaut's Referent has to be so tightly > tied to OpenURL Z39.88? I say it doens't, and the Referent model object is > just Umlaut's representation of citation metadata, and is welcome to have > things in it that are not part of Z39.88 or do not directly relate to it. It > also of course has a lot of stuff that is from Z39.88--it makes sense that > Umlaut's internal citation data model would be based off of Z39.88, because > of the nature of Umlaut, but it doesn't need to stop there. > > Of course, it is still important that you can tell the difference between > data that is part of an OpenURL (so OpenURLs can be generated from the > Referent), and data that is just internal Umlaut data that should not be > included in an OpenURL. Thanks for clearing up that private_data is in fact > part of Z39.88, so by putting stuff there I'm not really doing what I > thought I was doing. (Still confused that I thought OpenURL private_data was > a single string, non-repeatable, but you've got the internal representation > being repeatable with a key? Maybe I misunderstand OpenURL private data?). > > Anyhow, so the original question is: Why not just put it in a service > response? I could. It still seems to me to make sense to distinguish > between Referent and ServiceResponse based on the idea that Referent is > where we store metadata about the citation (like an asin), and an individual > ServiceResponse shouldn't just be a random tiny piece of data (like an asin) > but should generally be a, well, service that is going to be presented to > the user that the user can click on or whatever. > > For instance, when the Amazon service realizes that it knows the title of > this citation that was only given as an ISBN in the OpenURL request---the > service adds it to the Referent. It doesn't make a ServiceResponse that has > nothing in it by a title. It would be a lot harder to work with if it did. > To me, it makes sense to put an asin in the Referent too, even though it's > nto part of the OpenURL. > > But I guess not the way I was thinking of doing it. Hmm. I could create an > extra column in Referent for asin? > > Jonathan > > Ross Singer wrote: >> >> Hey, we're replying to the list now by default. Nice. >> >> Ok, I'm going to lead this off with: I don't really like the idea of >> shoving data into the referent that isn't intended to be part of the >> referent. The expectation was that you could serialize the referent >> from that data without having to know what the actual rows meant. >> >> Private data is part of the OpenURL spec. Every ContextObject entity >> can have private data, which are key-value pairs that are meant for >> the target's eyes only (i.e., don't reserialize). I don't know of any >> instances of this actually being used. >> >> If you're using the Amazon ServiceAdapter, why can't you just use the >> data it stored in service_responses? >> >> -Ross. >> >> On Mon, Jun 30, 2008 at 1:27 PM, Jonathan Rochkind >> wrote: >> >>> >>> If I want to store something in Referent/ReferentValue that isn't really >>> part of an openURL at all.... >>> >>> looks like I safely do that by setting "metadata" on the referent_value >>> to >>> false... and also set private_data to true? Not sure what private_data >>> is >>> for. Do you remember, Ross? >>> >>> I'm using this to store an asin, to let the amazon service work in waves, >>> so >>> I can make the AWS call in the foreground, but not do the slower HTTP >>> request/scrape until a later wave, using the stored AWS. >>> >>> referent.enhance_metadata("asin", asin, false, true) => would be >>> metadata=>false, private_data=>true. >>> >>> Can you confirm from your memory whether this is a sane thing to do, >>> ross? >>> Not entirely sure what private_data is intended for. Wonder if just >>> setting >>> metadata to false, but not using private_data is sufficient to avoid >>> confusing Umlaut into thinking this is actually an OpenURL referent >>> value? >>> >>> Jonathan >>> >>> -- >>> Jonathan Rochkind >>> Digital Services Software Engineer >>> The Sheridan Libraries >>> Johns Hopkins University >>> 410.516.8886 rochkind (at) jhu.edu >>> >>> _______________________________________________ >>> Umlaut-general mailing list >>> Umlaut-general at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/umlaut-general >>> >>> >> >> _______________________________________________ >> Umlaut-general mailing list >> Umlaut-general at rubyforge.org >> http://rubyforge.org/mailman/listinfo/umlaut-general >> > > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > From rochkind at jhu.edu Mon Jun 30 17:14:27 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Mon, 30 Jun 2008 17:14:27 -0400 Subject: [Umlaut-general] storing something weird in referent In-Reply-To: <23b83f160806301302m2e342834jbe3b8f2af46d34a4@mail.gmail.com> References: <48691786.7030808@jhu.edu> <23b83f160806301230v437069f8o56360a2d76c58434@mail.gmail.com> <48693819.9060607@jhu.edu> <23b83f160806301302m2e342834jbe3b8f2af46d34a4@mail.gmail.com> Message-ID: <48694CB3.8010106@jhu.edu> Beautiful solution, thanks! I actually was wondering if I was allowed to put any legal URI/URN in the rft_id, or only certain ones. But any one will do, perhaps? Except now I realize I really could use the Amazon item URL too, not just the asin. Although I may not need it. I'll keep messing with it. I agree with you that my original proposed solution was an ugly shoehorning; I didn't understand what was going on. However, to me, creating a new column in Referent itself would not be a shoehorning, so long as it was a new column which clearly held non-z39.88 data. Equally acceptable to me would be if we made another flag in ReferentValue for non-z3988 or what have you. If just the fact that the data is still called Referent is what bothers you, imagine that it were changed to Citation and CitationValue instead of Referent and ReferentValue. It would just so happen that many of the CitationValues (those with the z3988 field set to true for instance) had 1-to-1 correspondence with z39.88 values. To me, that would be perfectly acceptable and not the ugly shoehorning that my original suggestion indeed was. But for now, I don't think that will be neccesary. And certainly putting things in ServiceResponse is another option. I briefly contemplated the idea of "metadata" Service Type Value, that would just be an arbitrary key and a value. Functioning much like ReferentValue does. But I still think this would be a bit confusing. Plus, Referent and ServiceResponse have different cacheing semantics. At present, Referent can, at least theoretically, be used between any requests that target the same citation. ServiceResponses are, at present, not re-used between Requests, and Requests are not re-used between sessions. Jonathan Ross Singer wrote: > Yes, we discussed detaching the data model from Z39.88. However, it's > not currently detached. So what we would be doing is shoehorning > non-referent data into compartments designed for referent data. In my > mind this is similar to what we've done with MARC, in the 9xx tags: > shoving local data in there that doesn't make any sense out of context > and probably wasn't the best place to put it in the first place, but > hey, you work with the tools you have. > > However, in our case, the tool can be modified to do our bidding. > > In your example, you're not taking the title and shoving it into a > service response because it makes more sense for you to enhance the > referent's metadata with it, since it fits, legally, into the > referent. > > The same can apply for ASIN, if you use rft_id=URN:ASIN:123456789X > that would be perfectly fine and can be passed on, legally, in the > referent. > > -Ross. > > On Mon, Jun 30, 2008 at 3:46 PM, Jonathan Rochkind wrote: > >> Well, you know how we all kept saying, gee, it's kind of too bad that the >> data object Umlaut uses to store a citation is tied so tightly to the >> OpenURL spec, it really ought to be a more generic "Umlaut Citation data" >> object? I think I've talked with you about that before Ross, and I think I >> even read you saying something about it in a recent thread. >> >> Well, a while ago I realized---okay, if we prefer this to be so, then why >> not just MAKE it so? Who says that Umlaut's Referent has to be so tightly >> tied to OpenURL Z39.88? I say it doens't, and the Referent model object is >> just Umlaut's representation of citation metadata, and is welcome to have >> things in it that are not part of Z39.88 or do not directly relate to it. It >> also of course has a lot of stuff that is from Z39.88--it makes sense that >> Umlaut's internal citation data model would be based off of Z39.88, because >> of the nature of Umlaut, but it doesn't need to stop there. >> >> Of course, it is still important that you can tell the difference between >> data that is part of an OpenURL (so OpenURLs can be generated from the >> Referent), and data that is just internal Umlaut data that should not be >> included in an OpenURL. Thanks for clearing up that private_data is in fact >> part of Z39.88, so by putting stuff there I'm not really doing what I >> thought I was doing. (Still confused that I thought OpenURL private_data was >> a single string, non-repeatable, but you've got the internal representation >> being repeatable with a key? Maybe I misunderstand OpenURL private data?). >> >> Anyhow, so the original question is: Why not just put it in a service >> response? I could. It still seems to me to make sense to distinguish >> between Referent and ServiceResponse based on the idea that Referent is >> where we store metadata about the citation (like an asin), and an individual >> ServiceResponse shouldn't just be a random tiny piece of data (like an asin) >> but should generally be a, well, service that is going to be presented to >> the user that the user can click on or whatever. >> >> For instance, when the Amazon service realizes that it knows the title of >> this citation that was only given as an ISBN in the OpenURL request---the >> service adds it to the Referent. It doesn't make a ServiceResponse that has >> nothing in it by a title. It would be a lot harder to work with if it did. >> To me, it makes sense to put an asin in the Referent too, even though it's >> nto part of the OpenURL. >> >> But I guess not the way I was thinking of doing it. Hmm. I could create an >> extra column in Referent for asin? >> >> Jonathan >> >> Ross Singer wrote: >> >>> Hey, we're replying to the list now by default. Nice. >>> >>> Ok, I'm going to lead this off with: I don't really like the idea of >>> shoving data into the referent that isn't intended to be part of the >>> referent. The expectation was that you could serialize the referent >>> from that data without having to know what the actual rows meant. >>> >>> Private data is part of the OpenURL spec. Every ContextObject entity >>> can have private data, which are key-value pairs that are meant for >>> the target's eyes only (i.e., don't reserialize). I don't know of any >>> instances of this actually being used. >>> >>> If you're using the Amazon ServiceAdapter, why can't you just use the >>> data it stored in service_responses? >>> >>> -Ross. >>> >>> On Mon, Jun 30, 2008 at 1:27 PM, Jonathan Rochkind >>> wrote: >>> >>> >>>> If I want to store something in Referent/ReferentValue that isn't really >>>> part of an openURL at all.... >>>> >>>> looks like I safely do that by setting "metadata" on the referent_value >>>> to >>>> false... and also set private_data to true? Not sure what private_data >>>> is >>>> for. Do you remember, Ross? >>>> >>>> I'm using this to store an asin, to let the amazon service work in waves, >>>> so >>>> I can make the AWS call in the foreground, but not do the slower HTTP >>>> request/scrape until a later wave, using the stored AWS. >>>> >>>> referent.enhance_metadata("asin", asin, false, true) => would be >>>> metadata=>false, private_data=>true. >>>> >>>> Can you confirm from your memory whether this is a sane thing to do, >>>> ross? >>>> Not entirely sure what private_data is intended for. Wonder if just >>>> setting >>>> metadata to false, but not using private_data is sufficient to avoid >>>> confusing Umlaut into thinking this is actually an OpenURL referent >>>> value? >>>> >>>> Jonathan >>>> >>>> -- >>>> Jonathan Rochkind >>>> Digital Services Software Engineer >>>> The Sheridan Libraries >>>> Johns Hopkins University >>>> 410.516.8886 rochkind (at) jhu.edu >>>> >>>> _______________________________________________ >>>> Umlaut-general mailing list >>>> Umlaut-general at rubyforge.org >>>> http://rubyforge.org/mailman/listinfo/umlaut-general >>>> >>>> >>>> >>> _______________________________________________ >>> Umlaut-general mailing list >>> Umlaut-general at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/umlaut-general >>> >>> >> -- >> Jonathan Rochkind >> Digital Services Software Engineer >> The Sheridan Libraries >> Johns Hopkins University >> 410.516.8886 rochkind (at) jhu.edu >> >> _______________________________________________ >> Umlaut-general mailing list >> Umlaut-general at rubyforge.org >> http://rubyforge.org/mailman/listinfo/umlaut-general >> >> > _______________________________________________ > Umlaut-general mailing list > Umlaut-general at rubyforge.org > http://rubyforge.org/mailman/listinfo/umlaut-general > -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From rochkind at jhu.edu Mon Jun 30 17:22:44 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Mon, 30 Jun 2008 17:22:44 -0400 Subject: [Umlaut-general] storing something weird in referent In-Reply-To: <48694CB3.8010106@jhu.edu> References: <48691786.7030808@jhu.edu> <23b83f160806301230v437069f8o56360a2d76c58434@mail.gmail.com> <48693819.9060607@jhu.edu> <23b83f160806301302m2e342834jbe3b8f2af46d34a4@mail.gmail.com> <48694CB3.8010106@jhu.edu> Message-ID: <48694EA4.6060802@jhu.edu> I'm still a bit confused about the "private data" though, thinking that openurl private data was just a single blob of text string--but you have it stored as multiple key/value pairs. Can you give me an example of an openurl with multiple key/value pairs in private data, in a way that can be extracted without knowing anything more except that that its an openurl and it's format? I didn't think that could be done. I thought the private data (rft_dat right?) was just a single string value. But maybe I'm confused because it can only be done in the XML formats, not the KEV formats? Or because it is hypothetically possible in openurl, but not actually in SAP1/2? Or something? Jonathan Jonathan Rochkind wrote: > Beautiful solution, thanks! > > I actually was wondering if I was allowed to put any legal URI/URN in > the rft_id, or only certain ones. But any one will do, perhaps? > > Except now I realize I really could use the Amazon item URL too, not > just the asin. Although I may not need it. I'll keep messing with it. > > I agree with you that my original proposed solution was an ugly > shoehorning; I didn't understand what was going on. > > However, to me, creating a new column in Referent itself would not be > a shoehorning, so long as it was a new column which clearly held > non-z39.88 data. Equally acceptable to me would be if we made another > flag in ReferentValue for non-z3988 or what have you. If just the > fact that the data is still called Referent is what bothers you, > imagine that it were changed to Citation and CitationValue instead of > Referent and ReferentValue. It would just so happen that many of the > CitationValues (those with the z3988 field set to true for instance) > had 1-to-1 correspondence with z39.88 values. To me, that would be > perfectly acceptable and not the ugly shoehorning that my original > suggestion indeed was. > > But for now, I don't think that will be neccesary. > > And certainly putting things in ServiceResponse is another option. I > briefly contemplated the idea of "metadata" Service Type Value, that > would just be an arbitrary key and a value. Functioning much like > ReferentValue does. But I still think this would be a bit confusing. > Plus, Referent and ServiceResponse have different cacheing semantics. > At present, Referent can, at least theoretically, be used between any > requests that target the same citation. ServiceResponses are, at > present, not re-used between Requests, and Requests are not re-used > between sessions. > > Jonathan > > Ross Singer wrote: >> Yes, we discussed detaching the data model from Z39.88. However, it's >> not currently detached. So what we would be doing is shoehorning >> non-referent data into compartments designed for referent data. In my >> mind this is similar to what we've done with MARC, in the 9xx tags: >> shoving local data in there that doesn't make any sense out of context >> and probably wasn't the best place to put it in the first place, but >> hey, you work with the tools you have. >> >> However, in our case, the tool can be modified to do our bidding. >> >> In your example, you're not taking the title and shoving it into a >> service response because it makes more sense for you to enhance the >> referent's metadata with it, since it fits, legally, into the >> referent. >> >> The same can apply for ASIN, if you use rft_id=URN:ASIN:123456789X >> that would be perfectly fine and can be passed on, legally, in the >> referent. >> >> -Ross. >> >> On Mon, Jun 30, 2008 at 3:46 PM, Jonathan Rochkind >> wrote: >> >>> Well, you know how we all kept saying, gee, it's kind of too bad >>> that the >>> data object Umlaut uses to store a citation is tied so tightly to the >>> OpenURL spec, it really ought to be a more generic "Umlaut Citation >>> data" >>> object? I think I've talked with you about that before Ross, and I >>> think I >>> even read you saying something about it in a recent thread. >>> >>> Well, a while ago I realized---okay, if we prefer this to be so, >>> then why >>> not just MAKE it so? Who says that Umlaut's Referent has to be so >>> tightly >>> tied to OpenURL Z39.88? I say it doens't, and the Referent model >>> object is >>> just Umlaut's representation of citation metadata, and is welcome to >>> have >>> things in it that are not part of Z39.88 or do not directly relate >>> to it. It >>> also of course has a lot of stuff that is from Z39.88--it makes >>> sense that >>> Umlaut's internal citation data model would be based off of Z39.88, >>> because >>> of the nature of Umlaut, but it doesn't need to stop there. >>> >>> Of course, it is still important that you can tell the difference >>> between >>> data that is part of an OpenURL (so OpenURLs can be generated from the >>> Referent), and data that is just internal Umlaut data that should >>> not be >>> included in an OpenURL. Thanks for clearing up that private_data is >>> in fact >>> part of Z39.88, so by putting stuff there I'm not really doing what I >>> thought I was doing. (Still confused that I thought OpenURL >>> private_data was >>> a single string, non-repeatable, but you've got the internal >>> representation >>> being repeatable with a key? Maybe I misunderstand OpenURL private >>> data?). >>> >>> Anyhow, so the original question is: Why not just put it in a service >>> response? I could. It still seems to me to make sense to distinguish >>> between Referent and ServiceResponse based on the idea that Referent is >>> where we store metadata about the citation (like an asin), and an >>> individual >>> ServiceResponse shouldn't just be a random tiny piece of data (like >>> an asin) >>> but should generally be a, well, service that is going to be >>> presented to >>> the user that the user can click on or whatever. >>> >>> For instance, when the Amazon service realizes that it knows the >>> title of >>> this citation that was only given as an ISBN in the OpenURL >>> request---the >>> service adds it to the Referent. It doesn't make a ServiceResponse >>> that has >>> nothing in it by a title. It would be a lot harder to work with if >>> it did. >>> To me, it makes sense to put an asin in the Referent too, even >>> though it's >>> nto part of the OpenURL. >>> >>> But I guess not the way I was thinking of doing it. Hmm. I could >>> create an >>> extra column in Referent for asin? >>> >>> Jonathan >>> >>> Ross Singer wrote: >>> >>>> Hey, we're replying to the list now by default. Nice. >>>> >>>> Ok, I'm going to lead this off with: I don't really like the idea of >>>> shoving data into the referent that isn't intended to be part of the >>>> referent. The expectation was that you could serialize the referent >>>> from that data without having to know what the actual rows meant. >>>> >>>> Private data is part of the OpenURL spec. Every ContextObject entity >>>> can have private data, which are key-value pairs that are meant for >>>> the target's eyes only (i.e., don't reserialize). I don't know of any >>>> instances of this actually being used. >>>> >>>> If you're using the Amazon ServiceAdapter, why can't you just use the >>>> data it stored in service_responses? >>>> >>>> -Ross. >>>> >>>> On Mon, Jun 30, 2008 at 1:27 PM, Jonathan Rochkind >>>> wrote: >>>> >>>> >>>>> If I want to store something in Referent/ReferentValue that isn't >>>>> really >>>>> part of an openURL at all.... >>>>> >>>>> looks like I safely do that by setting "metadata" on the >>>>> referent_value >>>>> to >>>>> false... and also set private_data to true? Not sure what >>>>> private_data >>>>> is >>>>> for. Do you remember, Ross? >>>>> >>>>> I'm using this to store an asin, to let the amazon service work in >>>>> waves, >>>>> so >>>>> I can make the AWS call in the foreground, but not do the slower HTTP >>>>> request/scrape until a later wave, using the stored AWS. >>>>> >>>>> referent.enhance_metadata("asin", asin, false, true) => would be >>>>> metadata=>false, private_data=>true. >>>>> >>>>> Can you confirm from your memory whether this is a sane thing to do, >>>>> ross? >>>>> Not entirely sure what private_data is intended for. Wonder if just >>>>> setting >>>>> metadata to false, but not using private_data is sufficient to avoid >>>>> confusing Umlaut into thinking this is actually an OpenURL referent >>>>> value? >>>>> >>>>> Jonathan >>>>> >>>>> -- >>>>> Jonathan Rochkind >>>>> Digital Services Software Engineer >>>>> The Sheridan Libraries >>>>> Johns Hopkins University >>>>> 410.516.8886 rochkind (at) jhu.edu >>>>> >>>>> _______________________________________________ >>>>> Umlaut-general mailing list >>>>> Umlaut-general at rubyforge.org >>>>> http://rubyforge.org/mailman/listinfo/umlaut-general >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Umlaut-general mailing list >>>> Umlaut-general at rubyforge.org >>>> http://rubyforge.org/mailman/listinfo/umlaut-general >>>> >>>> >>> -- >>> Jonathan Rochkind >>> Digital Services Software Engineer >>> The Sheridan Libraries >>> Johns Hopkins University >>> 410.516.8886 rochkind (at) jhu.edu >>> >>> _______________________________________________ >>> Umlaut-general mailing list >>> Umlaut-general at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/umlaut-general >>> >>> >> _______________________________________________ >> Umlaut-general mailing list >> Umlaut-general at rubyforge.org >> http://rubyforge.org/mailman/listinfo/umlaut-general >> > -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From mneedlem at ufl.edu Mon Jun 30 17:58:46 2008 From: mneedlem at ufl.edu (Mark Needleman) Date: Mon, 30 Jun 2008 17:58:46 -0400 Subject: [Umlaut-general] pivate data Message-ID: <48695716.4020601@ufl.edu> Jonathan from the OpenURL standard Entities may also be described by Private Data Descriptors. Because the nature of Private Data is not specified by this Standard, there is no infrastructure in the OpenURL Framework to support Private Data: none of the core components explicitly deal with Private Data, and Community Profiles do not contain any information to facilitate the use of Private Data in Applications according to the table in the standard the number of occurrences is unbounded so you can have multiple ones or bunch everything into one instance - it looks like the definitions for the SAP profiles do restrict the use of private data for each context object type to a maximum of one - although i may be reading this wrong - its been awhile since i looked at the standard - and to be honest the NISO OpenURL standard is not one of our best examples of well written prose Mark Needleman wrote: > Jonathan From mneedlem at ufl.edu Mon Jun 30 17:39:45 2008 From: mneedlem at ufl.edu (Mark Needleman) Date: Mon, 30 Jun 2008 17:39:45 -0400 Subject: [Umlaut-general] storing something weird in referent In-Reply-To: <48694EA4.6060802@jhu.edu> References: <48691786.7030808@jhu.edu> <23b83f160806301230v437069f8o56360a2d76c58434@mail.gmail.com> <48693819.9060607@jhu.edu> <23b83f160806301302m2e342834jbe3b8f2af46d34a4@mail.gmail.com> <48694CB3.8010106@jhu.edu> <48694EA4.6060802@jhu.edu> Message-ID: <486952A1.1090500@ufl.edu> Jonathan unless things have change - when i was on the NISO committee that developed the OpenURL standard and the 0.1 ExLibris standard before it - private data (pid in 0.1 terms) was exactly that - data that was understood by the client and server but not necessarily by anyone else in the 0.1 standards there was a rule that if you had a pid you had to have an sid (source id ( so you could tell where the private data came from and thus how to interpret it) -- but there were no rules for how it had to be formatted i believe that carried along to the NISO standard but i would have to check to make sure - its been awhile since i looked at the standard - and i dont recall offhand if reft_dat is defined as repeatable in the NISO standard mark Jonathan Rochkind wrote: > I'm still a bit confused about the "private data" though, thinking > that openurl private data was just a single blob of text string--but > you have it stored as multiple key/value pairs. Can you give me an > example of an openurl with multiple key/value pairs in private data, > in a way that can be extracted without knowing anything more except > that that its an openurl and it's format? I didn't think that could be > done. I thought the private data (rft_dat right?) was just a single > string value. > > But maybe I'm confused because it can only be done in the XML formats, > not the KEV formats? Or because it is hypothetically possible in > openurl, but not actually in SAP1/2? Or something? > > Jonathan > > Jonathan Rochkind wrote: >> Beautiful solution, thanks! >> >> I actually was wondering if I was allowed to put any legal URI/URN in >> the rft_id, or only certain ones. But any one will do, perhaps? >> >> Except now I realize I really could use the Amazon item URL too, not >> just the asin. Although I may not need it. I'll keep messing with it. >> >> I agree with you that my original proposed solution was an ugly >> shoehorning; I didn't understand what was going on. >> >> However, to me, creating a new column in Referent itself would not be >> a shoehorning, so long as it was a new column which clearly held >> non-z39.88 data. Equally acceptable to me would be if we made >> another flag in ReferentValue for non-z3988 or what have you. If >> just the fact that the data is still called Referent is what bothers >> you, imagine that it were changed to Citation and CitationValue >> instead of Referent and ReferentValue. It would just so happen that >> many of the CitationValues (those with the z3988 field set to true >> for instance) had 1-to-1 correspondence with z39.88 values. To me, >> that would be perfectly acceptable and not the ugly shoehorning that >> my original suggestion indeed was. >> >> But for now, I don't think that will be neccesary. >> >> And certainly putting things in ServiceResponse is another option. I >> briefly contemplated the idea of "metadata" Service Type Value, that >> would just be an arbitrary key and a value. Functioning much like >> ReferentValue does. But I still think this would be a bit confusing. >> Plus, Referent and ServiceResponse have different cacheing semantics. >> At present, Referent can, at least theoretically, be used between any >> requests that target the same citation. ServiceResponses are, at >> present, not re-used between Requests, and Requests are not re-used >> between sessions. >> >> Jonathan >> >> Ross Singer wrote: >>> Yes, we discussed detaching the data model from Z39.88. However, it's >>> not currently detached. So what we would be doing is shoehorning >>> non-referent data into compartments designed for referent data. In my >>> mind this is similar to what we've done with MARC, in the 9xx tags: >>> shoving local data in there that doesn't make any sense out of context >>> and probably wasn't the best place to put it in the first place, but >>> hey, you work with the tools you have. >>> >>> However, in our case, the tool can be modified to do our bidding. >>> >>> In your example, you're not taking the title and shoving it into a >>> service response because it makes more sense for you to enhance the >>> referent's metadata with it, since it fits, legally, into the >>> referent. >>> >>> The same can apply for ASIN, if you use rft_id=URN:ASIN:123456789X >>> that would be perfectly fine and can be passed on, legally, in the >>> referent. >>> >>> -Ross. >>> >>> On Mon, Jun 30, 2008 at 3:46 PM, Jonathan Rochkind >>> wrote: >>> >>>> Well, you know how we all kept saying, gee, it's kind of too bad >>>> that the >>>> data object Umlaut uses to store a citation is tied so tightly to the >>>> OpenURL spec, it really ought to be a more generic "Umlaut Citation >>>> data" >>>> object? I think I've talked with you about that before Ross, and I >>>> think I >>>> even read you saying something about it in a recent thread. >>>> >>>> Well, a while ago I realized---okay, if we prefer this to be so, >>>> then why >>>> not just MAKE it so? Who says that Umlaut's Referent has to be so >>>> tightly >>>> tied to OpenURL Z39.88? I say it doens't, and the Referent model >>>> object is >>>> just Umlaut's representation of citation metadata, and is welcome >>>> to have >>>> things in it that are not part of Z39.88 or do not directly relate >>>> to it. It >>>> also of course has a lot of stuff that is from Z39.88--it makes >>>> sense that >>>> Umlaut's internal citation data model would be based off of Z39.88, >>>> because >>>> of the nature of Umlaut, but it doesn't need to stop there. >>>> >>>> Of course, it is still important that you can tell the difference >>>> between >>>> data that is part of an OpenURL (so OpenURLs can be generated from the >>>> Referent), and data that is just internal Umlaut data that should >>>> not be >>>> included in an OpenURL. Thanks for clearing up that private_data is >>>> in fact >>>> part of Z39.88, so by putting stuff there I'm not really doing what I >>>> thought I was doing. (Still confused that I thought OpenURL >>>> private_data was >>>> a single string, non-repeatable, but you've got the internal >>>> representation >>>> being repeatable with a key? Maybe I misunderstand OpenURL private >>>> data?). >>>> >>>> Anyhow, so the original question is: Why not just put it in a service >>>> response? I could. It still seems to me to make sense to distinguish >>>> between Referent and ServiceResponse based on the idea that >>>> Referent is >>>> where we store metadata about the citation (like an asin), and an >>>> individual >>>> ServiceResponse shouldn't just be a random tiny piece of data (like >>>> an asin) >>>> but should generally be a, well, service that is going to be >>>> presented to >>>> the user that the user can click on or whatever. >>>> >>>> For instance, when the Amazon service realizes that it knows the >>>> title of >>>> this citation that was only given as an ISBN in the OpenURL >>>> request---the >>>> service adds it to the Referent. It doesn't make a ServiceResponse >>>> that has >>>> nothing in it by a title. It would be a lot harder to work with if >>>> it did. >>>> To me, it makes sense to put an asin in the Referent too, even >>>> though it's >>>> nto part of the OpenURL. >>>> >>>> But I guess not the way I was thinking of doing it. Hmm. I could >>>> create an >>>> extra column in Referent for asin? >>>> >>>> Jonathan >>>> >>>> Ross Singer wrote: >>>> >>>>> Hey, we're replying to the list now by default. Nice. >>>>> >>>>> Ok, I'm going to lead this off with: I don't really like the idea of >>>>> shoving data into the referent that isn't intended to be part of the >>>>> referent. The expectation was that you could serialize the referent >>>>> from that data without having to know what the actual rows meant. >>>>> >>>>> Private data is part of the OpenURL spec. Every ContextObject entity >>>>> can have private data, which are key-value pairs that are meant for >>>>> the target's eyes only (i.e., don't reserialize). I don't know of >>>>> any >>>>> instances of this actually being used. >>>>> >>>>> If you're using the Amazon ServiceAdapter, why can't you just use the >>>>> data it stored in service_responses? >>>>> >>>>> -Ross. >>>>> >>>>> On Mon, Jun 30, 2008 at 1:27 PM, Jonathan Rochkind >>>>> wrote: >>>>> >>>>> >>>>>> If I want to store something in Referent/ReferentValue that isn't >>>>>> really >>>>>> part of an openURL at all.... >>>>>> >>>>>> looks like I safely do that by setting "metadata" on the >>>>>> referent_value >>>>>> to >>>>>> false... and also set private_data to true? Not sure what >>>>>> private_data >>>>>> is >>>>>> for. Do you remember, Ross? >>>>>> >>>>>> I'm using this to store an asin, to let the amazon service work >>>>>> in waves, >>>>>> so >>>>>> I can make the AWS call in the foreground, but not do the slower >>>>>> HTTP >>>>>> request/scrape until a later wave, using the stored AWS. >>>>>> >>>>>> referent.enhance_metadata("asin", asin, false, true) => would be >>>>>> metadata=>false, private_data=>true. >>>>>> >>>>>> Can you confirm from your memory whether this is a sane thing to do, >>>>>> ross? >>>>>> Not entirely sure what private_data is intended for. Wonder if just >>>>>> setting >>>>>> metadata to false, but not using private_data is sufficient to avoid >>>>>> confusing Umlaut into thinking this is actually an OpenURL referent >>>>>> value? >>>>>> >>>>>> Jonathan >>>>>> >>>>>> -- >>>>>> Jonathan Rochkind >>>>>> Digital Services Software Engineer >>>>>> The Sheridan Libraries >>>>>> Johns Hopkins University >>>>>> 410.516.8886 rochkind (at) jhu.edu >>>>>> >>>>>> _______________________________________________ >>>>>> Umlaut-general mailing list >>>>>> Umlaut-general at rubyforge.org >>>>>> http://rubyforge.org/mailman/listinfo/umlaut-general >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Umlaut-general mailing list >>>>> Umlaut-general at rubyforge.org >>>>> http://rubyforge.org/mailman/listinfo/umlaut-general >>>>> >>>>> >>>> -- >>>> Jonathan Rochkind >>>> Digital Services Software Engineer >>>> The Sheridan Libraries >>>> Johns Hopkins University >>>> 410.516.8886 rochkind (at) jhu.edu >>>> >>>> _______________________________________________ >>>> Umlaut-general mailing list >>>> Umlaut-general at rubyforge.org >>>> http://rubyforge.org/mailman/listinfo/umlaut-general >>>> >>>> >>> _______________________________________________ >>> Umlaut-general mailing list >>> Umlaut-general at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/umlaut-general >>> >> >