From rochkind at jhu.edu Mon Feb 11 11:02:16 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Mon, 11 Feb 2008 11:02:16 -0500 Subject: [Blacklight-development] blacklight questiosn Message-ID: <47B07188.9030103@jhu.edu> Hi all, I've started taking a look at the Blacklight code. Very nice clean code, nice job. [ Although some more comments wouldn't have hurt! :) ] There are a few things I'm having trouble figuring out, and some questions about features and if you considered them or implemented them or not. I see you store the entire marc record in a field, and use that stored MARC for display, the index fields are just used for indexing. Which makes sense; I understand that right? I don't understand how the default_controller.rb comes into play. Nothing I know about Rails says anything about a "default controller" being part of rails--and I don't see any of the other controllers extending or including it---so what is this file used for, if anything? Is it leftover from a previous version, and should be deleted? I see that I don't think you've handled 'status' (checked out, available, etc.) at all, true? Do you handle serial holdings at all (ie, showing what the ranges of coverage or individual volumes held are for a serial, right in Blacklight?). Have you tried to tackle highlighting of _where_ in the record your search matched, at all? Do you know what I mean? You don't support any kind of fielded search except for limiting by facets right now, right? No way to enter a fielded keyword search? And in general, I'm having trouble understanding how the Flare code works exactly, what is the heart of both the indexing and the display, yes? Any advice for understanding what's going on there? I admit I haven't done much with SOLR; if I understand SOLR, will the Flare API map pretty obviously to SOLR functions? Jonathan -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From erikhatcher at mac.com Tue Feb 12 05:52:41 2008 From: erikhatcher at mac.com (Erik Hatcher) Date: Tue, 12 Feb 2008 05:52:41 -0500 Subject: [Blacklight-development] blacklight questiosn In-Reply-To: <47B07188.9030103@jhu.edu> References: <47B07188.9030103@jhu.edu> Message-ID: <8D9C52C8-A282-4FF0-AA18-F6CF828C5314@mac.com> (I noticed reply goes to the sender - I wonder if we can configure it such that replies go to the list? Bess?) I'll take a stab at replying.... On Feb 11, 2008, at 11:02 AM, Jonathan Rochkind wrote: > Hi all, I've started taking a look at the Blacklight code. Very nice > clean code, nice job. [ Although some more comments wouldn't have > hurt! :) ] ha! > I see you store the entire marc record in a field, and use that > stored > MARC for display, the index fields are just used for indexing. Which > makes sense; I understand that right? Yup. > I don't understand how the default_controller.rb comes into play. > Nothing I know about Rails says anything about a "default controller" > being part of rails--and I don't see any of the other controllers > extending or including it---so what is this file used for, if > anything? > Is it leftover from a previous version, and should be deleted? Looks like a leftover to me also, and can be removed. > I see that I don't think you've handled 'status' (checked out, > available, etc.) at all, true? Status is not currently handled - that is correct. > Do you handle serial holdings at all > (ie, showing what the ranges of coverage or individual volumes held > are > for a serial, right in Blacklight?). No, that is not handled either. If that info is in the MARC record, and this is a display-only sort of need, it should be fairly straightforward to address this. I know nothing about that particular corner of MARC though. > Have you tried to tackle highlighting of _where_ in the record your > search matched, at all? Do you know what I mean? Highlighting is possible, and in fact the underlying query made to Solr is requesting highlighting (search keyword highlighting, I mean here). Any field could be selectively highlighted. Highlighting is not currently being rendered though - it was removed from my initial prototype in its current incarnation. You can see it commented out in views/catalog/_record.rhtml partial. > You don't support any kind of fielded search except for limiting by > facets right now, right? No way to enter a fielded keyword search? Not true. The full Solr/Lucene QueryParser is at your disposal because Solr's standard request handler is being used (it really should be migrated to using DisMax for better relevancy tuning - a related topic because that then eliminates user entered fielded searches). But currently, if you know the field names, you can use them using this syntax: Example: author_text:jonathan The field names are not the most intuitive for end users though, so that is an issue. > And in general, I'm having trouble understanding how the Flare code > works exactly, what is the heart of both the indexing and the display, > yes? Any advice for understanding what's going on there? I admit I > haven't done much with SOLR; if I understand SOLR, will the Flare API > map pretty obviously to SOLR functions? The Flare code is not quite clear to me either :) It was taken on and morphed greatly after my departure, so it is it's own beast I'm unfamiliar with. The heart of the indexing code is scripts/virgo_marc_map.rb - the mapping from MARC to Solr. Flare is not involved at all in indexing, if that is what you're asking. Flare is used purely for searching/displaying results, and keep in mind the "Flare" we speak of here is Blacklight's own forked and highly customized version of what is in the open source codebase of Solr's client/ruby/flare svn tree - no relationship at all at the moment. Best advice for understanding what is going on in code is, as always, with the unit tests :) There appears to be a decent set of tests in scripts/test/mapping_test.rb - though I haven't run it myself. Maybe we can get Matt, forker of Flare, to chime in on this topic too. Erik From rochkind at jhu.edu Tue Feb 12 10:42:34 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Tue, 12 Feb 2008 10:42:34 -0500 Subject: [Blacklight-development] blacklight questiosn In-Reply-To: <8D9C52C8-A282-4FF0-AA18-F6CF828C5314@mac.com> References: <47B07188.9030103@jhu.edu> <8D9C52C8-A282-4FF0-AA18-F6CF828C5314@mac.com> Message-ID: <47B1BE6A.7060203@jhu.edu> Thanks, that's helpful! [And yes, can someone change the list to reply to list, not reply to sender? I didn't get this message previously] MARC serial holdings handling is... a mess. But yeah, based on what's there, I think I see how one would add it. It would not neccesarily be easy--especially because serial holdings often make another layer of object in the object model. It's no longer just 'records' which hold items, it's title-level records that have a one-to-many with 'copy' level records, which in turn have a one to many with 'items'. But anyway. Yeah, I get that flare isn't used for indexing, just search and display. I'll look at the unit tests, good advice. And thanks for warning me that the 'flare' in Blacklight is actually a forked version of an open source project. (What motivated THAT? Why not send your patches back to the real flare?). If the existing blacklight developers wanted to do one thing to make blacklight easier for others to work wtih and adopt---document your fork of flare, please! Just do it with rdoc-able comments. A couple lines about what each class represents, a sentance or two (if neccesary) for each method, and some examples. Jonathan Erik Hatcher wrote: > (I noticed reply goes to the sender - I wonder if we can configure it > such that replies go to the list? Bess?) > > I'll take a stab at replying.... > > On Feb 11, 2008, at 11:02 AM, Jonathan Rochkind wrote: > >> Hi all, I've started taking a look at the Blacklight code. Very nice >> clean code, nice job. [ Although some more comments wouldn't have >> hurt! :) ] >> > > ha! > > >> I see you store the entire marc record in a field, and use that >> stored >> MARC for display, the index fields are just used for indexing. Which >> makes sense; I understand that right? >> > > Yup. > > >> I don't understand how the default_controller.rb comes into play. >> Nothing I know about Rails says anything about a "default controller" >> being part of rails--and I don't see any of the other controllers >> extending or including it---so what is this file used for, if >> anything? >> Is it leftover from a previous version, and should be deleted? >> > > Looks like a leftover to me also, and can be removed. > > >> I see that I don't think you've handled 'status' (checked out, >> available, etc.) at all, true? >> > > Status is not currently handled - that is correct. > > >> Do you handle serial holdings at all >> (ie, showing what the ranges of coverage or individual volumes held >> are >> for a serial, right in Blacklight?). >> > > No, that is not handled either. If that info is in the MARC record, > and this is a display-only sort of need, it should be fairly > straightforward to address this. I know nothing about that > particular corner of MARC though. > > >> Have you tried to tackle highlighting of _where_ in the record your >> search matched, at all? Do you know what I mean? >> > > Highlighting is possible, and in fact the underlying query made to > Solr is requesting highlighting (search keyword highlighting, I mean > here). Any field could be selectively highlighted. Highlighting is > not currently being rendered though - it was removed from my initial > prototype in its current incarnation. You can see it commented out > in views/catalog/_record.rhtml partial. > > >> You don't support any kind of fielded search except for limiting by >> facets right now, right? No way to enter a fielded keyword search? >> > > Not true. The full Solr/Lucene QueryParser is at your disposal > because Solr's standard request handler is being used (it really > should be migrated to using DisMax for better relevancy tuning - a > related topic because that then eliminates user entered fielded > searches). > > But currently, if you know the field names, you can use them using > this syntax: > > > Example: author_text:jonathan > > The field names are not the most intuitive for end users though, so > that is an issue. > > >> And in general, I'm having trouble understanding how the Flare code >> works exactly, what is the heart of both the indexing and the display, >> yes? Any advice for understanding what's going on there? I admit I >> haven't done much with SOLR; if I understand SOLR, will the Flare API >> map pretty obviously to SOLR functions? >> > > The Flare code is not quite clear to me either :) It was taken on > and morphed greatly after my departure, so it is it's own beast I'm > unfamiliar with. > > The heart of the indexing code is scripts/virgo_marc_map.rb - the > mapping from MARC to Solr. Flare is not involved at all in indexing, > if that is what you're asking. > > Flare is used purely for searching/displaying results, and keep in > mind the "Flare" we speak of here is Blacklight's own forked and > highly customized version of what is in the open source codebase of > Solr's client/ruby/flare svn tree - no relationship at all at the > moment. > > Best advice for understanding what is going on in code is, as always, > with the unit tests :) There appears to be a decent set of tests in > scripts/test/mapping_test.rb - though I haven't run it myself. > > Maybe we can get Matt, forker of Flare, to chime in on this topic too. > > Erik > > _______________________________________________ > Blacklight-development mailing list > Blacklight-development at rubyforge.org > http://rubyforge.org/mailman/listinfo/blacklight-development > -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From rochkind at jhu.edu Thu Feb 14 16:32:39 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Thu, 14 Feb 2008 16:32:39 -0500 Subject: [Blacklight-development] solr schema; facetting vs searching In-Reply-To: <47B1BE6A.7060203@jhu.edu> References: <47B07188.9030103@jhu.edu> <8D9C52C8-A282-4FF0-AA18-F6CF828C5314@mac.com> <47B1BE6A.7060203@jhu.edu> Message-ID: <47B4B377.2040809@jhu.edu> It seems from reading the solr wiki that if you want a field to be used for facetting, and ALSO individually searchable as a fielded search, you should index it twice. Does the Blacklight indexing and interface code currently do this, indexing some fields (say, "subject" stuff) as facets AND as searchable text fields? Looks like if you want to sort on it too, you might want it in yet a third version. Can anyone point me to what part of the Blacklight code I might want to look at to see this in practice, both for indexing and on the interface end? Jonathan From erikhatcher at mac.com Fri Feb 15 14:56:22 2008 From: erikhatcher at mac.com (Erik Hatcher) Date: Fri, 15 Feb 2008 14:56:22 -0500 Subject: [Blacklight-development] solr schema; facetting vs searching In-Reply-To: <47B4B377.2040809@jhu.edu> References: <47B07188.9030103@jhu.edu> <8D9C52C8-A282-4FF0-AA18-F6CF828C5314@mac.com> <47B1BE6A.7060203@jhu.edu> <47B4B377.2040809@jhu.edu> Message-ID: Jonathan - yes, that is correct about facet fields and full-text searchable fields. Making one for each purpose indexed/tokenized differently is how to do it. Blacklight does not do this currently though. All field text, however, is tossed into a searchable "text" field, so you can search on practically anything (readable) in the MARC record (the to_s of it) in a full-text fashion. And yes, depending on how you want to sort, and whether you're dealing with single-valued or multivalued facet fields, you may also need to create a sortable field - a single value per document, in the lexicographical format you want sorted by. And no, Blacklight doesn't do sortable fields yet either, that I'm aware of. Blacklight is pure and simple - MARC -> Ruby indexer -> Solr <- RoR search interface. Not really any other bells and whistles lurking underneath, other than the goodies extracted by the indexer. Erik On Feb 14, 2008, at 4:32 PM, Jonathan Rochkind wrote: > It seems from reading the solr wiki that if you want a field to be > used > for facetting, and ALSO individually searchable as a fielded > search, you > should index it twice. Does the Blacklight indexing and interface code > currently do this, indexing some fields (say, "subject" stuff) as > facets > AND as searchable text fields? Looks like if you want to sort on it > too, you might want it in yet a third version. > > Can anyone point me to what part of the Blacklight code I might > want to > look at to see this in practice, both for indexing and on the > interface end? > > Jonathan > _______________________________________________ > Blacklight-development mailing list > Blacklight-development at rubyforge.org > http://rubyforge.org/mailman/listinfo/blacklight-development From rochkind at jhu.edu Fri Feb 15 15:00:35 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Fri, 15 Feb 2008 15:00:35 -0500 Subject: [Blacklight-development] solr schema; facetting vs searching In-Reply-To: References: <47B07188.9030103@jhu.edu> <8D9C52C8-A282-4FF0-AA18-F6CF828C5314@mac.com> <47B1BE6A.7060203@jhu.edu> <47B4B377.2040809@jhu.edu> Message-ID: <47B5EF63.9030102@jhu.edu> [ Can someone please change the list to reply-to list? ] Perfect, thanks. Before I was asking if you could do a fielded search in the current blacklight. You said, yes, using lucene syntax, if you knew the name of the field you wanted to search. But it sounds like most of the fields you might want to search actually aren't stored in the index in a suitable way at present. But I also realize it would be a pretty easy change to the indexer to change this. Just trying to make sure I understand what's going on! If Blacklight doesn't do sortable fields yet.... how does Blacklight sort search results? Jonathan Erik Hatcher wrote: > Jonathan - yes, that is correct about facet fields and full-text > searchable fields. Making one for each purpose indexed/tokenized > differently is how to do it. Blacklight does not do this currently > though. All field text, however, is tossed into a searchable "text" > field, so you can search on practically anything (readable) in the > MARC record (the to_s of it) in a full-text fashion. > > And yes, depending on how you want to sort, and whether you're dealing > with single-valued or multivalued facet fields, you may also need to > create a sortable field - a single value per document, in the > lexicographical format you want sorted by. And no, Blacklight > doesn't do sortable fields yet either, that I'm aware of. > > Blacklight is pure and simple - MARC -> Ruby indexer -> Solr <- RoR > search interface. Not really any other bells and whistles lurking > underneath, other than the goodies extracted by the indexer. > > Erik > > > On Feb 14, 2008, at 4:32 PM, Jonathan Rochkind wrote: >> It seems from reading the solr wiki that if you want a field to be used >> for facetting, and ALSO individually searchable as a fielded search, you >> should index it twice. Does the Blacklight indexing and interface code >> currently do this, indexing some fields (say, "subject" stuff) as facets >> AND as searchable text fields? Looks like if you want to sort on it >> too, you might want it in yet a third version. >> >> Can anyone point me to what part of the Blacklight code I might want to >> look at to see this in practice, both for indexing and on the >> interface end? >> >> Jonathan >> _______________________________________________ >> Blacklight-development mailing list >> Blacklight-development at rubyforge.org >> http://rubyforge.org/mailman/listinfo/blacklight-development -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From erikhatcher at mac.com Sat Feb 16 04:16:29 2008 From: erikhatcher at mac.com (Erik Hatcher) Date: Sat, 16 Feb 2008 04:16:29 -0500 Subject: [Blacklight-development] solr schema; facetting vs searching In-Reply-To: <47B5EF63.9030102@jhu.edu> References: <47B07188.9030103@jhu.edu> <8D9C52C8-A282-4FF0-AA18-F6CF828C5314@mac.com> <47B1BE6A.7060203@jhu.edu> <47B4B377.2040809@jhu.edu> <47B5EF63.9030102@jhu.edu> Message-ID: On Feb 15, 2008, at 3:00 PM, Jonathan Rochkind wrote: > Perfect, thanks. Before I was asking if you could do a fielded > search in > the current blacklight. You said, yes, using lucene syntax, if you > knew > the name of the field you wanted to search. But it sounds like most of > the fields you might want to search actually aren't stored in the > index > in a suitable way at present. I wouldn't say that. There are a lot of *_text fields that are surely useful for searching. And take a look at the schema.xml - all fields are being stored currently so they are retrievable/ highlightable from the front-end easily. Have a look at the Lucene index using Luke or Solr's Luke Request Handler - check out the fields indexed and play around. There is more there than meets the eye. Certainly there is much more to MARC than the indexer is extracting currently, though - so lots more can be done. > If Blacklight doesn't do sortable fields yet.... how does Blacklight > sort search results? Depends - if you're purely navigating by facets, the order is in index order (the order the records were indexed). However, once you add a full-text query, sorting is by.... relevance! Imagine that :) Erik From goodieboy at gmail.com Sat Feb 16 14:28:35 2008 From: goodieboy at gmail.com (Matt M.) Date: Sat, 16 Feb 2008 14:28:35 -0500 Subject: [Blacklight-development] Hello! Message-ID: I've joined the list! :) I want to get in and comment to some of the questions here, seems like Erik has done a great at that so far. I'll get back asap Monday. Until then, just wanted to let everyone know that a much better Flare is coming. The code is a lot more simple, documented and tested. Please just hold on a couple more days! My intention was never to really fork Flare, I had a limited amount of time to get some things working that the original Flare did not provide. I would *love* to create a patch and get this merged into the real Flare. I'll admit though, I love the title "forker of flare". That's so cool! Also, I've recently found a great way to get holdings information using Z39.50 and created a nice little helper class. It uses YAZ, and ruby-marc. You can do things like: s = Z3950.connect('virgo.lib.virginia.edu', 2020, 'unicorn') record = s.find_by_ckey('2993872') record.holdings.each do |item| item.available? item.current_location item.home_location end The problem is, the solution is very dependent on what our Z39.50 server is spitting out. The field we use for holdings is 926 I think. You can customize your server as needed. It'd be great to get that documented somewhere so everyone could use this functionality. Cheers! - m -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/blacklight-development/attachments/20080216/a05c25e1/attachment.html From rochkind at jhu.edu Mon Feb 18 10:36:40 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Mon, 18 Feb 2008 10:36:40 -0500 Subject: [Blacklight-development] solr schema; facetting vs searching In-Reply-To: References: <47B07188.9030103@jhu.edu> <8D9C52C8-A282-4FF0-AA18-F6CF828C5314@mac.com> <47B1BE6A.7060203@jhu.edu> <47B4B377.2040809@jhu.edu> <47B5EF63.9030102@jhu.edu> Message-ID: <47B9A608.8030407@jhu.edu> Hmm, I guess I'm confused as to how to figure out which SOLR/lucene fields are suitable for which use. I'll look around with it more in the ways you say, get Luke going and such. Jonathan. Erik Hatcher wrote: > > On Feb 15, 2008, at 3:00 PM, Jonathan Rochkind wrote: >> Perfect, thanks. Before I was asking if you could do a fielded search in >> the current blacklight. You said, yes, using lucene syntax, if you knew >> the name of the field you wanted to search. But it sounds like most of >> the fields you might want to search actually aren't stored in the index >> in a suitable way at present. > > I wouldn't say that. There are a lot of *_text fields that are > surely useful for searching. And take a look at the schema.xml - all > fields are being stored currently so they are > retrievable/highlightable from the front-end easily. > > Have a look at the Lucene index using Luke > or Solr's Luke Request Handler - check > out the fields indexed and play around. There is more there than > meets the eye. Certainly there is much more to MARC than the indexer > is extracting currently, though - so lots more can be done. > >> If Blacklight doesn't do sortable fields yet.... how does Blacklight >> sort search results? > > Depends - if you're purely navigating by facets, the order is in index > order (the order the records were indexed). > > However, once you add a full-text query, sorting is by.... > relevance! Imagine that :) > > Erik -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu From rochkind at jhu.edu Mon Feb 18 10:42:14 2008 From: rochkind at jhu.edu (Jonathan Rochkind) Date: Mon, 18 Feb 2008 10:42:14 -0500 Subject: [Blacklight-development] Hello! In-Reply-To: References: Message-ID: <47B9A756.50008@jhu.edu> Awesome, very pleased to hear that. I guess I owe those of you who don't know me an introduction. I am an engineer at Johns Hopkins Libraries. We are currently formally 'investigating' several open source 'next generation discovery layer' tools. Blacklight being just one. So I am not moving forward aggressively with Blacklight or anything; rather we in the systems department have been directed to get some knowledge about what is out there, feasibility of adopting it here, strengths and weaknesses of available solutions, etc. I'm currently spending about one day a week on such efforts, currently focused on blacklight. So the moral of the story is, I'm in no particular rush, and certainly can wait a few more days for something from you. :) As far as holdings--I don't think there's any way to get around being very instance-dependent. That is, not only does it need to be different from one ILS vendor to another---but often from one location of a given ILS product to another! This is what I've found in my explorations with that stuff for other purposes. So the best bet is writing software that's easily configurable and extendable--the trick of course is figuring out the right architecture for that, cleverly abstracting out what really will be common to them all to eliminate duplication of logic. With Horizon that I've got, I actually found I do better avoiding the z39.50 server and using an XML layer that the Horizon OPAC provides. Jonathan Jonathan Matt M. wrote: > I've joined the list! :) > > I want to get in and comment to some of the questions here, seems like Erik > has done a great at that so far. I'll get back asap Monday. Until then, just > wanted to let everyone know that a much better Flare is coming. The code is > a lot more simple, documented and tested. Please just hold on a couple more > days! My intention was never to really fork Flare, I had a limited amount of > time to get some things working that the original Flare did not provide. I > would *love* to create a patch and get this merged into the real Flare. I'll > admit though, I love the title "forker of flare". That's so cool! > > Also, I've recently found a great way to get holdings information using > Z39.50 and created a nice little helper class. It uses YAZ, and ruby-marc. > You can do things like: > > s = Z3950.connect('virgo.lib.virginia.edu', 2020, 'unicorn') > record = s.find_by_ckey('2993872') > record.holdings.each do |item| > item.available? > item.current_location > item.home_location > end > > The problem is, the solution is very dependent on what our Z39.50 server is > spitting out. The field we use for holdings is 926 I think. You can > customize your server as needed. It'd be great to get that documented > somewhere so everyone could use this functionality. > > Cheers! > - m > > > ------------------------------------------------------------------------ > > _______________________________________________ > Blacklight-development mailing list > Blacklight-development at rubyforge.org > http://rubyforge.org/mailman/listinfo/blacklight-development > -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu