Hi, Jens,<br><br>great stuff. Just installed it and made a short test as described in the readme. It works as announced. Thanks for sharing this! The crawler has problems with frames but this is a quite common problem. I've had to configure it to the main content frame.
<br><br>You'll probably know nutch. But here is a pointer anyway: <a href="http://lucene.apache.org/nutch/">http://lucene.apache.org/nutch/</a> just if you're in search for some inspiration. Nutch is a great tool for webcrawling. I've used it and it worked great...
<br><br>Best Regards<br>Jan Prill<br><br><div><span class="gmail_quote">On 3/25/06, <b class="gmail_sendername">Jens Kraemer</b> <<a href="mailto:kraemer@webit.de">kraemer@webit.de</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi!<br><br>RDig is a small tool to build a Ferret index for the contents of a<br>website or intranet. It contains a simple HTTP crawler and some support<br>for extracting textual content from the fetched pages.<br><br>I built this to implement a site-wide search for a recent project
<br>that combined a Rails application with lots of static html files<br>generated by a CMS.<br><br>Any feedback is very welcome!<br><br>Rubyforge project page: <a href="http://rubyforge.org/projects/rdig">http://rubyforge.org/projects/rdig
</a><br>RDocs: <a href="http://rdig.rubyforge.org/">http://rdig.rubyforge.org/</a><br><br>`gem install rdig` should work once the gem has reached the rubyforge<br>mirrors.<br><br><br>Jens<br><br>--<br>webit! Gesellschaft für neue Medien mbH
<a href="http://www.webit.de">www.webit.de</a><br>Dipl.-Wirtschaftsingenieur Jens Krämer <a href="mailto:kraemer@webit.de">kraemer@webit.de</a><br>Schnorrstraße 76 Tel +49 351 46766 0<br>D-01069 Dresden Fax +49 351 46766 66
<br>_______________________________________________<br>Ferret-talk mailing list<br><a href="mailto:Ferret-talk@rubyforge.org">Ferret-talk@rubyforge.org</a><br><a href="http://rubyforge.org/mailman/listinfo/ferret-talk">http://rubyforge.org/mailman/listinfo/ferret-talk
</a><br></blockquote></div><br>