Posted By: Tom Link
Date: 2007-10-26 22:31
Summary: websitary 0.3 was released
Project: websitary - Webpage/RSS Monitor
websitary version 0.3 has been released!
* <http://rubyforge.org/projects/websitiary/>
## DESCRIPTION:
websitary (formerly known as websitiary with an extra "i") monitors
webpages, rss feeds, podcasts etc. It reuses other programs (w3m, diff
etc.) to do most of the actual work. By default, it works on an ASCII
basis, i.e. with the output of text-based webbrowsers like w3m (or lynx,
links etc.) as the output can easily be post-processed. It can also work
with HTML and highlight new items. This script was originally planned as
a ruby-based websec replacement.
By default, this script will use w3m to dump HTML pages and then run
diff over the current page and the previous backup. Some pages are
better viewed with lynx or links. Downloaded documents (HTML or ASCII)
can be post-processed (e.g., filtered through some ruby block that
extracts elements via hpricot and the like). Please see the
configuration options below to find out how to change this globally or
for a single source.
This user manual is also available as
PDF[http://websitiary.rubyforge.org/websitary.pdf].
## FEATURES/PROBLEMS:
* Handle webpages, rss feeds (optionally save attachments in podcasts
etc.)
* Compare webpages with previous backups
* Display differences between the current version and the backup
* Provide hooks to post-process the downloaded documents and the diff
* Display a one-page report summarizing all news
* Automatically open the report in your favourite web-browser
* Experimental: Download webpages on defined intervalls and generate
incremental diffs.
Changes:
# 0.3
* Renamed the global option :downloadhtml to :download_html.
* The downloader for robots and rss enclosures should now be properly
configurable via the global options :download_robots and
:download_rss_enclosure (default: :openuri).
* Respect rel="nofollow" on hyperreferences.
* :wdays, :mdays didn't work.
* --exclude command line options, exclude configuration command
* Check for robots.txt-compliance after testing if the URL is
appropriate.
* htmldiff.rb can now also highlight differences à la websec's webdiff.
* configuration.rb: Ignore pubDate and certain other non-essential fields (tags
etc.) when constructing rss item IDs.
|
|