The markup and cross-referencing algorithm is broken, which causes a number of subtle typesetting bugs. The algorithm
scans over text and matches the words in the text against a series of regular expressions, in order to determine whether
a word is normal text or some kind of markup. One kind of markup is “special” (user-defined) markup, which includes
code cross-references. The problem with this is that not all text flagged as a cross-reference is in fact a cross-reference.
The regular expressions used to detect cross-references come up with lots of false positives (the callback associated
with cross-references deals with the false positives). The regular expressions *cannot* be tweaked to avoid this.
One reason for this is that rdoc supports f95 files. In f95, a "class" name can consist solely of lowercase
characters. A word like "the" therefore could be a class name as far as f95 is concerned. Thus, in text
like ‘the cat jumped’, “the”, “cat”, and “jumped” all are potential cross-references. Cross-referencing
hinders the typesetting features (turning single quotes into opening or closing single quotes, for instance) because
words get split up (i.e., can't becomes "can" (possible cross-reference), "'" (text), "t"
(possible cross-reference). The markup engine cannot handle applying typesetting to cross-references, even to false
positive cross-references. This is the reason that rdoc's HTML apostrophe typesetting has been broken for years; all
the typesetting rule ever gets is a solitary ' character (due to the surrounding text being flagged as potential
cross-references), which is not enough to decide whether the character should be converted into an apostrophe. There
are several possible solutions:
** When initially classifying text, expand all "special" markup (including cross-references). If the expansion
does not change the text (the case for false positives), then the text is not really special.
** When trying to apply a typesetting rule in the presence of special markup (i.e., “ConfigToolkit’s”), provide
access to the textual contents of the markup so that any typesetting rules can be applied properly.
** Stop implementing cross-referencing through special markup and instead implement it as a core part of the markup
engine (eliminating the false positives).
I fixed some of the most glaring typesetting issues
in http://rubyforge.org/tracker/index.php?func=detail&aid=2709&group_id=627&atid=2472 but there are many
more. “(c)” cannot be converted properly into the copyright symbol, for instance, because “c” is considered
a potential cross-reference.
There are other serious issues with the markup engine. The cross-referencing engine is built so that each potential
cross-reference is matched against the same regular expression twice, once within the markup engine and once with
RDoc::MarkUp::ToHtmlCrossref. This is inefficient and inelegant. Instead, each type of cross-reference should have
its own callback. This also easily would accommodate different processing for each type of cross-reference. For instance,
the current cross-reference callback assumes that words consisting of all lowercase letters cannot be cross-references,
which prevents some Fortran 95 classes from being cross-referenced properly. The logic instead only should prevent
method names of all lowercase letters from being made into cross-references.
The code in the markup engine also needs refactoring. The special markup, for instance, is set through instance methods
but stored in class constants, which makes for very confusing usage (you add special markup to one instance without
realizing that it effects all instances; I ran into this while creating unit tests). Also, there is a lot of code
duplication in lib/rdoc/markup (the convert_special method, for instance, is defined identically in to_html.rb, to_latex.rb,
and to_flow.rb).
|