[Kramdown-users] More HTML corruption
sunshine at sunshineco.com
Fri Dec 4 03:34:04 EST 2009
Thomas Leitner wrote:
> kramdown seems to corrupt HTML in this case but it is actually only
> following its rules because it can only recognize block HTML tags at
> *the start of a line*!
Does this need to be the case for the closing tag? One would think that
for raw HTML, the closing tag could be recognized at other than start of
line. Even when processing Markdown within a block (due to markdown="1",
for instance), I suspect that, in the most common case, a user would
expect the closing tag to be recognized properly rather than being
escaped via HTML character references.
> So if you change the above to
> kramdown will give you the correct output.
I was aware of this corrective transformation, but the documentation is
ambiguous (to me at least) and the situation somewhat counterintuitive.
By counterintuitive, I mean that if the following works as expected:
where the closing </dd> is recognized at other than start of line, then
one easily assumes that the similar case will work:
(The example is contrived, but in the real-world, the <dd>...</dd> was
emitted by ERB and just happened to be long enough that it got wrapped
over several lines.)
> Refer to the syntax
> documentation for this case (i.e. a HTML block is only started when a
> line is started with a non-span HTML tag).
Indeed, the documentation states that an "HTML block is started when
kramdown encounters a line beginning with [a non-span] HTML tag",
however it does not say that the closing tag need start at the beginning
of line. Rather, it states ambiguously that "the HTML block continues
till the HTML block line with the corresponding closing tag", which I
read as meaning only that the line must contain the closing tag, not
that the closing tag need start the line. I'm not being intentionally
pedantic, but rather relating how I interpreted the documentation.
> However, when I think about it this behaviour could be changed when
> parsing raw HTML, i.e. when parsing the content of an HTML block as
> span or block level elements this behaviour can't be avoided. However,
> when parsing as raw HTML kramdown could just parse every line till the
> end and not just HTML block lines... With this changes your original
> example would work.
> So, would this be useful? It would certainly slow down the HTML
Performance issues aside, the current behavior has a high "surprise
factor". The input was well-formed HTML and appeared to be
well-delimited (that is, visually separated from the following level-2
header), so the malformed result was startling (and would not pass HTML
validation). If the intention of Markdown is to be unobtrusive, yet
permissive enough to allow HTML where needed, then it might be
reasonable somehow to cater to this sort of situation.
Of course, the original Markdown implementation side-stepped these ugly
corner cases by being restrictive and inflexible with regard to HTML
block elements. More feature-laden processors, however, may need to deal
with these issues if they wish to retain the friendliness of the original.
More information about the Kramdown-users