[Kramdown-users] More HTML corruption

Eric Sunshine sunshine at sunshineco.com
Fri Dec 4 03:34:04 EST 2009


Hi Thomas,

Thomas Leitner wrote:
> kramdown seems to corrupt HTML in this case but it is actually only
> following its rules because it can only recognize block HTML tags at
> *the start of a line*!

Does this need to be the case for the closing tag? One would think that 
for raw HTML, the closing tag could be recognized at other than start of 
line. Even when processing Markdown within a block (due to markdown="1", 
for instance), I suspect that, in the most common case, a user would 
expect the closing tag to be recognized properly rather than being 
escaped via HTML character references.

> So if you change the above to
>     <dl>
>     <dt>Moo</dt>
>     <dd>
>     Bar
>     </dd>
>     </dl>
> kramdown will give you the correct output.

I was aware of this corrective transformation, but the documentation is 
ambiguous (to me at least) and the situation somewhat counterintuitive. 
By counterintuitive, I mean that if the following works as expected:

   <dd>Foo Bar</dd>

where the closing </dd> is recognized at other than start of line, then 
one easily assumes that the similar case will work:

   <dd>Foo
   Bar</dd>

(The example is contrived, but in the real-world, the <dd>...</dd> was 
emitted by ERB and just happened to be long enough that it got wrapped 
over several lines.)

> Refer to the syntax
> documentation for this case (i.e. a HTML block is only started when a
> line is started with a non-span HTML tag).

Indeed, the documentation states that an "HTML block is started when 
kramdown encounters a line beginning with [a non-span] HTML tag", 
however it does not say that the closing tag need start at the beginning 
of line. Rather, it states ambiguously that "the HTML block continues 
till the HTML block line with the corresponding closing tag", which I 
read as meaning only that the line must contain the closing tag, not 
that the closing tag need start the line. I'm not being intentionally 
pedantic, but rather relating how I interpreted the documentation.

> However, when I think about it this behaviour could be changed when
> parsing raw HTML, i.e. when parsing the content of an HTML block as
> span or block level elements this behaviour can't be avoided. However,
> when parsing as raw HTML kramdown could just parse every line till the
> end and not just HTML block lines... With this changes your original
> example would work.
> So, would this be useful? It would certainly slow down the HTML
> parser...

Performance issues aside, the current behavior has a high "surprise 
factor". The input was well-formed HTML and appeared to be 
well-delimited (that is, visually separated from the following level-2 
header), so the malformed result was startling (and would not pass HTML 
validation). If the intention of Markdown is to be unobtrusive, yet 
permissive enough to allow HTML where needed, then it might be 
reasonable somehow to cater to this sort of situation.

Of course, the original Markdown implementation side-stepped these ugly 
corner cases by being restrictive and inflexible with regard to HTML 
block elements. More feature-laden processors, however, may need to deal 
with these issues if they wish to retain the friendliness of the original.

-- ES


More information about the Kramdown-users mailing list