[kramdown-users] possible bug: kramdown wrapping <br> in <p>

Matt Neuburg matt at tidbits.com
Wed Aug 11 11:37:03 EDT 2010

Actually, now that I am a firm and enthusiastic convert to kramdown, I
believe (thanks to Thomas's hint) that I understand kramdown's behavior
here, and can defend it.

So, here's how I understand what kramdown is doing in this example. Yes,
kramdown stays out of a <div>. So if I wrap the *entire* expression in a
<div>, it is untouched:

<br />

[Output is identical to input.]

But in my actual example, kramdown divides the input into three regions: the
div pairs, and the stuff between them. The "stuff between them" is not
inside a div, so it is not protected. Thus it is a target for block-level

<br /> <-- this is unprotected, it is a "kramdown target"

The "contiguous" issue that you raise also misled me at first, but now that
I understand it, I actually like what kramdown is doing. The fact is that
"contiguous" is irrelevant! kramdown isolates the line "<br />" as a target
for processing, not on the basis of what it is touching or not touching, but
on the basis of the overall *structure* - it is a region unprotected by a
surrounding <div>. This is actually very good, because it means that, unlike
Markdown, I don't have to add extra spacing as I embed the document into its
surrounding templated HTML.

Here is another way to think of it. What should kramdown do here?

require 'kramdown'
s = "test"
puts Kramdown::Document.new(s).to_html

I think you will agree that it should surround "test" with <p> tags. And
that is just what it does. Yet there is no space before or after "test";
there isn't even a newline. Well, the "<br />" line in my example is exactly
equivalent to that. It is a single-line processing "document" on its own.

If I am serious about protecting the "<br />", I have ways to do it. I can
surround the entire document with a <div> and just isolate the sections to
be processed with an inner <div markdown="1">. Or I can write:

<br />

If you're going to argue about an edge case, here is an edge case to argue

{::nomarkdown}<br />{:/}

I think what kramdown does in that case is a bit more surprising (it *does*
surround the <br /> with <p> tags), but I can live with it. It is an edge
case! Edge cases are always difficult. I can envision kramdown applying what
amounts to two conflicting rules here, and I can easily see how the <p> rule
might win. The same applies here:

require 'kramdown'
s = "{::nomarkdown}<br />{:/}"
puts Kramdown::Document.new(s).to_html

Same edge case, same result. But I don't think it is worth trying to "fix",
since the workaround is clear; there is syntax to protect the <br /> if


PS. Original Markdown's behavior in this regard is irrelevant, because in
fact this is one of the many areas in which Markdown was somewhat
indeterminate. Markdown did *not* always leave HTML alone. Markdown
sometimes used to wrap <div></div> in <p> tags, wrongly (giving
<p><div></div></p>), and I had to add extra post-processing code to detect
and fix these. But kramdown aims to be consistent and predictable (besides
adding power that original Markdown lacks). The chief difference is in
overall approach: Markdown was just doing find-and-replace, but kramdown is
actually ***parsing the HTML structure of the document***.

This fact originally gave me some trouble in switching to kramdown, you may
recall. I was able to hand the document to Markdown at a time when it did
not in fact consist of just text and legal HTML (it also contained ERB
expressions, for example - you may recall that I complained that kramdown
was altering my ERB so that when the ERB parser actually came along, the ERB
itself was wrong). But I have compensated by changing the order of
processing in RubyFrontier so that this is no longer an issue.

My chief remaining issue with kramdown was that it was not merely treating
its HTML parsing as a pass-through: it was disassembling the HTML and
reassembling it, and it was reassembling it differently. Thus for example,
<div manny="moe" hey="ho"> was turning into <div hey="ho" manny="moe">. And
since RubyFrontier was using these sorts of details to pass secret messages
to itself across the stages of processing, kramdown was disrupting those
messages. But Thomas made the order of attributes stable, and so this
problem went away. kramdown *still* does not merely treat its HTML parsing
as a pass-through (for example, it will alter <div hey='ho'> so that the
quotes are double instead of single), and I still think this is wrong, but I
can live with it.

On or about 8/11/10 6:26 AM, thus spake "Shawn Van Ittersum"
<svicalifornia at gmail.com>:

> I thought that there was a requirement for a blank line before all paragraphs?
> This is clearly a contiguous block of HTML, and kramdown should leave HTML
> alone, as Markdown does.
> Matt, what does Markdown do when parsing this input?
> Shawn
> On Tue, 10 Aug 2010 11:31:07 +0200, Thomas Leitner wrote:
>> On 2010-08-09 18:52 -0700 Matt Neuburg wrote:
>>> require 'kramdown'
>>> s = <<END
>>> <div>
>>> </div>
>>> <br />
>>> <div>
>>> </div>
>>> END
>>> puts Kramdown::Document.new(s).to_html
>>> The result is:
>>> <div>
>>> </div>
>>> <p><br /></p>
>>> <div>
>>> </div>

matt neuburg, phd = matt at tidbits.com, http://www.tidbits.com/matt/
pantes anthropoi tou eidenai oregontai phusei
Among the 2007 MacTech Top 25, http://tinyurl.com/2rh4pf
AppleScript: the Definitive Guide, 2nd edition
Take Control of Exploring & Customizing Snow Leopard
RubyFrontier! http://www.apeth.com/RubyFrontierDocs/default.html
TidBITS, Mac news and reviews since 1990, http://www.tidbits.com

More information about the kramdown-users mailing list