[kramdown-users] possible bug: kramdown wrapping <br> in <p>

Shawn Van Ittersum svicalifornia at gmail.com
Wed Aug 11 12:15:02 EDT 2010


I disagree that contiguous is irrelevant.  I also see what kramdown is doing, and it's something that developers can figure out.  However, when you or I use kramdown in some other product, its end users are non-developers, who are not going to figure this out.  Non-developers who are going to copy in a block of HTML from somewhere, expecting it to be untouched as it would be in Markdown, and being very frustrated to see kramdown messing with it.

Contiguous blocks of HTML should be left alone.  kramdown results should not diverge from Markdown results except to add clear benefit, and I fail to see the benefit if this particular divergence.

With regard to your example input of "test": the rule we discussed before was that the first line and any line that followed a blank line would be treated as the start of a new block.  Following this rule, it is fitting to wrap "test" in p tags.  It is not appropriate to apply p tags in the middle of a block of HTML, regardless of the opening and closing of DOM elements within that HTML block.  kramdown should respect the rules of Markdown first.

I still don't understand why kramdown has an HTML parser... HTML tags should simply pass through untouched.

Shawn

On Wed, 11 Aug 2010 08:37:03 -0700, Matt Neuburg wrote:
> Actually, now that I am a firm and enthusiastic convert to kramdown, I
> believe (thanks to Thomas's hint) that I understand kramdown's behavior
> here, and can defend it.
> 
> So, here's how I understand what kramdown is doing in this example. Yes,
> kramdown stays out of a <div>. So if I wrap the *entire* expression in a
> <div>, it is untouched:
> 
> <div>
> <div>
> </div>
> <br />
> <div>
> </div>
> </div>
> 
> [Output is identical to input.]
> 
> But in my actual example, kramdown divides the input into three regions: the
> div pairs, and the stuff between them. The "stuff between them" is not
> inside a div, so it is not protected. Thus it is a target for block-level
> processing.
> 
> <div>
> </div>
> <br /> <-- this is unprotected, it is a "kramdown target"
> <div>
> </div>
> 
> The "contiguous" issue that you raise also misled me at first, but now that
> I understand it, I actually like what kramdown is doing. The fact is that
> "contiguous" is irrelevant! kramdown isolates the line "<br />" as a target
> for processing, not on the basis of what it is touching or not touching, but
> on the basis of the overall *structure* - it is a region unprotected by a
> surrounding <div>. This is actually very good, because it means that, unlike
> Markdown, I don't have to add extra spacing as I embed the document into its
> surrounding templated HTML.
> 
> Here is another way to think of it. What should kramdown do here?
> 
> require 'kramdown'
> s = "test"
> puts Kramdown::Document.new(s).to_html
> 
> I think you will agree that it should surround "test" with <p> tags. And
> that is just what it does. Yet there is no space before or after "test";
> there isn't even a newline. Well, the "<br />" line in my example is exactly
> equivalent to that. It is a single-line processing "document" on its own.
> 
> If I am serious about protecting the "<br />", I have ways to do it. I can
> surround the entire document with a <div> and just isolate the sections to
> be processed with an inner <div markdown="1">. Or I can write:
> 
> <div>
> </div>
> {::nomarkdown}
> <br />
> {:/}
> <div>
> </div>
> 
> If you're going to argue about an edge case, here is an edge case to argue
> about:
> 
> <div>
> </div>
> {::nomarkdown}<br />{:/}
> <div>
> </div>
> 
> I think what kramdown does in that case is a bit more surprising (it *does*
> surround the <br /> with <p> tags), but I can live with it. It is an edge
> case! Edge cases are always difficult. I can envision kramdown applying what
> amounts to two conflicting rules here, and I can easily see how the <p> rule
> might win. The same applies here:
> 
> require 'kramdown'
> s = "{::nomarkdown}<br />{:/}"
> puts Kramdown::Document.new(s).to_html
> 
> Same edge case, same result. But I don't think it is worth trying to "fix",
> since the workaround is clear; there is syntax to protect the <br /> if
> desired.
> 
> m.
> 
> PS. Original Markdown's behavior in this regard is irrelevant, because in
> fact this is one of the many areas in which Markdown was somewhat
> indeterminate. Markdown did *not* always leave HTML alone. Markdown
> sometimes used to wrap <div></div> in <p> tags, wrongly (giving
> <p><div></div></p>), and I had to add extra post-processing code to detect
> and fix these. But kramdown aims to be consistent and predictable (besides
> adding power that original Markdown lacks). The chief difference is in
> overall approach: Markdown was just doing find-and-replace, but kramdown is
> actually ***parsing the HTML structure of the document***.
> 
> This fact originally gave me some trouble in switching to kramdown, you may
> recall. I was able to hand the document to Markdown at a time when it did
> not in fact consist of just text and legal HTML (it also contained ERB
> expressions, for example - you may recall that I complained that kramdown
> was altering my ERB so that when the ERB parser actually came along, the ERB
> itself was wrong). But I have compensated by changing the order of
> processing in RubyFrontier so that this is no longer an issue.
> 
> My chief remaining issue with kramdown was that it was not merely treating
> its HTML parsing as a pass-through: it was disassembling the HTML and
> reassembling it, and it was reassembling it differently. Thus for example,
> <div manny="moe" hey="ho"> was turning into <div hey="ho" manny="moe">. And
> since RubyFrontier was using these sorts of details to pass secret messages
> to itself across the stages of processing, kramdown was disrupting those
> messages. But Thomas made the order of attributes stable, and so this
> problem went away. kramdown *still* does not merely treat its HTML parsing
> as a pass-through (for example, it will alter <div hey='ho'> so that the
> quotes are double instead of single), and I still think this is wrong, but I
> can live with it.
> 
> On or about 8/11/10 6:26 AM, thus spake "Shawn Van Ittersum"
> <svicalifornia at gmail.com>:
> 
>> I thought that there was a requirement for a blank line before all 
>> paragraphs?
>> This is clearly a contiguous block of HTML, and kramdown should leave HTML
>> alone, as Markdown does.
>> 
>> Matt, what does Markdown do when parsing this input?
>> 
>> Shawn
>> 
>> On Tue, 10 Aug 2010 11:31:07 +0200, Thomas Leitner wrote:
>>> On 2010-08-09 18:52 -0700 Matt Neuburg wrote:
>>>> require 'kramdown'
>>>> s = <<END
>>>> <div>
>>>> </div>
>>>> <br />
>>>> <div>
>>>> </div>
>>>> END
>>>> puts Kramdown::Document.new(s).to_html
>>>> 
>>>> The result is:
>>>> 
>>>> <div>
>>>> </div>
>>>> <p><br /></p>
>>>> <div>
>>>> </div>
> 
> -- 
> matt neuburg, phd = matt at tidbits.com, http://www.tidbits.com/matt/
> pantes anthropoi tou eidenai oregontai phusei
> Among the 2007 MacTech Top 25, http://tinyurl.com/2rh4pf
> AppleScript: the Definitive Guide, 2nd edition
> http://www.tidbits.com/matt/default.html#applescriptthings
> Take Control of Exploring & Customizing Snow Leopard
> http://tinyurl.com/kufyy8
> RubyFrontier! http://www.apeth.com/RubyFrontierDocs/default.html
> TidBITS, Mac news and reviews since 1990, http://www.tidbits.com
> 
> 
> 
> _______________________________________________
> kramdown-users mailing list
> kramdown-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/kramdown-users


More information about the kramdown-users mailing list