[kramdown-users] possible bug: kramdown wrapping <br> in <p>

Thomas Leitner t_leitner at gmx.at
Thu Aug 12 06:20:04 EDT 2010


On 2010-08-11 11:39 -0700 Matt Neuburg wrote:
> On or about 8/11/10 9:15 AM, thus spake "Shawn Van Ittersum"
> <svicalifornia at gmail.com>:
> 
> > I still don't understand why kramdown has an HTML parser... HTML
> > tags should simply pass through untouched.
> 
> I am presuming that kramdown has an HTML parser because it needs to
> be able to cycle down through the HTML looking for a <div> with a
> "markdown" attribute. I agree with you, however, in that I'd prefer
> if this parser were read-only, so that the HTML found would be then
> identical in the output to what it was in the input. The source of
> this issue, I think, is that kramdown is using REXML. As I've hinted
> before, I think that if kramdown were willing depend on nokogiri it
> could just jump straight to those divs without parsing its way down
> to them. And nokogiri does not rewrite attribute order.

Yes, for anything than a naive implementation you will need a good HTML
parser.

However, kramdown does not use any library for parsing HTML, neither
REXML nor nokogiri! I once tried to use REXML but was very slow and I
needed to hack it in order to get certain functionality. There were
some many complications that I decided to write the HTML parser myself.
This lead to increased speed and better support for HTML.

Nokogiri does not have the speed problem as REXML but it is also not
suitable for kramdown. For example, consider the following document:

    This is a para

    <div markdown="1">
        This is a code block with a </div>
    </div>

    This is a para

How should nokogiri be able to parse the div without the knowledge of
the kramdown parser?

This and similar cases lead to the creation of the current parser.

-- Thomas


More information about the kramdown-users mailing list