[kramdown-users] [ANN] kramdown 0.9.0 released

Thomas Leitner t_leitner at gmx.at
Sat Jul 17 05:19:41 EDT 2010


On 2010-06-25 09:26 -0400 Eric Sunshine wrote:
> Hi Thomas,
> 
> On 6/25/2010 2:19 AM, Thomas Leitner wrote:
> >> First, what is the intended behavior when feeding kramdown a
> >> fully-structured HTML document containing<html>,<head>,<body>?
> > It should output it in a hybrid format, i.e. converting everything
> > possible to kramdown and leaving the rest as HTML. I just ran a
> > sample HTML document through html-to-kramdown-to-html and it worked
> > fine for all things except the DOCTYPE - I have put this on my TODO
> > list.
> 
> I'm not sure that I understand. When I feed it the HTML input:
> 
>    <html>
>    <head>
>    <title>Title</title>
>    </head>
>    <body>
>    <h1>Header</h1>
>    Body <strong>text</strong>.
>    </body>
>    </html>
> 
> The emitted kramdown is:
> 
>    <html><head><title>Title</title>
>    </head>
>    <body markdown="1"># Header
> 
>    Body **text**.
>    </body>
>    </html>
> 
> But in the conversion back to HTML, kramdown entities, such as "# 
> Header" and "***text***" are not converted to HTML equivalents. In
> fact, the output of kramdown -> HTML is identical to the input (minus
> the markdown="1" attribute):
> 
>    <html><head><title>Title</title>
>    </head>
>    <body># Header
> 
>    Body **text**.
>    </body>
>    </html>

The problem is that the <body> tag was not in the list of tags that may
contain block level elements. I have fixed this and the use case above
works now!

> >> C:\>kramdown test.kd>  test.html
> >> c:/ruby/lib/ruby/gems/1.9.1/gems/kramdown-0.9.0/lib/kramdown/parser/kramdown.rb:206:in
> >> `check': incompatible encoding regexp match (UTF-8 regexp with
> >> IBM437 string) (Encoding::CompatibilityError)
> > Hmm... I have to look at this, and probably generate some test cases
> > for checking encodings under Ruby 1.9. Could you send me the test.kd
> > document so that I can dig into it and find the offending regexp?
> 
> I narrowed it down to this fragment:
> 
>    <p>Fran&#xE7;ois</p>
> 
> The equivalent <p>Fran&ccedil;ois</p> is converted to kramdown and
> back to HTML without problem.

I have tried to reproduce the problem but wasn't successful. I have
used the following test program (named `tt.rb`):

~~~~~~~~~~~
require 'kramdown'
text = "Fran&#xE7;ois"
text.force_encoding('IBM437')
p [text.encoding, Encoding.default_internal, Encoding.default_external]
puts Kramdown::Document.new(text).to_html
~~~~~~~~~~~

    $ ruby tt.rb
    [#<Encoding:IBM437>, nil, #<Encoding:UTF-8>]
    <p>Fran�ois</p>

where the question mark character is ccedil in the IBM437 encoding.

Could you provide step-by-step instructions of how to reproduce the
error?

> >> Third, this is an old HTML document still using<b>bold</b>
> >> elements rather than<strong>...</strong>. The<b>bold</b>  elements
> >> were not converted to **bold** Markdown. I think it should be safe
> >> to treat <b>  as equivalent to<strong>  for conversion purposes.
> > Yeah, I thought about this... but decided against it, can't remember
> > why. But it should probably be okay converting<b>  and<i>
> > to<strong> and<em>.
> 
> If the intention is for perfect fidelity in the HTML -> kramdown ->
> HTML chain, then I can understand not touching <b> and <i> since you
> could not reproduce them in the final HTML. Perhaps an option in the
> HTML parser could control whether <b> and <i> are folded to <strong>
> and <em>.

Haven't decided on this one, yet.

-- Thomas


More information about the kramdown-users mailing list