[kramdown-users] [ANN] kramdown 0.9.0 released

Eric Sunshine sunshine at sunshineco.com
Fri Jun 25 09:26:40 EDT 2010


Hi Thomas,

On 6/25/2010 2:19 AM, Thomas Leitner wrote:
>> First, what is the intended behavior when feeding kramdown a
>> fully-structured HTML document containing<html>,<head>,<body>?
> It should output it in a hybrid format, i.e. converting everything
> possible to kramdown and leaving the rest as HTML. I just ran a sample
> HTML document through html-to-kramdown-to-html and it worked fine for
> all things except the DOCTYPE - I have put this on my TODO list.

I'm not sure that I understand. When I feed it the HTML input:

   <html>
   <head>
   <title>Title</title>
   </head>
   <body>
   <h1>Header</h1>
   Body <strong>text</strong>.
   </body>
   </html>

The emitted kramdown is:

   <html><head><title>Title</title>
   </head>
   <body markdown="1"># Header

   Body **text**.
   </body>
   </html>

But in the conversion back to HTML, kramdown entities, such as "# 
Header" and "***text***" are not converted to HTML equivalents. In fact, 
the output of kramdown -> HTML is identical to the input (minus the 
markdown="1" attribute):

   <html><head><title>Title</title>
   </head>
   <body># Header

   Body **text**.
   </body>
   </html>

>> C:\>kramdown test.kd>  test.html
>> c:/ruby/lib/ruby/gems/1.9.1/gems/kramdown-0.9.0/lib/kramdown/parser/kramdown.rb:206:in
>> `check': incompatible encoding regexp match (UTF-8 regexp with IBM437
>> string) (Encoding::CompatibilityError)
> Hmm... I have to look at this, and probably generate some test cases
> for checking encodings under Ruby 1.9. Could you send me the test.kd
> document so that I can dig into it and find the offending regexp?

I narrowed it down to this fragment:

   <p>Fran&#xE7;ois</p>

The equivalent <p>Fran&ccedil;ois</p> is converted to kramdown and back 
to HTML without problem.

>> Third, this is an old HTML document still using<b>bold</b>  elements
>> rather than<strong>...</strong>. The<b>bold</b>  elements were not
>> converted to **bold** Markdown. I think it should be safe to treat
>> <b>  as equivalent to<strong>  for conversion purposes.
> Yeah, I thought about this... but decided against it, can't remember
> why. But it should probably be okay converting<b>  and<i>  to<strong>
> and<em>.

If the intention is for perfect fidelity in the HTML -> kramdown -> HTML 
chain, then I can understand not touching <b> and <i> since you could 
not reproduce them in the final HTML. Perhaps an option in the HTML 
parser could control whether <b> and <i> are folded to <strong> and <em>.

-- ES


More information about the kramdown-users mailing list