From t_leitner at gmx.at Tue Jun 8 09:55:47 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Tue, 8 Jun 2010 15:55:47 +0200 Subject: [kramdown-users] [ANN] kramdown 0.8.0 released Message-ID: <20100608155547.1de2e03e@noweto> ## About kramdown kramdown (sic, not Kramdown or KramDown, just kramdown) is a *free* GPL-licensed [Ruby](http://www.ruby-lang.org) library for parsing a superset of Markdown. It is completely written in Ruby, supports standard Markdown (with some minor modifications) and various extensions that have been made popular by the [PHP Markdown Extra] package and [Maruku]. Homepage for installation instructions and documentation: http://kramdown.rubyforge.org ## kramdown 0.8.0 released One of the bigger changes in this release is the support for converting HTML tags into native kramdown elements via the new `html_to_native` option. For example, the HTML tag `p` is converted to the native paragraph element instead of a generic HTML tag if this option is set to `true`. This is especially useful for converters that don't handle generic HTML tags (e.g. the LaTeX converter). This conversion is a feature of the new standalone HTML parser which is used by the kramdown parser for parsing HTML tags. Also note that support for the old extension syntax and custom extensions has been dropped as of this release! And the `filter_html` option will be removed in the next release because there exist better facilities for performing this kind of task! ## Changes * Major changes: - New parser for parsing HTML documents - Added the option `html_to_native` (default: `false`) which tells the kramdown parser whether to convert HTML tags to native kramdown elements or to generic HTML elements. * Minor changes: - Table header cells are now represented by their own element type - The element type `:html_text` is not used anymore - it is replaced by the general `:text` element - HTML comments are now converted to LaTeX comments when using the LaTeX converter - The LaTeX converter can now output the contents of HTML `` and `` tags * Bug fixes: - Attributes that have been assigned to the to-be-replaced TOC list are now added correctly on the generated TOC list in the HTML converter - Fixed problem in typographic symbol processing where an entity string instead of an entity element was added - Fixed problem with HTML span parsing: some text was not added to the correct element when the end tag was not found - HTML `code` and `pre` tags are now parsed as raw HTML tags - HTML tags inside raw HTML span tags are now parsed correctly as raw HTML tags - The Rakefile can now be used even if the `rdoc` gem is missing (patch by Ben Armston) - Fixed generation of footnotes in the LaTeX converter (patch by Ben Armston) - Fixed LaTeX converter to support code spans/blocks in footnotes - HTML comments and XML processing instructions are now correctly parsed inside raw HTML tags - HTML script tags are now correctly parsed - Fixed the abbreviation conversion in the LaTeX converter - Empty image links are not output anymore by the LaTeX converter * Deprecation notes: - The old extension syntax and support for custom extensions has been removed. - The `filter_html` option will be removed in the next release. From johnmuhl at gmail.com Tue Jun 8 11:31:21 2010 From: johnmuhl at gmail.com (john muhl) Date: Tue, 8 Jun 2010 10:31:21 -0500 Subject: [kramdown-users] using kramdown in xhtml Message-ID: is it possible to use kramdown in an xhtml document? or someway to say you want numeric entities instead of named entities; e.g. `“` instead of `“`? From elliot.winkler at gmail.com Tue Jun 8 15:29:33 2010 From: elliot.winkler at gmail.com (Elliot Winkler) Date: Tue, 08 Jun 2010 14:29:33 -0500 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: References: Message-ID: <4C0E9A1D.5070902@gmail.com> I'd like to be able to have this option too, actually. The specific use case for me is in generating an XML feed for my blog -- since XML only allows certain named entities, I'm basically forced to patch kramdown to use numeric entities instead of named ones. Actually, it might be easier just to always have kramdown use numerics... would you guys be opposed to that? -- Elliot On 6/8/10 10:31 AM, john muhl wrote: > is it possible to use kramdown in an xhtml document? or someway to say > you want numeric entities instead of named entities; e.g. `“` > instead of `“`? > _______________________________________________ > kramdown-users mailing list > kramdown-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/kramdown-users > From matt at tidbits.com Tue Jun 8 16:02:57 2010 From: matt at tidbits.com (Matt Neuburg) Date: Tue, 08 Jun 2010 13:02:57 -0700 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: <4C0E9A1D.5070902@gmail.com> Message-ID: On or about 6/8/10 12:29 PM, thus spake "Elliot Winkler" : > I'd like to be able to have this option too, actually. The specific use > case for me is in generating an XML feed for my blog -- since XML only > allows certain named entities, I'm basically forced to patch kramdown to > use numeric entities instead of named ones. Actually, it might be easier > just to always have kramdown use numerics... would you guys be opposed > to that? > > -- Elliot > > On 6/8/10 10:31 AM, john muhl wrote: >> is it possible to use kramdown in an xhtml document? or someway to say >> you want numeric entities instead of named entities; e.g. `“` >> instead of `“`? I agree that this option sounds desirable. I generally prefer numeric to named entities. Just to be clear, though, named entities are legal in XHTML. So this would be purely an XML matter. Also, the usual approach in a feed is to enclose everything that might be (X)HTML in delimiters. m. -- matt neuburg, phd = matt at tidbits.com, http://www.tidbits.com/matt/ pantes anthropoi tou eidenai oregontai phusei Among the 2007 MacTech Top 25, http://tinyurl.com/2rh4pf AppleScript: the Definitive Guide, 2nd edition http://www.tidbits.com/matt/default.html#applescriptthings Take Control of Exploring & Customizing Snow Leopard http://tinyurl.com/kufyy8 RubyFrontier! http://www.apeth.com/RubyFrontierDocs/default.html TidBITS, Mac news and reviews since 1990, http://www.tidbits.com From t_leitner at gmx.at Wed Jun 9 01:53:21 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Wed, 9 Jun 2010 07:53:21 +0200 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: References: <4C0E9A1D.5070902@gmail.com> Message-ID: <20100609075321.203e6740@noweto> On 2010-06-08 13:02 -0700 Matt Neuburg wrote: > > On 6/8/10 10:31 AM, john muhl wrote: > >> is it possible to use kramdown in an xhtml document? or someway to > >> say you want numeric entities instead of named entities; e.g. > >> `“` instead of `“`? > > I agree that this option sounds desirable. I generally prefer numeric > to named entities. > > Just to be clear, though, named entities are legal in XHTML. So this > would be purely an XML matter. Also, the usual approach in a feed is > to enclose everything that might be (X)HTML in > delimiters. Is there any advantage in using named entities besides being better understandable? I don't have a preference regarding named or numeric entities, so if numeric entities are preferred, I will modfiy the HTML parser to emit numeric entities. -- Thomas From svicalifornia at gmail.com Wed Jun 9 04:52:30 2010 From: svicalifornia at gmail.com (Shawn Van Ittersum) Date: Wed, 9 Jun 2010 18:52:30 +1000 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: <20100609075321.203e6740@noweto> References: <4C0E9A1D.5070902@gmail.com> <20100609075321.203e6740@noweto> Message-ID: <4FB0682E-1B52-4567-A72E-5E14A3E8F4CB@gmail.com> I prefer named entities for readability. Could there be an option to choose? Shawn [sent from mobile phone] On Jun 9, 2010, at 3:53 PM, Thomas Leitner wrote: > On 2010-06-08 13:02 -0700 Matt Neuburg wrote: > >>> On 6/8/10 10:31 AM, john muhl wrote: >>>> is it possible to use kramdown in an xhtml document? or someway to >>>> say you want numeric entities instead of named entities; e.g. >>>> `“` instead of `“`? >> >> I agree that this option sounds desirable. I generally prefer numeric >> to named entities. >> >> Just to be clear, though, named entities are legal in XHTML. So this >> would be purely an XML matter. Also, the usual approach in a feed is >> to enclose everything that might be (X)HTML in >> delimiters. > > Is there any advantage in using named entities besides being better > understandable? I don't have a preference regarding named or numeric > entities, so if numeric entities are preferred, I will modfiy the HTML > parser to emit numeric entities. > > -- Thomas > _______________________________________________ > kramdown-users mailing list > kramdown-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/kramdown-users From johnmuhl at gmail.com Wed Jun 9 15:07:20 2010 From: johnmuhl at gmail.com (john muhl) Date: Wed, 9 Jun 2010 14:07:20 -0500 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: <4FB0682E-1B52-4567-A72E-5E14A3E8F4CB@gmail.com> References: <4C0E9A1D.5070902@gmail.com> <20100609075321.203e6740@noweto> <4FB0682E-1B52-4567-A72E-5E14A3E8F4CB@gmail.com> Message-ID: On Wed, Jun 9, 2010 at 3:52 AM, Shawn Van Ittersum wrote: > I prefer named entities for readability. Could there be an option to choose? i don't think the readability argument is very strong in this case. of course if you're composing html by hand then named entities are preferable but when it's output by an application from markdown input then it doesn't matter to me what comes out the end. p.s. markdown/smartypants.pl outputs numeric entities; probably for the same reason. > On Jun 9, 2010, at 3:53 PM, Thomas Leitner wrote: > >> On 2010-06-08 13:02 -0700 Matt Neuburg wrote: >> >>>> On 6/8/10 10:31 AM, john muhl wrote: >>>>> >>>>> is it possible to use kramdown in an xhtml document? or someway to >>>>> say you want numeric entities instead of named entities; e.g. >>>>> `“` instead of `“`? >>> >>> I agree that this option sounds desirable. I generally prefer numeric >>> to named entities. >>> >>> Just to be clear, though, named entities are legal in XHTML. So this >>> would be purely an XML matter. Also, the usual approach in a feed is >>> to enclose everything that might be (X)HTML in >>> delimiters. >> >> Is there any advantage in using named entities besides being better >> understandable? I don't have a preference regarding named or numeric >> entities, so if numeric entities are preferred, I will modfiy the HTML >> parser to emit numeric entities. >> >> -- Thomas >> _______________________________________________ >> kramdown-users mailing list >> kramdown-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/kramdown-users > > _______________________________________________ > kramdown-users mailing list > kramdown-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/kramdown-users > From matt at tidbits.com Wed Jun 9 15:46:29 2010 From: matt at tidbits.com (Matt Neuburg) Date: Wed, 9 Jun 2010 12:46:29 -0700 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: References: <4C0E9A1D.5070902@gmail.com> <20100609075321.203e6740@noweto> <4FB0682E-1B52-4567-A72E-5E14A3E8F4CB@gmail.com> Message-ID: <1EFAD572-9275-49BA-8D27-3F6C3697707C@tidbits.com> On Jun 9, 2010, at 12:07 PM, john muhl wrote: > i don't think the readability argument is very strong in this case. of > course if you're composing html by hand then named entities are > preferable but when it's output by an application from markdown input > then it doesn't matter to me what comes out the end. > > p.s. markdown/smartypants.pl outputs numeric entities; probably for > the same reason. The main reason numeric entities are preferable is that they are easier to generate and they exist for every character. You just put &#xxx; where xxx is the numeric value of the character. What is the named entity for ☐ (a "ballot box")? There isn't one. I think what I'd really like is the option for actual *characters* to be output. Only two characters must be entityized: ampersand and less-than. But that's all. I'm making UTF-8 encoded Web pages and I'm starting with UTF-8 encoded Markdown / kramdown, so all the characters I'm using are legal as they stand. I don't need them transformed at all. I type my own em-dashes, ellipses, Ancient Greek, etc. The only things I'm not typing myself are the curly quotes and curly apostrophe. I think Smartypants produces entities for its curly quotes etc. only because it doesn't know what encoding I'll be using. But I do know! :) So I'd be happiest if kramdown's Smartypants function allowed me to specify that I just want characters for any transformations it produces. Then I'd be able to read my own XHTML. On the other hand if I were producing output for a non-UTF-8 milieu then everything that isn't ASCII must be entityized. In that case, only numeric entities make sense, because there are no named entities for most of the characters I use. m. From svicalifornia at gmail.com Wed Jun 9 16:46:18 2010 From: svicalifornia at gmail.com (Shawn Van Ittersum) Date: Thu, 10 Jun 2010 06:46:18 +1000 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: <1EFAD572-9275-49BA-8D27-3F6C3697707C@tidbits.com> References: <4C0E9A1D.5070902@gmail.com> <20100609075321.203e6740@noweto> <4FB0682E-1B52-4567-A72E-5E14A3E8F4CB@gmail.com> <1EFAD572-9275-49BA-8D27-3F6C3697707C@tidbits.com> Message-ID: <30D09B8B-A32E-4F48-A6E8-8A1CF696E40B@gmail.com> Obviously a system using named entities should fall back to numeric entities when the named entity for some character doesn't exist. Shawn [sent from mobile phone] On Jun 10, 2010, at 5:46 AM, Matt Neuburg wrote: > > On Jun 9, 2010, at 12:07 PM, john muhl wrote: > >> i don't think the readability argument is very strong in this case. >> of >> course if you're composing html by hand then named entities are >> preferable but when it's output by an application from markdown input >> then it doesn't matter to me what comes out the end. >> >> p.s. markdown/smartypants.pl outputs numeric entities; probably for >> the same reason. > > The main reason numeric entities are preferable is that they are > easier to generate and they exist for every character. You just put > &#xxx; where xxx is the numeric value of the character. What is the > named entity for ☐ (a "ballot box")? There isn't one. > > I think what I'd really like is the option for actual *characters* > to be output. Only two characters must be entityized: ampersand and > less-than. But that's all. I'm making UTF-8 encoded Web pages and > I'm starting with UTF-8 encoded Markdown / kramdown, so all the > characters I'm using are legal as they stand. I don't need them > transformed at all. I type my own em-dashes, ellipses, Ancient > Greek, etc. The only things I'm not typing myself are the curly > quotes and curly apostrophe. I think Smartypants produces entities > for its curly quotes etc. only because it doesn't know what encoding > I'll be using. But I do know! :) So I'd be happiest if kramdown's > Smartypants function allowed me to specify that I just want > characters for any transformations it produces. Then I'd be able to > read my own XHTML. > > On the other hand if I were producing output for a non-UTF-8 milieu > then everything that isn't ASCII must be entityized. In that case, > only numeric entities make sense, because there are no named > entities for most of the characters I use. m. > _______________________________________________ > kramdown-users mailing list > kramdown-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/kramdown-users From t_leitner at gmx.at Thu Jun 10 01:21:18 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Thu, 10 Jun 2010 07:21:18 +0200 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: <4FB0682E-1B52-4567-A72E-5E14A3E8F4CB@gmail.com> References: <4C0E9A1D.5070902@gmail.com> <20100609075321.203e6740@noweto> <4FB0682E-1B52-4567-A72E-5E14A3E8F4CB@gmail.com> Message-ID: <20100610072118.4b332336@noweto> On 2010-06-09 18:52 +1000 Shawn Van Ittersum wrote: > I prefer named entities for readability. Could there be an option to > choose? I have now added an option named `numeric_values` that defaults to `false` and can be used to decide whether entities are output using their name or their numeric value. -- Thomas From t_leitner at gmx.at Thu Jun 10 09:15:38 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Thu, 10 Jun 2010 15:15:38 +0200 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: <1EFAD572-9275-49BA-8D27-3F6C3697707C@tidbits.com> References: <4C0E9A1D.5070902@gmail.com> <20100609075321.203e6740@noweto> <4FB0682E-1B52-4567-A72E-5E14A3E8F4CB@gmail.com> <1EFAD572-9275-49BA-8D27-3F6C3697707C@tidbits.com> Message-ID: <20100610151538.4c106a21@noweto> > I think what I'd really like is the option for actual *characters* to > be output. Only two characters must be entityized: ampersand and > less-than. But that's all. I'm making UTF-8 encoded Web pages and I'm > starting with UTF-8 encoded Markdown / kramdown, so all the > characters I'm using are legal as they stand. I don't need them > transformed at all. I type my own em-dashes, ellipses, Ancient Greek, > etc. The only things I'm not typing myself are the curly quotes and > curly apostrophe. I think Smartypants produces entities for its curly > quotes etc. only because it doesn't know what encoding I'll be using. > But I do know! :) So I'd be happiest if kramdown's Smartypants > function allowed me to specify that I just want characters for any > transformations it produces. Then I'd be able to read my own XHTML. I won't implement this on Ruby 1.8 because of the lack of string encoding support. However, the feature you want is already on my TODO list. It will convert entities, smart quotes and typographic symbols (as handled by the kramdown parser) into their character equivalences on output - but only under Ruby 1.9. One more incentive to switch from 1.8 to 1.9 ;-) > On the other hand if I were producing output for a non-UTF-8 milieu > then everything that isn't ASCII must be entityized. In that case, > only numeric entities make sense, because there are no named entities > for most of the characters I use. m. This won't be done in kramdown. The kramdown parser (as well as the new HTML parser) doesn't convert between encodings or change normal characters to entities. Whatever string you give to kramdown comes out in the same encoding. -- Thomas From johnmuhl at gmail.com Thu Jun 10 10:00:10 2010 From: johnmuhl at gmail.com (john muhl) Date: Thu, 10 Jun 2010 09:00:10 -0500 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: <20100610072118.4b332336@noweto> References: <4C0E9A1D.5070902@gmail.com> <20100609075321.203e6740@noweto> <4FB0682E-1B52-4567-A72E-5E14A3E8F4CB@gmail.com> <20100610072118.4b332336@noweto> Message-ID: On Thu, Jun 10, 2010 at 12:21 AM, Thomas Leitner wrote: > On 2010-06-09 18:52 +1000 Shawn Van Ittersum wrote: > >> I prefer named entities for readability. Could there be an option to >> choose? > > I have now added an option named `numeric_values` that defaults to > `false` and can be used to decide whether entities are output using > their name or their numeric value. thanks a lot. From matt at tidbits.com Thu Jun 10 12:15:12 2010 From: matt at tidbits.com (Matt Neuburg) Date: Thu, 10 Jun 2010 09:15:12 -0700 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: <20100610151538.4c106a21@noweto> Message-ID: On or about 6/10/10 6:15 AM, thus spake "Thomas Leitner" : >> I think what I'd really like is the option for actual *characters* to >> be output. Only two characters must be entityized: ampersand and >> less-than. But that's all. I'm making UTF-8 encoded Web pages and I'm >> starting with UTF-8 encoded Markdown / kramdown, so all the >> characters I'm using are legal as they stand. I don't need them >> transformed at all. I type my own em-dashes, ellipses, Ancient Greek, >> etc. The only things I'm not typing myself are the curly quotes and >> curly apostrophe. I think Smartypants produces entities for its curly >> quotes etc. only because it doesn't know what encoding I'll be using. >> But I do know! :) So I'd be happiest if kramdown's Smartypants >> function allowed me to specify that I just want characters for any >> transformations it produces. Then I'd be able to read my own XHTML. > > I won't implement this on Ruby 1.8 because of the lack of string > encoding support. However, the feature you want is already on my TODO > list. It will convert entities, smart quotes and typographic symbols > (as handled by the kramdown parser) into their character equivalences > on output - but only under Ruby 1.9. One more incentive to switch from > 1.8 to 1.9 ;-) Ouch! :) So I take it that iconv doesn't count as string encoding support? Oh, wait - you mean that you can't count on kramdown *itself* (i.e. literal strings in the kramdown file) being interpreted properly with respect to encoding. TextMate will interpret Ruby files as UTF-8; that's what I use, so I'm used to that, and I rely on the assumption that that's what I *will* use. But someone might run kramdown in some other milieu. I never thought of that. :))) > >> On the other hand if I were producing output for a non-UTF-8 milieu >> then everything that isn't ASCII must be entityized. In that case, >> only numeric entities make sense, because there are no named entities >> for most of the characters I use. m. > > This won't be done in kramdown. The kramdown parser (as well as the new > HTML parser) doesn't convert between encodings or change normal > characters to entities. Whatever string you give to kramdown comes out > in the same encoding. Sounds great. m. -- matt neuburg, phd = matt at tidbits.com, http://www.tidbits.com/matt/ pantes anthropoi tou eidenai oregontai phusei Among the 2007 MacTech Top 25, http://tinyurl.com/2rh4pf AppleScript: the Definitive Guide, 2nd edition http://www.tidbits.com/matt/default.html#applescriptthings Take Control of Exploring & Customizing Snow Leopard http://tinyurl.com/kufyy8 RubyFrontier! http://www.apeth.com/RubyFrontierDocs/default.html TidBITS, Mac news and reviews since 1990, http://www.tidbits.com From svicalifornia at gmail.com Thu Jun 10 15:29:30 2010 From: svicalifornia at gmail.com (Shawn Van Ittersum) Date: Fri, 11 Jun 2010 05:29:30 +1000 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: References: Message-ID: <20100611052930199819.11fd2bf8@gmail.com> On Thu, 10 Jun 2010 09:15:12 -0700, Matt Neuburg wrote: > Ouch! :) So I take it that iconv doesn't count as string encoding support? > Oh, wait - you mean that you can't count on kramdown *itself* (i.e. literal > strings in the kramdown file) being interpreted properly with respect to > encoding. TextMate will interpret Ruby files as UTF-8; that's what I use, so > I'm used to that, and I rely on the assumption that that's what I *will* > use. But someone might run kramdown in some other milieu. I never thought of > that. :))) TextMate's handling of text allows you to save UTF data, but Ruby 1.8 is not aware of UTF or any other encodings. Ruby 1.8 can still use the data, but as strings of bytes, not UTF characters. Operations on UTF strings may return strange results because Ruby 1.8 doesn't know how to handle UTF properly. See this for more info: http://blog.grayproductions.net/articles/bytes_and_characters_in_ruby_18 Shawn From matt at tidbits.com Thu Jun 10 16:17:11 2010 From: matt at tidbits.com (Matt Neuburg) Date: Thu, 10 Jun 2010 13:17:11 -0700 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: <20100611052930199819.11fd2bf8@gmail.com> References: <20100611052930199819.11fd2bf8@gmail.com> Message-ID: <8A1A03B5-3306-4A27-A003-B4E3BB009E2A@tidbits.com> On Jun 10, 2010, at 12:29 PM, Shawn Van Ittersum wrote: > TextMate's handling of text allows you to save UTF data, but Ruby 1.8 is not aware of UTF or any other encodings. Ruby 1.8 can still use the data, but as strings of bytes, not UTF characters. Operations on UTF strings may return strange results because Ruby 1.8 doesn't know how to handle UTF properly. See this for more info: > > http://blog.grayproductions.net/articles/bytes_and_characters_in_ruby_18 > > Shawn Sure, but if you already know the encoding there's no problem. Believe me, I've written LOTS of code that manipulates / generates UTF-8 files in Ruby 1.8. I know plenty about this. m. -------------- next part -------------- An HTML attachment was scrubbed... URL: From svicalifornia at gmail.com Thu Jun 10 16:20:05 2010 From: svicalifornia at gmail.com (Shawn Van Ittersum) Date: Fri, 11 Jun 2010 06:20:05 +1000 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: <8A1A03B5-3306-4A27-A003-B4E3BB009E2A@tidbits.com> References: <20100611052930199819.11fd2bf8@gmail.com> <8A1A03B5-3306-4A27-A003-B4E3BB009E2A@tidbits.com> Message-ID: <20100611062005197039.bda5ea52@gmail.com> OK. :) Shawn On Thu, 10 Jun 2010 13:17:11 -0700, Matt Neuburg wrote: > > On Jun 10, 2010, at 12:29 PM, Shawn Van Ittersum wrote: >> TextMate's handling of text allows you to save UTF data, but Ruby >> 1.8 is not aware of UTF or any other encodings. Ruby 1.8 can still >> use the data, but as strings of bytes, not UTF characters. >> Operations on UTF strings may return strange results because Ruby >> 1.8 doesn't know how to handle UTF properly. See this for more info: >> >> http://blog.grayproductions.net/articles/bytes_and_characters_in_ruby_18 >> >> Shawn > > Sure, but if you already know the encoding there's no problem. > Believe me, I've written LOTS of code that manipulates / generates > UTF-8 files in Ruby 1.8. I know plenty about this. m. > > _______________________________________________ > kramdown-users mailing list > kramdown-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/kramdown-users From t_leitner at gmx.at Fri Jun 11 01:55:31 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Fri, 11 Jun 2010 07:55:31 +0200 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: References: <20100610151538.4c106a21@noweto> Message-ID: <20100611075531.107e1b65@noweto> On Thu, 10 Jun 2010 09:15:12 -0700 Matt Neuburg wrote: > > I won't implement this on Ruby 1.8 because of the lack of string > > encoding support. However, the feature you want is already on my > > TODO list. It will convert entities, smart quotes and typographic > > symbols (as handled by the kramdown parser) into their character > > equivalences on output - but only under Ruby 1.9. One more > > incentive to switch from 1.8 to 1.9 ;-) > > Ouch! :) So I take it that iconv doesn't count as string encoding > support? Oh, wait - you mean that you can't count on kramdown > *itself* (i.e. literal strings in the kramdown file) being > interpreted properly with respect to encoding. TextMate will > interpret Ruby files as UTF-8; that's what I use, so I'm used to > that, and I rely on the assumption that that's what I *will* use. But > someone might run kramdown in some other milieu. I never thought of > that. :))) As far as I know iconv can only transcode strings from one encoding to another, given you know the *source* encoding. But this source encoding is not known in Ruby 1.8. I had several choices regarding encodings: * UTF-8 only: UTF-8 is supported by Ruby 1.8 and Ruby 1.9 with the difference that in Ruby 1.8 one has to assume that the input is UTF-8 and in Ruby 1.9 one knows whether the input is in UTF-8. However, I felt that this would limit the usefulness of kramdown to much, especially in Asia. * Supporting meta data and an encoding attribute: I once considered this but I though that the last approach was still more useful. * Just using whatever the user gives to kramdown and outputting it in the same encoding. This is by far the easiest way, in Ruby 1.8 and 1.9. The only difference between 1.8 and 1.9 is that in 1.9 kramdown knows the encoding whereas in 1.8 kramdown can only assume that the string is ASCII compatible. So basically the user needs to make sure that the string is in the correct encoding before it is given to kramdown. Regarding kramdown itself: all source files have the "encoding: utf-8" tag for compatibility with Ruby 1.9 and all strings should be ASCII compatible. Last but not least: multiple encoding support is currently not really tested... however, nobody filed a bug so I presume that it works correctly :-) -- Thomas From t_leitner at gmx.at Fri Jun 11 03:58:13 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Fri, 11 Jun 2010 09:58:13 +0200 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: References: <4C0E9A1D.5070902@gmail.com> <20100609075321.203e6740@noweto> <4FB0682E-1B52-4567-A72E-5E14A3E8F4CB@gmail.com> <20100610072118.4b332336@noweto> Message-ID: <20100611095813.14000e30@noweto> On Thu, 10 Jun 2010 09:00:10 -0500 john muhl wrote: > On Thu, Jun 10, 2010 at 12:21 AM, Thomas Leitner > wrote: > > On 2010-06-09 18:52 +1000 Shawn Van Ittersum wrote: > > > >> I prefer named entities for readability. Could there be an option > >> to choose? > > > > I have now added an option named `numeric_values` that defaults to > > `false` and can be used to decide whether entities are output using > > their name or their numeric value. I have pushed the latest changes to the github repo for your consumption/consideration :-) The changes include the new `numeric_values` option as well as the conversion of entities to characters under Ruby 1.9 (only done in the HTML converter). -- Thomas From elliot.winkler at gmail.com Fri Jun 11 09:39:23 2010 From: elliot.winkler at gmail.com (Elliot Winkler) Date: Fri, 11 Jun 2010 08:39:23 -0500 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: <20100611075531.107e1b65@noweto> References: <20100610151538.4c106a21@noweto> <20100611075531.107e1b65@noweto> Message-ID: <4C123C8B.1000409@gmail.com> On 6/11/10 12:55 AM, Thomas Leitner wrote: > * Just using whatever the user gives to kramdown and outputting it in > the same encoding. This is by far the easiest way, in Ruby 1.8 and > 1.9. The only difference between 1.8 and 1.9 is that in 1.9 kramdown > knows the encoding whereas in 1.8 kramdown can only assume that the > string is ASCII compatible. > > So basically the user needs to make sure that the string is in the > correct encoding before it is given to kramdown. If there were a "keep your hands off my special characters" option in addition to the numeric entities option, that's how I'd think it should work. Is there a case in which kramdown needs to convert the source text into another encoding? -- Elliot From t_leitner at gmx.at Fri Jun 11 12:22:24 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Fri, 11 Jun 2010 18:22:24 +0200 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: <4C123C8B.1000409@gmail.com> References: <20100610151538.4c106a21@noweto> <20100611075531.107e1b65@noweto> <4C123C8B.1000409@gmail.com> Message-ID: <20100611182224.1739d4a5@noweto> On Fri, 11 Jun 2010 08:39:23 -0500 Elliot Winkler wrote: > On 6/11/10 12:55 AM, Thomas Leitner wrote: > > * Just using whatever the user gives to kramdown and outputting it > > in the same encoding. This is by far the easiest way, in Ruby 1.8 > > and 1.9. The only difference between 1.8 and 1.9 is that in 1.9 > > kramdown knows the encoding whereas in 1.8 kramdown can only assume > > that the string is ASCII compatible. > > > > So basically the user needs to make sure that the string is in > > the correct encoding before it is given to kramdown. > > If there were a "keep your hands off my special characters" option in > addition to the numeric entities option, that's how I'd think it > should work. Is there a case in which kramdown needs to convert the > source text into another encoding? As explained in a previous email on this thread, kramdown doesn't transcode the source text - it just takes it as it is. On Ruby 1.8 this means that kramdown assumes that the source text is ASCII compatible sothat the literal strings and regexps kramdown internally uses can be correctly applied. The situation when running kramdown under Ruby 1.9 is the same as with Ruby 1.8 except that kramdown *knows* the source encoding and can thus convert entities to characters if possible. So kramdown should already be keeping its hands off any special character! If not, please file a bug report with a test case! Or did you mean something else with "keep your hands..."? -- Thomas From michaelfranzl at gmx.at Sat Jun 12 15:28:29 2010 From: michaelfranzl at gmx.at (Michael Franzl) Date: Sat, 12 Jun 2010 21:28:29 +0200 Subject: [kramdown-users] IAL quotes for LaTex? Message-ID: <4C13DFDD.4060104@gmx.at> Would this feature make sense? Source: {: .smallquote } > ghi > > jkl LaTex output: \begin{smallquote} ghi jkl \end{smallquote} A user could then define the "smallquote" LaTex environment with: \newenvironment{smallquote}{\begin{quote}\small}{\end{quote}} Michael From johnmuhl at gmail.com Sat Jun 12 15:51:45 2010 From: johnmuhl at gmail.com (john muhl) Date: Sat, 12 Jun 2010 14:51:45 -0500 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: <20100611095813.14000e30@noweto> References: <4C0E9A1D.5070902@gmail.com> <20100609075321.203e6740@noweto> <4FB0682E-1B52-4567-A72E-5E14A3E8F4CB@gmail.com> <20100610072118.4b332336@noweto> <20100611095813.14000e30@noweto> Message-ID: On Fri, Jun 11, 2010 at 2:58 AM, Thomas Leitner wrote: > I have pushed the latest changes to the github repo for your > consumption/consideration :-) > > The changes include the new `numeric_values` option as well as the > conversion of entities to characters under Ruby 1.9 (only done in the > HTML converter). i think `rsquo` (e.g. in `it's`) is still slipping through as a named entity with `numeric_entities` set to true. From t_leitner at gmx.at Sun Jun 13 02:10:25 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Sun, 13 Jun 2010 08:10:25 +0200 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: References: <4C0E9A1D.5070902@gmail.com> <20100609075321.203e6740@noweto> <4FB0682E-1B52-4567-A72E-5E14A3E8F4CB@gmail.com> <20100610072118.4b332336@noweto> <20100611095813.14000e30@noweto> Message-ID: <20100613081025.3fb5b6a2@noweto> On Sat, 12 Jun 2010 14:51:45 -0500 john muhl wrote: > On Fri, Jun 11, 2010 at 2:58 AM, Thomas Leitner > wrote: > > I have pushed the latest changes to the github repo for your > > consumption/consideration :-) > > > > The changes include the new `numeric_values` option as well as the > > conversion of entities to characters under Ruby 1.9 (only done in > > the HTML converter). > > i think `rsquo` (e.g. in `it's`) is still slipping through as a named > entity with `numeric_entities` set to true. It's working fine on my end (on the latest commit pushed to github): $ ruby -Ilib bin/kramdown --no-numeric_entities It's^D

It’s

$ ruby -Ilib bin/kramdown --numeric_entities It's

It’s

Do you have a special test case where this happens? -- Thomas From t_leitner at gmx.at Sun Jun 13 02:29:18 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Sun, 13 Jun 2010 08:29:18 +0200 Subject: [kramdown-users] IAL quotes for LaTex? In-Reply-To: <4C13DFDD.4060104@gmx.at> References: <4C13DFDD.4060104@gmx.at> Message-ID: <20100613082918.4228e9dd@noweto> On 2010-06-12 21:28 +0200 Michael Franzl wrote: > Would this feature make sense? > > Source: > > {: .smallquote } > > ghi > > > > jkl > > LaTex output: > > \begin{smallquote} > ghi > > jkl > \end{smallquote} > > A user could then define the "smallquote" LaTex environment with: > > \newenvironment{smallquote}{\begin{quote}\small}{\end{quote}} Interesting idea. So basically you want the blockquote enviornment be substituted by another environment based on a class attribute. What if two classes are set on the blockquote: {: .quote1 .quote2} > Some quote Which one should be used? Or should both be used, in the order they are defined? We would probably also have to define a list of possible custom latex "blockquote" environment names so that this feature cannot accidentally be invoked... So, how about this: * New option "quote_envs" for LaTeX converter, which is an array of allowed environment names, defaults to `[]` (i.e. empty array) * If a blockquote does not have a class attribute, just handle it like it is currently done. Otherwise extract the class names in order and do the following for each class name: If the class name is contained in the "quote_envs" array, wrap the block quote contents in a thus named environment, otherwise do nothing. -- Thomas From michaelfranzl at gmx.at Sun Jun 13 02:59:31 2010 From: michaelfranzl at gmx.at (Michael Franzl) Date: Sun, 13 Jun 2010 08:59:31 +0200 Subject: [kramdown-users] IAL quotes for LaTex? In-Reply-To: <20100613082918.4228e9dd@noweto> References: <4C13DFDD.4060104@gmx.at> <20100613082918.4228e9dd@noweto> Message-ID: <4C1481D3.7010207@gmx.at> Thomas Leitner wrote: > What if > two classes are set on the blockquote: > > {: .quote1 .quote2} > > Some quote > > Which one should be used? Or should both be used, in the order they are > defined? I think that several classes would work also. Example: {: .small .bold } > Some Quote would be output to: \begin{small} \begin{bold} Some Quote \end{bold} \end{small} But if that is too complicated, only the first class attribute could be used. > So, how about this: > > * New option "quote_envs" for LaTeX converter, which is an array of > allowed environment names, defaults to `[]` (i.e. empty array) > > * If a blockquote does not have a class attribute, just handle it like > it is currently done. Otherwise extract the class names in order and > do the following for each class name: If the class name is contained > in the "quote_envs" array, wrap the block quote contents in a thus > named environment, otherwise do nothing. Sounds very reasonable! Michael From johnmuhl at gmail.com Sun Jun 13 10:18:57 2010 From: johnmuhl at gmail.com (john muhl) Date: Sun, 13 Jun 2010 09:18:57 -0500 Subject: [kramdown-users] using kramdown in xhtml In-Reply-To: <20100613081025.3fb5b6a2@noweto> References: <4C0E9A1D.5070902@gmail.com> <20100609075321.203e6740@noweto> <4FB0682E-1B52-4567-A72E-5E14A3E8F4CB@gmail.com> <20100610072118.4b332336@noweto> <20100611095813.14000e30@noweto> <20100613081025.3fb5b6a2@noweto> Message-ID: On Sun, Jun 13, 2010 at 1:10 AM, Thomas Leitner wrote: > It's working fine on my end (on the latest commit pushed to github): right. it was a typo...apparently i have a hard time typing numeric_entities :) From t_leitner at gmx.at Sun Jun 13 14:11:21 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Sun, 13 Jun 2010 20:11:21 +0200 Subject: [kramdown-users] RFC - using classes to change the blockquote environment In-Reply-To: <4C1481D3.7010207@gmx.at> References: <4C13DFDD.4060104@gmx.at> <20100613082918.4228e9dd@noweto> <4C1481D3.7010207@gmx.at> Message-ID: <20100613201121.7e340def@noweto> > > So, how about this: > > > > * New option "quote_envs" for LaTeX converter, which is an array of > > allowed environment names, defaults to `[]` (i.e. empty array) > > > > * If a blockquote does not have a class attribute, just handle it > > like it is currently done. Otherwise extract the class names in > > order and do the following for each class name: If the class name > > is contained in the "quote_envs" array, wrap the block quote > > contents in a thus named environment, otherwise do nothing. > > Sounds very reasonable! Okay, then I will implement this in the next days if no one sees any problems with this approach. One more thing: *if* we do it like described above, it is not easily changable in a future version without breaking compatibility. So my last question: Could there be any other useful applications of the classes of a blockquote in the LaTeX converter? Note that the blockquote is currently the only block level element that can easily contain any other block level elements... (list items could probably also be used for such a purpose). -- Thomas From michaelfranzl at gmx.at Mon Jun 14 04:22:01 2010 From: michaelfranzl at gmx.at (Michael Franzl) Date: Mon, 14 Jun 2010 10:22:01 +0200 Subject: [kramdown-users] RFC - using classes to change the blockquote environment In-Reply-To: <20100613201121.7e340def@noweto> References: <4C13DFDD.4060104@gmx.at> <20100613082918.4228e9dd@noweto> <4C1481D3.7010207@gmx.at> <20100613201121.7e340def@noweto> Message-ID: <4C15E6A9.7070004@gmx.at> Thomas Leitner wrote: > Okay, then I will implement this in the next days if no one sees any > problems with this approach. > > One more thing: *if* we do it like described above, it is not easily > changable in a future version without breaking compatibility. So my > last question: Could there be any other useful applications of the > classes of a blockquote in the LaTeX converter? Note that the > blockquote is currently the only block level element that can easily > contain any other block level elements... (list items could probably > also be used for such a purpose). The downside of my suggestion is, that users *must* define the new LaTeX environment, otherwise the file won't compile out of the box. It would require some knowledge. And if the compatibility issue is too risky, there is another possibility. Just add comments into the LaTeX output. The comments could later be changed with optional, custom regexp processing. Example: {: .smallquote } > Some Quote will be converted to \begin{quote} % .smallquote Some Quote \end{quote} % .smallquote and a custom regexp conversion (optional) would be simple: \begin{smallquote} Some Quote \end{smallquote} Michael BTW, is it also possible to convert kramdown {::comments} to LaTeX comments? From t_leitner at gmx.at Mon Jun 14 14:13:19 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Mon, 14 Jun 2010 20:13:19 +0200 Subject: [kramdown-users] RFC - using classes to change the blockquote environment In-Reply-To: <4C15E6A9.7070004@gmx.at> References: <4C13DFDD.4060104@gmx.at> <20100613082918.4228e9dd@noweto> <4C1481D3.7010207@gmx.at> <20100613201121.7e340def@noweto> <4C15E6A9.7070004@gmx.at> Message-ID: <20100614201319.1ca816de@noweto> > And if the compatibility issue is too risky, there is another > possibility. Just add comments into the LaTeX output. The comments > could later be changed with optional, custom regexp processing. > Example: > > {: .smallquote } > > Some Quote > > will be converted to > > \begin{quote} % .smallquote > Some Quote > \end{quote} % .smallquote > > and a custom regexp conversion (optional) would be simple: > > \begin{smallquote} > Some Quote > \end{smallquote} > Hmm... this looks good! We wouldn't need another option and the user could decide what he wants to do with the attributes set on a blockquote. This could probably be generalized to other block level elements. If you don't mind writing a post-processor for your case I would actually like to use this approach since it seems to be even more general! Thanks for the great idea!! > BTW, is it also possible to convert kramdown {::comments} to LaTeX > comments? Naturally, I have just implemented this and pushed it to the github repo - please test it! -- Thomas From t_leitner at gmx.at Tue Jun 15 16:27:00 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Tue, 15 Jun 2010 22:27:00 +0200 Subject: [kramdown-users] New converter: kramdown Message-ID: <20100615222700.41cc2411@noweto> Hi everybody, I have just pushed the latest changes to the github repo. The biggest change is the addition of a kramdown converter! This means that we can now not only do kramdown-to-html but also html-to-kramdown! :-) The test suite has been enhanced to automatically produce text-to-kramdown-to-html tests which were very helpful in making the converter: there were about 70 failures after the initial draft implementation, now there are zero. Please test and report any errors! Best regards, Thomas From michaelfranzl at gmx.at Mon Jun 21 02:41:40 2010 From: michaelfranzl at gmx.at (Michael Franzl) Date: Mon, 21 Jun 2010 08:41:40 +0200 Subject: [kramdown-users] German LaTeX output Message-ID: <4C1F09A4.2040909@gmx.at> Would it be possible to optionally configure kramdown for German LaTeX output? A LaTeX document adds German specific syntax when instructed with \usepackage[german]{babel} It is mostly the quotes that need to be fixed: "Kramdown double quotes" are currently transformed into ``English double quotes'' but need to be `"German double quotes'" The same is for single quotes: 'Kramdown single quotes' are currently transformed into `English single quotes' but need to be \qlq{}German single quotes\qrq{} Thanks, Michael From michaelfranzl at gmx.at Mon Jun 21 03:05:58 2010 From: michaelfranzl at gmx.at (Michael Franzl) Date: Mon, 21 Jun 2010 09:05:58 +0200 Subject: [kramdown-users] RFC - using classes to change the blockquote environment In-Reply-To: <20100614201319.1ca816de@noweto> References: <4C13DFDD.4060104@gmx.at> <20100613082918.4228e9dd@noweto> <4C1481D3.7010207@gmx.at> <20100613201121.7e340def@noweto> <4C15E6A9.7070004@gmx.at> <20100614201319.1ca816de@noweto> Message-ID: <4C1F0F56.8090906@gmx.at> Thomas Leitner wrote: >> {: .smallquote } >> > Some Quote >> >> will be converted to >> >> \begin{quote} % .smallquote >> Some Quote >> \end{quote} % .smallquote > > If you don't mind writing a post-processor for your case I would > actually like to use this approach since it seems to be even more > general! Thanks for the great idea!! I have cloned from the github repository and ran setup.rb, but this feature seems not to be working. By the way, how can I uninstall kramdown when installed by setup.rb? >> BTW, is it also possible to convert kramdown {::comments} to LaTeX >> comments? > > Naturally, I have just implemented this and pushed it to the github > repo - please test it! This works. Thank you! Michael From t_leitner at gmx.at Mon Jun 21 05:42:09 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Mon, 21 Jun 2010 11:42:09 +0200 Subject: [kramdown-users] German LaTeX output In-Reply-To: <4C1F09A4.2040909@gmx.at> References: <4C1F09A4.2040909@gmx.at> Message-ID: <20100621114209.6f409b30@noweto> On 2010-06-21 08:41 +0200 Michael Franzl wrote: > Would it be possible to optionally configure kramdown for German > LaTeX output? > > A LaTeX document adds German specific syntax when instructed with > > \usepackage[german]{babel} Just copy the included `document.latex` template from the `data/kramdown` directory of the kramdown distribution to a new file in your local directory and change it to fit your needs, ie. include the above line! Then you can use the `template` option to use your modified template. > It is mostly the quotes that need to be fixed: > > "Kramdown double quotes" > > are currently transformed into > > ``English double quotes'' > > but need to be > > `"German double quotes'" Hmm... this is currently hard-coded. I assume that there are rules for each language, not only English and German, on how the quotes should be transformed?! If that is the case, the easiest way would probably be to add a new option `latex_quotes` which specifies how opening/closing double and single quotes are transformed - would this fit your needs? -- Thomas From t_leitner at gmx.at Mon Jun 21 10:35:06 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Mon, 21 Jun 2010 16:35:06 +0200 Subject: [kramdown-users] RFC - using classes to change the blockquote environment In-Reply-To: <4C1F0F56.8090906@gmx.at> References: <4C13DFDD.4060104@gmx.at> <20100613082918.4228e9dd@noweto> <4C1481D3.7010207@gmx.at> <20100613201121.7e340def@noweto> <4C15E6A9.7070004@gmx.at> <20100614201319.1ca816de@noweto> <4C1F0F56.8090906@gmx.at> Message-ID: <20100621163506.30efd9ef@noweto> On 2010-06-21 09:05 +0200 Michael Franzl wrote: > Thomas Leitner wrote: > >> {: .smallquote } > >> > Some Quote > >> > >> will be converted to > >> > >> \begin{quote} % .smallquote > >> Some Quote > >> \end{quote} % .smallquote > > > > If you don't mind writing a post-processor for your case I would > > actually like to use this approach since it seems to be even more > > general! Thanks for the great idea!! > > I have cloned from the github repository and ran setup.rb, but this > feature seems not to be working. This was not implemented till now since I waited if someone else would comment on it. However, I have implemented it now and the latest changes are on github. There are two changes regarding the implementation: the attributes are output in HTML key-value form and the behaviour is not only added to blockquotes, but also lists, tables, certain code blocks and math blocks. -- Thomas From michaelfranzl at gmx.at Mon Jun 21 13:22:21 2010 From: michaelfranzl at gmx.at (Michael Franzl) Date: Mon, 21 Jun 2010 19:22:21 +0200 Subject: [kramdown-users] German LaTeX output In-Reply-To: <20100621114209.6f409b30@noweto> References: <4C1F09A4.2040909@gmx.at> <20100621114209.6f409b30@noweto> Message-ID: <4C1F9FCD.6020606@gmx.at> Thomas Leitner wrote: > Just copy the included `document.latex` template from the > `data/kramdown` directory of the kramdown distribution to a new file in > your local directory and change it to fit your needs, ie. include the > above line! Then you can use the `template` option to use your modified > template. Thank you for the hint! > Hmm... this is currently hard-coded. I assume that there are rules for > each language, not only English and German, on how the quotes should > be transformed?! If that is the case, the easiest way would probably be > to add a new option `latex_quotes` which specifies how > opening/closing double and single quotes are transformed - would this > fit your needs? Yes. Only as long as it is generally useful also for others. In France/Switzerland they might need guillements instead of quotation marks. Here are a few infos: http://en.wikibooks.org/wiki/LaTeX/Internationalization Right now I have to do the following to be able to print correct German punctuation: sed -i 's|``|"`|g' file.tex sed -i "s|''|\"'|g" file.tex perl -i -pe "s|([^\"])\`(.*?)\'|\1\\\glq{}\2\\\grq{}|g" file.tex So it can be done also without kramdown, but I try to make it as 'beautiful' and straightforward as possible. Don't know if that is worth implementing though. Michael From t_leitner at gmx.at Wed Jun 23 02:51:18 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Wed, 23 Jun 2010 08:51:18 +0200 Subject: [kramdown-users] [ANN] kramdown 0.9.0 released Message-ID: <20100623085118.34a4b9ac@noweto> ## About kramdown kramdown (sic, not Kramdown or KramDown, just kramdown) is a *free* GPL-licensed [Ruby](http://www.ruby-lang.org) library for parsing a superset of Markdown. It is completely written in Ruby, supports standard Markdown (with some minor modifications) and various extensions that have been made popular by the [PHP Markdown Extra] package and [Maruku]. Homepage for installation instructions and documentation: http://kramdown.rubyforge.org ## kramdown 0.9.0 released The biggest change in this release is the addition of a kramdown converter. This converter together with the HTML parser enables one to convert an HTML document into a kramdown document. Apart from that there are many other small changes and bug fixes, a full list of which you find below. ## Changes * Major changes: - New *kramdown converter* that converts an element tree into a kramdown document * Minor changes: - Added option `numeric_entities` that defines whether entities are output using their names or their numeric values - Added option `toc_depth` for specifying how many header levels to include in the table of contents (patch by Alex Marandon) - Ruby 1.9 only: The HTML converter now always tries to convert entities to their character equivalences if possible - Change in HTML parser: conversion of `pre` and `code` elements to their native counterpart is only done if they contain no entities (under Ruby 1.9 entities are converted to characters before this check if possible) - The comment extension now produces comment elements that are used by the converters - IALs can now also be assigned to definitions (i.e. `dd` elements) - Image links may now be specified without alternative text (requested by Rune Myrland, fixes RF#28292) - The HTML parser gained the ability to convert conforming `span` and `div` elements to `math` elements - The LaTeX converter now outputs the element attributes as LaTeX comment for some elements (blockquotes, lists and math environments; requested by Michael Franzl) * Bug fixes: - Fixed problem with list item IALs: the IAL was not recognized when first element was a code block - Fixed ri documentation error on gem installation (patch by Alex Marandon) - Math content is now correctly escaped when using the HTML converter - Fixed html-to-native conversion of tables to only convert conforming tables * Deprecation notes: - The `filter_html` option has been removed. - The method `Kramdown::Converter::Html#options_for_element` has been renamed to `html_attributes` -- using the old name is deprecated and the alias will be removed in the next release From sunshine at sunshineco.com Wed Jun 23 05:55:19 2010 From: sunshine at sunshineco.com (Eric Sunshine) Date: Wed, 23 Jun 2010 05:55:19 -0400 Subject: [kramdown-users] [ANN] kramdown 0.9.0 released In-Reply-To: <20100623085118.34a4b9ac@noweto> References: <20100623085118.34a4b9ac@noweto> Message-ID: <4C21DA07.8000508@sunshineco.com> Hi Thomas, On 6/23/2010 2:51 AM, Thomas Leitner wrote: > The biggest change in this release is the addition of a kramdown > converter. This converter together with the HTML parser enables one to > convert an HTML document into a kramdown document. Very nicely done. I ran some tests on this feature and the results were very favorable, though there were a few issues worth reporting. First, what is the intended behavior when feeding kramdown a fully-structured HTML document containing , , ? In my tests, upon converting to kramdown, a 'markdown="1"' attribute was added to , however, this was ignored when converting the document back to HTML. Even when adding 'markdown="1"' manually to the parent node, conversion back to HTML failed (that is, no Markdown processing was performed at all inside the body). Second, once I stripped the and boilerplate from the document, conversion on Windows from HTML to kramdown succeeded, but conversion back to HTML failed with this exception: C:\>kramdown test.kd > test.html c:/ruby/lib/ruby/gems/1.9.1/gems/kramdown-0.9.0/lib/kramdown/parser/kramdown.rb:206:in `check': incompatible encoding regexp match (UTF-8 regexp with IBM437 string) (Encoding::CompatibilityError) Invoking 'set LANG=en_US.UTF-8' at the Windows command prompt resolved this issue. Note that the original HTML contained &#xHH; entity references for copyright, elipses, etc. Third, this is an old HTML document still using bold elements rather than .... The bold elements were not converted to **bold** Markdown. I think it should be safe to treat as equivalent to for conversion purposes. Fourth, I ran into a problem with a stand-alone element being consumed by a subsequent paragraph. For instance, given HTML input: foo

bar

Conversion to kramdown produced: ![foo]() bar And conversion back to HTML resulted in the becoming a child of the

:

foo bar

-- ES From michaelfranzl at gmx.at Thu Jun 24 04:35:22 2010 From: michaelfranzl at gmx.at (Michael Franzl) Date: Thu, 24 Jun 2010 10:35:22 +0200 Subject: [kramdown-users] [ANN] kramdown 0.9.0 released In-Reply-To: <20100623085118.34a4b9ac@noweto> References: <20100623085118.34a4b9ac@noweto> Message-ID: <4C2318CA.8010209@gmx.at> Thomas Leitner wrote: > - The LaTeX converter now outputs the element attributes as LaTeX > comment for some elements (blockquotes, lists and math > environments; requested by Michael Franzl) In LaTeX, the closing tag of an environment needs to have the same name as the opening tag. So it is not enough to add the comment only at the beginning. Right now kramdown produces: \begin{quote} % class="small" a quote \end{quote} But it should be: \begin{quote} % class="small" a quote \end{quote} % class="small" Otherwise it is not possible to transform it via regular expressions: \begin{small} a quote \end{small} Thanks, Michael From t_leitner at gmx.at Fri Jun 25 02:19:23 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Fri, 25 Jun 2010 08:19:23 +0200 Subject: [kramdown-users] [ANN] kramdown 0.9.0 released In-Reply-To: <4C21DA07.8000508@sunshineco.com> References: <20100623085118.34a4b9ac@noweto> <4C21DA07.8000508@sunshineco.com> Message-ID: <20100625081923.7549f051@noweto> > First, what is the intended behavior when feeding kramdown a > fully-structured HTML document containing , , ? In > my tests, upon converting to kramdown, a 'markdown="1"' attribute was > added to , however, this was ignored when converting the > document back to HTML. Even when adding 'markdown="1"' manually to > the parent node, conversion back to HTML failed (that is, no > Markdown processing was performed at all inside the body). It should output it in a hybrid format, i.e. converting everything possible to kramdown and leaving the rest as HTML. I just ran a sample HTML document through html-to-kramdown-to-html and it worked fine for all things except the DOCTYPE - I have put this on my TODO list. > Second, once I stripped the and boilerplate from the > document, conversion on Windows from HTML to kramdown succeeded, but > conversion back to HTML failed with this exception: > > C:\>kramdown test.kd > test.html > c:/ruby/lib/ruby/gems/1.9.1/gems/kramdown-0.9.0/lib/kramdown/parser/kramdown.rb:206:in > `check': incompatible encoding regexp match (UTF-8 regexp with IBM437 > string) (Encoding::CompatibilityError) Hmm... I have to look at this, and probably generate some test cases for checking encodings under Ruby 1.9. Could you send me the test.kd document so that I can dig into it and find the offending regexp? > Third, this is an old HTML document still using bold elements > rather than .... The bold elements were not > converted to **bold** Markdown. I think it should be safe to treat > as equivalent to for conversion purposes. Yeah, I thought about this... but decided against it, can't remember why. But it should probably be okay converting and to and . > Fourth, I ran into a problem with a stand-alone element being > consumed by a subsequent paragraph. For instance, given HTML input: > > foo >

bar

> > Conversion to kramdown produced: > > ![foo]() > bar > > And conversion back to HTML resulted in the becoming a child > of the

: > >

foo > bar

The reason for this is that kramdown always treats tags as span level elements and never as block elements like in your example above. So an image will always be wrapped in a paragraph! Also that consecutive text and span tags inside a flow HTML element like
will conceptually be wrapped in a paragraph! -- Thomas From t_leitner at gmx.at Fri Jun 25 02:24:28 2010 From: t_leitner at gmx.at (Thomas Leitner) Date: Fri, 25 Jun 2010 08:24:28 +0200 Subject: [kramdown-users] [ANN] kramdown 0.9.0 released In-Reply-To: <4C2318CA.8010209@gmx.at> References: <20100623085118.34a4b9ac@noweto> <4C2318CA.8010209@gmx.at> Message-ID: <20100625082428.4aab8458@noweto> On 2010-06-24 10:35 +0200 Michael Franzl wrote: > Thomas Leitner wrote: > > - The LaTeX converter now outputs the element attributes as LaTeX > > comment for some elements (blockquotes, lists and math > > environments; requested by Michael Franzl) > > In LaTeX, the closing tag of an environment needs to have the same > name as the opening tag. So it is not enough to add the comment only > at the beginning. Right now kramdown produces: > > \begin{quote} % class="small" > a quote > \end{quote} > > But it should be: > > \begin{quote} % class="small" > a quote > \end{quote} % class="small" Will change that! -- Thomas From sunshine at sunshineco.com Fri Jun 25 09:26:40 2010 From: sunshine at sunshineco.com (Eric Sunshine) Date: Fri, 25 Jun 2010 09:26:40 -0400 Subject: [kramdown-users] [ANN] kramdown 0.9.0 released In-Reply-To: <20100625081923.7549f051@noweto> References: <20100623085118.34a4b9ac@noweto> <4C21DA07.8000508@sunshineco.com> <20100625081923.7549f051@noweto> Message-ID: <4C24AE90.3090901@sunshineco.com> Hi Thomas, On 6/25/2010 2:19 AM, Thomas Leitner wrote: >> First, what is the intended behavior when feeding kramdown a >> fully-structured HTML document containing,,? > It should output it in a hybrid format, i.e. converting everything > possible to kramdown and leaving the rest as HTML. I just ran a sample > HTML document through html-to-kramdown-to-html and it worked fine for > all things except the DOCTYPE - I have put this on my TODO list. I'm not sure that I understand. When I feed it the HTML input: Title

Header

Body text. The emitted kramdown is: Title # Header Body **text**. But in the conversion back to HTML, kramdown entities, such as "# Header" and "***text***" are not converted to HTML equivalents. In fact, the output of kramdown -> HTML is identical to the input (minus the markdown="1" attribute): Title # Header Body **text**. >> C:\>kramdown test.kd> test.html >> c:/ruby/lib/ruby/gems/1.9.1/gems/kramdown-0.9.0/lib/kramdown/parser/kramdown.rb:206:in >> `check': incompatible encoding regexp match (UTF-8 regexp with IBM437 >> string) (Encoding::CompatibilityError) > Hmm... I have to look at this, and probably generate some test cases > for checking encodings under Ruby 1.9. Could you send me the test.kd > document so that I can dig into it and find the offending regexp? I narrowed it down to this fragment:

François

The equivalent

François

is converted to kramdown and back to HTML without problem. >> Third, this is an old HTML document still usingbold elements >> rather than.... Thebold elements were not >> converted to **bold** Markdown. I think it should be safe to treat >> as equivalent to for conversion purposes. > Yeah, I thought about this... but decided against it, can't remember > why. But it should probably be okay converting and to > and. If the intention is for perfect fidelity in the HTML -> kramdown -> HTML chain, then I can understand not touching and since you could not reproduce them in the final HTML. Perhaps an option in the HTML parser could control whether and are folded to and . -- ES