[kramdown-users] RFC: Lazy syntax for paragraphs, blockquotes and lists

Eric Sunshine sunshine at sunshineco.com
Wed Sep 29 01:06:08 EDT 2010


Hi Thomas,

Sorry for the delayed response. I only just found time to read this 
thread in its entirety. I don't have have anything to add but wanted to 
thank you for the thorough write-up.

-- ES


On 9/3/2010 5:43 AM, Thomas Leitner wrote:
> Hi everybody,
>
> it was requested that kramdown (a Markdown parser in Ruby, see
> <http://kramdown.rubyforge.org>) supports the lazy syntax of Markdown.
> So I sat down, thought about it, skimmed through the Markdown ML on
> issues regarding lazy indentation as done with Markdown and now I have
> some rough idea on how to do this in kramdown.
>
> First: I'd like to say that there is no way to satisfy everyone. Lazy
> indentation has some pros and cons and we have to find a middle ground!
>
> Second: This is a rather long mail but worth the read, especially if
> you want to influence how kramdown implements the lazy syntax!
>
> Third: I have cross-posted this email to the Markdown ML because it
> provides a nice explanation of why the behaviour of the lazy syntax in
> Markdown.pl might be as it is.
>
>
>
> In the beginning there was...
> =============================
>
> Markdown was created by John Gruber because he wanted a nice text
> format that is inspired by how email messages are written. There is a
> requirement that lines in plain text email message should not be longer
> than 78 characters and therefore many mail (transport) programs
> hard-wrap text to a specific line length.
>
> I think that this is the reason why we have lazy indentation or
> generally long line wrapping in Markdown. If we would say that each
> paragraph must be one long line, there would obviously be problems when
> messages get automatically wrapped by (email) programs. Therefore
> Markdown allows paragraphs to continue on the following lines.
>
>
>
> The Markdown syntax
> ===================
>
> This is just a short summary of how and in which elements Markdown
> supports lazy indentation (taken more or less from the [Markdown Syntax
> Page][1]).
>
>
> ## Blockquotes
>
> A blockquote starts with a `>` character. All following lines with a
> `>` character belong to the same blockquote. However, you may be lazy
> and put the `>` character only before the first line of a blockquote:
>
>      >  This is a normal
>      paragraph in a blockquote.
>
>      >  The blockquote is continued here!!!
>
> A blank line between two blockquotes does *not* separate the
> blockquotes, it's just one large blockquote.
>
>
> ## Lists
>
> As with blockquotes, the content of a list item must not be indented
> correctly. For example:
>
>      * This is a normal
>      paragraph in a list item.
>
> This is even allowed for other paragraphs in the list:
>
>      *   This is a paragraph.
>
>          This is a paragraph
>      with a lazy indentation.
>
>
>
> Problems/Ambiguities
> ====================
>
> The lazy indentation syntax provides Markdown users with many chances
> to get some unexpected output... Additionally, since both lists and
> blockquotes support lazy indentation it is sometimes not clear what the
> outcome is when those two elements are combined.
>
> Here are some issues taken from the Markdown ML.
>
> PA1. First example:
>
>      * this is list item
>      >  * this item is in a block quote
>      more block quoting?
>
> PA2. Second example:
>
>        *>  list item with quoting
>      more text here
>
>        *>  list item with quoting
>      more text here
>        * another list item
>
> PA3. Third example:
>
>      >  >  I wrote something
>      >  you replied
>      and now here is my reply to your reply.
>
> PA4. Fourth example:
>
>      >  * foo
>      >  >  bar
>      >  >  baz
>
> The above examples can be interpreted in one way or another. This means
> that we won't find a solution that satisfies all needs. We can only try
> to find a solution that is based on a general rule which feels natural
> to the user and does what most people would expect.
>
> Michel Fortin wrote [this][2] on the Markdown ML regarding the lazy
> syntax:
>
>> Basically, I'd eliminate any "half-lazy" syntax were you can be lazy
>> about list item indentation while not being lazy on blockquote
>> markers. This just creates confusion; syntax markers shouldn't be
>> allowed to be lazy.
>>
>> Removing half-lazy things would also fix a surprising issue with
>> blockquotes:
>>
>>       >  foo
>>       >  >  bar
>>       >  baz
>>
>> This would be seen as a blockquote containing a "foo" paragraph, a
>> nested "bar" blockquote and a "baz" paragraph, instead of the
>> completly counter-intuitive output produced today. To make "baz"
>> part of the nested blockquote, you would either go the explicit route:
>>
>>       >  foo
>>       >  >  bar
>>       >  >  baz
>>
>> or the lazy route:
>>
>>       >  foo
>>       >  >  bar
>>       baz
>>
>> but not something in between.
>
>
>
> kramdown "lazy" syntax
> ======================
>
> I thought about how I would like things to work, considering all of the
> above and I came to the following solution. Note, however, that I do
> *not* recommend using the lazy syntax when writing a document!
>
> Since the problem of the lazy syntax arises from the problem of line
> wrapping, why not just use that to specify how the lazy syntax should
> work?
>
> Before we go into details consider the following: The kramdown syntax
> page lists the following structural block level elements:
>
> * Blank lines
> * Paragraphs
> * Headers
> * Blockquotes
> * Code blocks
> * Lists (incl. footnote definitions)
> * Tables
> * Horizontal rules
> * Math blocks
> * HTML blocks
>
> We can leave out all elements which do not inherently support line
> wrapping, namely blank lines (no text to wrap), code blocks (should be
> output as is), tables, horizontal rules, math blocks (same as with code
> blocks) and HTML blocks.
>
> Headers can also be left out assuming that a header text is not long
> enough to trigger line wrapping (this has also been discussed on the
> Markdown ML and I think that the consensus was that longer header texts
> should be written directly in HTML).
>
> This leaves us with three elements: paragraphs, blockquotes and lists.
> However, blockquotes and lists are just "wrappers" around paragraphs
> and therefore the only element that really contains any text in a
> kramdown (Markdown) document is a paragraph (I also count the compact
> list text that is not wrapped in `<p>` tags as a paragraph because
> conceptually it is one). So when we know how long lines in paragraphs
> are wrapped, the behaviour of long lines in blockquotes and lists are
> easy to derive.
>
>
> ## Requirements
>
> There are two requirements regarding line wrapping and "lazy" syntax:
>
> * Line wrapping may be done like it is done by dumb editors, ie. a long
>    line is split on whitespace before the maximal line length and the
>    text continues on the next line (ie. *no* blank line in-between).
>    This means that the additional lines belong to the line (and
>    therefore a certain paragraph) to which line wrapping has been
>    applied!
>
> * It must be possible to blockquote a kramdown document (which
>    possible contains lazy lines) and preserve the structure of the quoted
>    document.
>
>
> ## Paragraphs
>
> So how to lazy wrap simple paragraphs? This is the easiest one since
> the [Markdown syntax description][1] already tells us how: just
> hard-wrap your lines and separate multiple consecutive paragraphs with
> one or more blank lines.
>
> For example:
>
>      This is one long long long long long long long long long line
>
> gets wrapped to:
>
>      This is one long long
>      long long long long
>      long long long line
>
> So the paragraph rule as stated on the [Markdown syntax page][1] is
> actually needed to support being lazy when writing paragraphs - and to
> support programs that hard-wrap long lines.
>
>
> ## Blockquotes
>
> By following the two requirements as stated above, it is clear how the
> lazy syntax for blockquotes has to look like.
>
> The following examples modify this document:
>
>      This is one long long long long long long long long long line
>
> BQ1. After blockquoting:
>
>      >  This is one long long long long long long long long long line
>
> BQ2. After line wrapping and blockquoting:
>
>      >  This is one long long
>      >  long long long long
>      >  long long long line
>
> BQ3. After line wrapping, blockquoting and blockquoting:
>
>      >  >  This is one long long
>      >  >  long long long long
>      >  >  long long long line
>
> BG4. After blockquoting and line wrapping:
>
>      >  This is one long long
>      long long long long
>      long long long line
>
> BG5. After blockquoting, line wrapping and blockquoting:
>
>      >  >  This is one long long
>      >  long long long long
>      >  long long long line
>
> As can be seen in the last example, the "half-lazy" syntax described by
> Michel Fortin arises naturally when blockquoting and line wrapping are
> combined in a certain way. However, I think it should not make any
> difference whether a document is first line-wrapped and then
> blockquoted or the other way around. Therefore I would allow this
> "half-lazy" syntax.
>
> What happens if line wrapping is done several times?
>
> BG5 with additional line wrapping:
>
>      >  >  This is one
>      long long
>      >  long long
>      long long
>      >  long long
>      long line
>
> This looks a bit scary, I admit, but it is still one paragraph embedded
> in two blockquotes... I don't suggest that anyone writes his documents
> in this way though...
>
> Due to line wrapping we now also have to require the use of blank lines
> between a blockquote and a following paragraph. Otherwise it is
> impossible to know whether example BQ4 contains just a blockquote or a
> blockquote followed by a paragraph.
>
> I don't think that requiring a blank line is a burden on writers. If
> you look through the kramdown or the Markdown ML, you will see that in
> nearly all emails quoted text is separated from the response by at
> least one blank line.
>
> Note that kramdown would generate two separate blockquotes if they are
> separated by a blank line (Markdown.pl merges the blockquotes):
>
>      >  This is one blockquote with
>      a long line.
>
>      >  This is another blockquote
>      with a long line.
>
> If you run the example BQ1 to BQ5 through Markdown.pl, you will find
> that it produces the expected output (as defined above). This is no
> coincidence, I think, since Markdown.pl has been designed with email
> messages in mind. However, the requirements as stated above
> haven't been written down anywhere (at least I don't know of it) and
> with those the behaviour of Markdown.pl is easily explained.
>
>
> ## Lists
>
> The content of lists, footnote definitions and all other content
> (except code blocks) that is defined via indentation, also has to
> support the lazy syntax.
>
> We will start with this document:
>
>      *   This is one long long long long long long long long long line
>
>          This is one long long long long long long long long long line
>      *   Another very very very very very very very very long line
>
> LI1. After line wrapping:
>
>      *   This is one long long
>      long long long long long
>      long long line
>
>          This is one long long
>      long long long long long
>      long long line
>      *   Another very very very
>      very very very very very
>      long line
>
> So line wrapping inside lists can also be explained in terms of the
> requirements. And the line wrapping behaviour is identical to that of
> Markdown.pl.
>
>
>
> How to interpret the stated problems/ambiguities
> ================================================
>
> After having specified how the kramdown lazy syntax would work, here is
> how the initially given problems would be interpreted:
>
> PA1. A list with one item, followed by a blockquote containing a list
>       with one item. Markdown.pl interprets it in more or less the same
>       way but using invalid HTML.
>
> PA2. A list with three items: the first and the second item contain a
>       blockquote with a paragraph, the third item contains just text.
>       Again, Markdown.pl shows the same behaviour.
>
> PA3. Two nested blockquotes containing one paragraph with all the text.
>       Markdown.pl shows the same behaviour.
>
> PA4. A blockquote containing a) a list with one item and b) a
>       blockquote with a paragraph containing the text "bar baz".
>       Markdown.pl's behaviour differs - it puts the inner blockquote
>       inside the list item - again we have to disregard the invalid HTML
>       it produces.
>
> There is always the problem with blockquote and list markers: if they
> appear inside a paragraph and line wrapping is applied, they may
> potentially end up at the beginning of a line... I don't think that
> this can be avoided.
>
> Any other problems/ambiguities/edge cases that need to be addressed?
>
>
>
> Conclusion
> ==========
>
> The proposed lazy syntax for kramdown is identical to that of the
> original Markdown implementation - some edge cases are handled
> differently though. However, in contrast to Markdown.pl more reasons are
> given why this lazy syntax is useful and how it arises naturally when
> looking at email messages and how they are processed by MTAs and email
> programs.
>
> I haven't looked at how to implement this in kramdown but it shouldn't
> be too difficult. Before I do that I would like to hear your opinions
> on this matter! :-)
>
>
> Best regards and thanks for staying with me through this long email,
>    Thomas
>
>
> [1]: http://daringfireball.net/projects/markdown/syntax
> [2]: http://osdir.com/ml/text.markdown.general/2007-05/msg00031.html
> _______________________________________________
> kramdown-users mailing list
> kramdown-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/kramdown-users
>


More information about the kramdown-users mailing list