[kramdown-users] RFC: Lazy syntax for paragraphs, blockquotes and lists

Shawn Van Ittersum svicalifornia at gmail.com
Fri Sep 3 11:06:47 EDT 2010


Hi Thomas,

Thank you for this great write-up, and for seeing the need for lazy indentation to support contexts with forced line wrapping (such as email).

I believe I'm in agreement with all of your conclusions, and your proposed handling of the edge cases you cited.  As we discussed on the kramdown mailing list, blank lines are ideal separators between paragraphs and blockquotes, both in email and in Markdown.

Here's another edge case I'd like to confirm with you:

First, the setup.  Someone sends an email that gets wrapped like this:

This is a long line of text that gets wrapped in email at 72 characters,
and here is the second part of that line, which also gets wrapped at the
same length and then stops here.

Someone replies and blockquotes that line, which lengthens the lines.  However, their email client wraps at 78 characters, so no additional wrapping occurs:

> This is a long line of text that gets wrapped in email at 72 characters,
> and here is the second part of that line, which also gets wrapped at the
> same length and then stops here.

The first person replies, lengthening the lines again.  They get wrapped at 72 characters, and this happens:

> > This is a long line of text that gets wrapped in email at 72 
characters,
> > and here is the second part of that line, which also gets wrapped at
the
> > same length and then stops here.

Despite the jagged syntax, this should semantically be a blockquote inside a blockquote.  Based on the "must end blockquote with blank line" rule, would kramdown interpret it that way? 

The second person replies again:

> > > This is a long line of text that gets wrapped in email at 72 
> characters,
> > > and here is the second part of that line, which also gets wrapped at
> the
> > > same length and then stops here.

Will kramdown treat that as a blockquote inside a blockquote inside a blockquote?

If so, then I think we're in agreement, and I look forward to seeing it in action!

Thanks again,
Shawn

On Fri, 3 Sep 2010 11:43:49 +0200, Thomas Leitner wrote:
> Hi everybody,
> 
> it was requested that kramdown (a Markdown parser in Ruby, see
> <http://kramdown.rubyforge.org>) supports the lazy syntax of Markdown.
> So I sat down, thought about it, skimmed through the Markdown ML on
> issues regarding lazy indentation as done with Markdown and now I have
> some rough idea on how to do this in kramdown.
> 
> First: I'd like to say that there is no way to satisfy everyone. Lazy
> indentation has some pros and cons and we have to find a middle ground!
> 
> Second: This is a rather long mail but worth the read, especially if
> you want to influence how kramdown implements the lazy syntax!
> 
> Third: I have cross-posted this email to the Markdown ML because it
> provides a nice explanation of why the behaviour of the lazy syntax in
> Markdown.pl might be as it is.
> 
> 
> 
> In the beginning there was...
> =============================
> 
> Markdown was created by John Gruber because he wanted a nice text
> format that is inspired by how email messages are written. There is a
> requirement that lines in plain text email message should not be longer
> than 78 characters and therefore many mail (transport) programs
> hard-wrap text to a specific line length.
> 
> I think that this is the reason why we have lazy indentation or
> generally long line wrapping in Markdown. If we would say that each
> paragraph must be one long line, there would obviously be problems when
> messages get automatically wrapped by (email) programs. Therefore
> Markdown allows paragraphs to continue on the following lines.
> 
> 
> 
> The Markdown syntax
> ===================
> 
> This is just a short summary of how and in which elements Markdown
> supports lazy indentation (taken more or less from the [Markdown Syntax
> Page][1]).
> 
> 
> ## Blockquotes
> 
> A blockquote starts with a `>` character. All following lines with a
> `>` character belong to the same blockquote. However, you may be lazy
> and put the `>` character only before the first line of a blockquote:
> 
>     > This is a normal
>     paragraph in a blockquote.
> 
>     > The blockquote is continued here!!!
> 
> A blank line between two blockquotes does *not* separate the
> blockquotes, it's just one large blockquote.
> 
> 
> ## Lists
> 
> As with blockquotes, the content of a list item must not be indented
> correctly. For example:
> 
>     * This is a normal
>     paragraph in a list item.
> 
> This is even allowed for other paragraphs in the list:
> 
>     *   This is a paragraph.
> 
>         This is a paragraph
>     with a lazy indentation.
> 
> 
> 
> Problems/Ambiguities
> ====================
> 
> The lazy indentation syntax provides Markdown users with many chances
> to get some unexpected output... Additionally, since both lists and
> blockquotes support lazy indentation it is sometimes not clear what the
> outcome is when those two elements are combined.
> 
> Here are some issues taken from the Markdown ML.
> 
> PA1. First example:
> 
>     * this is list item
>     > * this item is in a block quote  
>     more block quoting?
> 
> PA2. Second example:
> 
>       * > list item with quoting
>     more text here
> 
>       * > list item with quoting
>     more text here
>       * another list item
> 
> PA3. Third example:
> 
>     > > I wrote something  
>     > you replied  
>     and now here is my reply to your reply.
> 
> PA4. Fourth example:
> 
>     > * foo
>     > > bar
>     > > baz 
> 
> The above examples can be interpreted in one way or another. This means
> that we won't find a solution that satisfies all needs. We can only try
> to find a solution that is based on a general rule which feels natural
> to the user and does what most people would expect.
> 
> Michel Fortin wrote [this][2] on the Markdown ML regarding the lazy
> syntax:
> 
>> Basically, I'd eliminate any "half-lazy" syntax were you can be lazy  
>> about list item indentation while not being lazy on blockquote  
>> markers. This just creates confusion; syntax markers shouldn't be  
>> allowed to be lazy.
>> 
>> Removing half-lazy things would also fix a surprising issue with  
>> blockquotes:
>> 
>>> foo  
>>>> bar  
>>> baz  
>> 
>> This would be seen as a blockquote containing a "foo" paragraph, a  
>> nested "bar" blockquote and a "baz" paragraph, instead of the  
>> completly counter-intuitive output produced today. To make "baz"
>> part of the nested blockquote, you would either go the explicit route:
>> 
>>> foo  
>>>> bar
>>>> baz  
>> 
>> or the lazy route:
>> 
>>> foo  
>>>> bar  
>>      baz
>> 
>> but not something in between.
> 
> 
> 
> kramdown "lazy" syntax
> ======================
> 
> I thought about how I would like things to work, considering all of the
> above and I came to the following solution. Note, however, that I do
> *not* recommend using the lazy syntax when writing a document!
> 
> Since the problem of the lazy syntax arises from the problem of line
> wrapping, why not just use that to specify how the lazy syntax should
> work?
> 
> Before we go into details consider the following: The kramdown syntax
> page lists the following structural block level elements:
> 
> * Blank lines
> * Paragraphs
> * Headers
> * Blockquotes
> * Code blocks
> * Lists (incl. footnote definitions)
> * Tables
> * Horizontal rules
> * Math blocks
> * HTML blocks
> 
> We can leave out all elements which do not inherently support line
> wrapping, namely blank lines (no text to wrap), code blocks (should be
> output as is), tables, horizontal rules, math blocks (same as with code
> blocks) and HTML blocks.
> 
> Headers can also be left out assuming that a header text is not long
> enough to trigger line wrapping (this has also been discussed on the
> Markdown ML and I think that the consensus was that longer header texts
> should be written directly in HTML).
> 
> This leaves us with three elements: paragraphs, blockquotes and lists.
> However, blockquotes and lists are just "wrappers" around paragraphs
> and therefore the only element that really contains any text in a
> kramdown (Markdown) document is a paragraph (I also count the compact
> list text that is not wrapped in `<p>` tags as a paragraph because
> conceptually it is one). So when we know how long lines in paragraphs
> are wrapped, the behaviour of long lines in blockquotes and lists are
> easy to derive.
> 
> 
> ## Requirements
> 
> There are two requirements regarding line wrapping and "lazy" syntax:
> 
> * Line wrapping may be done like it is done by dumb editors, ie. a long
>   line is split on whitespace before the maximal line length and the
>   text continues on the next line (ie. *no* blank line in-between).
>   This means that the additional lines belong to the line (and
>   therefore a certain paragraph) to which line wrapping has been
>   applied!
> 
> * It must be possible to blockquote a kramdown document (which
>   possible contains lazy lines) and preserve the structure of the quoted
>   document.
> 
> 
> ## Paragraphs
> 
> So how to lazy wrap simple paragraphs? This is the easiest one since
> the [Markdown syntax description][1] already tells us how: just
> hard-wrap your lines and separate multiple consecutive paragraphs with
> one or more blank lines.
> 
> For example:
> 
>     This is one long long long long long long long long long line
> 
> gets wrapped to:
> 
>     This is one long long
>     long long long long
>     long long long line
> 
> So the paragraph rule as stated on the [Markdown syntax page][1] is
> actually needed to support being lazy when writing paragraphs - and to
> support programs that hard-wrap long lines.
> 
> 
> ## Blockquotes
> 
> By following the two requirements as stated above, it is clear how the
> lazy syntax for blockquotes has to look like.
> 
> The following examples modify this document:
> 
>     This is one long long long long long long long long long line
> 
> BQ1. After blockquoting:
> 
>     > This is one long long long long long long long long long line
> 
> BQ2. After line wrapping and blockquoting:
> 
>     > This is one long long
>     > long long long long
>     > long long long line
> 
> BQ3. After line wrapping, blockquoting and blockquoting:
> 
>     > > This is one long long
>     > > long long long long
>     > > long long long line
> 
> BG4. After blockquoting and line wrapping:
> 
>     > This is one long long
>     long long long long
>     long long long line
> 
> BG5. After blockquoting, line wrapping and blockquoting:
> 
>     > > This is one long long
>     > long long long long
>     > long long long line
> 
> As can be seen in the last example, the "half-lazy" syntax described by
> Michel Fortin arises naturally when blockquoting and line wrapping are
> combined in a certain way. However, I think it should not make any
> difference whether a document is first line-wrapped and then
> blockquoted or the other way around. Therefore I would allow this
> "half-lazy" syntax.
> 
> What happens if line wrapping is done several times?
> 
> BG5 with additional line wrapping:
> 
>     > > This is one
>     long long
>     > long long
>     long long
>     > long long
>     long line
> 
> This looks a bit scary, I admit, but it is still one paragraph embedded
> in two blockquotes... I don't suggest that anyone writes his documents
> in this way though...
> 
> Due to line wrapping we now also have to require the use of blank lines
> between a blockquote and a following paragraph. Otherwise it is
> impossible to know whether example BQ4 contains just a blockquote or a
> blockquote followed by a paragraph.
> 
> I don't think that requiring a blank line is a burden on writers. If
> you look through the kramdown or the Markdown ML, you will see that in
> nearly all emails quoted text is separated from the response by at
> least one blank line.
> 
> Note that kramdown would generate two separate blockquotes if they are
> separated by a blank line (Markdown.pl merges the blockquotes):
> 
>     > This is one blockquote with
>     a long line.
> 
>     > This is another blockquote
>     with a long line.
> 
> If you run the example BQ1 to BQ5 through Markdown.pl, you will find
> that it produces the expected output (as defined above). This is no
> coincidence, I think, since Markdown.pl has been designed with email
> messages in mind. However, the requirements as stated above
> haven't been written down anywhere (at least I don't know of it) and
> with those the behaviour of Markdown.pl is easily explained.
> 
> 
> ## Lists
> 
> The content of lists, footnote definitions and all other content
> (except code blocks) that is defined via indentation, also has to
> support the lazy syntax.
> 
> We will start with this document:
> 
>     *   This is one long long long long long long long long long line
> 
>         This is one long long long long long long long long long line
>     *   Another very very very very very very very very long line
> 
> LI1. After line wrapping:
> 
>     *   This is one long long
>     long long long long long
>     long long line
> 
>         This is one long long
>     long long long long long
>     long long line
>     *   Another very very very
>     very very very very very
>     long line
> 
> So line wrapping inside lists can also be explained in terms of the
> requirements. And the line wrapping behaviour is identical to that of
> Markdown.pl.
> 
> 
> 
> How to interpret the stated problems/ambiguities
> ================================================
> 
> After having specified how the kramdown lazy syntax would work, here is
> how the initially given problems would be interpreted:
> 
> PA1. A list with one item, followed by a blockquote containing a list
>      with one item. Markdown.pl interprets it in more or less the same
>      way but using invalid HTML.
> 
> PA2. A list with three items: the first and the second item contain a
>      blockquote with a paragraph, the third item contains just text.
>      Again, Markdown.pl shows the same behaviour.
> 
> PA3. Two nested blockquotes containing one paragraph with all the text.
>      Markdown.pl shows the same behaviour.
> 
> PA4. A blockquote containing a) a list with one item and b) a
>      blockquote with a paragraph containing the text "bar baz".
>      Markdown.pl's behaviour differs - it puts the inner blockquote
>      inside the list item - again we have to disregard the invalid HTML
>      it produces.
> 
> There is always the problem with blockquote and list markers: if they
> appear inside a paragraph and line wrapping is applied, they may
> potentially end up at the beginning of a line... I don't think that
> this can be avoided.
> 
> Any other problems/ambiguities/edge cases that need to be addressed?
> 
> 
> 
> Conclusion
> ==========
> 
> The proposed lazy syntax for kramdown is identical to that of the
> original Markdown implementation - some edge cases are handled
> differently though. However, in contrast to Markdown.pl more reasons are
> given why this lazy syntax is useful and how it arises naturally when
> looking at email messages and how they are processed by MTAs and email
> programs.
> 
> I haven't looked at how to implement this in kramdown but it shouldn't
> be too difficult. Before I do that I would like to hear your opinions
> on this matter! :-)
> 
> 
> Best regards and thanks for staying with me through this long email,
>   Thomas
> 
> 
> [1]: http://daringfireball.net/projects/markdown/syntax
> [2]: http://osdir.com/ml/text.markdown.general/2007-05/msg00031.html
> _______________________________________________
> kramdown-users mailing list
> kramdown-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/kramdown-users


More information about the kramdown-users mailing list