[kramdown-users] Zcaron/zcaron not valid HTML references

Eric Sunshine sunshine at sunshineco.com
Tue Jul 6 14:51:21 EDT 2010

On 7/6/2010 1:33 PM, Matt Neuburg wrote:
> No. If the actual character appears in input, leave it alone. UTF8
> characters are legal in XHTML / XML.
> This is also in keeping with the promise already given by Thomas Leitner
> (and I quote):
>> The kramdown parser (as well as the new
>> HTML parser) doesn't convert between encodings or change normal
>> characters to entities. Whatever string you give to kramdown comes out
>> in the same encoding.
> Actually, I thought that the example given in this email, namely the string
> Blaž, would have fallen into the field of this promise as well.
> kramdown should not change literal entities or literal characters. m.

Taken literally, Thomas's statement does not apply to this case. An 
entity (á) is not a literal character (á), and vice-versa. 
kramdown correctly represents the entity internally as just that: an 
entity, not as its external representation. It does not confuse the 
entity with a character in a string. No particular external 
representation of the entity (á, á, á) is more correct 
than any other, and none is incorrect (except the reported bug with 
ž). Keep in mind too that other external representations also are 
possible, depending upon the output module, such as \acute{a} for LaTeX, 
so it makes sense that kramdown treat the entity internally in this 
abstract fashion rather than as one of its external representations.

I expect that Thomas could augment the internal entity object so that it 
remembers its input representation, but this would sully the presently 
clean abstraction, and it's not clear that there would be significant 
benefit. The present behavior of emitting symbolic references when 
possible (unless explicitly disabled) seems a decent compromise if the 
output is expected to be read by humans.

-- ES

More information about the kramdown-users mailing list