[kramdown-users] Zcaron/zcaron not valid HTML references
Eric Sunshine
sunshine at sunshineco.com
Tue Jul 6 14:51:21 EDT 2010
On 7/6/2010 1:33 PM, Matt Neuburg wrote:
> No. If the actual character appears in input, leave it alone. UTF8
> characters are legal in XHTML / XML.
> This is also in keeping with the promise already given by Thomas Leitner
> (and I quote):
>> The kramdown parser (as well as the new
>> HTML parser) doesn't convert between encodings or change normal
>> characters to entities. Whatever string you give to kramdown comes out
>> in the same encoding.
> Actually, I thought that the example given in this email, namely the string
> Blaž, would have fallen into the field of this promise as well.
> kramdown should not change literal entities or literal characters. m.
Taken literally, Thomas's statement does not apply to this case. An
entity (á) is not a literal character (á), and vice-versa.
kramdown correctly represents the entity internally as just that: an
entity, not as its external representation. It does not confuse the
entity with a character in a string. No particular external
representation of the entity (á, á, á) is more correct
than any other, and none is incorrect (except the reported bug with
ž). Keep in mind too that other external representations also are
possible, depending upon the output module, such as \acute{a} for LaTeX,
so it makes sense that kramdown treat the entity internally in this
abstract fashion rather than as one of its external representations.
I expect that Thomas could augment the internal entity object so that it
remembers its input representation, but this would sully the presently
clean abstraction, and it's not clear that there would be significant
benefit. The present behavior of emitting symbolic references when
possible (unless explicitly disabled) seems a decent compromise if the
output is expected to be read by humans.
-- ES
More information about the kramdown-users
mailing list