[kramdown-users] Zcaron/zcaron not valid HTML references

Thomas Leitner t_leitner at gmx.at
Wed Jul 7 07:57:45 EDT 2010

On 2010-07-06 14:51 -0400 Eric Sunshine wrote:
> On 7/6/2010 1:33 PM, Matt Neuburg wrote:
> > No. If the actual character appears in input, leave it alone. UTF8
> > characters are legal in XHTML / XML.
> > This is also in keeping with the promise already given by Thomas
> > Leitner (and I quote):
> >> The kramdown parser (as well as the new
> >> HTML parser) doesn't convert between encodings or change normal
> >> characters to entities. Whatever string you give to kramdown comes
> >> out in the same encoding.
> > Actually, I thought that the example given in this email, namely
> > the string Blaž, would have fallen into the field of this
> > promise as well. kramdown should not change literal entities or
> > literal characters. m.
> Taken literally, Thomas's statement does not apply to this case. An 
> entity (á) is not a literal character (á), and vice-versa. 
> kramdown correctly represents the entity internally as just that: an 
> entity, not as its external representation. It does not confuse the 
> entity with a character in a string. No particular external 
> representation of the entity (á, á, á) is more
> correct than any other, and none is incorrect (except the reported
> bug with ž). Keep in mind too that other external
> representations also are possible, depending upon the output module,
> such as \acute{a} for LaTeX, so it makes sense that kramdown treat
> the entity internally in this abstract fashion rather than as one of
> its external representations.

I agree.

> I expect that Thomas could augment the internal entity object so that
> it remembers its input representation, but this would sully the
> presently clean abstraction, and it's not clear that there would be
> significant benefit. The present behavior of emitting symbolic
> references when possible (unless explicitly disabled) seems a decent
> compromise if the output is expected to be read by humans.

The internal representation could easily be augmented but I don't know
if this is really useful - is there a use case where `ž` is
"better" than `ž`?

-- Thomas

More information about the kramdown-users mailing list