[Ironruby-core] Issue with accents (UTF-8) - is it supposed to work ?
thibaut.barrere at gmail.com
Tue Mar 3 15:24:36 EST 2009
thanks for your two messages and the in-depth explanation. Working
with -19 and #encoding: UTF-8 indeed solves the issue (tested on
> Actually the 1.8 parser is somewhat influenced by the current $KCODE.
> Multi-byte characters could be part of identifiers and also the decision of
> where a string literal ends needs to deal with multi-byte characters.
> However, the resulting literals are just plain byte arrays with no knowledge
> of encoding so String#size method is still broken.
> To achieve a better .NET interop in IronRuby, we will honor KCODE when
> creating MutableStrings. The representation of the string will be byte if
> it contains any non-ascii characters and KCODE is set to a non-ascii
> encoding. We will also attach the KCODE encoding to the MutableString at
> creation time. This doesn’t affect Ruby 1.8 functionality, it only affects
> conversions to CLR string. So if you use KCODE = “U” the CLR strings should
> be correctly encoded (they are not now as you are experiencing). I’ll
> implement this feature as soon as possible.
I think affecting strings only when conversion occurs to CLR is a
pretty neat idea.
I like that a lot more than having to add #encoding and -19 (also
because I'm not sure what the impact would be to use -19 just for
Because I was curious, I had a look at Rails (2.2.2) output for some
of these operations:
Loading development environment (Rails 2.2.2)
=> #<ActiveSupport::Multibyte::Chars:0x2378348 @wrapped_string="hèllo">
So pretty much rough access through array is pure byte, while .first
takes multibytes into account.
I think the spirit of what you suggest is somewhat close from that.
I like it - and will test it when you'll have it implemented.
cheers and thanks for your idea,
More information about the Ironruby-core