[Ironruby-core] Issue with accents (UTF-8) - is it supposed to work ?

Thibaut Barrère thibaut.barrere at gmail.com
Tue Mar 3 15:24:36 EST 2009

Hi Tomas,

thanks for your two messages and the in-depth explanation. Working
with -19 and #encoding: UTF-8 indeed solves the issue (tested on

> Actually the 1.8 parser is somewhat influenced by the current $KCODE.
> Multi-byte characters could be part of identifiers and also the decision of
> where a string literal ends needs to deal with multi-byte characters.
> However, the resulting literals are just plain byte arrays with no knowledge
> of encoding so String#size method is still broken.
> To achieve a better .NET interop in IronRuby, we will honor KCODE when
> creating MutableStrings. The representation of the string will be byte[] if
> it contains any non-ascii characters and KCODE is set to a non-ascii
> encoding. We will also attach the KCODE encoding to the MutableString at
> creation time. This doesn’t affect Ruby 1.8 functionality, it only affects
> conversions to CLR string. So if you use KCODE = “U” the CLR strings should
> be correctly encoded (they are not now as you are experiencing). I’ll
> implement this feature as soon as possible.

I think affecting strings only when conversion occurs to CLR is a
pretty neat idea.

I like that a lot more than having to add #encoding and -19 (also
because I'm not sure what the impact would be to use -19 just for

Because I was curious, I had a look at Rails (2.2.2) output for some
of these operations:

Loading development environment (Rails 2.2.2)
"hèllo".size>> "hèllo".size
=> 6
>> "hèllo".chars
=> #<ActiveSupport::Multibyte::Chars:0x2378348 @wrapped_string="hèllo">
>> "hèllo".chars.size
=> 5
>> '€2.99'[0,1]
=> "\342"
>> '€2.99'.first
=> "€"
>> '€2.99'.first
=> "€"

So pretty much rough access through array is pure byte, while .first
takes multibytes into account.

I think the spirit of what you suggest is somewhat close from that.

I like it - and will test it when you'll have it implemented.

cheers and thanks for your idea,

-- Thibaut

More information about the Ironruby-core mailing list