[Ironruby-core] Issue with accents (UTF-8) - is it supposed to work ?

Tomas Matousek Tomas.Matousek at microsoft.com
Tue Mar 3 13:36:13 EST 2009


If I run this in Ruby 1.8.6:

> ruby –Ku uni.rb

And uni.rb is UTF-8 encoded w/o BOM:

puts $KCODE
puts 'hèllo'.size

I’ll get output:
UTF-8
6

So that clearly doesn’t work as one might expect. String literals in MRI 1.8 are always binary (ie. the accented character is stored as any other 2 bytes in the string).
AFAIK $KCODE only affects some built-in and library methods – for example String#inspect, regular expression, conversion libraries, etc.

Although IronRuby stores string literals in UTF16 .NET strings, to be fully compatible with MRI 1.8 we use a custom BinaryEncoding for these strings. When a string is converted to an array of bytes using this encoding, only 8 bits of each character are used (the other bits are required to be 0). This works fine for encodings that use a single byte per character. It’s broken for multi-byte encodings but that’s a problem with Ruby 1.8 in general.

If you want to use Unicode you should not use 1.8 semantics. You should use -19 switch to run your script in 1.9 mode and either add a UTF8 BOM preamble or Ruby encoding magic comment:

#encoding: UTF-8
puts 'hèllo'.size

> ruby19 uni.rb
5

> ir.exe -19 uni.rb
5

In a hosted app you can set 1.9 compat mode when creating the ScriptEngine/Runtime:

var ruby = IronRuby.Ruby.CreateEngine((setup) => {
   setup.Options["Compatibility"] = RubyCompatibility.Ruby19
});

Tomas

From: ironruby-core-bounces at rubyforge.org [mailto:ironruby-core-bounces at rubyforge.org] On Behalf Of Tomas Matousek
Sent: Tuesday, March 03, 2009 9:56 AM
To: ironruby-core at rubyforge.org
Subject: Re: [Ironruby-core] Issue with accents (UTF-8) - is it supposed to work ?

I’ll take a look.

Tomas

From: ironruby-core-bounces at rubyforge.org [mailto:ironruby-core-bounces at rubyforge.org] On Behalf Of Ivan Porto Carrero
Sent: Tuesday, March 03, 2009 6:58 AM
To: ironruby-core at rubyforge.org
Subject: Re: [Ironruby-core] Issue with accents (UTF-8) - is it supposed to work ?

No not a mono related issue. I get the same results when i run your sample on windows with MS.NET<http://MS.NET>
It must be an encoding thing. When I set the $KCODE to "UTF-8" it still has the same behavior which is weird I guess :)
On Tue, Mar 3, 2009 at 3:35 PM, Thibaut Barrère <thibaut.barrere at gmail.com<mailto:thibaut.barrere at gmail.com>> wrote:
Hi,

> not sure if it's an oddity in my code, a bug or non-implemented feature in
> IronRuby or Mono - so I'm reporting it here. When using accents inside
> strings ("Barrère") that I pass to either buttons or datagridviews, they
> translate into "BarrA¨re". Here's a sample (also available on github):
Bumping this one - do you have some idea of what's happening there ?
Is it a mono related issue ?

-- Thibaut

> Hi,
> not sure if it's an oddity in my code, a bug or non-implemented feature in
> IronRuby or Mono - so I'm reporting it here. When using accents inside
> strings ("Barrère") that I pass to either buttons or datagridviews, they
> translate into "BarrA¨re". Here's a sample (also available on github):
>
> form = Magic.build do
>   form(:text => "DataGridView sample", :width => 800, :height => 600) do
>     # nifty - current Magic.build makes it possible to reuse the control
> that has been added
>     @grid = data_grid_view :dock => DockStyle.fill
>     @grid.column_count = 2
>     @grid.columns[0].name = "First name"
>     @grid.columns[1].name = "Last name"
>
>     @grid.rows.add("Thibaut","Barrère") # using my name with its nasty
> accent - utf-8 ?
>   end
> end
>
> After editing the datagridview, I noticed a log on stdout from mono:
> 009-03-01 11:48:36.927 mono[5512:10b] WARNING:
> CFSTR("Barr\37777777703\37777777603\37777777702\37777777650re") has non-7
> bit chars, interpreting using MacOS Roman encoding for now, but this will
> change. Please eliminate usages of non-7 bit chars (including escaped
> characters above \177 octal) in CFSTR().
> So I guess the issue probably boils down to non-MacOS Roman support in Mono.
> What do you think ?
> -- Thibaut
_______________________________________________
Ironruby-core mailing list
Ironruby-core at rubyforge.org<mailto:Ironruby-core at rubyforge.org>
http://rubyforge.org/mailman/listinfo/ironruby-core

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://rubyforge.org/pipermail/ironruby-core/attachments/20090303/d20f4970/attachment-0001.html>


More information about the Ironruby-core mailing list