[Ironruby-core] Encoding problem

Tomas Matousek Tomas.Matousek at microsoft.com
Fri Jan 14 12:49:45 EST 2011


Most editors insert BOM character at the beginning of the file. This is a byte sequence that allows readers to identify Unicode encoding and is not usually visible in editors.

Tomas

From: ironruby-core-bounces at rubyforge.org [mailto:ironruby-core-bounces at rubyforge.org] On Behalf Of Albert-Jan Pieter Nijburg
Sent: Friday, January 14, 2011 12:49 AM
To: ironruby-core at rubyforge.org
Subject: Re: [Ironruby-core] Encoding problem

I have a clone from the github repos and it’s at the tip.

It appears that scite does not save its files as UTF-8 by default, I assumed it did. Which solves the problem ☺

Even without the #encoding it works

Thanks

Van: ironruby-core-bounces at rubyforge.org [mailto:ironruby-core-bounces at rubyforge.org] Namens Tomas Matousek
Verzonden: donderdag 13 januari 2011 19:13
Aan: ironruby-core at rubyforge.org
Onderwerp: Re: [Ironruby-core] Encoding problem

What IronRuby version do you use?

On my machine (github/master):

a.rb (saved as UTF-8 encoded file):
# encoding: UTF-8

a = "ë"
b = "\u{eb}"

puts a.encoding, b.encoding, a, b, a.inspect, b.inspect

C:\Temp>rbx a.rb
UTF-8
UTF-8
ë
ë
"\u{eb}"
"\u{eb}"

Which is also what MRI 1.9.2 does.

Tomas


From: ironruby-core-bounces at rubyforge.org [mailto:ironruby-core-bounces at rubyforge.org] On Behalf Of Albert-Jan Pieter Nijburg
Sent: Thursday, January 13, 2011 7:49 AM
To: ironruby-core at rubyforge.org
Subject: Re: [Ironruby-core] Encoding problem

Hey,

I found out that if I put nothing at the top and I do this:

puts "\x89"

it puts “ë”

if I put the #Encoding: UTF-8 at the top this happens. ☺

ëmscorlib:0:in `Throw': Unable to translate bytes [89] at index -1 from specifie
d code page to Unicode. (System::Text::DecoderFallbackException)
        from mscorlib:0:in `Fallback'
        from mscorlib:0:in `InternalFallback'
        from mscorlib:0:in `GetCharCount'
        from mscorlib:0:in `GetCharCount'
        from mscorlib:0:in `GetChars'
        from tabaco.rb:2:in `puts'
        from tabaco.rb:2

It does print it but then it dies.

#<Encoding: UTF-8>
puts "\x89".force_encoding("UTF-8") does the same

#<Encoding: UTF-8>
puts "ë".force_encoding("UTF-8") does the same as before.
Also without the #<Encoding thing>

So I thought I had it with the puts “\x89” and I tried this:

class PatGeg < ActiveRecord::Base
      set_table_name "Pati\x89ntGegevens"
end

PatGeg.first.Achternaam

and here’s what I got

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby/gems/1.9.1/g
ems/activerecord-3.0.0/lib/active_record/connection_adapters/abstract_adapter.rb
:200:in `log': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding:
:CompatibilityError)
        from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connecti
on_adapters/sqlserver/database_statements.rb:217:in `raw_select'
        from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connecti
on_adapters/sqlserver/database_statements.rb:178:in `select'
        from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/connection_adapters/abstra
ct/database_statements.rb:7:in `select_all'
        from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/connection_adapters/abstra
ct/query_cache.rb:56:in `select_all'
        from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/base.rb:467:in `find_by_sq
l'
        from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation.rb:64:in `to_a'
        from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb
:333:in `find_first'
        from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb
:122:in `first'
        from c:6:in `__send__'
        from c:6:in `first'

I’ve tried to do the force_encoding(“UTF-8”) on this thing to which results in something very similar :

c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby/gems/1.9.1/g
ems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connection_adapters/s
qlserver/quoting.rb:31:in `=~': invalid byte sequence 89 on UTF-8 (Encoding::Inv
alidByteSequenceError)
        from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-sqlserver-adapter-3.0.0/lib/active_record/connecti
on_adapters/sqlserver/quoting.rb:31:in `quote_table_name'
        from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/base.rb:597:in `quoted_tab
le_name'
        from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/query_methods.rb:
234:in `build_select'
        from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/query_methods.rb:
159:in `build_arel'
        from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/query_methods.rb:
110:in `arel'
        from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation.rb:64:in `to_a'
        from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb
:333:in `find_first'
        from c:/ir/irtest/External.LCA_RESTRICTED/Languages/Ruby/ruby19/lib/ruby
/gems/1.9.1/gems/activerecord-3.0.0/lib/active_record/relation/finder_methods.rb
:122:in `first'
        from c:6:in `__send__'
        from c:6:in `first'
        from tabaco.rb:34

I have a feeling that ironruby and .net are not in sync with the encodings

Albert-Jan

Van: ironruby-core-bounces at rubyforge.org [mailto:ironruby-core-bounces at rubyforge.org] Namens Dezso Zoltan
Verzonden: donderdag 13 januari 2011 16:04
Aan: ironruby-core at rubyforge.org
Onderwerp: Re: [Ironruby-core] Encoding problem

Hi,

> warning: variable $KCODE is no longer effective

This means that you are in 1.9 mode :) In that case there are two things you could try:
1) set the encoding at the top of the file in the form of the comment:
# encoding: UTF-8

2) force an encoding on the string(s) in question with the method (if 1) fails in IronRuby):
.force_encoding("UTF-8")

Zaki

On Thu, Jan 13, 2011 at 11:20 PM, Albert-Jan Pieter Nijburg <albertjan at curit.com<mailto:albertjan at curit.com>> wrote:
Hey Zaki,

WARNING: YAML.add_builtin_type is not implemented
unknown:0: warning: variable $KCODE is no longer effective
tabaco.rb:11:in `puts': character U+00EB can't be encoded in US-ASCII (Encoding:
:InvalidByteSequenceError)
        from tabaco.rb:11

Too bad.. thanks though. I’ll have a look in the source if I can find something.

Annoying Europeans :P

Albert-Jan

Van: ironruby-core-bounces at rubyforge.org<mailto:ironruby-core-bounces at rubyforge.org> [mailto:ironruby-core-bounces at rubyforge.o<mailto:ironruby-core-bounces at rubyforge.o>rg] Namens Dezso Zoltan
Verzonden: donderdag 13 januari 2011 14:52
Aan: ironruby-core at rubyforge.org<mailto:ironruby-core at rubyforge.org>
Onderwerp: Re: [Ironruby-core] Encoding problem

Hi,

I don't really know the solution to your question, but this might help:
ë is Unicode U+00EB, which is 0xC3AB in UTF-8 (so we are dealing with unicode rather than utf-8, which I assume is because IronRuby uses the immutable .NET strings internally with Unicode encoding).

The errors are expected if your default encoding is US-ASCII because it does not contain ë (and uses single bytes, so the 0x00EB would be broken into two bytes and your script would choke on the second 0xEB) : you will need to set your encoding to something compatible, like utf-8.

I don't quite know how to do that properly in IronRuby, but in CRuby 1.9 you could use "magic comments" in your ruby file and in 1.8 something like $KCODE='u' could work. You might also be able to drop back into .NET and set the encoding there, but I'm not sure how that affects IronRuby assemblies.

I would start with $KCODE = 'u' Let me know how that works for you.

Zaki

On Thu, Jan 13, 2011 at 6:33 PM, Albert-Jan Pieter Nijburg <albertjan at curit.com<mailto:albertjan at curit.com>> wrote:
Hi Guys,

My boss thought it would be cool to use “ë” in an sql tablename, many of you will want to shoot her now ☺.

But now I did find something weird, I can’t even print “ë”.

It says:

tabaco.rb:16:in `puts': character U+00EB can't be encoded in US-ASCII (Encoding::InvalidByteSequenceError)
        from tabaco.rb:16

or

when I print the string somewhere else :S when it comes back from a method.

System::Text::DecoderFallbackException at /patient/0
Unable to translate bytes [EB] at index 3 from specified code page to Unicode.

Or when I don’t mess with it

Encoding::InvalidByteSequenceError at /patient/0
invalid byte sequence EB on UTF-8


All the same problem coming from 3 places.

Is this a fundamental issue or should this be solvable?

If you could point me in the right direction I could try to maybe fix it.


Thanks,

Albert-Jan


_______________________________________________
Ironruby-core mailing list
Ironruby-core at rubyforge.org<mailto:Ironruby-core at rubyforge.org>
http://rubyforge.org/mailman/listinfo/ironruby-core


_______________________________________________
Ironruby-core mailing list
Ironruby-core at rubyforge.org<mailto:Ironruby-core at rubyforge.org>
http://rubyforge.org/mailman/listinfo/ironruby-core

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://rubyforge.org/pipermail/ironruby-core/attachments/20110114/f1cb87e4/attachment-0001.html>


More information about the Ironruby-core mailing list