[fxruby-users] Unicode support in FXRuby 1.6

Oliver Smith olivers at mondrian-ide.com
Sat Sep 3 12:53:57 EDT 2005


Gonzalo,

Thanks a lot for your thorough notes!  I think you covered everything I was
curious about.

Oliver

> -----Original Message-----
> From: fxruby-users-bounces at rubyforge.org
> [mailto:fxruby-users-bounces at rubyforge.org]On Behalf Of Gonzalo
> Garramuno
> Sent: Saturday, September 03, 2005 12:24 AM
> To: fxruby-users at rubyforge.org
> Subject: Re: [fxruby-users] Unicode support in FXRuby 1.6
>
>
> That's pretty much correct.  Ruby's Unicode support is somewhat weak
> compared to python or perl.
> Only UTF-8 is supported.  No support for UTF-16 is available, afaik.
>
> Basically...  here's everything you wanted to know about ruby's
> Unicode but
> were afraid to ask....
>
> * $KCODE can be set to support an encoding directly, but this is *NOT*
> needed to have a script work with unicode.
> It is just a simple shortcut so that any regex like /./ will do the right
> thing.
>
> * Without $KCODE, regexp with unicode support is available.  It is done
> using /u language option, like
> t =~ //u
> or
> Regexp.new(regex, options, 'u')
> (or, alternatively,  //m which is for multi-byte -- meaning ANSI, UTF-8,
> EUC, or SJIS depending on
> what $KCODE is set to, albeit I believe this is now no longer needed as
> setting $KCODE will alredy
> adjust all regexes).
>
> * Supporting u"" like python can be added to some extent very
> easily.  See:
> http://redhanded.hobix.com/inspect/closingInOnUnicodeWithJcode.html
> This allows you to then do:
> c = u'U+00a9'  # same as \xc2\xa9
>
> *  You can also use:
>      [].pack('U*')
>      "".unpack('U*')
>      to pack/unpack utf-8 strings.  This allows you to easily count
> characters and iterate thru them,
>      without the need of jcode (which really is only needed for
> getting succ
> to work).
>
> * jcode.rb is kind of a ruby hack and it is incomplete.  Methods such as:
> reverse, capitalize, casecmp, swapcase, all the strip functions
> and probably
> others are not defined and will return incorrect results, depending on the
> language.
>
> * Ruby's $KCODE does not add a UTF-8 <->Latin1 encoding conversion, unlike
> python's unicode strings.  So, albeit with the above, you can do:
>
> question = u'U+00bfHabla espaU+00f1ol?'  # ¿Habla español?
> puts question
>
> similar to python's:
> question = u'\u00bfHabla espa\u00f1ol?'  # ¿Habla español?
> print question
>
> You will not get the corresponding Latin1 string when you print it (unlike
> python's unicode strings).
>
> * To properly do the above, and convert Latin1<->UTF8 for printing, you
> should use iconv.
>     ruby -rinconv -e 'puts Iconv.iconv("UTF-8", "ISO-8859-1", "\xf1")'
>    Iconv, by default, does *NOT* get installed by the One-Click Windows
> installer, even thou it is supposed to be a
>    standard part of ruby.
>    Adding something then like:
>           class UString
>                  require 'iconv'
>                  def to_s
>                      puts Iconv.iconv("UTF-8", "ISO-8859-1", self)
>                  end
>            end
>    will do the trick for Why's UString class.
>
> * The ruby interpreter should have no problem reading a utf-8 .rb script
> file, but you have to prefix it by calling
> > ruby -Ku file.rb  (or set RUBYOPTS to -Ku, so ruby always runs
> with that)
> Note, however, that window's notepad, when saving UTF-8 files adds a valid
> albeit meaningless 3-byte BOM (byte-order sequence) at start
> which will not
> work fine with ruby1.8 (and will also corrupt unix shebang lines on
> most -all?- unixes).  This sequence is not valid utf-8 unicode,
> albeit it is
> allowed by the standard.  Ruby, just as Unix shebangs, does not deal with
> this appropiately.
>
> _______________________________________________
> fxruby-users mailing list
> fxruby-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/fxruby-users
>




More information about the fxruby-users mailing list