[fxruby-users] Unicode support in FXRuby 1.6
olivers at mondrian-ide.com
Sat Sep 3 12:53:57 EDT 2005
Thanks a lot for your thorough notes! I think you covered everything I was
> -----Original Message-----
> From: fxruby-users-bounces at rubyforge.org
> [mailto:fxruby-users-bounces at rubyforge.org]On Behalf Of Gonzalo
> Sent: Saturday, September 03, 2005 12:24 AM
> To: fxruby-users at rubyforge.org
> Subject: Re: [fxruby-users] Unicode support in FXRuby 1.6
> That's pretty much correct. Ruby's Unicode support is somewhat weak
> compared to python or perl.
> Only UTF-8 is supported. No support for UTF-16 is available, afaik.
> Basically... here's everything you wanted to know about ruby's
> Unicode but
> were afraid to ask....
> * $KCODE can be set to support an encoding directly, but this is *NOT*
> needed to have a script work with unicode.
> It is just a simple shortcut so that any regex like /./ will do the right
> * Without $KCODE, regexp with unicode support is available. It is done
> using /u language option, like
> t =~ //u
> Regexp.new(regex, options, 'u')
> (or, alternatively, //m which is for multi-byte -- meaning ANSI, UTF-8,
> EUC, or SJIS depending on
> what $KCODE is set to, albeit I believe this is now no longer needed as
> setting $KCODE will alredy
> adjust all regexes).
> * Supporting u"" like python can be added to some extent very
> easily. See:
> This allows you to then do:
> c = u'U+00a9' # same as \xc2\xa9
> * You can also use:
> to pack/unpack utf-8 strings. This allows you to easily count
> characters and iterate thru them,
> without the need of jcode (which really is only needed for
> getting succ
> to work).
> * jcode.rb is kind of a ruby hack and it is incomplete. Methods such as:
> reverse, capitalize, casecmp, swapcase, all the strip functions
> and probably
> others are not defined and will return incorrect results, depending on the
> * Ruby's $KCODE does not add a UTF-8 <->Latin1 encoding conversion, unlike
> python's unicode strings. So, albeit with the above, you can do:
> question = u'U+00bfHabla espaU+00f1ol?' # ¿Habla español?
> puts question
> similar to python's:
> question = u'\u00bfHabla espa\u00f1ol?' # ¿Habla español?
> print question
> You will not get the corresponding Latin1 string when you print it (unlike
> python's unicode strings).
> * To properly do the above, and convert Latin1<->UTF8 for printing, you
> should use iconv.
> ruby -rinconv -e 'puts Iconv.iconv("UTF-8", "ISO-8859-1", "\xf1")'
> Iconv, by default, does *NOT* get installed by the One-Click Windows
> installer, even thou it is supposed to be a
> standard part of ruby.
> Adding something then like:
> class UString
> require 'iconv'
> def to_s
> puts Iconv.iconv("UTF-8", "ISO-8859-1", self)
> will do the trick for Why's UString class.
> * The ruby interpreter should have no problem reading a utf-8 .rb script
> file, but you have to prefix it by calling
> > ruby -Ku file.rb (or set RUBYOPTS to -Ku, so ruby always runs
> with that)
> Note, however, that window's notepad, when saving UTF-8 files adds a valid
> albeit meaningless 3-byte BOM (byte-order sequence) at start
> which will not
> work fine with ruby1.8 (and will also corrupt unix shebang lines on
> most -all?- unixes). This sequence is not valid utf-8 unicode,
> albeit it is
> allowed by the standard. Ruby, just as Unix shebangs, does not deal with
> this appropiately.
> fxruby-users mailing list
> fxruby-users at rubyforge.org
More information about the fxruby-users