| Message |
 |
Date: 2009-01-23 14:57
Sender: S. Leger
Hi,
If you are interested, after Vladimir commits his changes, I
can easily create a patch with the updates for UTF-8 return values
AND for the changes in bug 17631. These changes were submitted
by Larry K., a former coworker of mine and provide better mapping
of java.lang.Exception types into Ruby. It would be very nice
to get these committed!
http://rubyforge.org/tracker/index.php?func=detail&aid=17631&
amp;group_id=1270&atid=4995 |
Date: 2009-01-23 09:46
Sender: Vladimir Dobriakov
> Would it be okay if I add you as a committer Vladimir?
Sure, than I would just run
`patch -p0 -i fix_pack_string_properly.diff`
and increase the version number.
What is about releasing the new gem version? Would you still
do this?
Best Regards
Vladimir AKA geekQ
|
Date: 2009-01-23 09:18
Sender: Christer Sandberg
Hi to you both!
Sorry for not applying the patch yet. I'm totally occupied in my
current project and my mind is elsewhere right now. Would it
be okay
if I add you as a committer Vladimir? Then you can apply all the
necessary patches and everyone will be happy and at ease ;)
/Christer |
Date: 2009-01-21 21:09
Sender: S. Leger
Update: I was correct, that function was mangling multi-byte
strings. I've changed the function so that it doesn't perform
the pack('C*') conversion; this works in ASCII but mangles other
UTF-8 values.
It looks a bit unintuitive because you might expect that s.length
would return the number of characters. However, it returns the
number of bytes (since strings are basically just byte arrays
in Ruby).
# +returns+:: An array, [0] = parsed string,
# [1] = number of bytes to slice
# out of the data stream that were
# occupied by the string
def from_utf8(len = '*')
s = @data.unpack("U#{len}").pack('U*')
[ s, s.length ]
end
I'm gonna throw some keywords in this bug so that search engines
may pick up this thread more readily: UTF-8 Ruby Hessian mangled
Japanese Chinese Russian multibyte unicode |
Date: 2009-01-21 19:06
Sender: S. Leger
Is it possible that this type of problem also affects UTF-8 strings
in return values? I am having trouble decoding an array (coming
out of a Hessian call) of UTF-8 text that contains Japanese-language
strings. I suspect that the same type of problem is occurring
in the decoding routine:
def from_utf8(len = '*')
s = @data.unpack("U#{len}").pack('C*')
[ s, s.unpack('C*').pack('U*').length ]
end
Also it does not appear that the original patch attached to this
bug has been applied to the SVN repo yet (as of 2009-01-21). |
Date: 2008-12-05 22:55
Sender: Vladimir Dobriakov
BTW, both the original and my proposed implementation won't
work with long strings (more than 64K). For that you need
the splitting in chunks of less than 64K letters (not
bytes). As long as nobody requests this functionality we
should probably at least raise a meaningful exception like
raise 'chunking is not supported yet' if length > 65535
Regards,
Vladimir |
Date: 2008-12-04 08:34
Sender: Christer Sandberg
Hi Vladimir!
I will apply the patch and make a new release as soon as I can.
I can't fix this while at work, but I will do it this weekend
for sure.
Thanks,
Christer |
Date: 2008-12-02 19:20
Sender: Vladimir Dobriakov
One additional note: the patch assumes, that the input
strings are UTF-8 encoded. We use the hessian library in a
Rails application where strings are UTF-8 encoded by
default. (English text encoded as ASCII works fine because
the bytes are the same as with UTF-8)
The patch is also tested with Cyrillic and Japanese strings.
To assume the UTF-8 encoding is the right way, I think. It
is not possible to reliably detect the encoding of a string
automatically. We either have to assume some encoding, that
is suitable for all or extend the API so the encoding is
provided by the caller.
|