Forums | Admin

Discussion Forums: help

Start New Thread Start New Thread

 

By: Eszter Horvati
RE: Unicode characters lost when reading NCHAR [ reply ]  
2010-11-24 12:31
Dear Kubo!

I'm sorry about the confusion. It took me a while to figure out what really was going wrong here...
It turned out there was a different presentation layer in the project, one coming from CLOBs, that didn't show the correct UTF-8 characters. The ones coming from NVARCHAR columns worked perfectly! :-)
I changed also CLOB type columns to NCLOB, recreated the database setup (and applied a little fix according to this post: http://groups.google.com/group/oracle-enhanced/browse_thread/thread/8a52fbc868c8e632), and that did the trick.

Thank you very much!

Best regards,
Eszter

By: Kubo Takehiro
RE: Unicode characters lost when reading NCHAR [ reply ]  
2010-11-22 13:25
Could you post test code to reproduce the problem?
The patch works fine for the script posted by Raimonds.
http://rubyforge.org/forum/message.php?msg_id=95101

Well, I have no time now. I'll check it again at the next weekend.

By: Eszter Horvati
RE: Unicode characters lost when reading NCHAR [ reply ]  
2010-11-22 10:23
Hello again!

I got some help to confirm that the compilation was correct, and the changes should have taken effect.
Unfortunately, they don't.

Seems like the solution of binding all string values to NCHAR doesn't fix the problem. Could it be that the database's NLS_LANG setting has already been assumed somewhere before we fetch the actual characters? So that the characters are first converted to the character set of the database (LATIN-1 in my case) somewhere, and then set to UTF-8 through the NCHAR datatype (according to your patch)? And when that happens, the critical characters are already lost?

Eszter

By: Eszter Horvati
RE: Unicode characters lost when reading NCHAR [ reply ]  
2010-11-18 21:11
Dear Kubo!
Thanks for the patch (actually, I was the one who raised this issue with Raimonds on the activerecord oracle-enhanced adapter forum). However, I tried to simply change the code locally, according to the change log from the link you posted, then run make again, but I still didn't get the effect I wanted. That is, I still don't get Unicode characters out from the database.
Am I missing something? Do I have to compile some other parts of the ruby-oci8 gem?
I am quite new to the whole Ruby on Rails thing, so forgive me if I am asking silly questions...

Kind regards,
Eszter

By: Kubo Takehiro
RE: Unicode characters lost when reading NCHAR [ reply ]  
2010-10-30 09:13
I made a patch to fix the issue. It binds all string values as NCHAR.
This is not general solution, but it works fine in this case.

See http://gist.github.com/655114

By: Kubo Takehiro
RE: Unicode characters lost when reading NCHAR [ reply ]  
2010-10-27 13:02
Thanks for your detailed post to reproduce the issue.
I'll check it at the next weekend.


By: Raimonds Simanovskis
RE: Unicode characters lost when reading NCHAR [ reply ]  
2010-10-24 13:04
One more clarification - on client side I set environment variable
NLS_LANG=AMERICAN_AMERICA.UTF8

By: Raimonds Simanovskis
RE: Unicode characters lost when reading NCHAR [ reply ]  
2010-10-24 12:55
One additional comment - Unicode characters from NCLOB columns are retrieved correctly. Here is modified test script for that:


# encoding: utf-8

require "rubygems"
gem "ruby-oci8", "2.0.4"
require "oci8"

# This tells OCI client to treat N'...' as Unicode string literals
ENV['ORA_NCHAR_LITERAL_REPLACE']='TRUE'

# Database NLS settings:
# NLS_CHARACTERSET = WE8ISO8859P1
# NLS_NCHAR_CHARACTERSET = AL16UTF16
conn = OCI8.new 'hr', 'hr', 'xe'

conn.exec "DROP TABLE test_nvarchar" rescue nil
conn.exec <<-SQL
CREATE TABLE test_nvarchar (
id NUMBER(38),
nchar_column NCHAR(20),
nvarchar_column NVARCHAR2(20),
nclob_column NCLOB
)
SQL

puts 'Test insert and select of "aceāčē"'

conn.exec "INSERT INTO test_nvarchar VALUES (1, N'aceāčē', N'aceāčē', N'aceāčē')"
conn.commit

# Problem - Unicode characters are shown as ¿¿¿ from NCHAR and NVARCHAR2 columns
# Unicode characters from NCLOB column are shown correctly
conn.exec("select * from test_nvarchar") do |r|
puts r[1],r[2],r[3].read
end

By: Raimonds Simanovskis
Unicode characters lost when reading NCHAR [ reply ]  
2010-10-24 12:44
If database character set is not UTF8 (e.g. NLS_CHARACTERSET = WE8ISO8859P1) and NCHAR and NVARCHAR2 data types are used for storing Unicode strings then ruby-oci8 cannot read Unicode characters from NCHAR and NVARCHAR2 columns.

Here is test script how to repeat the issue:

# encoding: utf-8

require "rubygems"
gem "ruby-oci8", "2.0.4"
require "oci8"

# This tells OCI client to treat N'...' as Unicode string literals
ENV['ORA_NCHAR_LITERAL_REPLACE']='TRUE'

# Database NLS settings:
# NLS_CHARACTERSET = WE8ISO8859P1
# NLS_NCHAR_CHARACTERSET = AL16UTF16
conn = OCI8.new 'hr', 'hr', 'xe'

conn.exec "DROP TABLE test_nvarchar" rescue nil
conn.exec <<-SQL
CREATE TABLE test_nvarchar (
id NUMBER(38),
nchar_column NCHAR(20),
nvarchar_column NVARCHAR2(20)
)
SQL

puts 'Test insert and select of "aceāčē"'

conn.exec "INSERT INTO test_nvarchar VALUES (1, N'aceāčē', N'aceāčē')"
conn.commit

# Problem - Unicode characters are shown as ¿¿¿
conn.exec("select * from test_nvarchar") do |r|
puts r[1],r[2]
end



When I selected inserted values using either SQL Developer or sqlplus then I can see that they are inserted correctly as Unicode characters. So it means that ruby-oci8 is not correctly retrieving Unicode characters from NCHAR and NVARCHAR2 columns. Tried to identify in source code where it happens, but currently it was too hard for me :)

Kubo - do you have any idea what is causing this?

Raimonds