[BioCatalogue-developers] Database Encoding Issues

Eric Nzuobontane ericnzuo at ebi.ac.uk
Thu Aug 19 05:52:48 EDT 2010

Like I mentioned earlier, the default encoding for mysql which we are 
using for most tables is latin1. I believe this can be changed and 
should be changed to utf-8, which would make it consistent with rails 
expectations as well. This should apply to the db engine as well except 
there is a special reason for using myISAM tables, I would expect all 
tables to use innodb engine.


Jiten Bhagat wrote:
> Note that:
> - the database is storing it "correctly" (i.e.: not losing any
> information, just that it's a different encoding then what it should be).
> - but the Rails app expects things in UTF-8 usually, so displaying it in
> the UI (and API) causes it to mess up due to a disjointedness in the
> encodings.
> - interestingly, PHPMyAdmin displays it correctly, probably because it
> is not assuming the encoding and instead working with the database's
> encoding (we could simulate something like with String#mb_chars in
> Rails, but...)
> So, there is the bigger issue of the database being an encoding that is
> not modern enough and different to what is expected in Rails; I remember
> we had those issues a while back when people were trying to submit
> content with non-english characters and it kept failing in the database
> due to a "swedish collation/encoding" issue.
> I may be wrong though; this could be a Ruby issue (I know that Ruby has
> recently been criticised a lot for how it handles encodings of strings).
> This is something VERY IMPORTANT to look into, please.
> Jits
> Mannie Tagarira wrote:
>> Hi Eric,
>> There is a concerning issue regarding database encodings in which some characters are stored in the database in an inconsistent way; most European character sets are not being handled properly.  Accented characters, for example, will appear as symbols, which is clearly not the desired effect.  Take for example, the user with ID 234 (not yet activated), whose name is "José María Fernández"; this may appear in the BioCatalogue as "José Marà a Fernández" or something similar depending on the database encoding used.  
>> So far, this has been the case on my machine, Jits's machine, as well as the sandbox.  Could you please look into this just to make sure this issue does not affect the test and live sites as well...
>> Regards,
>> Mannie

Eric Nzuobontane
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD
United Kingdom

Tel   : +44 1223 492654
email : ericnzuo at ebi.ac.uk

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: biocat_db_encoding.txt
URL: <http://rubyforge.org/pipermail/biocatalogue-developers/attachments/20100819/da447e37/attachment.txt>

More information about the BioCatalogue-developers mailing list