[BioCatalogue-developers] Database Encoding Issues
ericnzuo at ebi.ac.uk
Thu Aug 19 05:52:48 EDT 2010
Like I mentioned earlier, the default encoding for mysql which we are
using for most tables is latin1. I believe this can be changed and
should be changed to utf-8, which would make it consistent with rails
expectations as well. This should apply to the db engine as well except
there is a special reason for using myISAM tables, I would expect all
tables to use innodb engine.
Jiten Bhagat wrote:
> Note that:
> - the database is storing it "correctly" (i.e.: not losing any
> information, just that it's a different encoding then what it should be).
> - but the Rails app expects things in UTF-8 usually, so displaying it in
> the UI (and API) causes it to mess up due to a disjointedness in the
> - interestingly, PHPMyAdmin displays it correctly, probably because it
> is not assuming the encoding and instead working with the database's
> encoding (we could simulate something like with String#mb_chars in
> Rails, but...)
> So, there is the bigger issue of the database being an encoding that is
> not modern enough and different to what is expected in Rails; I remember
> we had those issues a while back when people were trying to submit
> content with non-english characters and it kept failing in the database
> due to a "swedish collation/encoding" issue.
> I may be wrong though; this could be a Ruby issue (I know that Ruby has
> recently been criticised a lot for how it handles encodings of strings).
> This is something VERY IMPORTANT to look into, please.
> Mannie Tagarira wrote:
>> Hi Eric,
>> There is a concerning issue regarding database encodings in which some characters are stored in the database in an inconsistent way; most European character sets are not being handled properly. Accented characters, for example, will appear as symbols, which is clearly not the desired effect. Take for example, the user with ID 234 (not yet activated), whose name is "José María Fernández"; this may appear in the BioCatalogue as "JosÃ© MarÃ a FernÃ¡ndez" or something similar depending on the database encoding used.
>> So far, this has been the case on my machine, Jits's machine, as well as the sandbox. Could you please look into this just to make sure this issue does not affect the test and live sites as well...
European Bioinformatics Institute
Wellcome Trust Genome Campus
Tel : +44 1223 492654
email : ericnzuo at ebi.ac.uk
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
More information about the BioCatalogue-developers