[Rails I18n] UTF8 vs Latin1 String comparison in MySQL?
Sarah Allen
sarah at ultrasaurus.com
Wed May 20 08:55:50 EDT 2009
Wolfram Arnold wrote:
> I'm running into a surprising twist in MySql string compare.
>
> Try this:
>
> select "América" = "America";
> +------------------------+
> | "América" = "America" |
> +------------------------+
> | 1 |
> +------------------------+
I looked into this a bit this morning. It seems that string comparison
is affected by "collation" not charset. The default utf8 collation
yields the unexpected result:
mysql> SHOW VARIABLES LIKE '%collat%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)
mysql> select "América" = "America";
+------------------------+
| "América" = "America" |
+------------------------+
| 1 |
+------------------------+
1 row in set (0.00 sec)
But if I change the collation to utf8_bin, the strings compare as
unequal:
mysql> set collation='utf8_bin';
Query OK, 0 rows affected (0.04 sec)
mysql> set collation_server='utf8_bin';
Query OK, 0 rows affected (0.00 sec)
mysql> set collation_database='utf8_bin';
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE '%collat%';
+----------------------+----------+
| Variable_name | Value |
+----------------------+----------+
| collation_connection | utf8_bin |
| collation_database | utf8_bin |
| collation_server | utf8_bin |
+----------------------+----------+
3 rows in set (0.00 sec)
mysql> select "América" = "America";;
+------------------------+
| "América" = "America" |
+------------------------+
| 0 |
+------------------------+
1 row in set (0.00 sec)
I'm a bit confused myself about how utf8_general_ci could sort
effectively if 'é' equals 'e' since users would expect all the like
characters to group together and why this collation would be useful; nor
do I understand if there are other implications to using utf8_bin
collation.
There is a nicely written report on Unicode collation here:
http://www.unicode.org/unicode/reports/tr10/
Sarah
--
Posted via http://www.ruby-forum.com/.
More information about the Railsi18n-discussion
mailing list