Forums | Admin

Discussion Forums: help

Start New Thread Start New Thread
Message: 40617
BY: clint troxel (ctroxel)
DATE: 2008-01-25 22:50
SUBJECT: mangling characters in large data set

 

I'm having an odd problem that I hope will ring a bell for someone:

I'm importing a large csv file (fastercsv 1.2.3) in rails (2.0.2).

There are numerous rows in this csv file (26,000 rows total) that have odd characters. Some examples:

Sanz Martín,Javier,fake@fake.com,Svcs/ClntOp-Oth,SPAI
Ugarte Muñoz,Sonia,fake@fake.com,Ent/CIO,SPAI
Iglesias Vidal,María Belén,fake@fake.com,"Ent/St,Rs&CorpD",SPAI
Ivroth,Björn,fake@fake.com,Svcs/PrjBsd-Glb,Nordic
Fernandes,José,fake@fake.com,Svcs/PrjBsd-IO,SPAI

These rows are added as ActiveRecord objects.

The problem: these rows with odd characters don't import well -- fastercsv seems to get confused about which column is which, and many of the accented characters are loaded into the database incorrectly. They show up as ?'s. This seems to point to a character encoding problem,

BUT, if I manually remove all offending records from the large csv file and create a new csv file with ONLY THE PROBLEM ROWS the import of this new file happens perfectly and cleanly. Odd. Note: there are only about 27 of these records out of the 26,000 that appear broken after import.

I'd appreciate any help or hunches.
Thanks,
-clint


Thread View

Thread Author Date
mangling characters in large data setclint troxel2008-01-25 22:50
      RE: mangling characters in large data setJames Gray2008-01-25 23:07

Post a followup to this message