| Message: 44596 |
 |
BY: James Bebbington (fractious) DATE: 2008-02-21 17:23 SUBJECT: Parsing *large* malformed CSVs Hi,
I'm trying to use FasterCSV to import a large product feed csv (250MB+ / 450,000+ records) from a third party. The import was going fine for the first 13,000 odd records then suddenly froze with memory usage climbing up from a steady 30MB to 200MB+.
Examining the the record it froze on:
"Women's Petite Long Sleeve Pima Polo","http://pdt.tradedoubler.com/click?a(1294235)p(18460)prod(82795623)ttid(5)","http://www.landsend.co.uk/INTL/full/77/f1779061_1999.jpg","''It may feel as 'soft as the hair of an angel' to the touch, but our Peruvian pima cotton polo gets tough on the rigours of ev...''","22.00","GBP","82795623","85","Women's clothing",""Womens|Petite|Tops, Tunics & Tees|Polos"","","","","","","","","","","","","","","","","","","","","","Lands End Affiliate Program","http://img.tradedoubler.com/images/uk/merchants/landsend_gb_100.gif","18460","Manufacturer:Lands' End"
it appears that the field ""Womens|Petite|Tops, Tunics & Tees|Polos"" may well be the culprit due to its double quotes. Having extracted that record and the previous out into a file to test against, FasterCSV does indeed throw a MalformedCSVError. Unfortunately, as mentioned above, in the context of the full csv no exception is throw and processing effectively stops.
Can anyone shed some light on what FasterCSV is doing, could it be searching through the rest of the file for something before throwing the exception? (I gave up after waiting 15mins and killed the process)
Thanks.
| |