Author mallyvai
Recipients mallyvai
Date 2017-09-26.09:28:19
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1506418099.9.0.80881309729.issue31590@psf.upfronthosting.co.za>
In-reply-to
Content
I'm writing python `csv` based-parsers as part of a data processing pipeline that includes Redshift and other data stores upstream and down. It's easy and expected in all of these data stores  (http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html) that CSV-style data can be generated with ESCAPE'd newlines, and with or without quotes on the columns.

Challenge: However, 2.x CSV module has a bug where ESCAPE'd newlines in unquoted CSVs are not actually treated as escaped newlines, but as entirely new record entries. This is at odds with expected behavior in most common data warehouses (See - Redshift docs I linked above for example) and is a subtle source of bugs for data processing pipelines. We changed our Redshift Parameters to ADDQUOTES so we could get around this bug, after some debugging. 

Note - This seems to be a continuation of https://bugs.python.org/issue15927 which was closed as WONTFIX for 2.x. I think this is a legitimate bug, and should be fixed in 2.x. If someone is relying on old / bad behavior might mean something else is wrong. In my view, the current behavior effectively adds an implicit, undocumented dialect to the CSV module.
History
Date User Action Args
2017-09-26 09:28:19mallyvaisetrecipients: + mallyvai
2017-09-26 09:28:19mallyvaisetmessageid: <1506418099.9.0.80881309729.issue31590@psf.upfronthosting.co.za>
2017-09-26 09:28:19mallyvailinkissue31590 messages
2017-09-26 09:28:19mallyvaicreate