classification
Title: csv.reader() does not support escaped newline when quoting=csv.QUOTE_NONE
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: kalaxy, lukasz.langa, maciej.szulik, mjohnson, python-dev, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2012-09-12 04:49 by kalaxy, last changed 2013-03-20 02:44 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
test_csv.py kalaxy, 2012-09-12 04:49 Script exhibiting bug.
test_csv_py3k.py maciej.szulik, 2012-10-09 20:41 Script exhibiting bug for py3k.
csv.patch mjohnson, 2013-03-20 01:43 review
Messages (7)
msg170352 - (view) Author: Kalon Mills (kalaxy) Date: 2012-09-12 04:49
cvs.reader improperly prematurely ends row parsing when parsing a row with an escaped newline but with quoting turned off.  cvs.reader properly handles quoted newlines.  cvs.writer properly handles writing escaped unquoted newlines so only the reader has an issue.

Given a dialect with escapechar='\\', quoting=csv.QUOTE_NONE, lineterminator='\n':

writer.writerow(['one\nelement']) will correctly write 'one\\\nelement\n'

however pass that back into a reader and it will produce two rows: ['one\n'] ['element']

I would expect the reader to parse it correctly and return the original value of ['one\nelement']
 
I've attached a test script that exhibits the improper behavior.  It uses a dialect to set an escapechar and disable quoting.
msg172521 - (view) Author: Maciej Szulik (maciej.szulik) * Date: 2012-10-09 20:41
I've confirmed that bug in the latest repo version, still exists. I attach patch for py3k. 
I'll try to have a look at it in the current version, as soon as it will be fixed I'll port it to 2.7.
msg175900 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-18 18:33
CSVĀ is not well defined format. What you expect to read from csv.reader(['one', 'two'])? If two rows ['one'] and ['two'], than the reader in its own right and there is no bug which can be fixed.
msg184415 - (view) Author: Kalon Mills (kalaxy) Date: 2013-03-18 02:49
Serhiy,  sorry I'm not sure I understand your question.  But if you take a look at the script that exhibits the problem I think the bug that I'm reporting becomes more clear.

Namely, using the dialect configuration shown in the script, the round trip conversion from string through writer then through the reader back to string is inconsistent.  The reader should return as output the same input that was given to the corresponding writer and this is not the case.  

So even if CVS in not well defined I believe the writer and reader should at least be consistent.
msg184720 - (view) Author: Michael Johnson (mjohnson) * Date: 2013-03-20 01:43
On input, the reader sees a line like 

    ['one\\\n','element']

from the file iterator and successfully escapes the newline character, but still interprets the end of the string as the end of a record.  I've attached a patch that modifies this behavior, so that encountering the end of a string immediately after an escaped \r or \n is does not begin a new record.
msg184723 - (view) Author: Roundup Robot (python-dev) Date: 2013-03-20 02:42
New changeset 940748853712 by R David Murray in branch 'default':
#15927: Fix cvs.reader parsing of escaped \r\n with quoting off.
http://hg.python.org/cpython/rev/940748853712
msg184724 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-03-20 02:44
Although this is clearly a bug fix, it also represents a behavior change that could cause a working program to fail.  I have therefore only applied it to 3.4, but I'm open to arguments that it should be backported.

Thanks for the patch, Michael.
History
Date User Action Args
2013-03-20 02:44:33r.david.murraysetstatus: open -> closed

assignee: lukasz.langa ->
versions: - Python 2.7, Python 3.2, Python 3.3
nosy: + r.david.murray

messages: + msg184724
resolution: fixed
stage: resolved
2013-03-20 02:42:03python-devsetnosy: + python-dev
messages: + msg184723
2013-03-20 01:43:57mjohnsonsetfiles: + csv.patch

nosy: + mjohnson
messages: + msg184720

keywords: + patch
2013-03-18 02:49:09kalaxysetmessages: + msg184415
2012-11-18 18:33:13serhiy.storchakasetmessages: + msg175900
2012-10-09 22:36:47serhiy.storchakasetnosy: + serhiy.storchaka
2012-10-09 20:45:31lukasz.langasetassignee: lukasz.langa

nosy: + lukasz.langa
versions: + Python 3.2, Python 3.3
2012-10-09 20:41:52maciej.szuliksetfiles: + test_csv_py3k.py
versions: + Python 3.4
nosy: + maciej.szulik

messages: + msg172521
2012-09-13 03:32:08chris.jerdoneksetcomponents: + Library (Lib), - None
title: cvs.reader does not support escaped newline when quoting=cvs.QUOTE_NONE -> csv.reader() does not support escaped newline when quoting=csv.QUOTE_NONE
2012-09-12 04:49:29kalaxycreate