classification
Title: csv input converts \r\n to \n but csv output does not when a field has internal line breaks
Type: behavior Stage: resolved
Components: IO, Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: skip.montanaro Nosy List: ajaksu2, fenner, gregory.p.smith, pitrou, skip.montanaro
Priority: normal Keywords:

Created on 2007-11-28 05:15 by fenner, last changed 2009-05-12 14:08 by ajaksu2. This issue is now closed.

Files
File name Uploaded Description Edit
issue1511.py gregory.p.smith, 2007-11-28 05:33
issue1511.csv gregory.p.smith, 2007-11-28 05:36
issue1511_py3k.py ajaksu2, 2009-05-12 13:23
Messages (7)
msg57902 - (view) Author: Bill Fenner (fenner) Date: 2007-11-28 05:15
When a field has internal line breaks, e.g.,

foo,"bar
baz
biff",boo

that is actually 3 lines, but one csv-file row.  csv.reader() converts 
this to ['foo', 'bar\nbaz\nbiff', 'boo'].  This is a reasonable 
behavior.

Unfortunately, csv.writer() does not use the dialect's lineterminator 
setting for values with such internal linebreaks.  This means that the 
resulting file will have a mix of line-termination styles:

foo,"bar\n
baz\n
biff",boo\r\n

If the reading csv implementation is strict about its line termination, 
these line breaks will not be read properly.
msg57903 - (view) Author: Bill Fenner (fenner) Date: 2007-11-28 05:19
I realized that my description was not crystal clear - the file being 
read has \r\n line terminators - in the format that I used later, the 
input file is

foo,"bar\r\n
baz\r\n
biff",boo\r\n
msg57904 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2007-11-28 05:33
release25-maint and trunk (2.6) appear to do the correct thing when
testing on my ubuntu gutsy linux x86 box.  test script and file attached.

The problem is reproducable in a release24-maint build compiled 2007-11-05.
msg57905 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2007-11-28 05:36
attaching the test input file.  use od -x or similar to compare the
new.csv output with issue1511.csv to see if the problem happened.

its 2.4.. that may be old enough to be considered dead
msg87624 - (view) Author: Daniel Diniz (ajaksu2) (Python triager) Date: 2009-05-12 13:23
I get different behavior in py3k compared to trunk:

~/trunk-py$ ./python issue1511_py3k.py
[['foo', 'bar\r\nbaz\r\nbiff', 'boo']]
'foo,"bar\r\nbaz\r\nbiff",boo\r\n'

~/trunk-py$ ../py3k/python issue1511_py3k.py
[['foo', 'bar\nbaz\nbiff', 'boo']]
'foo,"bar\nbaz\nbiff",boo\n'
msg87631 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2009-05-12 13:59
Daniel> Daniel Diniz <ajaksu@gmail.com> added the comment:

    Daniel> I get different behavior in py3k compared to trunk:

        Daniel> ~/trunk-py$ ./python issue1511_py3k.py
        Daniel> [['foo', 'bar\r\nbaz\r\nbiff', 'boo']]
        Daniel> 'foo,"bar\r\nbaz\r\nbiff",boo\r\n'

        Daniel> ~/trunk-py$ ../py3k/python issue1511_py3k.py
        Daniel> [['foo', 'bar\nbaz\nbiff', 'boo']]
        Daniel> 'foo,"bar\nbaz\nbiff",boo\n'

Try adding newline='' to your open calls.  I believe that will preserve the
CRLF pairs.

Skip
msg87632 - (view) Author: Daniel Diniz (ajaksu2) (Python triager) Date: 2009-05-12 14:08
You're right, sorry about the noise. Closing as out of date.
History
Date User Action Args
2009-05-12 14:08:00ajaksu2setstatus: open -> closed
resolution: out of date
messages: + msg87632

stage: test needed -> resolved
2009-05-12 13:59:33skip.montanarosetmessages: + msg87631
2009-05-12 13:23:17ajaksu2setfiles: + issue1511_py3k.py

components: + IO
versions: + Python 2.6, - Python 2.4
nosy: + ajaksu2, pitrou

messages: + msg87624
stage: test needed
2008-01-20 19:54:41christian.heimessetpriority: normal
2007-11-28 12:45:53skip.montanarosetassignee: skip.montanaro
nosy: + skip.montanaro
2007-11-28 05:36:15gregory.p.smithsetfiles: + issue1511.csv
messages: + msg57905
2007-11-28 05:33:03gregory.p.smithsetfiles: + issue1511.py
nosy: + gregory.p.smith
messages: + msg57904
2007-11-28 05:19:39fennersetmessages: + msg57903
2007-11-28 05:18:53fennersetcomponents: + Library (Lib)
2007-11-28 05:15:17fennercreate