classification
Title: Extraneous newlines with csv.writer on Windows
Type: behavior Stage: resolved
Components: Documentation Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: baloan, eric.araujo, python-dev, r.david.murray, sjmachin, skip.montanaro, zacktu
Priority: normal Keywords:

Created on 2009-10-24 22:07 by zacktu, last changed 2011-03-20 15:40 by python-dev. This issue is now closed.

Messages (26)
msg94437 - (view) Author: Bob Cannon (zacktu) Date: 2009-10-24 22:08
I used csv.writer to open a file for writing with comma as separator and
dialect='excel'.  I used writerow to write each row into the file.  When
I execute under linux, each line is terminated by '\r\n'.  When I
execute under windows, each line is terminated by '\r\r\n'.  Thus, under
MS Windows, when I read the csv file, there is a blank line between each
significant line.  I have dropped cvs.writer and now build each line
manually and terminate it with '\n'.  When the line is written in
windows, it is terminated by '\r\n'.  That's what should happen.  

As I see it, writerow with dialect='excel' should only terminate a line
with '\n'.  Windows will automatically place a '\r' in front of the '\n'.
msg94438 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2009-10-24 22:52
Your output file should be opened in binary mode.  Sounds like you
opened it in text mode.
msg94441 - (view) Author: Bob Cannon (zacktu) Date: 2009-10-24 23:10
Probably so.  I'm sorry to report this as a bug if it's not.  I asked
abut this on a Python group on IRC and got no suggestions.  Thanks for
taking a look.
msg111694 - (view) Author: Andreas Balogh (baloan) Date: 2010-07-27 11:12
I encountered the same problem. It is unclear that using binary mode for the file is solving the problem. I suggest to add a hint to the documentation.
msg111773 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2010-07-28 08:07
Can you provide me with a concrete example which fails for you?
I don't have ready access to a Windows machine with Python on
it but should be able to arrange something at work, however before
going through the exercise of spending admin time to install
Python I would like to look at code which fails for you first.
msg111787 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-07-28 11:03
Bob, can you give us some code to reproduce the problem, in the form or a unit test or even just a regular function? It will help confirm the bug and fix it.
msg111792 - (view) Author: Bob Cannon (zacktu) Date: 2010-07-28 11:46
Eric,

This issue was resolved for me by Skip Montanaro's response less than an 
hour after I posted it.  I didn't understand why a text file had to be 
binary, but I no longer had a problem with extraneous.  In looking back 
at my message 94441, I think that it was ambiguous and that I should 
have made it clear that I no longer had a problem.  Perhaps in my 
ignorance as a newbie I didn't close the issue properly. 

I don't know that I can reproduce the problem any more.  I think that I 
was writing snippets of code to try to isolate the problem and when I 
used Skip's solution I changed the program and deleted the test code.

Please let me know what I can do to help you now.

Bob

Éric Araujo wrote:
> Éric Araujo <merwok@netwok.org> added the comment:
>
> Bob, can you give us some code to reproduce the problem, in the form or a unit test or even just a regular function? It will help confirm the bug and fix it.
>
> ----------
> nosy: +merwok
> stage:  -> unit test needed
> title: csv.writer -> Extraneous newlines with csv.writer on Windows
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue7198>
> _______________________________________
>
msg111793 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-07-28 12:14
If the documentation is not clear enough about requiring binary, it is a doc bug.

(P.S. Please strip unneeded quotes. Thanks)
msg111832 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2010-07-28 17:19
I got access to Python 2.6.5 on Windows and ran this simple
example:

Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.

    ****************************************************************
    Personal firewall software may warn about the connection IDLE
    makes to its subprocess using this computer's internal loopback
    interface.  This connection is not visible on any external
    interface and no data is sent to or received from the Internet.
    ****************************************************************
    
IDLE 2.6.5      
>>> f = open("H:sample.csv", "wb")
>>> import csv
>>> writer = csv.writer(f)
>>> writer.writerow([1,2,3])
>>> writer.writerow(['a', 'b', 'c'])
>>> del writer
>>> f.close()
>>> 

I then looked at the CSV file which it generated.
Looked find to me.  Each of the two rows was terminated
by a single CRLF pair.

Then I repeated the "test", opening the file in text
mode:

>>> f = open("H:sample2.csv", "w")

>>> writer = csv.writer(f)

>>> writer.writerow([1,2,3])

>>> writer.writerow(['a', 'b', 'c'])

>>> del writer

>>> f.close()

>>> 

That output does indeed terminate each line with
CRCRLF and when viewed in a spreadsheet program
such as OpenOffice Calc (probably Excel as well),
displays a blank line between the 123 row and the
abc row.

I've removed the "unit test needed" attribute from the
ticket as there is a test_writerows test case in the
Python test suite.  Also closing again and marking
invalid.  If you still believe there is actually a
problem, feel free to reopen this issue, but also
please send me (skip@pobox.com) a short example and
the erroneous output it produces for you (attach your
two files - don't just embed them in your mail msg).
msg111910 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2010-07-29 11:10
> If the documentation is not clear enough about requiring binary, it is
    > a doc bug.

The documentation for both csv.reader and csv.writer state (this is from the
Python 2.7 version):

    If *csvfile* is a file object, it must be opened with the 'b' flag on
    platforms where that makes a difference.

I suppose we could be explicit and mention Windows here, but the wording is
quite clear.  There is really no harm in always opening the file in binary
mode, and I do that myself even though I only program on Unix or Mac
platforms where it's safe to open the file in text mode.

This all changed in Python 3.  There, the choice of line ending is up to the
programmer, so file objects for use by the csv module are opened with
newline='' and when writing CSV data the writer object takes complete
control of proper line termination according to the programmer's stated
choice of lineterminator.

Skip
msg124580 - (view) Author: John Machin (sjmachin) Date: 2010-12-24 00:52
Please re-open this. The binary/text mode problem still exists with Python 3.X on Windows. Quite simply, there is no option available to the caller to open the output file in binary mode, because the module is throwing str objects at the file. The module's idea of "taking control" in the default case appears to be to write \r\n which is then processed by the Windows runtime and becomes \r\r\n.

Python 3.1.3 (r313:86834, Nov 27 2010, 18:30:53) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> f = open('terminator31.csv', 'w')
>>> row = ['foo', None, 3.14159]
>>> writer = csv.writer(f)
>>> writer.writerow(row)
14
>>> writer.writerow(row)
14
>>> f.close()
>>> open('terminator31.csv', 'rb').read()
b'foo,,3.14159\r\r\nfoo,,3.14159\r\r\n'
>>>

And it's not just a row terminator problem; newlines embedded in fields are likewise expanded to \r\n by the Windows runtime.
msg124598 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2010-12-24 16:38
John,

The API for the open() builtin function has changed.  You should open
the output file with newline="" instead of using the default.  Take a
look at the documentation for open() and csv.reader:

    http://docs.python.org/py3k/library/functions.html?highlight=open#open
    http://docs.python.org/py3k/library/csv.html?highlight=csv.reader#csv.reader

Note the form of the open() call in the csv.reader example.  This one
snuck by me as well.  Python 3 underwent a lot of change in the I/O
subsystem.  This was one of them.  If changing the form of the open()
call doesn't fix the problem, let me know.

Skip
msg124678 - (view) Author: John Machin (sjmachin) Date: 2010-12-26 20:52
Skip, I'm WRITING, not reading.. Please read the 3.1 documentation for csv.writer. It does NOT mention newline='', and neither does the example. Please fix.

Other problems with the examples: (1) They encourage a bad habit (open inside the call to reader/writer); good practice is to retain the reference to the file handle (preferably with a "with" statement) so that it can be closed properly. (2) delimiter=' ' is very unrealistic.

The documentation for both 2.x and 3.x should be much more explicit about what is needed in open() for csv to work properly and portably:

2.x read: use mode='rb' -- otherwise fail on Windows
2.x write: use mode='wb' -- otherwise fail on Windows
3.x read: use newline='' -- otherwise fail unconditionally(?)
3.x write: use newline='' -- otherwise fail on Windows

The 2.7 documentation says """If csvfile is a file object, it must be opened with the 'b' flag on platforms where that makes a difference""" ... in my experience, people are left asking "what platforms? what difference?"; Windows should be mentioned explicitly.
msg124682 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-26 22:52
OK, I'm reopening this as a doc issue, since currently the Python3 writer docs do not mention newline='', and it is indeed required on Windows.  John, would you care to suggest a doc patch?

I agree with Skip that "where it makes a difference" is more precise than specifically mentioning Windows, even if less useful in this context.  That is how the 'b' mode is documented in the open documentation.  To fix the problem with the CSV docs, the recommendation to use 'b' can simply be made unconditional, as it is for newline='' in python3.
msg126593 - (view) Author: John Machin (sjmachin) Date: 2011-01-20 06:43
"docpatch" for 3.x csv docs:

In the csv.writer docs, insert the sentence "If csvfile is a file object, it should be opened with newline=''." immediately after the sentence "csvfile can be any object with a write() method."

In the closely-following example, change the open call from "open('eggs.csv', 'w')" to "open('eggs.csv', 'w', newline='')".

In section 13.1.5 Examples, there are 2 reader cases and 1 writer case that likewise need inserting ", newline=''" in the open call.
msg131400 - (view) Author: John Machin (sjmachin) Date: 2011-03-19 07:55
Can somebody please review my "doc patch" submitted 2 months ago?
msg131417 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2011-03-19 13:59
John> John Machin <sjmachin@lexicon.net> added the comment:

    John> Can somebody please review my "doc patch" submitted 2 months ago?

My apologies.  I have it in my sandbox, but a combination of the switch to
Mercurial and lack of round tuits has conspired to keep me from checking it
in.

Skip
msg131418 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2011-03-19 14:06
Actually, I was thinking of another doc patch for the csv module.
Your changes (or something very like them) made it into the 3.2
release, as you can see here:

    http://docs.python.org/py3k/library/csv.html

S
msg131443 - (view) Author: John Machin (sjmachin) Date: 2011-03-19 21:27
Skip, The changes that I suggested have NOT been made. Please re-read the doc page you pointed to. The "writer" paragraph does NOT mention that newline='' is required when writing. The "writer" examples do NOT include newline=''. The examples have NOT been enhanced by using a "with" statement and not using space as an example delimiter.

PLEASE RE-OPEN THIS ISSUE.
msg131468 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-03-20 02:36
New changeset ab27f16f707a by R David Murray in branch 'default':
#7198: add newlines='' to csv.writer docs.
http://hg.python.org/cpython/rev/ab27f16f707a

New changeset 959f666470cc by R David Murray in branch 'default':
Merge #7198 doc fix.
http://hg.python.org/cpython/rev/959f666470cc

New changeset 9d1b1a95bc8f by R David Murray in branch 'default':
Merge #7198 doc fix.
http://hg.python.org/cpython/rev/9d1b1a95bc8f
msg131469 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-20 02:38
Fixed now.  Thanks, and sorry for the delay, and the confusion.
msg131475 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-20 04:02
Gah, I messed up the push.  Now I have to backport the doc fix :(
msg131495 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-03-20 14:30
New changeset 9201455f950b by R David Murray in branch '3.1':
#7198: really add newline='' to csv.writer docs.
http://hg.python.org/cpython/rev/9201455f950b

New changeset fa0563f3b7f7 by R David Murray in branch '3.2':
Really merge #7198
http://hg.python.org/cpython/rev/fa0563f3b7f7

New changeset ed0d1e07ce79 by R David Murray in branch 'default':
Dummy merge #7198
http://hg.python.org/cpython/rev/ed0d1e07ce79
msg131496 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-20 14:31
OK, now it's really done (I hope!).
msg131498 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2011-03-20 14:55
John> Skip, The changes that I suggested have NOT been made. Please
    John> re-read the doc page you pointed to. The "writer" paragraph does
    John> NOT mention that newline='' is required when writing. The "writer"
    John> examples do NOT include newline=''. The examples have NOT been
    John> enhanced by using a "with" statement and not using space as an
    John> example delimiter.

I copied the statement about using newline= from the reader() doc to the
writer() doc.  All the examples I see (I'm looking at the cpython repo -
that is, what will be 3.3) use the with statement and open files using
newline=''.  I don't think more changes are necessary.  I will consult with
other Python developers about merging these changes to other active
branches.  I simply don't understand the new Mercurial workflow well enough
to do it properly.

Skip
msg131501 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-03-20 15:40
New changeset 88876a264ebe by R David Murray in branch '3.1':
Markup fixes for #7198 patch.
http://hg.python.org/cpython/rev/88876a264ebe

New changeset d0d1235cb66e by R David Murray in branch '3.2':
Merge markup fixes for #7198 patch.
http://hg.python.org/cpython/rev/d0d1235cb66e

New changeset 2a8580f4897c by R David Murray in branch 'default':
Markup fixes for #7198 patch.
http://hg.python.org/cpython/rev/2a8580f4897c
History
Date User Action Args
2011-03-20 15:40:06python-devsetnosy: skip.montanaro, sjmachin, baloan, eric.araujo, r.david.murray, zacktu, python-dev
messages: + msg131501
2011-03-20 14:55:27skip.montanarosetnosy: skip.montanaro, sjmachin, baloan, eric.araujo, r.david.murray, zacktu, python-dev
messages: + msg131498
2011-03-20 14:31:23r.david.murraysetstatus: open -> closed
nosy: skip.montanaro, sjmachin, baloan, eric.araujo, r.david.murray, zacktu, python-dev
messages: + msg131496
2011-03-20 14:30:19python-devsetnosy: skip.montanaro, sjmachin, baloan, eric.araujo, r.david.murray, zacktu, python-dev
messages: + msg131495
2011-03-20 04:02:40r.david.murraysetstatus: closed -> open
assignee: skip.montanaro -> r.david.murray
nosy: skip.montanaro, sjmachin, baloan, eric.araujo, r.david.murray, zacktu, python-dev
2011-03-20 04:02:18r.david.murraysetnosy: skip.montanaro, sjmachin, baloan, eric.araujo, r.david.murray, zacktu, python-dev
messages: + msg131475
2011-03-20 02:38:59r.david.murraysetnosy: skip.montanaro, sjmachin, baloan, eric.araujo, r.david.murray, zacktu, python-dev
messages: + msg131469
resolution: accepted -> fixed
stage: needs patch -> resolved
2011-03-20 02:36:47python-devsetnosy: + python-dev
messages: + msg131468
2011-03-19 21:27:04sjmachinsetnosy: skip.montanaro, sjmachin, baloan, eric.araujo, r.david.murray, zacktu
messages: + msg131443
2011-03-19 14:06:04skip.montanarosetstatus: open -> closed

messages: + msg131418
resolution: accepted
nosy: skip.montanaro, sjmachin, baloan, eric.araujo, r.david.murray, zacktu
2011-03-19 13:59:18skip.montanarosetnosy: skip.montanaro, sjmachin, baloan, eric.araujo, r.david.murray, zacktu
messages: + msg131417
2011-03-19 07:55:31sjmachinsetnosy: skip.montanaro, sjmachin, baloan, eric.araujo, r.david.murray, zacktu
messages: + msg131400
2011-01-20 06:43:34sjmachinsetnosy: skip.montanaro, sjmachin, baloan, eric.araujo, r.david.murray, zacktu
messages: + msg126593
2010-12-26 22:52:14r.david.murraysetstatus: closed -> open

components: + Documentation
versions: - Python 3.3
nosy: + r.david.murray

messages: + msg124682
resolution: not a bug -> (no value)
stage: needs patch
2010-12-26 20:52:49sjmachinsetnosy: skip.montanaro, sjmachin, baloan, eric.araujo, zacktu
messages: + msg124678
versions: + Python 2.7, Python 3.2, Python 3.3
2010-12-24 16:38:56skip.montanarosetnosy: skip.montanaro, sjmachin, baloan, eric.araujo, zacktu
messages: + msg124598
2010-12-24 00:52:41sjmachinsetnosy: + sjmachin

messages: + msg124580
versions: + Python 3.1, - Python 2.6
2010-07-29 11:10:30skip.montanarosetmessages: + msg111910
2010-07-28 17:19:36skip.montanarosetstatus: open -> closed
resolution: not a bug
messages: + msg111832

stage: test needed -> (no value)
2010-07-28 12:14:33eric.araujosetmessages: + msg111793
2010-07-28 11:46:53zacktusetmessages: + msg111792
2010-07-28 11:03:56eric.araujosetnosy: + eric.araujo
title: csv.writer -> Extraneous newlines with csv.writer on Windows
messages: + msg111787

stage: test needed
2010-07-28 08:07:06skip.montanarosetstatus: closed -> open
assignee: skip.montanaro
messages: + msg111773
2010-07-27 11:12:12baloansetnosy: + baloan
messages: + msg111694
2009-10-24 23:11:18zacktusetstatus: open -> closed
2009-10-24 23:10:41zacktusetmessages: + msg94441
2009-10-24 22:52:30skip.montanarosetnosy: + skip.montanaro
messages: + msg94438
2009-10-24 22:08:39zacktusetmessages: + msg94437
2009-10-24 22:07:03zacktucreate