classification
Title: use universal newline mode in csv module examples
Type: enhancement Stage: needs patch
Components: Documentation Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, georg.brandl, jesstess, pitrou, r.david.murray, sfinnie, terry.reedy
Priority: normal Keywords:

Created on 2010-04-13 21:11 by sfinnie, last changed 2019-03-15 23:54 by BreamoreBoy.

Files
File name Uploaded Description Edit
test_csv.py jesstess, 2014-04-20 01:42
test.csv jesstess, 2014-04-20 01:42
Messages (10)
msg103086 - (view) Author: (sfinnie) Date: 2010-04-13 21:11
Running the examples in the csv module docs (http://docs.python.org/library/csv.html) causes problems reading file on a mac.  This is highlighted in issue 1072404 (http://bugs.python.org/issue1072404).

Commentary on the bug indicates a no fix, meaning most/many people using a mac will get an error if they use the sample code in the docs.

A simpler solution would be to use universal newline mode in the doc examples.  This is actually mentioned in commentary on the bug, and appears to work.

Proposal
--------
In all example code blocks, use mode 'rU' when opening the file.  1st code block, for example, would become:

spamReader = csv.reader(open('eggs.csv', 'rU'), delimiter=' ', quotechar='|')

That should solve the problem on mac without impacting compatibility on other operating systems.  Note: Haven't been able to verify this on other platforms.
msg113190 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-07 18:51
In the current 2.7 docs, files are opened with 'rb' or 'wb'.
In msg106210 of #1072404, RDM says "The doc has been fixed;".
I am not sure if this refers a change in the open or just removal of reference to non-working delimiter option.
David?
Any opinion on this request?
msg113221 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-08 01:00
"The doc has been fixed" refers to the fact that the lineterminator dialect option is now documented as applying only to writing, not to reading.

The docs could certainly be improved to discuss using universal newline mode.  I'm not clear on whether or not there are disadvantages to using universal newline mode with the py2 version of the csv module, but I wouldn't be surprised if there are.  Perhaps Skip can comment on whether changing the examples to use rU would be a good idea or not.

Note that the situation for the py3k csv module is different, and it would be helpful if someone could test this issue there.  Though in truth we have no resources to support non-OSX macs any longer, so if it doesn't work it may be just tough luck.
msg216887 - (view) Author: Jessica McKellar (jesstess) * (Python triager) Date: 2014-04-20 01:42
I ran some experiments to see what the state of the world is. I generated a test.csv by exporting a CSV file from Numbers on OSX. This generated a file with Windows-style \r\n-terminated lines. The attached test_csv.py tries to open this CSV file in binary and universal newlines modes. Here's what happens on various platforms

Python 3:
* Linux: both binary and universal work
* OSX: binary errors out, universal works
* Windows: binary errors out, universal works

In both cases, the error was:

$ python3 test_csv.py
Traceback (most recent call last):
  File "test_csv.py", line 5, in <module>
    for row in spamreader:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

Python 2:
* Linux: both binary and universal work
* OSX: both binary and universal
* Windows: wasn't readily able to test

If I manually create a CSV file using TextEdit in plaintext mode on OSX, that produces a file with Mac-style \r-terminated lines. test_csv.py has the same results on this file on OSX (errors out in binary mode in Python 3).
msg216888 - (view) Author: Jessica McKellar (jesstess) * (Python triager) Date: 2014-04-20 02:07
All of the examples from https://docs.python.org/3/library/csv.html run without issue on OSX, though.

In summary, the Python 2 examples error out on OSX and switching them to use 'U' instead of 'b' would fix this. I don't think any action needs to be taken for Python 3.

My one remaining question is about binary files on Windows. The Python 2 csv docs say "If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference." I don't readily have a Windows machine to play with this -- do "binary" CSV files exist, or can we eliminate the 'b' language entirely and just talk about 'U'?
msg216892 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-04-20 02:23
I think that it's complete nonsense to talk about binary csv files on Windows.  They are just plain text files that can be manipulated with any old editor or a spreadsheet.
msg216894 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-04-20 02:27
The magic of newline='' in python3 is that it *preserves* the line end characters, which is the same thing binary mode does on windows.  The place that matters, as I remember it, is when there is a newline embedded inside a quoted string.  I don't remember *why* that matters, though :(.  But it had something to do with how the csv module processes the data internally.
msg216904 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-20 15:08
Note that 'U' is a no-op under Python 3, it's just there for compatibility reasons; i.e. 'rU' is the same as 'r'.

Also, from a quick glance, the CSV parser in _csv.c looks newline-agnostic.

@sfinnie: can you explain which problems you encountered running the examples? Please also post the resulting exception tracebacks, if any.
msg216909 - (view) Author: Jessica McKellar (jesstess) * (Python triager) Date: 2014-04-20 16:22
I realized that I typo'd 2 instead of 3 in http://bugs.python.org/issue8387#msg216888 which makes that message confusing. Here's a restatement of my findings:

* All of the Python 3 csv examples work in Python 3 on all platforms.
* The Python 2 binary-mode csv examples work in Python 2.7 on all platforms.
* The Python 2 binary-mode csv examples error out on Windows and OSX when run under Python 3. We could do nothing to address this, or, if we determine that there's no negative impact to removing the 'b', update the examples to accommodate readers who are running Python 2 examples using Python 3 for whatever reason.

Which does bring me to the same question as @pitrou, which is what data and code cause an error for @sfinnie on Python 2. :)
msg221627 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-06-26 17:42
@sfinnie can we please have a response to the question first asked by Antoine and repeated by Jessica, thanks.
History
Date User Action Args
2019-03-15 23:54:44BreamoreBoysetnosy: - BreamoreBoy
2014-06-26 17:42:43BreamoreBoysetmessages: + msg221627
2014-04-26 00:09:43terry.reedysetversions: - Python 3.1, Python 3.2
2014-04-20 16:22:59jesstesssetmessages: + msg216909
2014-04-20 15:08:57pitrousetnosy: + pitrou
messages: + msg216904
2014-04-20 02:27:56r.david.murraysetmessages: + msg216894
2014-04-20 02:23:14BreamoreBoysetnosy: + BreamoreBoy
messages: + msg216892
2014-04-20 02:07:22jesstesssetmessages: + msg216888
2014-04-20 01:42:14jesstesssetfiles: + test.csv
2014-04-20 01:42:02jesstesssetfiles: + test_csv.py
nosy: + jesstess
messages: + msg216887

2011-03-19 19:11:45skip.montanarosetnosy: - skip.montanaro
2010-08-08 01:00:09r.david.murraysetnosy: + skip.montanaro
messages: + msg113221
2010-08-07 18:51:30terry.reedysetversions: - Python 2.6
nosy: + terry.reedy, r.david.murray, docs@python

messages: + msg113190

assignee: georg.brandl -> docs@python
stage: needs patch
2010-07-11 02:00:59terry.reedysetversions: + Python 3.1, Python 3.2
2010-04-13 21:11:23sfinniecreate