Message 83350 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	skip.montanaro
Recipients	skip.montanaro
Date	2009-03-09.02:48:36
SpamBayes Score	1.7049695e-12
Marked as misclassified	No
Message-id	<1236566919.88.0.376307932068.issue5455@psf.upfronthosting.co.za>
In-reply-to

Content
I just discovered that the csv module's reader class in 3.x doesn't work as expected when used as documented. The requirement has always been that the CSV file is opened in binary mode so that embedded newlines in fields are screwed up. Alas, in 3.x files opened in binary mode return their contents as bytes, not unicode strings which are apparently not allowed by the next() builtin: % python3.1 Python 3.1a0 (py3k:70084M, Feb 28 2009, 20:46:48) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import csv >>> next(csv.reader(open("f.csv", "rb"))) Traceback (most recent call last): File "<stdin>", line 1, in <module> _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) >>> next(csv.reader(open("f.csv", "r"))) ['col1', 'col2', 'color'] At the very least the documentation for the csv.reader class is no longer correct. However, I can't see how you can open a CSV file in text mode and not screw up embedded newlines. I think binary mode has to stay and some other way of dealing with bytes has to be found.

I just discovered that the csv module's reader class in 3.x doesn't work
as expected when used as documented.  The requirement has always been
that the CSV file is opened in binary mode so that embedded newlines in
fields are screwed up.  Alas, in 3.x files opened in binary mode return
their contents as bytes, not unicode strings which are apparently not
allowed by the next() builtin:

% python3.1
Python 3.1a0 (py3k:70084M, Feb 28 2009, 20:46:48) 
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv    
>>> next(csv.reader(open("f.csv", "rb")))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_csv.Error: iterator should return strings, not bytes (did you open the 
file in text mode?)
>>> next(csv.reader(open("f.csv", "r")))
['col1', 'col2', 'color']

At the very least the documentation for the csv.reader class is no
longer correct.  However, I can't see how you can open a CSV file in
text mode and not screw up embedded newlines.  I think binary mode
*has* to stay and some other way of dealing with bytes has to be found.

History
Date	User	Action	Args
2009-03-09 02:48:40	skip.montanaro	set	recipients: + skip.montanaro
2009-03-09 02:48:39	skip.montanaro	set	messageid: <1236566919.88.0.376307932068.issue5455@psf.upfronthosting.co.za>
2009-03-09 02:48:38	skip.montanaro	link	issue5455 messages
2009-03-09 02:48:36	skip.montanaro	create