Issue 5455: csv module no longer works as expected when file opened in binary mode

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/49705

classification

Title:	csv module no longer works as expected when file opened in binary mode
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.0, Python 3.1

process

Status:	closed	Resolution:	duplicate
Dependencies:		Superseder:	csv fails when file is opened in binary mode View: 4847
Assigned To:		Nosy List:	georg.brandl, jdwhitley, sjmachin, skip.montanaro
Priority:	normal	Keywords:

Created on 2009-03-09 02:48 by skip.montanaro, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (7)
msg83350 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2009-03-09 02:48
I just discovered that the csv module's reader class in 3.x doesn't work as expected when used as documented. The requirement has always been that the CSV file is opened in binary mode so that embedded newlines in fields are screwed up. Alas, in 3.x files opened in binary mode return their contents as bytes, not unicode strings which are apparently not allowed by the next() builtin: % python3.1 Python 3.1a0 (py3k:70084M, Feb 28 2009, 20:46:48) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import csv >>> next(csv.reader(open("f.csv", "rb"))) Traceback (most recent call last): File "<stdin>", line 1, in <module> _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) >>> next(csv.reader(open("f.csv", "r"))) ['col1', 'col2', 'color'] At the very least the documentation for the csv.reader class is no longer correct. However, I can't see how you can open a CSV file in text mode and not screw up embedded newlines. I think binary mode has to stay and some other way of dealing with bytes has to be found.
msg83351 - (view)	Author: Jervis Whitley (jdwhitley)	Date: 2009-03-09 03:01
in _csv.c, the check is done here: lineobj = PyIter_Next(self->input_iter); if (lineobj == NULL) { /* End of input OR exception */ if (!PyErr_Occurred() && self->field_len != 0) PyErr_Format(error_obj, "newline inside string"); return NULL; } if (!PyUnicode_Check(lineobj)) { PyErr_Format(error_obj, "iterator should return strings, " "not %.200s " "(did you open the file in text mode?)", lineobj->ob_type->tp_name ); Py_DECREF(lineobj); return NULL; } So the returned lineobj is a bytes type and then the PyUnicode_Check throws the error.
msg83353 - (view)	Author: Jervis Whitley (jdwhitley)	Date: 2009-03-09 03:09
Hi Skip, Currently, once we are sure the lineobj is a unicode obj we then get it's internal buffer using: line = PyUnicode_AsUnicode(lineobj); for the purpose of iterating through the line. is there an opportunity to use: line = PyBytes_AsString(lineobj); (or similar approach if I have quoted an incorrect function) for the case that we have a bytes object (not Unicode)?
msg83355 - (view)	Author: John Machin (sjmachin)	Date: 2009-03-09 04:43
This is in effect a duplicate of issue 4847. Summary: The docs are CORRECT. The 3.X implementation is WRONG. The 2.X implementation is CORRECT. See examples in my comment on issue 4847.
msg83376 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2009-03-09 13:15
Jervis> So the returned lineobj is a bytes type and then the Jervis> PyUnicode_Check throws the error. Right, but given that fact how do you get a Unicode string out of the bytes without an encoding? You can't open a file in binary mode and give the encoding arg.
msg83380 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2009-03-09 14:04
John> The docs are CORRECT. John> The 3.X implementation is WRONG. John> The 2.X implementation is CORRECT. I agree. I posted a note to python-dev referencing both tickets. Hopefully one of the bytes/unicode experts there can shed some light on a possible solution. Skip
msg85524 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2009-04-05 16:28
Setting #4847 as superseder.

History
Date	User	Action	Args
2022-04-11 14:56:46	admin	set	github: 49705
2009-04-05 16:28:33	georg.brandl	set	status: open -> closed nosy: + georg.brandl messages: + msg85524 superseder: csv fails when file is opened in binary mode resolution: duplicate
2009-03-09 14:04:16	skip.montanaro	set	messages: + msg83380
2009-03-09 13:15:43	skip.montanaro	set	messages: + msg83376
2009-03-09 04:43:51	sjmachin	set	nosy: + sjmachin messages: + msg83355
2009-03-09 03:09:33	jdwhitley	set	messages: + msg83353
2009-03-09 03:01:03	jdwhitley	set	messages: + msg83351
2009-03-09 02:56:12	jdwhitley	set	nosy: + jdwhitley
2009-03-09 02:48:38	skip.montanaro	create