Message 82661 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	sjmachin
Recipients	georg.brandl, jaywalker, pitrou, sjmachin, vstinner
Date	2009-02-24.07:25:40
SpamBayes Score	1.933348e-11
Marked as misclassified	No
Message-id	<1235460343.9.0.647749679832.issue4847@psf.upfronthosting.co.za>
In-reply-to

Content
Sorry, folks, we've got an understanding problem here. CSV files are typically NOT created by text editors. They are created e.g. by "save as csv" from a spreadsheet program, or as an output option by some database query program. They can have just about any character in a field, including \r and \n. Fields containing those characters should be quoted (just like a comma) by the csv file producer. A csv reader should be capable of reproducing the original field division. Here for example is a dump of a little file I just created using Excel 2003: C:\devel\csv>\python26\python -c "print repr(open('book1.csv','rb').read())" 'Field1,"Field 2 has a\nvery long\nheading",Field3\r\n1.11,2.22,3.33\r\n' Inserting \n into a text field in Excel (using Alt-Enter) is a well-known user trick. Here's what we get from Python 2.6.1: C:\devel\csv>\python26\python -c "import csv; print repr(list(csv.reader(open('book1.csv','rb'))))" [['Field1', 'Field 2 has a\nvery long\nheading', 'Field3'], ['1.11', '2.22', '3.33']] and the same by design all the way back to Python 2.3's csv module and its ancestor, the ObjectCraft csv module. However with Python 3.0.1 we get: C:\devel\csv>\python30\python -c "import csv; print(repr(list(csv.reader(open('book1.csv','rb')))))" Traceback (most recent call last): File "<string>", line 1, in <module> _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) This sentence in the documentation is NOT an error: """If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.""" The problem IS a "biggie". This paragraph in the documentation (evidently introduced in 2.5) is rather confusing:"""The parser is quite strict with respect to multi-line quoted fields. Previously, if a line ended within a quoted field without a terminating newline character, a newline would be inserted into the returned field. This behavior caused problems when reading files which contained carriage return characters within fields. The behavior was changed to return the field without inserting newlines. As a consequence, if newlines embedded within fields are important, the input should be split into lines in a manner which preserves the newline characters.""" Some examples of what it is talking about would be a very good idea.

Sorry, folks, we've got an understanding problem here. CSV files are
typically NOT created by text editors. They are created e.g. by "save as
csv" from a spreadsheet program, or as an output option by some database
query program. They can have just about any character in a field,
including \r and \n. Fields containing those characters should be quoted
(just like a comma) by the csv file producer. A csv reader should be
capable of reproducing the original field division. Here for example is
a dump of a little file I just created using Excel 2003:

C:\devel\csv>\python26\python -c "print repr(open('book1.csv','rb').read())"
'Field1,"Field 2 has a\nvery long\nheading",Field3\r\n1.11,2.22,3.33\r\n'

Inserting \n into a text field in Excel (using Alt-Enter) is a
well-known user trick.

Here's what we get from Python 2.6.1:
C:\devel\csv>\python26\python -c "import csv; print
repr(list(csv.reader(open('book1.csv','rb'))))"
[['Field1', 'Field 2 has a\nvery long\nheading', 'Field3'], ['1.11',
'2.22', '3.33']]
and the same by design all the way back to Python 2.3's csv module and
its ancestor, the ObjectCraft csv module.

However with Python 3.0.1 we get:
C:\devel\csv>\python30\python -c "import csv;
print(repr(list(csv.reader(open('book1.csv','rb')))))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
_csv.Error: iterator should return strings, not bytes (did you open the
file in text mode?)

This sentence in the documentation is NOT an error: """If csvfile is a
file object, it must be opened with the ‘b’ flag on platforms where that
makes a difference."""

The problem *IS* a "biggie".

This paragraph in the documentation (evidently introduced in 2.5) is
rather confusing:"""The parser is quite strict with respect to
multi-line quoted fields. Previously, if a line ended within a quoted
field without a terminating newline character, a newline would be
inserted into the returned field. This behavior caused problems when
reading files which contained carriage return characters within fields.
The behavior was changed to return the field without inserting newlines.
As a consequence, if newlines embedded within fields are important, the
input should be split into lines in a manner which preserves the newline
characters.""" Some examples of what it is talking about would be a very
good idea.

History
Date	User	Action	Args
2009-02-24 07:25:44	sjmachin	set	recipients: + sjmachin, georg.brandl, pitrou, vstinner, jaywalker
2009-02-24 07:25:43	sjmachin	set	messageid: <1235460343.9.0.647749679832.issue4847@psf.upfronthosting.co.za>
2009-02-24 07:25:42	sjmachin	link	issue4847 messages
2009-02-24 07:25:40	sjmachin	create