➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author sjmachin
Recipients georg.brandl, jaywalker, pitrou, sjmachin, vstinner
Date 2009-02-24.07:25:40
SpamBayes Score 1.933348e-11
Marked as misclassified No
Message-id <1235460343.9.0.647749679832.issue4847@psf.upfronthosting.co.za>
In-reply-to
Content
Sorry, folks, we've got an understanding problem here. CSV files are
typically NOT created by text editors. They are created e.g. by "save as
csv" from a spreadsheet program, or as an output option by some database
query program. They can have just about any character in a field,
including \r and \n. Fields containing those characters should be quoted
(just like a comma) by the csv file producer. A csv reader should be
capable of reproducing the original field division. Here for example is
a dump of a little file I just created using Excel 2003:

C:\devel\csv>\python26\python -c "print repr(open('book1.csv','rb').read())"
'Field1,"Field 2 has a\nvery long\nheading",Field3\r\n1.11,2.22,3.33\r\n'

Inserting \n into a text field in Excel (using Alt-Enter) is a
well-known user trick.

Here's what we get from Python 2.6.1:
C:\devel\csv>\python26\python -c "import csv; print
repr(list(csv.reader(open('book1.csv','rb'))))"
[['Field1', 'Field 2 has a\nvery long\nheading', 'Field3'], ['1.11',
'2.22', '3.33']]
and the same by design all the way back to Python 2.3's csv module and
its ancestor, the ObjectCraft csv module.

However with Python 3.0.1 we get:
C:\devel\csv>\python30\python -c "import csv;
print(repr(list(csv.reader(open('book1.csv','rb')))))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
_csv.Error: iterator should return strings, not bytes (did you open the
file in text mode?)

This sentence in the documentation is NOT an error: """If csvfile is a
file object, it must be opened with the ‘b’ flag on platforms where that
makes a difference."""

The problem *IS* a "biggie".

This paragraph in the documentation (evidently introduced in 2.5) is
rather confusing:"""The parser is quite strict with respect to
multi-line quoted fields. Previously, if a line ended within a quoted
field without a terminating newline character, a newline would be
inserted into the returned field. This behavior caused problems when
reading files which contained carriage return characters within fields.
The behavior was changed to return the field without inserting newlines.
As a consequence, if newlines embedded within fields are important, the
input should be split into lines in a manner which preserves the newline
characters.""" Some examples of what it is talking about would be a very
good idea.
History
Date User Action Args
2009-02-24 07:25:44sjmachinsetrecipients: + sjmachin, georg.brandl, pitrou, vstinner, jaywalker
2009-02-24 07:25:43sjmachinsetmessageid: <1235460343.9.0.647749679832.issue4847@psf.upfronthosting.co.za>
2009-02-24 07:25:42sjmachinlinkissue4847 messages
2009-02-24 07:25:40sjmachincreate