classification
Title: Bugs in _csv module - lineterminator
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: andrewmcnamara Nosy List: ajaksu2, andrewmcnamara, fresh, r.david.murray
Priority: low Keywords:

Created on 2004-11-24 12:00 by fresh, last changed 2010-05-21 01:33 by r.david.murray. This issue is now closed.

Messages (7)
msg23295 - (view) Author: Chris Withers (fresh) Date: 2004-11-24 12:00
On trying to parse a '\r' terminated csv generated on a
Mac, I get a "newline inside string" error from the csv
module.

Two things sprung to mind having read:
http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Modules/_csv.c?rev=1.15&view=markup
...for a bit.

1. The Dialect's lineterminator doesn't appear to be
used when parsing a CSV. This feels like a bug to be,
'cos I could specify the terminator if
Reader_iternext(ReaderObj *self) used it :-S

2. The processing in Reader_iternext(ReaderObj *self)
assumes that a '\r' will be followed by '\0' for Macs,
'\n' for windows, and anything else is an error.

but:

>>> c = open('var\\data\\metadata.csv').read()
>>> c[:100]
'BENEFIT,,Subjects relating to all benefits,AB
\rBENEFIT,PARTNERDIED,Bereavement

Should I be expecting to see a '\0' there?

Anyway, the real bug seems to be the reader's ignorance
of the lineterminator. However, even if my analysis is
off the mark, the problem still exists :-S

cheers,

Chris
msg23296 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2004-11-25 04:23
Logged In: YES 
user_id=44345

This is a known problem.  See the April archives of the csv
mailing list:

http://manatee.mojam.com/pipermail/csv/2004-April/thread.html

Solutions are welcome.  I suspect any solution will involve
either
discarding PyIter_Next altogether or further subdividing what it
returns.

A couple things to note in the way of workarounds:

1. Reader_iternext() defers to PyIter_Next() to grab the
next line,
so there's really no opportunity to interject the
lineterminator into
the operation with the current code.  This means reading from
StringIO objects that use \r lineterminators will always fail.

2. If you have a real file as input and open it in universal
newline
mode you will get the correct behavior.
msg23297 - (view) Author: Andrew McNamara (andrewmcnamara) * (Python committer) Date: 2005-01-13 04:14
Logged In: YES 
user_id=698599

The reader expects to be supplied an iterator that returns lines - in this 
case, the file iterator has not recognised \r as end-of-line and has read the 
whole file in and yielded that as a "line". If you use universal-newline mode 
on your source file, you should have more luck.
msg23298 - (view) Author: Chris Withers (fresh) Date: 2005-01-18 11:25
Logged In: YES 
user_id=24723

I don't think its fair to close this as a rejection.
The documentation implies that you can control what line
terminator this module uses, which currently isn't the case.

I'm not saying this is a high priority issue, just that it
shouldn't be rejected in case some day someone (maybe even
me ;-) wants to haev a goat fixing it...
msg23299 - (view) Author: Andrew McNamara (andrewmcnamara) * (Python committer) Date: 2005-01-18 12:11
Logged In: YES 
user_id=698599

This cannot be fixed with the current interface - the line splitting is being 
done by the file iterator, and it only supports \r and \n. As I said, you'll get 
better results with universal newline mode.

The parser in Python 2.5 (the CVS HEAD) has been improved somewhat, 
but it's still not possible to use anything other than \r and \n for end-of-line. 
The documentation has been updated to reflect this fact.
msg82123 - (view) Author: Daniel Diniz (ajaksu2) (Python triager) Date: 2009-02-14 21:57
Needs confirmation, probably a won't fix either way.
msg106210 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-05-21 01:33
The doc has been fixed; using lineterminator in reader has not been and is not likely to be implemented (unless someone wants to come forward with a patch). Processing files that use \r line endings does work; as indicated you use universal newline mode for the input file.  In Py3k you can wrap a BytesIO object in a TextIOWrapper to get universal newline parsing.

So, I'm closing this as wont fix, as suggested.  If someone does want to implement lineterminator for reader, they can open a new feature request issue.
History
Date User Action Args
2010-05-21 01:33:24r.david.murraysetstatus: open -> closed

nosy: + r.david.murray
messages: + msg106210

resolution: wont fix
stage: test needed -> resolved
2010-05-20 20:27:19skip.montanarosetnosy: - skip.montanaro
2009-02-14 21:57:38ajaksu2setnosy: + ajaksu2
stage: test needed
type: behavior
messages: + msg82123
versions: + Python 2.6
2004-11-24 12:00:16freshcreate