Issue1465014
Created on 2006-04-05 15:14 by goodger, last changed 2006-08-13 02:20 by sf-robot.
| File name |
Uploaded |
Description |
Edit |
Remove |
|
csv_test.py
|
goodger,
2006-05-02 20:58
|
demonstrates the bug: run with 2.5 and 2.4 to see the difference |
|
|
|
msg28095 - (view) |
Author: David Goodger (goodger) |
Date: 2006-04-05 15:14 |
|
Running the attached csv_test.py under Python 2.4.2
(Windows XP SP1) produces:
>c:\apps\python24\python.exe ./csv_test.py
['one', '2', 'three (line 1)\n(line 2)']
Note that the third item in the row contains a newline
between "(line 1)" and "(line 2)".
With Python 2.5a1, I get:
>c:\apps\python25\python.exe ./csv_test.py
['one', '2', 'three (line 1)(line 2)']
Notice the missing newline, which is significant. The
CSV module under 2.5a1 seems to lose data.
|
|
msg28096 - (view) |
Author: David Goodger (goodger) |
Date: 2006-04-05 15:44 |
|
Logged In: YES
user_id=7733
This bug seems to be a side effect of revision 38290 on
Modules/_csv.c, which was prompted by bug 967934
(http://www.python.org/sf/967934).
|
|
msg28097 - (view) |
Author: David Goodger (goodger) |
Date: 2006-05-02 20:58 |
|
Logged In: YES
user_id=7733
Further investigation has revealed that the regression only
affects iterator I/O, not file I/O. The attached
csv_test.py demonstrates. Run with Python 2.5 to get:
results from file I/O:
[['one', '2', 'three (line 1)\n(line 2)']]
results from iterator I/O:
[['one', '2', 'three (line 1)(line 2)']]
|
|
msg28098 - (view) |
Author: David Goodger (goodger) |
Date: 2006-05-02 21:04 |
|
Logged In: YES
user_id=7733
Assigned to Andrew McNamara, since his change appears to
have caused this regression (revision 38290 on
Modules/_csv.c).
|
|
msg28099 - (view) |
Author: Andrew McNamara (andrewmcnamara) |
Date: 2006-06-20 23:17 |
|
Logged In: YES
user_id=698599
I think your problem is with str.splitlines(), rather than
the csv.reader: splitlines ate the newline. If you pass it
True as an argument, it will retain the end-of-line
character in the resulting strings.
|
|
msg28100 - (view) |
Author: David Goodger (goodger) |
Date: 2006-06-22 18:17 |
|
Logged In: YES
user_id=7733
I see what you're saying, but I disagree. In Python 2.4,
csv.reader did not require newlines, but in Python 2.5 it
does. That's a significant behavioral change. In the
stdlib csv "Module Contents" docs for csv.reader, it says:
"csvfile can be any object which supports the iterator
protocol and returns a string each time its next method is
called." It doesn't mention newline-terminated strings.
In any case, the behavior is inconsistent: newlines are not
required to terminate row-ending strings, but only strings
which end inside cells split across rows. Why the discrepancy?
|
|
msg28101 - (view) |
Author: Andrew McNamara (andrewmcnamara) |
Date: 2006-06-23 00:27 |
|
Logged In: YES
user_id=698599
The previous behaviour caused considerable problems,
particularly on platforms that did not use the unix line-
ending conventions, or with files that originated on those
platforms - users were finding mysterious newlines where
they didn't expect them.
Quoted fields exist to allow characters that would otherwise
be considered part of the syntax to appear within the field.
So yes, quoted fields are a special case, and necessarily
so.
The current behaviour puts the control back in the hands of
the user of the module: if literal newlines are important
within a field, they need to read their file in a way that
preserves the newlines. The old behaviour would introduce
spurious characters into quoted fields, with no way for the
user to control that behaviour.
I'm sorry that the change causes you problems. With a format
that's as loosely defined as CSV, it's an unfortunate fact
of life that there are going to be conflicting requirements.
|
|
msg28102 - (view) |
Author: David Goodger (goodger) |
Date: 2006-06-23 03:13 |
|
Logged In: YES
user_id=7733
I didn't realize that the previous behavior was buggy; I
thought that the current behavior was a side-effect. The
2.5 behavior did cause a small problem in Docutils, but it's
already been fixed. I just wanted to ensure that no
regression was creeping in to 2.5.
Thanks for the explanation! Perhaps it could be added to
the docs in some form?
Marking the bug report closed.
|
|
msg28103 - (view) |
Author: Andrew McNamara (andrewmcnamara) |
Date: 2006-06-23 03:34 |
|
Logged In: YES
user_id=698599
Yep, your point about adding a comment to the documentation
is fair. Skip, do you want to take my words and massage
them into a form suitable for the docs?
|
|
msg28104 - (view) |
Author: A.M. Kuchling (akuchling) |
Date: 2006-07-29 17:24 |
|
Logged In: YES
user_id=11375
I looked at this bug report, but I have no idea of exactly
what behaviour has changed or what needs to be described.
|
|
msg28105 - (view) |
Author: Skip Montanaro (skip.montanaro) |
Date: 2006-07-29 20:07 |
|
Logged In: YES
user_id=44345
I checked in a change to libcsv.tex (revision 50953). It adds a versionchanged
bit to the reader doc that explains why the behavior changed in 2.5. Andrew &
Andrew, please check my work. Sorry for the delay taking care of this.
Skip
|
|
msg28106 - (view) |
Author: Andrew McNamara (andrewmcnamara) |
Date: 2006-07-31 02:41 |
|
Logged In: YES
user_id=698599
I've changed the comment again in changeset 50993 -
hopefully this attempt describes the difference more fully.
Let me know what you think.
|
|
msg28107 - (view) |
Author: Skip Montanaro (skip.montanaro) |
Date: 2006-07-31 03:13 |
|
Logged In: YES
user_id=44345
I'll see your 50993 and raise you a 50998. Just minor tweaks. Hopefully we can
close this puppy, though a small example to make the idea concrete might be
worthwhile.
|
|
msg28108 - (view) |
Author: Andrew McNamara (andrewmcnamara) |
Date: 2006-07-31 04:14 |
|
Logged In: YES
user_id=698599
Yep, your changes are reasonable. I considered adding an
example, but couldn't think of anything that illustrated
the point without confusing the reader further.
|
|
msg28109 - (view) |
Author: SourceForge Robot (sf-robot) |
Date: 2006-08-13 02:20 |
|
Logged In: YES
user_id=1312539
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).
|
|
| Date |
User |
Action |
Args |
| 2009-02-14 13:56:28 | ajaksu2 | link | issue967934 dependencies |
| 2006-04-05 15:14:13 | goodger | create | |
|