Issue967934
Created on 2004-06-07 04:46 by gnbond, last changed 2009-02-14 13:56 by ajaksu2.
| File name |
Uploaded |
Description |
Edit |
Remove |
|
tcsv.py
|
gnbond,
2004-06-07 04:47
|
|
|
|
|
msg21057 - (view) |
Author: Gregory Bond (gnbond) |
Date: 2004-06-07 04:46 |
|
CSV module cannot handle the case of embedded \r (i.e.
carriage return) in a field.
As far as I can see, this is hard-coded into the _csv.c
file and cannot be fixed with Dialect changes.
|
|
msg21058 - (view) |
Author: Raymond Hettinger (rhettinger) |
Date: 2004-06-07 05:02 |
|
Logged In: YES
user_id=80475
Skip, does this coincide with your planned switchover to
universal newlines?
|
|
msg21059 - (view) |
Author: Andrew McNamara (andrewmcnamara) |
Date: 2004-06-07 05:32 |
|
Logged In: YES
user_id=698599
I suspect this restriction (CR appearing within a quoted
field) is a historical accident and can be safely removed.
|
|
msg21060 - (view) |
Author: Skip Montanaro (skip.montanaro) |
Date: 2004-06-07 11:25 |
|
Logged In: YES
user_id=44345
It certainly intersects with it somehow. ;-) If nothing else, it
will serve as a useful test case.
|
|
msg21061 - (view) |
Author: Andrew McNamara (andrewmcnamara) |
Date: 2005-01-13 11:34 |
|
Logged In: YES
user_id=698599
If you're interested, I've just checked in a change to the CVS head for
Python 2.5 that may, at least partially, fix this problem (if you try it, let me
know how it goes).
|
|
msg21062 - (view) |
Author: David Goodger (goodger) |
Date: 2006-04-05 15:35 |
|
Logged In: YES
user_id=7733
I just filed a bug (http://www.python.org/sf/1465014) that
seems to be related to this. Revision 38290 on
Modules/_csv.c includes the addition of this code:
else if (c == '\n' || c == '\r') {
self->state = EAT_CRNL;
break;
}
(and similar). This seems to be eating (deleting) control
chars, but newlines used to be significant.
Embedded line breaks are allowed, according to RFC 4180
(http://www.ietf.org/rfc/rfc4180.txt). And according to the
Wikipedia entry
(http://en.wikipedia.org/wiki/Comma-separated_values), "a
line break within an element must be preserved."
|
|
msg82052 - (view) |
Author: Daniel Diniz (ajaksu2) |
Date: 2009-02-14 13:56 |
|
IIUC, I get the correct behavior:
trunk-py$ ./python ~/Desktop/tcsv.py
['fld1', 'fld2', 'fld3 ', 'fld4']
['fld1', 'fld2', 'fld3 \r', 'fld4']
trunk-py$ cat ~/Desktop/tcsv.py
#! /usr/local/bin/python
import csv
d = 'fld1,fld2,"fld3 ",fld4\r\n'
d2 = 'fld1,fld2,"fld3 \r'
d3 = '",fld4\r\n'
r = csv.reader([d, d2, d3], dialect="excel")
for f in r:
print f
|
|
| Date |
User |
Action |
Args |
| 2009-02-14 13:56:28 | ajaksu2 | set | versions:
+ Python 2.6 nosy:
+ ajaksu2 messages:
+ msg82052 dependencies:
+ CSV regression in 2.5a1: multi-line cells components:
+ Extension Modules type: behavior stage: test needed |
| 2004-06-07 04:46:56 | gnbond | create | |
|