This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: csv: unexpected result
Type: behavior Stage: resolved
Components: Versions: Python 3.4
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: muss, r.david.murray
Priority: normal Keywords:

Created on 2015-12-14 01:56 by muss, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (10)
msg256355 - (view) Author: Ioan Fintescu (muss) Date: 2015-12-14 01:56
Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> s = 'x = "a", y = "b, c"'
>>> s
'x = "a", y = "b, c"'
>>> for row in csv.reader([s]): print(row)
...
['x = "a"', ' y = "b', ' c"']
>>> len(row)
3
>>> for row1 in csv.reader([s], skipinitialspace=True):print(row1)
...
['x = "a"', 'y = "b', 'c"']
>>> len(row1)
3
>>> s2 = 'x = "a",y="b,c"'
>>> s2
'x = "a",y="b,c"'
>>> for row2 in csv.reader([s]): print(row2)
...
['x = "a"', ' y = "b', ' c"']
>>> len(row2)
3
>>> for row3 in csv.reader([s], skipinitialspace=True): print(row3)
...
['x = "a"', 'y = "b', 'c"']
>>> len(row3)
3
msg256356 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-12-14 02:08
>>> b = io.StringIO()
>>> w = csv.writer(b)
>>> w.writerow(['x = "a"', 'y = "b, c"']) 
28
>>> b.getvalue()
'"x = ""a""","y = ""b, c"""\r\n'


In other words, your input was not validly quoted csv.
msg256358 - (view) Author: Ioan Fintescu (muss) Date: 2015-12-14 02:51
You wrote ['x = "a"', 'y = "b, c"']
I wrote ['x = "a", y = "b, c"']

...muss

On Sun, Dec 13, 2015 at 7:08 PM, R. David Murray <report@bugs.python.org>
wrote:

>
> R. David Murray added the comment:
>
> >>> b = io.StringIO()
> >>> w = csv.writer(b)
> >>> w.writerow(['x = "a"', 'y = "b, c"'])
> 28
> >>> b.getvalue()
> '"x = ""a""","y = ""b, c"""\r\n'
>
>
> In other words, your input was not validly quoted csv.
>
> ----------
> nosy: +r.david.murray
> resolution:  -> not a bug
> stage:  -> resolved
> status: open -> closed
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue25857>
> _______________________________________
>
msg256363 - (view) Author: Ioan Fintescu (muss) Date: 2015-12-14 03:05
You may be right.  I just saved it from LibreOffice Calc and I got
[x=”a”,"y=”b, c”"].  I thought the original was saved from a spreadsheet
program.

...muss

On Sun, Dec 13, 2015 at 7:51 PM, Ioan Fintescu <ifintescu@gmail.com> wrote:

> You wrote ['x = "a"', 'y = "b, c"']
> I wrote ['x = "a", y = "b, c"']
>
>
> ...muss
>
>
> On Sun, Dec 13, 2015 at 7:08 PM, R. David Murray <report@bugs.python.org>
> wrote:
>
>>
>> R. David Murray added the comment:
>>
>> >>> b = io.StringIO()
>> >>> w = csv.writer(b)
>> >>> w.writerow(['x = "a"', 'y = "b, c"'])
>> 28
>> >>> b.getvalue()
>> '"x = ""a""","y = ""b, c"""\r\n'
>>
>>
>> In other words, your input was not validly quoted csv.
>>
>> ----------
>> nosy: +r.david.murray
>> resolution:  -> not a bug
>> stage:  -> resolved
>> status: open -> closed
>>
>> _______________________________________
>> Python tracker <report@bugs.python.org>
>> <http://bugs.python.org/issue25857>
>> _______________________________________
>>
>
>
msg256385 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-12-14 16:05
Well, since there's no real standard for csv, it might have been.  Since it is iherently ambiguous according to "normal" csv rules, though, I'd say that if that is the case the originating spreadsheet is the one with the bug.  If you can prove there is a spreadsheet that uses this format, perhaps an enhancement request for csv would be in order.  I don't *think* there's any way to parse that with the existing dialect support, though I could be wrong.
msg256389 - (view) Author: Ioan Fintescu (muss) Date: 2015-12-14 16:21
There seems to be a CSV specification, namely IETF RFC 4180, and, as far as
I can tell, it indicates you are correct, and I am wrong.  Especially
points 5., 6., and 7 on page 3.  Here is a quote from 5 (RFC 4180, page 2
<https://tools.ietf.org/html/rfc4180#page-2>).

If fields are not enclosed with double quotes, then double quotes may not
> appear inside the fields.

It gets more specific in 6., and 7.

So the csv module is doing the right thing.  An error message may help,
though.

...muss

On Mon, Dec 14, 2015 at 9:05 AM, R. David Murray <report@bugs.python.org>
wrote:

>
> R. David Murray added the comment:
>
> Well, since there's no real standard for csv, it might have been.  Since
> it is iherently ambiguous according to "normal" csv rules, though, I'd say
> that if that is the case the originating spreadsheet is the one with the
> bug.  If you can prove there is a spreadsheet that uses this format,
> perhaps an enhancement request for csv would be in order.  I don't *think*
> there's any way to parse that with the existing dialect support, though I
> could be wrong.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue25857>
> _______________________________________
>
msg256391 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-12-14 16:33
There's no place to generate an error message, the csv module parsed the line according to the rules.

I'm glad that there's an RFC now.  There wasn't when the module was written (which was well before my time on this project...)
msg256392 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-12-14 16:34
Oh, maybe there is.  We could add an RFC-strict dialect that would raise an error, if I'm understanding your quotes from it correctly.  That would be a new feature, though.
msg256395 - (view) Author: Ioan Fintescu (muss) Date: 2015-12-14 16:41
If you consider that, you should take a look at the RFC; it also contains a
BNF like grammar plus some pointers to another RFC (2234).  It may require
more effort that it is worth, absent some demand for it.

...muss

On Mon, Dec 14, 2015 at 9:34 AM, R. David Murray <report@bugs.python.org>
wrote:

>
> R. David Murray added the comment:
>
> Oh, maybe there is.  We could add an RFC-strict dialect that would raise
> an error, if I'm understanding your quotes from it correctly.  That would
> be a new feature, though.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue25857>
> _______________________________________
>
msg256397 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-12-14 16:43
Yeah, we'll leave it alone until someone actually submits an enhancment request...and the only way it'll get done is if they do it, I suspect :)
History
Date User Action Args
2022-04-11 14:58:24adminsetgithub: 70044
2015-12-14 16:43:47r.david.murraysetmessages: + msg256397
2015-12-14 16:41:14musssetmessages: + msg256395
2015-12-14 16:34:30r.david.murraysetmessages: + msg256392
2015-12-14 16:33:22r.david.murraysetmessages: + msg256391
2015-12-14 16:21:40musssetmessages: + msg256389
2015-12-14 16:05:42r.david.murraysetmessages: + msg256385
2015-12-14 03:05:03musssetmessages: + msg256363
2015-12-14 02:51:31musssetmessages: + msg256358
2015-12-14 02:08:55r.david.murraysetstatus: open -> closed

nosy: + r.david.murray
messages: + msg256356

resolution: not a bug
stage: resolved
2015-12-14 01:56:20musscreate