msg196126 - (view) |
Author: Thibault Kruse (Thibault.Kruse) |
Date: 2013-08-25 12:56 |
To reproduce
$ python --version
Python 2.7.3
$ python -c 'from __future__ import unicode_literals; import csv; reader = csv.reader("foo", delimiter=",")'
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: "delimiter" must be an 1-character string
This is confusing because "," is like a 1-character (unicode) string, while csv requires a 1-character (byte) string. In python3, it's the inverse problem, using bytestring b','
$ python3 --version
Python 3.2.3
$ python3 -c 'import csv; reader = csv.reader("foo", delimiter=b",")'
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: "delimiter" must be an 1-character string
So I believe csv should state more clearly what went wrong here in each case, or what kind of string csv is willing to accept in the respective python version.
Whether the python3 refusal to use a bytestring is even reasonable I don't know.
This might or might not be related to the issue of multichar delimiters: http://bugs.python.org/issue15158 (just sayin)
|
msg196131 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2013-08-25 13:49 |
I agree that the 2.7 message is somewhat confusing, but I'm not sure it is worth changing at this point in 2.7's life cycle.
For Python3, the message is correct and unambiguous: a bytes object is not a string.
However, in 3.3, we have a regression:
rdmurray@hey:~/python/p34>python3.3 -c 'import csv; reader = csv.reader("foo", delimiter=b",")'
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: delimiter must be set
So that needs to be tracked down and fixed.
|
msg196132 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2013-08-25 13:51 |
I forgot to address the comment about accepting bytes in python3: the delimiter really is a unicode character. In python3, non-ASCII delimiters are handled correctly. So no, it isn't a byte anymore, it really is a string. Terry's comment about 'char' in the other issue does not apply to Python3.
|
msg196141 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-08-25 15:11 |
This is the preliminary patch to fix the error message.
|
msg196145 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-08-25 15:18 |
This is the second alternative of the patch to fix the error message using different approach. I am not sure which one is better.
|
msg196171 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-08-26 02:06 |
I realized there was a test that was similar with my test. I merged them into one test.
I think the alternative is better than the original patch so I updated the alternative patch only.
|
msg196212 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-08-26 15:26 |
Apparently, other attributes of the csv dialect beside delimiter, such as escapechar and quotechar share the same problem.
>>> import _csv
>>> _csv.reader('foo', quotechar=b'"')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: quotechar must be set if quoting enabled
>>> _csv.reader('foo', escapechar=b'+')
<_csv.reader object at 0x7fa85d7847d0>Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: bad argument type for built-in operation
My patch already fixed the problem, only lacked the unit test. But since this ticket is about delimiter, I'll create the unit test for quotechar and escapechar in separate ticket after this ticket has been closed.
|
msg196944 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2013-09-04 18:59 |
IMHO the error should say """TypeError: "delimiter" must be an 1-character string, not bytes""".
We could also have two separate errors for "wrong type" and "right type, wrong length" (this would be a ValueError though).
|
msg196986 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-09-05 09:22 |
Attached the patch to accomodate Ezio Melotti's request. The patch now have two separate errors for "wrong type" and "right type, wrong length".
Don't we break compatibility by changing the type of exception from TypeError to ValueError?
|
msg196997 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-09-05 13:55 |
A comment in csv.py says about compatibility with 2.3! After adding ValueError to the list of catched exceptions we can keep this compatibility for future generations. Here is a patch. I also have resorted the tests a little.
|
msg197002 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-09-05 14:45 |
Same as fix_error_message_reader_csv_alternative_1_v4.patch (by Serhiy), but I removed unused variables in the test.
|
msg197007 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-09-05 15:34 |
I rather prefer adding new tests.
|
msg197147 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-09-07 11:04 |
I noticed in CPython source code, when we print the type of the object, we use "%.200s" not "%s". For example, in Modules/posixmodule.c:
"uid should be integer, not %.200s",
So the newest path was created to conform with this tradition.
Another thing I noticed, there are two ways to get the type of object:
1. src->ob_type->tp_name
2. Py_TYPE(src)->tp_name
Both are widely used. So I am not sure which is the best way. I leave it alone this issue.
|
msg197172 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-09-07 16:38 |
When specify an empty string error message is confusing too:
$ ./python -c 'import csv; reader = csv.reader("foo", delimiter="")'
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: delimiter must be set
|
msg197249 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-09-08 11:10 |
Well, what about None?
$ python3 -c 'import csv; reader = csv.reader("foo", delimiter=None)'
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: delimiter must be set
English grammatically speaking, we should get this kind of error:
ValueError: "delimiter" must be string, not None
But computer science-ly speaking, the exception message "delimiter must be set" is correct because setting null or None value to a variable can be considered as same as unsetting that variable, hence error message "must be set".
And I would argue the empty string can be considered as one of a kind with None in this case.
But we'll see other people's opinions. I am also not sure about this case.
|
msg197261 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2013-09-08 14:28 |
"delimiter must be a 1 character string" would cover it.
|
msg197265 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-09-08 15:10 |
David R. Murray said, '"delimiter must be a 1 character string" would cover it.'
You mean
$ ./python -c 'import csv; reader = csv.reader("foo", delimiter="")'
should give this error '"delimiter" must be a 1 character string'?
Attached the patch to accommodate your request.
I found out that:
$ ./python -c 'import csv; reader = csv.reader("foo", quotechar="")'
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: quotechar must be set if quoting enabled
This is not consistent with the cure. But since this ticket is about delimiter, we keep the status-quo about quotechar for now. After the patch is committed, we can open the ticket about the quotechar inconsistency.
The fix for this quotechar is not straightforward, though, because of the quoting condition.
|
msg197266 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-09-08 15:12 |
Sorry for typing your name wrongly.
s/David R. Murray/R. David Murray/
|
msg197271 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-09-08 15:39 |
After contemplating for a while, I am not sure that we should treat empty string for delimiter with error message: '"delimiter" must be an 1-character string'.
The reason is to keep consistency with quotechar keyword.
[sky@localhost cpython]$ ./python -c 'import csv; reader = csv.reader("foo", quotechar="")'
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: quotechar must be set if quoting enabled
[sky@localhost cpython]$ ./python -c 'import csv; reader = csv.reader("foo", quotechar="", quoting=csv.QUOTE_NONE)'
# No error, so implicitly quotechar is not set, but it is okay since we use "quote none" quoting.
# In other word, quotechar is not set by giving empty string to quotechar.
We could not change the behaviour of csv module, which is unsetting quotechar by giving empty string to quotechar and let it be if we use "quote none" quoting, to preserve backward compatibility.
So the error message should be, "The delimiter must be set" for empty string for delimiter, I suppose.
|
msg197273 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2013-09-08 15:57 |
Parsing a csv file with no delimiter would seem to be meaningless, unless I'm misunderstanding what 'delimeter' controls. So the error messages for delimiter and quotechar are necessarily different.
|
msg206613 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-12-19 14:30 |
New changeset 5ed75e36be8e by Serhiy Storchaka in branch '2.7':
Issue #18829: csv.Dialect() now checks type for delimiter, escapechar and
http://hg.python.org/cpython/rev/5ed75e36be8e
New changeset 52d03fbdf67a by Serhiy Storchaka in branch '3.3':
Issue #18829: csv.Dialect() now checks type for delimiter, escapechar and
http://hg.python.org/cpython/rev/52d03fbdf67a
New changeset 6b17803bfddd by Serhiy Storchaka in branch 'default':
Issue #18829: csv.Dialect() now checks type for delimiter, escapechar and
http://hg.python.org/cpython/rev/6b17803bfddd
|
msg206614 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-12-19 14:35 |
Thank you Vajrasky for your patch. I have simplified and fixed (escapechar can be empty) it. Reverted ValueError back to TypeError because ord() raises TypeError for non-1-character strings.
|
msg207158 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2014-01-02 11:54 |
New changeset 0daf7f02c97f by Victor Stinner in branch '3.3':
Issue #18829: Add tests for the csv module for invalid characters (delimiter,
http://hg.python.org/cpython/rev/0daf7f02c97f
New changeset ccb52323039f by Victor Stinner in branch 'default':
(Merge 3.3) Issue #18829: Add tests for the csv module for invalid characters
http://hg.python.org/cpython/rev/ccb52323039f
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:50 | admin | set | github: 63029 |
2014-01-02 11:54:03 | python-dev | set | messages:
+ msg207158 |
2013-12-19 14:36:18 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2013-12-19 14:35:44 | serhiy.storchaka | set | messages:
+ msg206614 |
2013-12-19 14:30:25 | python-dev | set | nosy:
+ python-dev messages:
+ msg206613
|
2013-12-19 11:01:00 | serhiy.storchaka | set | assignee: serhiy.storchaka |
2013-12-19 10:58:44 | serhiy.storchaka | link | issue20023 superseder |
2013-09-08 15:57:21 | r.david.murray | set | messages:
+ msg197273 |
2013-09-08 15:39:49 | vajrasky | set | messages:
+ msg197271 |
2013-09-08 15:12:27 | vajrasky | set | messages:
+ msg197266 |
2013-09-08 15:10:16 | vajrasky | set | files:
+ fix_error_message_reader_csv_alternative_1_v8.patch
messages:
+ msg197265 |
2013-09-08 14:28:18 | r.david.murray | set | messages:
+ msg197261 |
2013-09-08 11:10:45 | vajrasky | set | messages:
+ msg197249 |
2013-09-07 16:38:41 | serhiy.storchaka | set | assignee: serhiy.storchaka -> (no value) messages:
+ msg197172 |
2013-09-07 11:04:09 | vajrasky | set | files:
+ fix_error_message_reader_csv_alternative_1_v7.patch
messages:
+ msg197147 |
2013-09-05 15:34:48 | serhiy.storchaka | set | files:
+ fix_error_message_reader_csv_alternative_1_v6.patch
messages:
+ msg197007 |
2013-09-05 14:45:15 | vajrasky | set | files:
+ fix_error_message_reader_csv_alternative_1_v5.patch
messages:
+ msg197002 |
2013-09-05 13:55:01 | serhiy.storchaka | set | files:
+ fix_error_message_reader_csv_alternative_1_v4.patch
messages:
+ msg196997 |
2013-09-05 09:22:24 | vajrasky | set | files:
+ fix_error_message_reader_csv_alternative_1_v3.patch
messages:
+ msg196986 |
2013-09-04 18:59:00 | ezio.melotti | set | nosy:
+ ezio.melotti messages:
+ msg196944
|
2013-08-26 15:26:48 | vajrasky | set | messages:
+ msg196212 |
2013-08-26 02:06:22 | vajrasky | set | files:
+ fix_error_message_reader_csv_alternative_1_v2.patch
messages:
+ msg196171 |
2013-08-25 15:33:23 | serhiy.storchaka | set | assignee: serhiy.storchaka
nosy:
+ serhiy.storchaka stage: needs patch -> patch review |
2013-08-25 15:18:45 | vajrasky | set | files:
+ fix_error_message_reader_csv_alternative_1.patch
messages:
+ msg196145 |
2013-08-25 15:11:00 | vajrasky | set | files:
+ fix_error_message_reader_csv.patch
nosy:
+ vajrasky messages:
+ msg196141
keywords:
+ patch |
2013-08-25 13:51:43 | r.david.murray | set | messages:
+ msg196132 |
2013-08-25 13:49:21 | r.david.murray | set | title: csv produces confusing error message for unexpected string encoding -> csv produces confusing error message when passed a non-string delimiter
keywords:
+ easy, 3.3regression nosy:
+ r.david.murray versions:
+ Python 3.3, Python 3.4, - Python 3.2 messages:
+ msg196131 stage: needs patch |
2013-08-25 12:56:23 | Thibault.Kruse | create | |