classification
Title: csv produces confusing error message when passed a non-string delimiter
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Thibault.Kruse, ezio.melotti, python-dev, r.david.murray, serhiy.storchaka, vajrasky
Priority: normal Keywords: 3.3regression, easy, patch

Created on 2013-08-25 12:56 by Thibault.Kruse, last changed 2014-01-02 11:54 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
fix_error_message_reader_csv.patch vajrasky, 2013-08-25 15:11 review
fix_error_message_reader_csv_alternative_1.patch vajrasky, 2013-08-25 15:18 review
fix_error_message_reader_csv_alternative_1_v2.patch vajrasky, 2013-08-26 02:06 review
fix_error_message_reader_csv_alternative_1_v3.patch vajrasky, 2013-09-05 09:22 review
fix_error_message_reader_csv_alternative_1_v4.patch serhiy.storchaka, 2013-09-05 13:55 review
fix_error_message_reader_csv_alternative_1_v5.patch vajrasky, 2013-09-05 14:45 review
fix_error_message_reader_csv_alternative_1_v6.patch serhiy.storchaka, 2013-09-05 15:34 review
fix_error_message_reader_csv_alternative_1_v7.patch vajrasky, 2013-09-07 11:04 review
fix_error_message_reader_csv_alternative_1_v8.patch vajrasky, 2013-09-08 15:10 review
Messages (23)
msg196126 - (view) Author: Thibault Kruse (Thibault.Kruse) Date: 2013-08-25 12:56
To reproduce

$ python --version
Python 2.7.3
$ python -c 'from __future__ import unicode_literals; import csv; reader = csv.reader("foo", delimiter=",")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: "delimiter" must be an 1-character string

This is confusing because "," is like a 1-character (unicode) string, while csv requires a 1-character (byte) string. In python3, it's the inverse problem, using bytestring b','

$ python3 --version
Python 3.2.3
$ python3 -c 'import csv; reader = csv.reader("foo", delimiter=b",")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: "delimiter" must be an 1-character string


So I believe csv should state more clearly what went wrong here in each case, or what kind of string csv is willing to accept in the respective python version.

Whether the python3 refusal to use a bytestring is even reasonable I don't know.

This might or might not be related to the issue of multichar delimiters: http://bugs.python.org/issue15158  (just sayin)
msg196131 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-08-25 13:49
I agree that the 2.7 message is somewhat confusing, but I'm not sure it is worth changing at this point in 2.7's life cycle.  

For Python3, the message is correct and unambiguous: a bytes object is not a string.

However, in 3.3, we have a regression:

rdmurray@hey:~/python/p34>python3.3 -c 'import csv; reader = csv.reader("foo", delimiter=b",")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: delimiter must be set

So that needs to be tracked down and fixed.
msg196132 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-08-25 13:51
I forgot to address the comment about accepting bytes in python3: the delimiter really is a unicode character.  In python3, non-ASCII delimiters are handled correctly.  So no, it isn't a byte anymore, it really is a string.  Terry's comment about 'char' in the other issue does not apply to Python3.
msg196141 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-08-25 15:11
This is the preliminary patch to fix the error message.
msg196145 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-08-25 15:18
This is the second alternative of the patch to fix the error message using different approach. I am not sure which one is better.
msg196171 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-08-26 02:06
I realized there was a test that was similar with my test. I merged them into one test.

I think the alternative is better than the original patch so I updated the alternative patch only.
msg196212 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-08-26 15:26
Apparently, other attributes of the csv dialect beside delimiter, such as escapechar and quotechar share the same problem.

>>> import _csv
>>> _csv.reader('foo', quotechar=b'"')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: quotechar must be set if quoting enabled
>>> _csv.reader('foo', escapechar=b'+')
<_csv.reader object at 0x7fa85d7847d0>Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: bad argument type for built-in operation

My patch already fixed the problem, only lacked the unit test. But since this ticket is about delimiter, I'll create the unit test for quotechar and escapechar in separate ticket after this ticket has been closed.
msg196944 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-09-04 18:59
IMHO the error should say """TypeError: "delimiter" must be an 1-character string, not bytes""".
We could also have two separate errors for "wrong type" and "right type, wrong length" (this would be a ValueError though).
msg196986 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-09-05 09:22
Attached the patch to accomodate Ezio Melotti's request. The patch now have two separate errors for "wrong type" and "right type, wrong length".

Don't we break compatibility by changing the type of exception from TypeError to ValueError?
msg196997 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-05 13:55
A comment in csv.py says about compatibility with 2.3! After adding ValueError to the list of catched exceptions we can keep this compatibility for future generations. Here is a patch. I also have resorted the tests a little.
msg197002 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-09-05 14:45
Same as fix_error_message_reader_csv_alternative_1_v4.patch (by Serhiy), but I removed unused variables in the test.
msg197007 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-05 15:34
I rather prefer adding new tests.
msg197147 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-09-07 11:04
I noticed in CPython source code, when we print the type of the object, we use "%.200s" not "%s". For example, in Modules/posixmodule.c:

"uid should be integer, not %.200s",

So the newest path was created to conform with this tradition.

Another thing I noticed, there are two ways to get the type of object:

1. src->ob_type->tp_name
2. Py_TYPE(src)->tp_name

Both are widely used. So I am not sure which is the best way. I leave it alone this issue.
msg197172 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-07 16:38
When specify an empty string error message is confusing too:

$ ./python -c 'import csv; reader = csv.reader("foo", delimiter="")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: delimiter must be set
msg197249 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-09-08 11:10
Well, what about None?

$ python3 -c 'import csv; reader = csv.reader("foo", delimiter=None)'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: delimiter must be set

English grammatically speaking, we should get this kind of error:
ValueError: "delimiter" must be string, not None

But computer science-ly speaking, the exception message "delimiter must be set" is correct because setting null or None value to a variable can be considered as same as unsetting that variable, hence error message "must be set".

And I would argue the empty string can be considered as one of a kind with None in this case.

But we'll see other people's opinions. I am also not sure about this case.
msg197261 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-09-08 14:28
"delimiter must be a 1 character string" would cover it.
msg197265 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-09-08 15:10
David R. Murray said, '"delimiter must be a 1 character string" would cover it.'

You mean

$ ./python -c 'import csv; reader = csv.reader("foo", delimiter="")' 

should give this error '"delimiter" must be a 1 character string'?

Attached the patch to accommodate your request.

I found out that:

$ ./python -c 'import csv; reader = csv.reader("foo", quotechar="")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: quotechar must be set if quoting enabled

This is not consistent with the cure. But since this ticket is about delimiter, we keep the status-quo about quotechar for now. After the patch is committed, we can open the ticket about the quotechar inconsistency.

The fix for this quotechar is not straightforward, though, because of the quoting condition.
msg197266 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-09-08 15:12
Sorry for typing your name wrongly.

s/David R. Murray/R. David Murray/
msg197271 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-09-08 15:39
After contemplating for a while, I am not sure that we should treat empty string for delimiter with error message: '"delimiter" must be an 1-character string'.

The reason is to keep consistency with quotechar keyword.

[sky@localhost cpython]$ ./python -c 'import csv; reader = csv.reader("foo", quotechar="")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: quotechar must be set if quoting enabled

[sky@localhost cpython]$ ./python -c 'import csv; reader = csv.reader("foo", quotechar="", quoting=csv.QUOTE_NONE)'
# No error, so implicitly quotechar is not set, but it is okay since we use "quote none" quoting.
# In other word, quotechar is not set by giving empty string to quotechar.

We could not change the behaviour of csv module, which is unsetting quotechar by giving empty string to quotechar and let it be if we use "quote none" quoting, to preserve backward compatibility.

So the error message should be, "The delimiter must be set" for empty string for delimiter, I suppose.
msg197273 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-09-08 15:57
Parsing a csv file with no delimiter would seem to be meaningless, unless I'm misunderstanding what 'delimeter' controls.  So the error messages for delimiter and quotechar are necessarily different.
msg206613 - (view) Author: Roundup Robot (python-dev) Date: 2013-12-19 14:30
New changeset 5ed75e36be8e by Serhiy Storchaka in branch '2.7':
Issue #18829: csv.Dialect() now checks type for delimiter, escapechar and
http://hg.python.org/cpython/rev/5ed75e36be8e

New changeset 52d03fbdf67a by Serhiy Storchaka in branch '3.3':
Issue #18829: csv.Dialect() now checks type for delimiter, escapechar and
http://hg.python.org/cpython/rev/52d03fbdf67a

New changeset 6b17803bfddd by Serhiy Storchaka in branch 'default':
Issue #18829: csv.Dialect() now checks type for delimiter, escapechar and
http://hg.python.org/cpython/rev/6b17803bfddd
msg206614 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-19 14:35
Thank you Vajrasky for your patch. I have simplified and fixed (escapechar can be empty) it. Reverted ValueError back to TypeError because ord() raises TypeError for non-1-character strings.
msg207158 - (view) Author: Roundup Robot (python-dev) Date: 2014-01-02 11:54
New changeset 0daf7f02c97f by Victor Stinner in branch '3.3':
Issue #18829: Add tests for the csv module for invalid characters (delimiter,
http://hg.python.org/cpython/rev/0daf7f02c97f

New changeset ccb52323039f by Victor Stinner in branch 'default':
(Merge 3.3) Issue #18829: Add tests for the csv module for invalid characters
http://hg.python.org/cpython/rev/ccb52323039f
History
Date User Action Args
2014-01-02 11:54:03python-devsetmessages: + msg207158
2013-12-19 14:36:18serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2013-12-19 14:35:44serhiy.storchakasetmessages: + msg206614
2013-12-19 14:30:25python-devsetnosy: + python-dev
messages: + msg206613
2013-12-19 11:01:00serhiy.storchakasetassignee: serhiy.storchaka
2013-12-19 10:58:44serhiy.storchakalinkissue20023 superseder
2013-09-08 15:57:21r.david.murraysetmessages: + msg197273
2013-09-08 15:39:49vajraskysetmessages: + msg197271
2013-09-08 15:12:27vajraskysetmessages: + msg197266
2013-09-08 15:10:16vajraskysetfiles: + fix_error_message_reader_csv_alternative_1_v8.patch

messages: + msg197265
2013-09-08 14:28:18r.david.murraysetmessages: + msg197261
2013-09-08 11:10:45vajraskysetmessages: + msg197249
2013-09-07 16:38:41serhiy.storchakasetassignee: serhiy.storchaka -> (no value)
messages: + msg197172
2013-09-07 11:04:09vajraskysetfiles: + fix_error_message_reader_csv_alternative_1_v7.patch

messages: + msg197147
2013-09-05 15:34:48serhiy.storchakasetfiles: + fix_error_message_reader_csv_alternative_1_v6.patch

messages: + msg197007
2013-09-05 14:45:15vajraskysetfiles: + fix_error_message_reader_csv_alternative_1_v5.patch

messages: + msg197002
2013-09-05 13:55:01serhiy.storchakasetfiles: + fix_error_message_reader_csv_alternative_1_v4.patch

messages: + msg196997
2013-09-05 09:22:24vajraskysetfiles: + fix_error_message_reader_csv_alternative_1_v3.patch

messages: + msg196986
2013-09-04 18:59:00ezio.melottisetnosy: + ezio.melotti
messages: + msg196944
2013-08-26 15:26:48vajraskysetmessages: + msg196212
2013-08-26 02:06:22vajraskysetfiles: + fix_error_message_reader_csv_alternative_1_v2.patch

messages: + msg196171
2013-08-25 15:33:23serhiy.storchakasetassignee: serhiy.storchaka

nosy: + serhiy.storchaka
stage: needs patch -> patch review
2013-08-25 15:18:45vajraskysetfiles: + fix_error_message_reader_csv_alternative_1.patch

messages: + msg196145
2013-08-25 15:11:00vajraskysetfiles: + fix_error_message_reader_csv.patch

nosy: + vajrasky
messages: + msg196141

keywords: + patch
2013-08-25 13:51:43r.david.murraysetmessages: + msg196132
2013-08-25 13:49:21r.david.murraysettitle: csv produces confusing error message for unexpected string encoding -> csv produces confusing error message when passed a non-string delimiter

keywords: + easy, 3.3regression
nosy: + r.david.murray
versions: + Python 3.3, Python 3.4, - Python 3.2
messages: + msg196131
stage: needs patch
2013-08-25 12:56:23Thibault.Krusecreate