classification
Title: Dialect class defaults are not documented.
Type: behavior Stage: needs patch
Components: Documentation Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: MiK, berker.peksag, docs@python, iritkatriel, jbmilam, r.david.murray, skip.montanaro
Priority: normal Keywords: easy

Created on 2015-05-09 01:05 by MiK, last changed 2021-03-22 16:00 by iritkatriel.

Files
File name Uploaded Description Edit
sans_headers.csv MiK, 2015-05-09 16:29
quotebug.py skip.montanaro, 2015-05-17 12:54
csv_dialect_doc_clarify.patch jbmilam, 2015-05-29 19:28 Document clarification review
csv.html jbmilam, 2015-05-29 19:29 html file holding the changes
Messages (13)
msg242787 - (view) Author: Mik (MiK) Date: 2015-05-09 01:05
Python 2.7.3 (default, Mar 13 2014, 11:03:55) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> class Mon(csv.Dialect):
...     delimiter = ','
...     quotechar = '"'
...     quoting = 0
...     lineterminator = '\n'
... 
>>> f = open('sans_headers.csv','r')
>>> reader = csv.DictReader(f, fieldnames=('code', 'nom', 'texte'), dialect=Mon)
>>> for l in reader:
...     print l
... 
{'nom': 'line_1', 'code': '3', 'texte': 'one line\ntwo lines'}
{'nom': 'line_2', 'code': '5', 'texte': 'one line\nand a quote "iop"";newline'}
{'nom': None, 'code': 'I\'m not a cat"', 'texte': None}
>>> f.seek(0)
>>> reader = csv.DictReader(f, fieldnames=('code', 'nom', 'texte'), delimiter=',', quotechar='"', quoting=0, lineterminator='\n')
>>> for l in reader:
...     print l
... 
{'nom': 'line_1', 'code': '3', 'texte': 'one line\ntwo lines'}
{'nom': 'line_2', 'code': '5', 'texte': 'one line\nand a quote "iop";newline\nI\'m not a cat'}
>>> 



If I use a subclass of csv.Dialect with the same attribute that I should use with keywords in calling csv.DictReader I don't get the same behaviour.
msg242807 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2015-05-09 12:00
Can you attach your cab file so we don't need to reconstruct it (and possibly make a mistake) by reading your program's output?
msg242811 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2015-05-09 13:19
Sorry, failed to override my phone's spell correction. "cab" should be
"csv".
msg242818 - (view) Author: Mik (MiK) Date: 2015-05-09 16:29
Hi,

This is the file used for my test.

Thank you,

regard,

Mik
msg243396 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2015-05-17 12:54
In your Mon class, you've left the doublequote parameter unset (it defaults to None). It completely overrides the default dialect. When no Dialect class is given, the default is csv.excel. Note that doublequote is set to True in csv.excel. In your second example, the reader starts with csv.excel, then selectively overrides the named attributes.
msg243397 - (view) Author: Mik (MiK) Date: 2015-05-17 13:06
Ok Thanks.

But perhaps the documentation of csv.Dialect would be updated with the default parameters. If all attribute may be specified this would be indicated in the doc.
msg243399 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-17 13:18
Yes, I think the documentation should be improved.
msg243402 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2015-05-17 14:50
The defaults for the Dialect class are documented:

https://docs.python.org/2/library/csv.html#dialects-and-formatting-parameters

I think the problem is mostly that csv.Dialect must be subclassed. You can't use it directly, and if you subclass it as MiK did, you have to supply all the missing parameters. The default dialect is actually csv.excel, which does provide a suitable set of values for all attributes.

There actually might be a bug lurking in the code as well. The value of csv.Dialect.doublequote is None, which will evaluate to False in a boolean context. The module docstring has this to say about that attribute:

        * doublequote - controls the handling of quotes inside fields.  When
            True, two consecutive quotes are interpreted as one during read,
            and when writing, each quote character embedded in the data is
            written as two quotes

Since the valid values of that attribute are actually only True and False, using None as a default value is an invitation to problems. It appears in this case that's what happened.

csv.Dialect.__init__ doesn't seem to check that the overriding class properly sets all the required parameters. It checks to see if the class is Dialect. If not, and if the validate() call passes, all is assumed to be well. But digging a bit under the surface, it appears the validate step drops into C where the doublequote attribute of Dialog_Type is 0.

I'm not sure the bug should be fixed in 2.7, but it's worth taking a look at the 3.5 code to see if that validation step can be improved.
msg244344 - (view) Author: Brandon Milam (jbmilam) * Date: 2015-05-28 20:29
Hi all,

I've been looking at this bug and am ready to start putting in some work on it but I have some questions about what is wanting to be done. From what I can tell these are the possible tasks for this issue.

- Add to the docs under the dialect section the excel attributes vs. the dialect class attributes and explain how the excel dialect is the default and this is the functionality you'd be changing by creating a new dialect.

- Add code to make sure that a certain number of attributes are set before the dialect can be accessed. (Though this might be C code and not really a C programmer nor do I know where _Dialect is in the repository)

- Change the defaults in the dialects class because currently the documentation for "double quote" and "skip initial space" says that the default is False when in the code it is None. Also I did not find the "strict" dialect in the module at all. (maybe its part of that C code that I don't know how to find.

- Add an example to the documentation on sub-classing dialect under the example on registering a new dialect

If someone could clarify which of these is the desired direction for this issue it would be much appreciated.
msg244347 - (view) Author: Mik (MiK) Date: 2015-05-28 21:24
Hi,

I have just read the documentation once again.

The problem is that it specifies that the default value for Dialect.doublequote is True :
<quote>Controls how instances of quotechar appearing inside a field should be themselves be quoted. When True, the character is doubled. When False, the escapechar is used as a prefix to the quotechar. It defaults to True.</quote>

So it is easy to understand that the class csv.Dialect implements this default value. Although the class Dialect default in the csv.reader calling is "Excel" and thus, implicitly, it is csv.excel the default class whose attributes are described in the above paragraph. 

It would be great in this case to describe the attributes of the base class Dialect or specify that all attributes must be settled when we subclass this.

Optionally it would be good that the code of CSV.Dialect be changed for really Boolean values. But the clarification of documentation is more important I think.
msg244400 - (view) Author: Brandon Milam (jbmilam) * Date: 2015-05-29 19:28
Here I added on to the Dialects and Formatting Parameters paragraph explaining that the defaults listed are for the excel dialect and that all the attributes need to be specified if the user is wanting to create custom dialects through sub-classing. I will also include the html file this produces for those who do not want to look at the .rst file.

Also I can go in and change the defaults of the Dialect class on the parameters that expect Boolean values if desired but I would open a separate issue for it. 

Let me know if there are any errors or desired changes in document change.
msg244407 - (view) Author: Mik (MiK) Date: 2015-05-29 20:03
I think it's clearer that way. Thank you.
msg389327 - (view) Author: Irit Katriel (iritkatriel) * (Python triager) Date: 2021-03-22 16:00
Brandon's patch has not been applied, it needs to be converted into a git PR.
History
Date User Action Args
2021-03-22 16:00:51iritkatrielsetversions: + Python 3.10, - Python 3.4, Python 3.5
nosy: + iritkatriel

messages: + msg389327

keywords: + easy, - patch
2016-04-27 02:36:58berker.peksagsetnosy: + berker.peksag
2015-05-29 20:03:06MiKsetmessages: + msg244407
2015-05-29 19:29:23jbmilamsetfiles: + csv.html
2015-05-29 19:28:39jbmilamsetfiles: + csv_dialect_doc_clarify.patch
keywords: + patch
messages: + msg244400
2015-05-28 21:24:51MiKsetmessages: + msg244347
2015-05-28 20:29:03jbmilamsetnosy: + jbmilam
messages: + msg244344
2015-05-17 14:50:32skip.montanarosetmessages: + msg243402
2015-05-17 13:18:29r.david.murraysetstatus: closed -> open


assignee: docs@python
stage: needs patch
title: doublequote are not well recognized with Dialect class -> Dialect class defaults are not documented.
nosy: + r.david.murray, docs@python
versions: + Python 3.4, Python 3.5, - Python 2.7
messages: + msg243399
components: + Documentation, - Library (Lib)
resolution: not a bug ->
2015-05-17 13:06:13MiKsetmessages: + msg243397
2015-05-17 12:54:17skip.montanarosetstatus: open -> closed
files: + quotebug.py
resolution: not a bug
messages: + msg243396
2015-05-09 16:29:21MiKsetfiles: + sans_headers.csv

messages: + msg242818
2015-05-09 13:19:10skip.montanarosetmessages: + msg242811
2015-05-09 12:00:09skip.montanarosetnosy: + skip.montanaro
messages: + msg242807
2015-05-09 01:05:19MiKcreate