Issue 24147: Dialect class defaults are not documented.

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/68335

classification

Title:	Dialect class defaults are not documented.
Type:	behavior	Stage:	patch review
Components:	Documentation	Versions:	Python 3.10

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:	docs@python	Nosy List:	MiK, berker.peksag, docs@python, iritkatriel, jbmilam, r.david.murray, skip.montanaro, uniocto
Priority:	normal	Keywords:	easy, patch

Created on 2015-05-09 01:05 by MiK, last changed 2022-04-11 14:58 by admin.

Files
File name	Uploaded	Description	Edit
sans_headers.csv	MiK, 2015-05-09 16:29
quotebug.py	skip.montanaro, 2015-05-17 12:54
csv_dialect_doc_clarify.patch	jbmilam, 2015-05-29 19:28	Document clarification	review
csv.html	jbmilam, 2015-05-29 19:29	html file holding the changes

Pull Requests
URL	Status	Linked	Edit
PR 25989	open	uniocto, 2021-05-08 12:23

Messages (14)
msg242787 - (view)	Author: Mik (MiK)	Date: 2015-05-09 01:05
Python 2.7.3 (default, Mar 13 2014, 11:03:55) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import csv >>> class Mon(csv.Dialect): ... delimiter = ',' ... quotechar = '"' ... quoting = 0 ... lineterminator = '\n' ... >>> f = open('sans_headers.csv','r') >>> reader = csv.DictReader(f, fieldnames=('code', 'nom', 'texte'), dialect=Mon) >>> for l in reader: ... print l ... {'nom': 'line_1', 'code': '3', 'texte': 'one line\ntwo lines'} {'nom': 'line_2', 'code': '5', 'texte': 'one line\nand a quote "iop"";newline'} {'nom': None, 'code': 'I\'m not a cat"', 'texte': None} >>> f.seek(0) >>> reader = csv.DictReader(f, fieldnames=('code', 'nom', 'texte'), delimiter=',', quotechar='"', quoting=0, lineterminator='\n') >>> for l in reader: ... print l ... {'nom': 'line_1', 'code': '3', 'texte': 'one line\ntwo lines'} {'nom': 'line_2', 'code': '5', 'texte': 'one line\nand a quote "iop";newline\nI\'m not a cat'} >>> If I use a subclass of csv.Dialect with the same attribute that I should use with keywords in calling csv.DictReader I don't get the same behaviour.
msg242807 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2015-05-09 12:00
Can you attach your cab file so we don't need to reconstruct it (and possibly make a mistake) by reading your program's output?
msg242811 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2015-05-09 13:19
Sorry, failed to override my phone's spell correction. "cab" should be "csv".
msg242818 - (view)	Author: Mik (MiK)	Date: 2015-05-09 16:29
Hi, This is the file used for my test. Thank you, regard, Mik
msg243396 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2015-05-17 12:54
In your Mon class, you've left the doublequote parameter unset (it defaults to None). It completely overrides the default dialect. When no Dialect class is given, the default is csv.excel. Note that doublequote is set to True in csv.excel. In your second example, the reader starts with csv.excel, then selectively overrides the named attributes.
msg243397 - (view)	Author: Mik (MiK)	Date: 2015-05-17 13:06
Ok Thanks. But perhaps the documentation of csv.Dialect would be updated with the default parameters. If all attribute may be specified this would be indicated in the doc.
msg243399 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-05-17 13:18
Yes, I think the documentation should be improved.
msg243402 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2015-05-17 14:50
The defaults for the Dialect class are documented: https://docs.python.org/2/library/csv.html#dialects-and-formatting-parameters I think the problem is mostly that csv.Dialect must be subclassed. You can't use it directly, and if you subclass it as MiK did, you have to supply all the missing parameters. The default dialect is actually csv.excel, which does provide a suitable set of values for all attributes. There actually might be a bug lurking in the code as well. The value of csv.Dialect.doublequote is None, which will evaluate to False in a boolean context. The module docstring has this to say about that attribute: * doublequote - controls the handling of quotes inside fields. When True, two consecutive quotes are interpreted as one during read, and when writing, each quote character embedded in the data is written as two quotes Since the valid values of that attribute are actually only True and False, using None as a default value is an invitation to problems. It appears in this case that's what happened. csv.Dialect.__init__ doesn't seem to check that the overriding class properly sets all the required parameters. It checks to see if the class is Dialect. If not, and if the validate() call passes, all is assumed to be well. But digging a bit under the surface, it appears the validate step drops into C where the doublequote attribute of Dialog_Type is 0. I'm not sure the bug should be fixed in 2.7, but it's worth taking a look at the 3.5 code to see if that validation step can be improved.
msg244344 - (view)	Author: Brandon Milam (jbmilam) *	Date: 2015-05-28 20:29
Hi all, I've been looking at this bug and am ready to start putting in some work on it but I have some questions about what is wanting to be done. From what I can tell these are the possible tasks for this issue. - Add to the docs under the dialect section the excel attributes vs. the dialect class attributes and explain how the excel dialect is the default and this is the functionality you'd be changing by creating a new dialect. - Add code to make sure that a certain number of attributes are set before the dialect can be accessed. (Though this might be C code and not really a C programmer nor do I know where _Dialect is in the repository) - Change the defaults in the dialects class because currently the documentation for "double quote" and "skip initial space" says that the default is False when in the code it is None. Also I did not find the "strict" dialect in the module at all. (maybe its part of that C code that I don't know how to find. - Add an example to the documentation on sub-classing dialect under the example on registering a new dialect If someone could clarify which of these is the desired direction for this issue it would be much appreciated.
msg244347 - (view)	Author: Mik (MiK)	Date: 2015-05-28 21:24
Hi, I have just read the documentation once again. The problem is that it specifies that the default value for Dialect.doublequote is True : <quote>Controls how instances of quotechar appearing inside a field should be themselves be quoted. When True, the character is doubled. When False, the escapechar is used as a prefix to the quotechar. It defaults to True.</quote> So it is easy to understand that the class csv.Dialect implements this default value. Although the class Dialect default in the csv.reader calling is "Excel" and thus, implicitly, it is csv.excel the default class whose attributes are described in the above paragraph. It would be great in this case to describe the attributes of the base class Dialect or specify that all attributes must be settled when we subclass this. Optionally it would be good that the code of CSV.Dialect be changed for really Boolean values. But the clarification of documentation is more important I think.
msg244400 - (view)	Author: Brandon Milam (jbmilam) *	Date: 2015-05-29 19:28
Here I added on to the Dialects and Formatting Parameters paragraph explaining that the defaults listed are for the excel dialect and that all the attributes need to be specified if the user is wanting to create custom dialects through sub-classing. I will also include the html file this produces for those who do not want to look at the .rst file. Also I can go in and change the defaults of the Dialect class on the parameters that expect Boolean values if desired but I would open a separate issue for it. Let me know if there are any errors or desired changes in document change.
msg244407 - (view)	Author: Mik (MiK)	Date: 2015-05-29 20:03
I think it's clearer that way. Thank you.
msg389327 - (view)	Author: Irit Katriel (iritkatriel) *	Date: 2021-03-22 16:00
Brandon's patch has not been applied, it needs to be converted into a git PR.
msg393254 - (view)	Author: So Ukiyama (uniocto) *	Date: 2021-05-08 12:32
I created a PR which apply Brandon Milam's patch. So If I have offended you with my rudeness, I hope you will forgive me for taking this down.

History
Date	User	Action	Args
2022-04-11 14:58:16	admin	set	github: 68335
2021-05-08 12:32:59	uniocto	set	messages: + msg393254
2021-05-08 12:23:54	uniocto	set	keywords: + patch nosy: + uniocto pull_requests: + pull_request24641 stage: needs patch -> patch review
2021-03-22 16:00:51	iritkatriel	set	versions: + Python 3.10, - Python 3.4, Python 3.5 nosy: + iritkatriel messages: + msg389327 keywords: + easy, - patch
2016-04-27 02:36:58	berker.peksag	set	nosy: + berker.peksag
2015-05-29 20:03:06	MiK	set	messages: + msg244407
2015-05-29 19:29:23	jbmilam	set	files: + csv.html
2015-05-29 19:28:39	jbmilam	set	files: + csv_dialect_doc_clarify.patch keywords: + patch messages: + msg244400
2015-05-28 21:24:51	MiK	set	messages: + msg244347
2015-05-28 20:29:03	jbmilam	set	nosy: + jbmilam messages: + msg244344
2015-05-17 14:50:32	skip.montanaro	set	messages: + msg243402
2015-05-17 13:18:29	r.david.murray	set	status: closed -> open assignee: docs@python stage: needs patch title: doublequote are not well recognized with Dialect class -> Dialect class defaults are not documented. nosy: + r.david.murray, docs@python versions: + Python 3.4, Python 3.5, - Python 2.7 messages: + msg243399 components: + Documentation, - Library (Lib) resolution: not a bug ->
2015-05-17 13:06:13	MiK	set	messages: + msg243397
2015-05-17 12:54:17	skip.montanaro	set	status: open -> closed files: + quotebug.py resolution: not a bug messages: + msg243396
2015-05-09 16:29:21	MiK	set	files: + sans_headers.csv messages: + msg242818
2015-05-09 13:19:10	skip.montanaro	set	messages: + msg242811
2015-05-09 12:00:09	skip.montanaro	set	nosy: + skip.montanaro messages: + msg242807
2015-05-09 01:05:19	MiK	create