Issue 32919: csv.reader() to support QUOTE_ALL

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/77100

classification

Title:	csv.reader() to support QUOTE_ALL
Type:	enhancement	Stage:
Components:	Extension Modules	Versions:	Python 3.8, Python 3.7

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Pavel Shpilev, r.david.murray
Priority:	normal	Keywords:

Created on 2018-02-23 04:55 by Pavel Shpilev, last changed 2022-04-11 14:58 by admin.

Messages (3)
msg312617 - (view)	Author: Pavel Shpilev (Pavel Shpilev)	Date: 2018-02-23 04:55
It appears that in current implementation csv.QUOTE_ALL has no effect on csv. reader(), it only affects csv.writer(). I know that csv is a poorly defined format and all, but I think this might be useful to distinguish None and '' values for the sources that use such quoting. Example: "1","Noneval",,"9" "2","Emptystr","","10" "3","somethingelse","","8" Reader converts all values in the third column to empty strings. The suggestion is to adjust reader's behaviour so when quoting=csv.QUOTE_ALL that would instruct reader to convert empty values (like the one in the first row) to None instead.
msg313194 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-03-03 20:51
QUOTE_ALL only makes sense as an output control parameter, IMO. It is an output discipline but doesn't say anything about semantics. In csv format, an empty field and a field containing the empty quoted string are completely equivalent. I would be -1 on adding an option that differentiated them.
msg313301 - (view)	Author: Pavel Shpilev (Pavel Shpilev)	Date: 2018-03-06 00:42
I know that CSV specification says empty field and empty string are the same, however, I still believe there is practical use for unconventional processing of such fields. In our specific case we parse CSVs produced by Amazon Athena (based on Presto) in which NULL and empty string values represented as above. Following CSV specs dogmatically, there's no way to distinguish between the two, but pragmatically you can tell them apart by simply looking at values. Brief search shows we aren't the only ones facing the issue. After giving it some more thought, I'd agree that csv.QUOTE_ALL doesn't make much sense here, but may be an extra argument to csv.reader() will do the trick? Something like csv.reader(detect_none_values=False/True), with False being default, and emphasis in the documentation that True goes against CSV specification.

History
Date	User	Action	Args
2022-04-11 14:58:58	admin	set	github: 77100
2018-03-06 00:42:52	Pavel Shpilev	set	messages: + msg313301
2018-03-03 20:51:59	r.david.murray	set	nosy: + r.david.murray messages: + msg313194
2018-02-23 04:55:23	Pavel Shpilev	create