This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: csv.reader() to support QUOTE_ALL
Type: enhancement Stage:
Components: Extension Modules Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Pavel Shpilev, r.david.murray
Priority: normal Keywords:

Created on 2018-02-23 04:55 by Pavel Shpilev, last changed 2022-04-11 14:58 by admin.

Messages (3)
msg312617 - (view) Author: Pavel Shpilev (Pavel Shpilev) Date: 2018-02-23 04:55
It appears that in current implementation csv.QUOTE_ALL has no effect on csv. reader(), it only affects csv.writer(). I know that csv is a poorly defined format and all, but I think this might be useful to distinguish None and '' values for the sources that use such quoting.

Example:

"1","Noneval",,"9"
"2","Emptystr","","10"
"3","somethingelse","","8"

Reader converts all values in the third column to empty strings. The suggestion is to adjust reader's behaviour so when quoting=csv.QUOTE_ALL that would instruct reader to convert empty values (like the one in the first row) to None instead.
msg313194 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-03-03 20:51
QUOTE_ALL only makes sense as an output control parameter, IMO.  It is an output discipline but doesn't say anything about semantics.  In csv format, an empty field and a field containing the empty quoted string are completely equivalent.  I would be -1 on adding an option that differentiated them.
msg313301 - (view) Author: Pavel Shpilev (Pavel Shpilev) Date: 2018-03-06 00:42
I know that CSV specification says empty field and empty string are the same, however, I still believe there is practical use for unconventional processing of such fields.

In our specific case we parse CSVs produced by Amazon Athena (based on Presto) in which NULL and empty string values represented as above. Following CSV specs dogmatically, there's no way to distinguish between the two, but pragmatically you can tell them apart by simply looking at values.

Brief search shows we aren't the only ones facing the issue. After giving it some more thought, I'd agree that csv.QUOTE_ALL doesn't make much sense here, but may be an extra argument to csv.reader() will do the trick? Something like csv.reader(detect_none_values=False/True), with False being default, and emphasis in the documentation that True goes against CSV specification.
History
Date User Action Args
2022-04-11 14:58:58adminsetgithub: 77100
2018-03-06 00:42:52Pavel Shpilevsetmessages: + msg313301
2018-03-03 20:51:59r.david.murraysetnosy: + r.david.murray
messages: + msg313194
2018-02-23 04:55:23Pavel Shpilevcreate