classification
Title: csv: Inconsistency re QUOTE_NONNUMERIC
Type: behavior Stage:
Components: Library (Lib) Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: corona10, serhiy.storchaka, tlotze, xiang.zhang
Priority: normal Keywords:

Created on 2017-04-11 20:41 by tlotze, last changed 2017-04-19 07:11 by corona10.

Messages (5)
msg291516 - (view) Author: Thomas Lotze (tlotze) Date: 2017-04-11 20:41
A csv.writer with quoting=csv.QUOTE_NONNUMERIC does not quote boolean values, which makes a csv.reader with the same quoting behaviour fail on that value:

-------- csv.py ----------

import csv
import io


f = io.StringIO()

writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC)
writer.writerow(['asdf', 1, True])

f.seek(0)
reader = csv.reader(f, quoting=csv.QUOTE_NONNUMERIC)
for row in reader:
    print(row)

----------------------

$ python3 csvbug.py 
Traceback (most recent call last):
  File "csvbug.py", line 12, in <module>
    for row in reader:
ValueError: could not convert string to float: 'True'

----------------------

I'd consider this inconsistency a bug, but in any case something that needs documenting.
msg291590 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-04-13 07:27
boolean is not quoted since in Python it's a subclass of int so True and False are numeric. This is also the case with numeric objects defining __int__ or __float__ but doesn't get a corresponding string representation.

Since QUOTE_NONNUMERIC will converts data to float when reading, I think we may force the converting even when writing so the inconsistency would disappear. Or document this limitation.
msg291862 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-19 05:46
This issue is not easy, it needs a thoughtful design before starting coding. I agree with Xiang's analysis and proposed solutions. But if just convert numbers to float we can get an overflow for large integers or lost precision in case of Decimal. The consumer of the CSV file may be not Python and it may support larger precision than Python float.

And it is not clear what would be better solution for enums. Should they be serialized by name or by value?
msg291864 - (view) Author: Dong-hee Na (corona10) * Date: 2017-04-19 07:09
I would like to solve this issue. 
Is there any other way than casting specifically for the bool object?
(e.g For general ways?)
msg291865 - (view) Author: Dong-hee Na (corona10) * Date: 2017-04-19 07:11
Oh, I read the Serhiy Storchaka 's comment just right now.
History
Date User Action Args
2017-04-19 07:11:22corona10setmessages: + msg291865
2017-04-19 07:09:26corona10setnosy: + corona10
messages: + msg291864
2017-04-19 07:08:52corona10setpull_requests: - pull_request1305
2017-04-19 07:08:42corona10setpull_requests: - pull_request1303
2017-04-19 06:38:04corona10setpull_requests: + pull_request1305
2017-04-19 05:46:33serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg291862
2017-04-19 05:07:38corona10setpull_requests: + pull_request1303
2017-04-13 07:27:02xiang.zhangsetnosy: + xiang.zhang
messages: + msg291590
2017-04-11 20:41:55tlotzecreate