This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author terry.reedy
Recipients eric.smith, remi.lapeyre, serhiy.storchaka, sidhant, skip.montanaro, steven.daprano, terry.reedy
Date 2020-05-30.01:56:53
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1590803813.55.0.979467061674.issue40762@roundup.psfhosted.org>
In-reply-to
Content
I make 5 core developers who agree that csv should definitely *not* assume that bytes given to it represent encoded text, reverting to the confusion of Python 1 and 2.  (And even it if did, it should not assume that the encoding of the given to it and the encoding of the file are the same, or even that all bytes given to it have the same encoding!)

If a user does not like the current default wonky mixed ascii-hex string representation of bytes, the user should explicitly convert bytes to the representation they want.  Here are just 3 examples, 2 with possible variations.

>>> b'\xc2a9'.hex()  # One might want to add prefix '0x' or r'\x'.
'c26139'             # Or add a separator.
>>> str(list(b'\xc2a9'))  # One might want to change or strip brackets, 
'[194, 97, 57]'           # change separator, or strip spaces.
>>> b'\xc2a9'.decode('latin-1')
'Âa9'

What is best depends on the expected reader of the output.  Pandas users who don't like Pandas' csv output should talk to its authors.  They are welcome to ask advice on python-list.

Eric: I agree that adding 'strict=False' might be a good idea (in a new issue).
History
Date User Action Args
2020-05-30 01:56:53terry.reedysetrecipients: + terry.reedy, skip.montanaro, eric.smith, steven.daprano, serhiy.storchaka, remi.lapeyre, sidhant
2020-05-30 01:56:53terry.reedysetmessageid: <1590803813.55.0.979467061674.issue40762@roundup.psfhosted.org>
2020-05-30 01:56:53terry.reedylinkissue40762 messages
2020-05-30 01:56:53terry.reedycreate