Message 370348 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	terry.reedy
Recipients	eric.smith, remi.lapeyre, serhiy.storchaka, sidhant, skip.montanaro, steven.daprano, terry.reedy
Date	2020-05-30.01:56:53
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1590803813.55.0.979467061674.issue40762@roundup.psfhosted.org>
In-reply-to

Content
I make 5 core developers who agree that csv should definitely not assume that bytes given to it represent encoded text, reverting to the confusion of Python 1 and 2. (And even it if did, it should not assume that the encoding of the given to it and the encoding of the file are the same, or even that all bytes given to it have the same encoding!) If a user does not like the current default wonky mixed ascii-hex string representation of bytes, the user should explicitly convert bytes to the representation they want. Here are just 3 examples, 2 with possible variations. >>> b'\xc2a9'.hex() # One might want to add prefix '0x' or r'\x'. 'c26139' # Or add a separator. >>> str(list(b'\xc2a9')) # One might want to change or strip brackets, '[194, 97, 57]' # change separator, or strip spaces. >>> b'\xc2a9'.decode('latin-1') 'Âa9' What is best depends on the expected reader of the output. Pandas users who don't like Pandas' csv output should talk to its authors. They are welcome to ask advice on python-list. Eric: I agree that adding 'strict=False' might be a good idea (in a new issue).

I make 5 core developers who agree that csv should definitely *not* assume that bytes given to it represent encoded text, reverting to the confusion of Python 1 and 2.  (And even it if did, it should not assume that the encoding of the given to it and the encoding of the file are the same, or even that all bytes given to it have the same encoding!)

If a user does not like the current default wonky mixed ascii-hex string representation of bytes, the user should explicitly convert bytes to the representation they want.  Here are just 3 examples, 2 with possible variations.

>>> b'\xc2a9'.hex()  # One might want to add prefix '0x' or r'\x'.
'c26139'             # Or add a separator.
>>> str(list(b'\xc2a9'))  # One might want to change or strip brackets, 
'[194, 97, 57]'           # change separator, or strip spaces.
>>> b'\xc2a9'.decode('latin-1')
'Âa9'

What is best depends on the expected reader of the output.  Pandas users who don't like Pandas' csv output should talk to its authors.  They are welcome to ask advice on python-list.

Eric: I agree that adding 'strict=False' might be a good idea (in a new issue).

History
Date	User	Action	Args
2020-05-30 01:56:53	terry.reedy	set	recipients: + terry.reedy, skip.montanaro, eric.smith, steven.daprano, serhiy.storchaka, remi.lapeyre, sidhant
2020-05-30 01:56:53	terry.reedy	set	messageid: <1590803813.55.0.979467061674.issue40762@roundup.psfhosted.org>
2020-05-30 01:56:53	terry.reedy	link	issue40762 messages
2020-05-30 01:56:53	terry.reedy	create