This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author steven.daprano
Recipients remi.lapeyre, serhiy.storchaka, sidhant, skip.montanaro, steven.daprano
Date 2020-05-27.03:01:34
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <20200527025544.GS11884@ando.pearwood.info>
In-reply-to <1590545387.05.0.387979829628.issue40762@roundup.psfhosted.org>
Content
On further thought, no, I don't think it would be a reasonable feature.

User opens the CSV file, probably using the default encoding (UTF-8?) 
but potentially in anything.

They collect some data as bytes. Those bytes could be from any unknown 
encoding. When they try writing those bytes to the CSV file, at best 
they get an explicit but confusing exception that the decoding failed, 
at worst they get data loss (mojibake).

    # Latin-1 to UTF-8 fails
    py> b = 'ßæ'.encode('latin-1')
    py> b.decode('utf-8')
    # raises UnicodeDecodeError: 'utf-8' codec can't decode 
    # byte 0xdf in position 0: invalid continuation byte

    # UTF-8 to Latin-1 loses data
    py> b = 'ßæ'.encode('UTF-8')
    py> b.decode('latin-1')
    # returns mojibake 'Ã\x9fæ'

Short of outright banning the use of bytes (raise a TypeError), I think 
the current behaviour is least-worst.
History
Date User Action Args
2020-05-27 03:01:35steven.dapranosetrecipients: + steven.daprano, skip.montanaro, serhiy.storchaka, remi.lapeyre, sidhant
2020-05-27 03:01:35steven.dapranolinkissue40762 messages
2020-05-27 03:01:34steven.dapranocreate