Message370043
On further thought, no, I don't think it would be a reasonable feature.
User opens the CSV file, probably using the default encoding (UTF-8?)
but potentially in anything.
They collect some data as bytes. Those bytes could be from any unknown
encoding. When they try writing those bytes to the CSV file, at best
they get an explicit but confusing exception that the decoding failed,
at worst they get data loss (mojibake).
# Latin-1 to UTF-8 fails
py> b = 'ßæ'.encode('latin-1')
py> b.decode('utf-8')
# raises UnicodeDecodeError: 'utf-8' codec can't decode
# byte 0xdf in position 0: invalid continuation byte
# UTF-8 to Latin-1 loses data
py> b = 'ßæ'.encode('UTF-8')
py> b.decode('latin-1')
# returns mojibake 'Ã\x9fæ'
Short of outright banning the use of bytes (raise a TypeError), I think
the current behaviour is least-worst. |
|
Date |
User |
Action |
Args |
2020-05-27 03:01:35 | steven.daprano | set | recipients:
+ steven.daprano, skip.montanaro, serhiy.storchaka, remi.lapeyre, sidhant |
2020-05-27 03:01:35 | steven.daprano | link | issue40762 messages |
2020-05-27 03:01:34 | steven.daprano | create | |
|