Message386547
On most platforms, unless UTF-8 mode is enabled, locale.getpreferredencoding(False) returns the LC_CTYPE encoding of the current locale. For example, in Linux:
>>> locale.setlocale(locale.LC_CTYPE, 'en_US.UTF-8')
'en_US.UTF-8'
>>> locale.getpreferredencoding(False)
'UTF-8'
>>> locale.setlocale(locale.LC_CTYPE, 'en_US.iso-88591')
'en_US.iso-88591'
>>> locale.getpreferredencoding(False)
'ISO-8859-1'
If the designers of the io module had wanted the preferred encoding to always be the default encoding from setlocale(LC_CTYPE, ""), they would have used and documented locale.getpreferredencoding(True).
---
In Windows, locale.getpreferredencoding(False) always returns the default encoding from locale.getdefaultlocale(), which is the process active (ANSI) code page. Changing it to track the LC_CTYPE locale would be convenient for applications and scripts running in Windows 10, for which the CRT's POSIX locale implementation has supported UTF-8 since spring of 2018.
The base behavior can't be changed at this point, but a -X option and/or environment variable could enable locale.getpreferredencoding(False) -- i.e. locale._get_locale_encoding() -- to return the current LC_CTYPE encoding in Windows, as it does in POSIX. |
|
Date |
User |
Action |
Args |
2021-02-06 06:39:42 | eryksun | set | recipients:
+ eryksun, docs@python, smallbigcake |
2021-02-06 06:39:41 | eryksun | set | messageid: <1612593581.97.0.493609874304.issue43140@roundup.psfhosted.org> |
2021-02-06 06:39:41 | eryksun | link | issue43140 messages |
2021-02-06 06:39:41 | eryksun | create | |
|