Message388124
> lets not claim that bytes cannot represent everything on a filesystem
> with an encoding.
Gregory, before changing the filesystem encoding to UTF-8 in Python 3.6, the [A]NSI file API (e.g. CreateFileA) was used for bytes paths and the [W]ide character file API was used for str paths (e.g. CreateFileW). The ANSI API is a set of wrapper functions that automatically translate strings between the ANSI code page of the current process and the system's native UTF-16 encoding, before and after calling the wide-character function (or a common internal function). Starting with Windows 10, the ANSI and OEM code pages of a process are finally allowed to be UTF-8 (code page 65001), but it's still considered beta and barely used. Usually the ANSI API is set to a legacy single-byte or double-byte code page such as 1252 (Western Europe) or 932 (Japanese).
Natively, Windows is UTF-16, and native Windows filesystems store filenames on disk using 16-bit characters. The system doesn't check for valid Unicode, so lone surrogate codes are allowed. This is sometimes called a "Wobbly" format. In Python it requires the "surrogatepass" error handler. |
|
Date |
User |
Action |
Args |
2021-03-04 19:26:58 | eryksun | set | recipients:
+ eryksun, gregory.p.smith, docs@python, steve.dower |
2021-03-04 19:26:58 | eryksun | set | messageid: <1614886018.33.0.991838036762.issue43403@roundup.psfhosted.org> |
2021-03-04 19:26:58 | eryksun | link | issue43403 messages |
2021-03-04 19:26:58 | eryksun | create | |
|