This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients docs@python, eryksun, gregory.p.smith, steve.dower
Date 2021-03-04.19:26:58
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1614886018.33.0.991838036762.issue43403@roundup.psfhosted.org>
In-reply-to
Content
> lets not claim that bytes cannot represent everything on a filesystem 
> with an encoding.

Gregory, before changing the filesystem encoding to UTF-8 in Python 3.6, the [A]NSI file API (e.g. CreateFileA) was used for bytes paths and the [W]ide character file API was used for str paths (e.g. CreateFileW). The ANSI API is a set of wrapper functions that automatically translate strings between the ANSI code page of the current process and the system's native UTF-16 encoding, before and after calling the wide-character function (or a common internal function). Starting with Windows 10, the ANSI and OEM code pages of a process are finally allowed to be UTF-8 (code page 65001), but it's still considered beta and barely used. Usually the ANSI API is set to a legacy single-byte or double-byte code page such as 1252 (Western Europe) or 932 (Japanese). 

Natively, Windows is UTF-16, and native Windows filesystems store filenames on disk using 16-bit characters. The system doesn't check for valid Unicode, so lone surrogate codes are allowed. This is sometimes called a "Wobbly" format. In Python it requires the "surrogatepass" error handler.
History
Date User Action Args
2021-03-04 19:26:58eryksunsetrecipients: + eryksun, gregory.p.smith, docs@python, steve.dower
2021-03-04 19:26:58eryksunsetmessageid: <1614886018.33.0.991838036762.issue43403@roundup.psfhosted.org>
2021-03-04 19:26:58eryksunlinkissue43403 messages
2021-03-04 19:26:58eryksuncreate