This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients docs@python, ericzolf, eryksun, ezio.melotti, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Date 2021-03-04.14:27:26
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1614868048.12.0.322783874664.issue43395@roundup.psfhosted.org>
In-reply-to
Content
> Vice versa, using bytes objects cannot represent all file names 
> on Windows (in the standard mbcs encoding), hence Windows 
> applications should use string objects to access all files.

This is outdated advice that should be removed, or at least reworded to emphasize that the 'mbcs' encoding is only used in legacy mode, with a link to the documentation of sys._enablelegacywindowsfsencoding [1].

Starting in Python 3.6, the default filesystem encoding in Windows is UTF-8. Internally, what happens is that a UTF-8 byte string gets translated to UTF-16 (2 or 4 bytes per character), the native Unicode encoding of the Windows API. 

A caveat is that Windows filesystems use 16-bit characters that are not restricted to valid Unicode. In particular, ordinals U+D800-U+DFFF are not reserved for use in surrogate pairs. This is "Wobbly" Unicode, and the filesystem encoding thus needs to be "Wobbly Transformation Format, 8-bit" (WTF-8). This is implemented in Python by setting the encode errors handler to "surrogatepass", in contrast to using "surrogateescape" in POSIX. For example, os.fsencode('\ud800') succeeds in Windows but fails in POSIX, while os.fsdecode(b'\x80') fails in Windows but succeeds in POSIX. The latter case is not a practical problem since filesystem functions will never return an invalid WTF-8 byte string.

---
[1] https://docs.python.org/3/library/sys.html#sys._enablelegacywindowsfsencoding
History
Date User Action Args
2021-03-04 14:27:28eryksunsetrecipients: + eryksun, paul.moore, vstinner, tim.golden, ezio.melotti, docs@python, zach.ware, steve.dower, ericzolf
2021-03-04 14:27:28eryksunsetmessageid: <1614868048.12.0.322783874664.issue43395@roundup.psfhosted.org>
2021-03-04 14:27:28eryksunlinkissue43395 messages
2021-03-04 14:27:26eryksuncreate