Author vstinner
Recipients JGoutin, lemburg, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Date 2017-01-13.12:46:23
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1484311584.38.0.0800548635489.issue29241@psf.upfronthosting.co.za>
In-reply-to
Content
My experience with changing the Python "filesystem encoding" (sys.getfilesystemencoding()) at runtime: it doesn't work.

The filesystem encoding must be set as soon as possible and must never change later. As soon as possible: before the first call to os.fsdecode(), which is implemented in C as Py_DecodeLocale(). For example, the encoding must be set before Python imports the first module.

The filesystem encoding must be set before Python decodes *any* operating system data: command line arguments, any filename or path, environment variables, etc.

Hopefully, Windows provides most operating system data as Unicode directly: command line arguments and environment variables are exposed as Unicode for example.

os.fsdecode() and os.fsencode() have an important property:
assert os.fsencode(os.fsdecode(data)) == data

On Windows, the other property is maybe more imporant:
assert os.fsdecode(os.fsencode(data)) == data

If the property becomes false, for example if the filesystem encoding is changed at runtime, you get mojibake. Example:

* os.fsdecode() decodes the filename b'h\xc3\xa9llo' from UTF-8 => 'h\xe9llo'
* sys._enablelegacywindowsfsencoding()
* os.fsencode() encodes the filename to cp1252 => you get 'h\xc3\xa9llo'
 instead of 'h\xe9llo', say hello to mojibake

--

Sorry, I didn't play with sys._enablelegacywindowsfsencoding() on Windows. I don't know if it would "work" if sys._enablelegacywindowsfsencoding() is the first instruction of an application. I expect that Python almost decodes nothing at startup on Windows, so it may work.

sys._enablelegacywindowsfsencoding() is a private method, so it shouldn't be used.

Maybe we could add a "fuse" (flag only sets to 1, cannot be reset to 0) to raise an exception if sys._enablelegacywindowsfsencoding() is called "too late", after the first call to os.fsdecode() / Py_DecodeLocale()?
History
Date User Action Args
2017-01-13 12:46:24vstinnersetrecipients: + vstinner, lemburg, paul.moore, tim.golden, zach.ware, steve.dower, JGoutin
2017-01-13 12:46:24vstinnersetmessageid: <1484311584.38.0.0800548635489.issue29241@psf.upfronthosting.co.za>
2017-01-13 12:46:24vstinnerlinkissue29241 messages
2017-01-13 12:46:23vstinnercreate