This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: surrogateescape'd paths not readable on Windows XP.
Type: Stage:
Components: IO Versions: Python 3.1
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: ideasman42, loewis, vstinner
Priority: normal Keywords:

Created on 2010-12-01 23:24 by ideasman42, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
utf8_surrogateescape.py ideasman42, 2010-12-01 23:24 Testfile for surrogateescape'd path not being writable.
Messages (5)
msg123022 - (view) Author: Campbell Barton (ideasman42) * Date: 2010-12-01 23:24
Attached is a script which works in linux but not windows XP 32bit with Python 3.1.3.

The problem is that the path can be written to when specified as bytes but when escaped it fails.
msg123023 - (view) Author: Campbell Barton (ideasman42) * Date: 2010-12-01 23:27
note, this bug was reported to me by a user running windows 7, 64bits.
msg123035 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-12-02 02:03
Use surrogateescape error handler to decode a Windows path is not a good idea. On Windows, the problem is not to decode a path (ANSI => wide char), but to encode a path (wide char => ANSI) to use a function expecting bytes path encoded to the ANSI code page. surrogateescape is only useful on the *decode* operation, to store undecodable bytes in special characters.

Why do you decode a Windows path using UTF-8? UTF-8 is not used, by default, as an ANSI code page. But first, what do you manipulate bytes path on Windows?

If you would like a portable program supporting UNIX/BSD (bytes) and Windows (unicode) paths with a single type, you should use str instead of bytes, because Unicode (with surrogateescape) is a superset of bytes.

Python 3.2 has os.fsencode() and os.fsdecode() functions to do that easily (to decode/encode UNIX/BSD paths).
msg123056 - (view) Author: Campbell Barton (ideasman42) * Date: 2010-12-02 05:22
This bug is with blender3d, were the paths are stored internally in C as simple char arrays - bytes.

We could expose all path names as bytes too through our C/python API, this would at least be a 1:1 mapping, however Id prefer using strings if possible.

Since blender projects need to be portable - compress entire projects and run on different systems, we cant ensure the native fs encoding is used.

So surrogateescape seems to work very well, except for this one case I've run into, windows only.
msg123057 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-12-02 06:14
This is not a bug. You can't expect that using an arbitrary codec (such as UTF-8) with the surrogateescape code, and expect to be able that opening the file will be able to produce the correct filename. This won't work on Unix, in the general case, either. The surrogateescape code will work correctly in this setup only when used with the filesystem encoding.
History
Date User Action Args
2022-04-11 14:57:09adminsetgithub: 54809
2010-12-02 06:14:04loewissetstatus: open -> closed

nosy: + loewis
messages: + msg123057

resolution: not a bug
2010-12-02 05:22:30ideasman42setmessages: + msg123056
2010-12-02 02:03:06vstinnersetnosy: + vstinner
messages: + msg123035
2010-12-01 23:27:44ideasman42setmessages: + msg123023
2010-12-01 23:24:55ideasman42create