classification
Title: Create PyUnicode_FSDecoder() function
Type: Stage:
Components: Interpreter Core, Unicode Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: haypo, lemburg, loewis
Priority: normal Keywords: patch

Created on 2010-08-08 23:56 by haypo, last changed 2010-08-14 00:00 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
PyUnicode_FSDecoder.patch haypo, 2010-08-08 23:56
Messages (3)
msg113352 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-08-08 23:56
For my work on #9425 (Rewrite import machinery to work with unicode paths), I need a PyArg_Parse converter converting bytes and str to str. PyUnicode_FSConverter() is the opposite because it encodes str to bytes.

To handle (input) filenames in a function, we have 3 choices:

 1/ use bytes: that's the current choice for most Python functions. It gives full unicode support for POSIX OSes (FS using a bytes API), but it is not enough for Windows (Windows uses mbcs encoding which is a very small subset of Unicode)
 2/ use str with the PEP 383 (surrogateescape): it begins to be used in Python 3.1, and more seriously in Python 3.2. It offers full unicode support on all OSes (POSIX and Windows)
 3/ use the native type for each OS (bytes on POSIX, str on Windows): I dislike this solution because it implies code duplication

PyUnicode_FSConverter() is the converter for solution (1). PyUnicode_FSDecoder() will be the converter for the solution (2).
msg113740 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-08-13 01:47
Lib/os.py may also be patched to add a Python implementation. Eg.

def fsdecode(value):
    if isinstance(value, str):
        return value
    elif isinstance(value, bytes):
        encoding = sys.getfilesystemencoding()
        if encoding == 'mbcs':
            return value.decode(encoding)
        else:
            return value.decode(encoding, 'surrogateescape')
    else:
        raise TypeError("expect bytes or str, not %s" % type(value).__name__)

--

Note: Solution (1) (use bytes API) is not deprecated by this issue. PyUnicode_FSConverter is still useful if the underlying library has a bytes API (eg. OpenSSL only supports char*).

Solution (2) is preferred if we have access to a character API, eg. Windows wide character API.
msg113854 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-08-14 00:00
Commited to 3.2 as r83990.
History
Date User Action Args
2010-08-14 00:00:29hayposetstatus: open -> closed
resolution: fixed
messages: + msg113854
2010-08-13 01:47:35hayposetmessages: + msg113740
2010-08-08 23:56:47haypocreate