Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create PyUnicode_FSDecoder() function #53751

Closed
vstinner opened this issue Aug 8, 2010 · 3 comments
Closed

Create PyUnicode_FSDecoder() function #53751

vstinner opened this issue Aug 8, 2010 · 3 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode

Comments

@vstinner
Copy link
Member

vstinner commented Aug 8, 2010

BPO 9542
Nosy @malemburg, @loewis, @vstinner
Files
  • PyUnicode_FSDecoder.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-08-14.00:00:29.469>
    created_at = <Date 2010-08-08.23:56:47.611>
    labels = ['interpreter-core', 'expert-unicode']
    title = 'Create PyUnicode_FSDecoder() function'
    updated_at = <Date 2010-08-14.00:00:29.468>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2010-08-14.00:00:29.468>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-08-14.00:00:29.469>
    closer = 'vstinner'
    components = ['Interpreter Core', 'Unicode']
    creation = <Date 2010-08-08.23:56:47.611>
    creator = 'vstinner'
    dependencies = []
    files = ['18447']
    hgrepos = []
    issue_num = 9542
    keywords = ['patch']
    message_count = 3.0
    messages = ['113352', '113740', '113854']
    nosy_count = 3.0
    nosy_names = ['lemburg', 'loewis', 'vstinner']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue9542'
    versions = ['Python 3.2']

    @vstinner
    Copy link
    Member Author

    vstinner commented Aug 8, 2010

    For my work on bpo-9425 (Rewrite import machinery to work with unicode paths), I need a PyArg_Parse converter converting bytes and str to str. PyUnicode_FSConverter() is the opposite because it encodes str to bytes.

    To handle (input) filenames in a function, we have 3 choices:

    1/ use bytes: that's the current choice for most Python functions. It gives full unicode support for POSIX OSes (FS using a bytes API), but it is not enough for Windows (Windows uses mbcs encoding which is a very small subset of Unicode)
    2/ use str with the PEP-383 (surrogateescape): it begins to be used in Python 3.1, and more seriously in Python 3.2. It offers full unicode support on all OSes (POSIX and Windows)
    3/ use the native type for each OS (bytes on POSIX, str on Windows): I dislike this solution because it implies code duplication

    PyUnicode_FSConverter() is the converter for solution (1). PyUnicode_FSDecoder() will be the converter for the solution (2).

    @vstinner vstinner added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode labels Aug 8, 2010
    @vstinner
    Copy link
    Member Author

    Lib/os.py may also be patched to add a Python implementation. Eg.

    def fsdecode(value):
        if isinstance(value, str):
            return value
        elif isinstance(value, bytes):
            encoding = sys.getfilesystemencoding()
            if encoding == 'mbcs':
                return value.decode(encoding)
            else:
                return value.decode(encoding, 'surrogateescape')
        else:
            raise TypeError("expect bytes or str, not %s" % type(value).__name__)

    --

    Note: Solution (1) (use bytes API) is not deprecated by this issue. PyUnicode_FSConverter is still useful if the underlying library has a bytes API (eg. OpenSSL only supports char*).

    Solution (2) is preferred if we have access to a character API, eg. Windows wide character API.

    @vstinner
    Copy link
    Member Author

    Commited to 3.2 as r83990.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant