Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python3: use ASCII for the file system encoding on initfsencoding() failure #52971

Closed
vstinner opened this issue May 15, 2010 · 5 comments
Closed
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode

Comments

@vstinner
Copy link
Member

BPO 8725
Nosy @malemburg, @loewis, @pitrou, @vstinner
Dependencies
  • bpo-8611: Python3 doesn't support locale different than utf8 and an non-ASCII path (POSIX)
  • bpo-8715: Create PyUnicode_EncodeFSDefault() function
  • Files
  • fsencoding_ascii-2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-10-19.23:55:25.879>
    created_at = <Date 2010-05-15.12:39:05.650>
    labels = ['interpreter-core', 'invalid', 'expert-unicode']
    title = 'Python3: use ASCII for the file system encoding on initfsencoding() failure'
    updated_at = <Date 2010-10-19.23:55:25.878>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2010-10-19.23:55:25.878>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-10-19.23:55:25.879>
    closer = 'vstinner'
    components = ['Interpreter Core', 'Unicode']
    creation = <Date 2010-05-15.12:39:05.650>
    creator = 'vstinner'
    dependencies = ['8611', '8715']
    files = ['17357']
    hgrepos = []
    issue_num = 8725
    keywords = ['patch']
    message_count = 5.0
    messages = ['105804', '105820', '105842', '111758', '119180']
    nosy_count = 5.0
    nosy_names = ['lemburg', 'loewis', 'pitrou', 'vstinner', 'Arfrever']
    pr_nums = []
    priority = 'normal'
    resolution = 'not a bug'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue8725'
    versions = ['Python 3.2']

    @vstinner
    Copy link
    Member Author

    I introduced initfsencoding() in bpo-8610 to ensure that Py_FileSystemEncoding is not more NULL. In the discussion, Marc Lemburg noticed that falling back the UTF-8 on nl_langinfo(CODESET) error is a bad idea: ASCII is better (I agree).

    We cannot fall back to ASCII yet because there are two other problems that have to be fixed before that:

    • Python3 doesn't support surrogates in module filenames: see bpo-8611
    • If Py_FileSystemEncoding is NULL, encoding functions fallback to utf-8 (PyUnicode_GetDefaultEncoding()). bpo-8715 proposes a new PyUnicode_EncodeFSDefault() function to fix this problem

    Attached patch is a partial fix for this issue.

    @vstinner vstinner added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode labels May 15, 2010
    @vstinner
    Copy link
    Member Author

    PyUnicode_AsEncodedString() contains a special path for the file system encoding. I don't think that it is still needed, but I don't know how to check that. => read msg105810

    @vstinner
    Copy link
    Member Author

    Version 2:

    • bpo-8715 has been commited: patch PyUnicode_EncodeFSDefault()
    • fix the documentation according the changes

    @vstinner
    Copy link
    Member Author

    I tried the patch on my import_unicode branch and it doesn't work if the locale encoding is not ASCII (as the current code doesn't work if the locale encoding is not UTF-8, bpo-8611).

    If Py_FileSystemUnicodeEncoding is NULL: PyUnicode_EncodeFSDefault() should use mbcstowcs() and PyUnicode_DecodeFSDefault() should use wcstombcs(). They may reuse _Py_wchar2char() and _Py_char2wchar().

    "ascii" should be used in initfsencoding().

    @vstinner
    Copy link
    Member Author

    initfsencoding() now raises a fatal error on get_codeset() error. Use a encoding different than the locale encoding on get_codeset() only leads to mojibake and encoding issues, it's not a good idea. Close this issue as invalid.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant