Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling Py_DecodeLocale() before _PyPreConfig_Write() can produce mojibake #80383

Closed
vstinner opened this issue Mar 6, 2019 · 7 comments
Closed
Labels
3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@vstinner
Copy link
Member

vstinner commented Mar 6, 2019

BPO 36202
Nosy @ncoghlan, @vstinner, @methane
PRs
  • bpo-36443: Disable C locale coercion and UTF-8 Mode by default #12589
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-03-27.17:29:58.167>
    created_at = <Date 2019-03-06.00:53:52.873>
    labels = ['interpreter-core', '3.8']
    title = 'Calling Py_DecodeLocale() before _PyPreConfig_Write() can produce mojibake'
    updated_at = <Date 2019-03-27.17:29:58.166>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2019-03-27.17:29:58.166>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-03-27.17:29:58.167>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2019-03-06.00:53:52.873>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36202
    keywords = ['patch']
    message_count = 7.0
    messages = ['337252', '337260', '337261', '337281', '337915', '338977', '338979']
    nosy_count = 3.0
    nosy_names = ['ncoghlan', 'vstinner', 'methane']
    pr_nums = ['12589']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue36202'
    versions = ['Python 3.8']

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 6, 2019

    Calling Py_DecodeLocale() before Py_Initialize() or _Py_InitializeCore() is broken since Python 3.7 if the C locale coercion (PEP-538) or UTF-8 mode (PEP-540) changes the encoding in the middle of _Py_InitializeCore().

    I added a new phase to the Python initialization in bpo-36142, a new _PyPreConfig structure, which can be used to fix this mojibake issue.

    The code for embedding Python should look like:
    ---
    _Py_PreInitialize();

    _PyCoreConfig config;
    config.home = Py_DecodeLocale("/path/to/home");

    _PyInitError err = _Py_InitializeFromConfig(&config);
    if (_Py_INIT_FAILED(err)) {
        _PyCoreConfig_Clear(&config);
        _Py_ExitInitError(err);
    }

    /* use Python here */

    Py_Finalize();
    _PyCoreConfig_Clear(&config);

    Except that there is no _Py_PreInitialize() function yet :-)

    @vstinner vstinner added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Mar 6, 2019
    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 6, 2019

    The "vim" editor embeds Python. It sets the Python home by calling Py_SetPythonHome() with the following code:
    ---

    	    size_t len = mbstowcs(NULL, (char *)p_py3home, 0) + 1;
    
    	    /* The string must not change later, make a copy in static memory. */
    	    py_home_buf = (wchar_t *)alloc(len * sizeof(wchar_t));
    	    if (py_home_buf != NULL && mbstowcs(
    			    py_home_buf, (char *)p_py3home, len) != (size_t)-1)
    		Py_SetPythonHome(py_home_buf);

    ref: https://github.com/vim/vim/blob/14816ad6e58336773443f5ee2e4aa9e384af65d2/src/if_python3.c#L874-L887

    mbstowcs() uses the current LC_CTYPE locale. Python can select a different filesystem encoding than the LC_CTYPE encoding depending on PEP-538 and PEP-540. So encoding back the Python home to bytes to access to files on the filesystem can fail because of mojibake.

    The code should by written like (pseudo-code):
    ---
    _Py_PreInitialize();

    _PyCoreConfig config;
    config.home = Py_DecodeLocale(p_py3home);
    if (config.home == NULL) { /* ERROR */ }
    
    _PyInitError err = _Py_InitializeFromConfig(&config);
    if (_Py_INIT_FAILED(err)) {
        _PyCoreConfig_Clear(&config);
        _Py_ExitInitError(err);
    }

    The vim case has been discussed at:
    https://discuss.python.org/t/adding-char-based-apis-for-unix/916/8

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 6, 2019

    By the way, according to Nick Coghlan (author of the PEP-538), Py_Initialize() and Py_Main() called from an application embedding Python should not coerce the C locale.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Mar 6, 2019

    They weren't *intended* to change it, and didn't in the original implementation of the PEP, but they do in the as-shipped Python 3.7 implementation, and I abandoned my attempts to revert to the as-designed behaviour as impractical given the other changes made for PEP-540. So that's a behavior we're stuck with now: they both have global side effects on the locale of the calling process.

    @ncoghlan
    Copy link
    Contributor

    Victor and I were discussing the appropriate behaviour for the "What do we do if _Py_PreInitialize() hasn't been called?" case, and Victor pointed out that the potential for mojibake provides a solid rationale for going back to the Python 3.6 behaviour: disable both C locale coercion *and* UTF-8 mode, and instead leave locale management to the embedding application.

    That would also get us back to the originally intended behaviour of PEP-538, where it only happens by default in the CLI app, not the runtime support library. Back when I decided that doing that would be too complicated, the _Py_PreInitialize API didn't exist yet, and it didn't occur to me that the mojibake problem would affect UTF-8 mode as well.

    @vstinner
    Copy link
    Member Author

    New changeset d929f18 by Victor Stinner in branch 'master':
    bpo-36443: Disable C locale coercion and UTF-8 Mode by default (GH-12589)
    d929f18

    @vstinner
    Copy link
    Member Author

    My commit disabled C locale coercion and UTF-8 Mode by default when Python is embedded which fix this issue in Python 3.8. I close the issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants