Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issues with the locale encoding #76736

Closed
vstinner opened this issue Jan 15, 2018 · 6 comments
Closed

Encoding issues with the locale encoding #76736

vstinner opened this issue Jan 15, 2018 · 6 comments
Labels

Comments

@vstinner
Copy link
Member

BPO 32555
Nosy @vstinner, @ezio-melotti
PRs
  • [3.6] bpo-32555: Fix locale encodings #5193
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-06-21.10:58:44.139>
    created_at = <Date 2018-01-15.11:20:42.719>
    labels = ['3.7', 'expert-unicode']
    title = 'Encoding issues with the locale encoding'
    updated_at = <Date 2018-06-21.10:58:44.137>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2018-06-21.10:58:44.137>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-06-21.10:58:44.139>
    closer = 'vstinner'
    components = ['Unicode']
    creation = <Date 2018-01-15.11:20:42.719>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 32555
    keywords = ['patch']
    message_count = 6.0
    messages = ['309963', '309964', '309965', '309994', '310021', '320156']
    nosy_count = 2.0
    nosy_names = ['vstinner', 'ezio.melotti']
    pr_nums = ['5193']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue32555'
    versions = ['Python 3.6', 'Python 3.7']

    @vstinner
    Copy link
    Member Author

    Python 3.6 doesn't use the right encoding in os.strerror(), time.stftime(), locale.localeconv(), time.tzname, etc. on macOS, FreeBSD and other platforms.

    See my fix locale encodings in bpo-29240: commit 7ed7aea, and test_all_locales.py attached to bpo-29240.

    See also the bpo-31900 for locale.localeconv() encoding issue when LC_NUMERIC encoding is different than the LC_CTYPE encoding.

    @vstinner
    Copy link
    Member Author

    I'm not sure that locale.bindtextdomain() uses the right encoding neither. I propose the following fix:

    diff --git a/Modules/_localemodule.c b/Modules/_localemodule.c
    index 324b694b83..1de17d3620 100644
    --- a/Modules/_localemodule.c
    +++ b/Modules/_localemodule.c
    @@ -555,7 +555,7 @@ PyIntl_bindtextdomain(PyObject* self,PyObject*args)
             PyErr_SetFromErrno(PyExc_OSError);
             return NULL;
         }
    -    result = PyUnicode_DecodeLocale(current_dirname, NULL);
    +    result = PyUnicode_DecodeFSDefault(current_dirname);
         Py_XDECREF(dirname_bytes);
         return result;
     }

    @vstinner
    Copy link
    Member Author

    Another issue: _Py_DecodeUTF8Ex() creates surrogate pairs with 16-bit wchar_t (on Windows), whereas input bytes should be escaped. I'm quite sure that it's a bug.

    @vstinner
    Copy link
    Member Author

    Example of bug on FreeBSD 11:

    haypo@freebsd$ LC_ALL=C ./python -c 'import locale, os; locale.setlocale(locale.LC_ALL, "fr_FR.ISO8859-1"); print(ascii(os.strerror(2)))'

    'Fichier ou r\udce9pertoire inexistant'

    Expected result:

    haypo@freebsd$ LC_ALL=fr_FR.ISO8859-1 ./python -c 'import locale, os; locale.setlocale(locale.LC_ALL, ""); print(ascii(os.strerror(2)))'

    'Fichier ou r\xe9pertoire inexistant'

    @vstinner
    Copy link
    Member Author

    New changeset b92c159 by Victor Stinner in branch '3.6':
    [3.6] bpo-32555: Fix locale encodings (bpo-5193)
    b92c159

    @vstinner
    Copy link
    Member Author

    freebsd$ LC_ALL=fr_FR.ISO8859-1 ./python -c 'import locale, os; locale.setlocale(locale.LC_ALL, ""); print(ascii(os.strerror(2)))'
    'Fichier ou r\xe9pertoire inexistant'

    I ran manually this test on FreeBSD: it pass on Python 3.6, 3.7 and master. I close the issue.

    Another issue: _Py_DecodeUTF8Ex() creates surrogate pairs with 16-bit wchar_t (on Windows), whereas input bytes should be escaped. I'm quite sure that it's a bug.

    I created bpo-33928 for that one.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant