Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames #60620

Closed
vstinner opened this issue Nov 5, 2012 · 13 comments
Closed
Assignees
Labels

Comments

@vstinner
Copy link
Member

vstinner commented Nov 5, 2012

BPO 16416
Nosy @ronaldoussoren, @pitrou, @vstinner, @asvetlov, @serhiy-storchaka
Files
  • macosx.patch
  • macosx-2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/vstinner'
    closed_at = <Date 2012-12-03.13:14:59.728>
    created_at = <Date 2012-11-05.22:18:23.613>
    labels = ['OS-mac']
    title = "Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames"
    updated_at = <Date 2012-12-03.13:14:59.726>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2012-12-03.13:14:59.726>
    actor = 'vstinner'
    assignee = 'vstinner'
    closed = True
    closed_date = <Date 2012-12-03.13:14:59.728>
    closer = 'vstinner'
    components = ['macOS']
    creation = <Date 2012-11-05.22:18:23.613>
    creator = 'vstinner'
    dependencies = []
    files = ['27903', '27969']
    hgrepos = []
    issue_num = 16416
    keywords = ['patch']
    message_count = 13.0
    messages = ['174943', '175441', '175477', '175478', '175479', '175480', '175481', '175482', '176490', '176797', '176836', '176840', '176841']
    nosy_count = 6.0
    nosy_names = ['ronaldoussoren', 'pitrou', 'vstinner', 'asvetlov', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue16416'
    versions = ['Python 3.2', 'Python 3.3', 'Python 3.4']

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 5, 2012

    Since the changeset 45079ad1e260 (issue bpo-4388), command line arguments are decoded from UTF-8 instead of the locale encoding. Functions of Python/fileutils.c are still using the locale encoding.

    It does not work: see issue bpo-16218. On Mac OS X, in the command line "python script.py", the filename "script.py" is decoded from UTF-8 (by _Py_DecodeUTF8_surrogateescape) but then it is passed to _Py_fopen() which encodes the filename to the locale encoding (ex: ISO-8859-1 if $LANG, $LC_CTYPE and $LC_ALL environment variables are not set). The result is mojibake and Python fails to open the script.

    Attached patch modifies function of Python/fileutils.c to use UTF-8 to encode and decode filenames, instead of the locale encoding on Mac OS X.

    I don't know yet if Module/getpath.c should also decode paths from UTF-8 instead of the locale encoding on Mac OS X. We may expose _Py_decode_filename().

    @vstinner
    Copy link
    Member Author

    macosx-2.patch patches _Py_wchar2char() and _Py_char2wchar() functions to
    use UTF-8/surrogateescape for any function using the locale encoding, not
    only file related functions of fileutils.h. The patch does also simplify
    the code, no more specific #ifdef __APPLE__ in python.c:

    -#ifdef __APPLE__

    •    argv_copy[i] = \_Py_DecodeUTF8_surrogateescape(argv[i],
      

    strlen(argv[i]));
    -#else
    argv_copy[i] = _Py_char2wchar(argv[i], NULL);
    -#endif

    2012/11/7 Andrew Svetlov <report@bugs.python.org>

    Changes by Andrew Svetlov <andrew.svetlov@gmail.com>:

    ----------
    nosy: +asvetlov


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue16416\>


    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 12, 2012

    New changeset 48fbdaf3a849 by Victor Stinner in branch 'default':
    Issue bpo-16416: OS data are now always encoded/decoded to/from
    http://hg.python.org/cpython/rev/48fbdaf3a849

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 12, 2012

    New changeset f3e512b5ffb3 by Victor Stinner in branch 'default':
    Issue bpo-16416: Fix error handling in _Py_wchar2char() _Py_char2wchar() functions
    http://hg.python.org/cpython/rev/f3e512b5ffb3

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 12, 2012

    New changeset 1b97cc71a05e by Victor Stinner in branch 'default':
    Issue bpo-16416: Fix Misc/NEWS entry, mention Mac OS X
    http://hg.python.org/cpython/rev/1b97cc71a05e

    @vstinner
    Copy link
    Member Author

    @serhiy: Thanks for your review, I missed it before my first commit.

    @serhiy-storchaka
    Copy link
    Member

    Victor, are you going to backport this to 3.3?

    @vstinner
    Copy link
    Member Author

    Victor, are you going to backport this to 3.3?

    I'm waiting for the result of the buildbots, and maybe also the fix for the issue bpo-16455 (which has an impact on tests on undecodable bytes).

    @pitrou
    Copy link
    Member

    pitrou commented Nov 27, 2012

    Victor, could you please backport to 3.3?

    @pitrou pitrou assigned vstinner and unassigned ronaldoussoren Nov 27, 2012
    @pitrou
    Copy link
    Member

    pitrou commented Dec 2, 2012

    Ping.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 3, 2012

    New changeset c838c9b117f1 by Victor Stinner in branch '3.2':
    Issue bpo-16416: On Mac OS X, operating system data are now always
    http://hg.python.org/cpython/rev/c838c9b117f1

    New changeset 26c4748351cb by Victor Stinner in branch '3.3':
    (Merge 3.2) Issue bpo-16416: On Mac OS X, operating system data are now always
    http://hg.python.org/cpython/rev/26c4748351cb

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 3, 2012

    New changeset af6fd3ca6de9 by Victor Stinner in branch '3.2':
    Issue bpo-16416: Fix compilation error
    http://hg.python.org/cpython/rev/af6fd3ca6de9

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 3, 2012

    The issue should now be fixed in Python 3.2, 3.3 and 3.4.

    @vstinner vstinner closed this as completed Dec 3, 2012
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants