Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot install package with unicode module names on Windows #78602

Closed
julienmalard mannequin opened this issue Aug 17, 2018 · 14 comments
Closed

Cannot install package with unicode module names on Windows #78602

julienmalard mannequin opened this issue Aug 17, 2018 · 14 comments
Assignees
Labels
3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@julienmalard
Copy link
Mannequin

julienmalard mannequin commented Aug 17, 2018

BPO 34421
Nosy @jkloth, @merwok, @serhiy-storchaka, @dstufft, @miss-islington, @julienmalard
PRs
  • bpo-34421 Fix unicode error on Windows #8799
  • [3.6] bpo-34421 avoid unicode error in distutils logging (GH-8799) #9117
  • [3.7] bpo-34421 avoid unicode error in distutils logging (GH-8799) #9118
  • bpo-34421: Improve distutils logging for non-ASCII strings. #9126
  • [3.7] bpo-34421: Improve distutils logging for non-ASCII strings. (GH-9126) #9503
  • [3.6] bpo-34421: Improve distutils logging for non-ASCII strings. (GH-9126) #9504
  • [3.7] bpo-34421: Improve distutils logging for non-ASCII strings. (GH-9126) #9506
  • [3.6] bpo-34421: Improve distutils logging for non-ASCII strings. (GH-9126) (GH-9506) #9510
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/merwok'
    closed_at = <Date 2018-09-23.11:11:42.033>
    created_at = <Date 2018-08-17.17:06:14.932>
    labels = ['3.7', '3.8', 'type-bug', 'library']
    title = 'Cannot install package with unicode module names on Windows'
    updated_at = <Date 2018-09-23.11:11:42.032>
    user = 'https://github.com/julienmalard'

    bugs.python.org fields:

    activity = <Date 2018-09-23.11:11:42.032>
    actor = 'serhiy.storchaka'
    assignee = 'eric.araujo'
    closed = True
    closed_date = <Date 2018-09-23.11:11:42.033>
    closer = 'serhiy.storchaka'
    components = ['Distutils']
    creation = <Date 2018-08-17.17:06:14.932>
    creator = 'julien.malard'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 34421
    keywords = ['patch']
    message_count = 14.0
    messages = ['323699', '323714', '324861', '324862', '324863', '324874', '324878', '324888', '324889', '324907', '324921', '326137', '326143', '326145']
    nosy_count = 6.0
    nosy_names = ['jkloth', 'eric.araujo', 'serhiy.storchaka', 'dstufft', 'miss-islington', 'julien.malard']
    pr_nums = ['8799', '9117', '9118', '9126', '9503', '9504', '9506', '9510']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue34421'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    @julienmalard julienmalard mannequin added type-crash A hard crash of the interpreter, possibly with a core dump stdlib Python modules in the Lib dir labels Aug 17, 2018
    @serhiy-storchaka
    Copy link
    Member

    Please provide more details. How to reproduce your issue? What you got, and what you expect to get?

    Seems the code just before lines modified by your PR are purposed to solve this issue. Why it doesn't work?

    @julienmalard
    Copy link
    Mannequin Author

    julienmalard mannequin commented Aug 18, 2018

    Hello,

    Yes, it does seem odd that that code does not work. On my Windows machine (WIndows 7, 64 bits, running 32-bit Python) I checked and it seems that the code in the if block immediately preceding my PR does not run at all, whereby the error.

    For a reproducible example, my Taqdir package, mostly consisting of unicode packages and modules, runs into this issue (and installs successfully after my proposed fix here combined with a separate PR in pip). Perhaps the most easily accessible example would be the Appveyor build (https://ci.appveyor.com/project/julienmalard/Tinamit) for my TInamit project, which has Taqdir as a dependency.

    Thanks!

    -Julien Malard


    દ્વારા: Serhiy Storchaka <report@bugs.python.org>
    મોકલ્યું: 18 ઑગસ્ટ 2018 06:32
    પ્રતિ: Julien Malard
    વિષય: [bpo-34421] Cannot install package with unicode module names on Windows

    New submission from Serhiy Storchaka <storchaka+cpython@gmail.com>:

    Please provide more details. How to reproduce your issue? What you got, and what you expect to get?

    Seems the code just before lines modified by your PR are purposed to solve this issue. Why it doesn't work?

    ----------
    nosy: +serhiy.storchaka


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue34421\>


    @merwok
    Copy link
    Member

    merwok commented Sep 8, 2018

    New changeset 0afada1 by Éric Araujo (Julien Malard) in branch 'master':
    bpo-34421 avoid unicode error in distutils logging (GH-8799)
    0afada1

    @merwok merwok added 3.7 (EOL) end of life 3.8 only security fixes labels Sep 8, 2018
    @merwok merwok self-assigned this Sep 8, 2018
    @merwok merwok added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Sep 8, 2018
    @miss-islington
    Copy link
    Contributor

    New changeset 3b36642 by Miss Islington (bot) in branch '3.6':
    bpo-34421 avoid unicode error in distutils logging (GH-8799)
    3b36642

    @miss-islington
    Copy link
    Contributor

    New changeset 77b92b1 by Miss Islington (bot) in branch '3.7':
    bpo-34421 avoid unicode error in distutils logging (GH-8799)
    77b92b1

    @serhiy-storchaka
    Copy link
    Member

    I would prefer to use the backslashreplace error handler rather of the unicode-escape codec. Just as few lines above, but with ASCII encoding.

        msg = msg.encode('ascii', 'backslashreplace').decode('ascii')

    It is still not clear to me why the current code purposed to handle this problem doesn't work in this case. We need to find the cause and fix the existing solution.

    @jkloth
    Copy link
    Contributor

    jkloth commented Sep 9, 2018

    The existing re-code solution is being triggered, as the errors in this case is 'surrogateescape' with an encoding of 'cp1252'.

    Here, pip is using subprocess.Popen() to have Python run setup.py. During execution, a filename, 'taqdir\\\u0634\u0645\u0627\u0631.py', which has characters not encodable in cp1252.

    I think that here, Python is not configuring its stdin/stdout/stderr streams correctly when run as a subprocess connected to pipes. Or, at least, subprocess.Popen() isn't passing the right (or enough) information to Python to get itself configured.

    There should ultimately be a way to have Python (in a subprocess, on Windows) pass through Unicode untouched to its calling process. I suppose it would mean setting the PYTHONIOENCODING envvar when using subprocess.

    After all that, it seems that:

    1. pip needs to be changed to support calling Python subprocesses to enable lossless unicode transmission,
    2. change the errors check in distutils.log to include 'surrogateescape'? (the heart of this issue)

    @serhiy-storchaka
    Copy link
    Member

    PR 9126 makes distutils.log using "backslashreplace" instead of "unicode-escape" and simplifies the code (it is more efficient now, although the performance of logging is not critical).

    "unicode-escape" escapes all non-ASCII characters, even encodable. It also escapes control characters like \t, \b, \r or \x1a (which starts control sequences for ANSI compatible terminals), this can be not desirable.

    @julienmalard
    Copy link
    Mannequin Author

    julienmalard mannequin commented Sep 9, 2018

    Hello,

    Thanks for the insights and better fixes. Regarding (1), do you have any pointers on how or where to fix pip? I have an inprogress pull request there (pypa/pip#5712) to fix a related unicode error during installation and could perhaps combine both solutions.

    Thanks!

    -Julien

    @jkloth
    Copy link
    Contributor

    jkloth commented Sep 10, 2018

    For pip, in call_subprocess() (given here in rough pseudo-code)

    is_python = (cmd[0] == sys.executable)
    kwds = {}
    if is_python:
        env['PYTHONIOENCODING'] = 'utf8'
        kwds['encoding'] = 'utf8'
    proc = Popen(..., **kwds)
    .
    .
    .
    if stdout is not None:
        while True:
            line = proc.stdout.readline()
            # When running Python, the output is already Unicode
            if not is_python:
                line = console_to_str(line)
            if not line:
                break

    Hopefully, there is enough context to figure out the exact placement.

    @julienmalard
    Copy link
    Mannequin Author

    julienmalard mannequin commented Sep 10, 2018

    Thanks! Will give it a try and reference this conversation here as background.

    @serhiy-storchaka
    Copy link
    Member

    New changeset 4b860fd by Serhiy Storchaka in branch 'master':
    bpo-34421: Improve distutils logging for non-ASCII strings. (GH-9126)
    4b860fd

    @serhiy-storchaka
    Copy link
    Member

    New changeset c73df53 by Serhiy Storchaka in branch '3.7':
    bpo-34421: Improve distutils logging for non-ASCII strings. (GH-9126) (GH-9506)
    c73df53

    @miss-islington
    Copy link
    Contributor

    New changeset 0b67995 by Miss Islington (bot) in branch '3.6':
    bpo-34421: Improve distutils logging for non-ASCII strings. (GH-9126) (GH-9506)
    0b67995

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants