Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse_envlist(): os.execve(), os.spawnve(), etc. crash in Python 3.6.0 when env contains byte strings #72301

Closed
cgohlke mannequin opened this issue Sep 13, 2016 · 19 comments
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@cgohlke
Copy link
Mannequin

cgohlke mannequin commented Sep 13, 2016

BPO 28114
Nosy @pfmoore, @vstinner, @tjguk, @ezio-melotti, @berkerpeksag, @zware, @serhiy-storchaka, @eryksun, @zooba
PRs
  • [Do Not Merge] Convert Misc/NEWS so that it is managed by towncrier #552
  • Files
  • callstack.txt
  • issue28114_unix.diff
  • issue_28114_01.patch
  • issue_28114_02.patch
  • issue_28114_03.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-09-15.17:48:30.918>
    created_at = <Date 2016-09-13.06:55:35.034>
    labels = ['interpreter-core', 'expert-unicode', '3.7', 'OS-windows', 'type-crash']
    title = 'parse_envlist(): os.execve(), os.spawnve(), etc. crash in Python 3.6.0 when env contains byte strings'
    updated_at = <Date 2017-03-31.16:36:10.658>
    user = 'https://bugs.python.org/cgohlke'

    bugs.python.org fields:

    activity = <Date 2017-03-31.16:36:10.658>
    actor = 'dstufft'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-09-15.17:48:30.918>
    closer = 'berker.peksag'
    components = ['Interpreter Core', 'Unicode', 'Windows']
    creation = <Date 2016-09-13.06:55:35.034>
    creator = 'cgohlke'
    dependencies = []
    files = ['44623', '44651', '44653', '44661', '44663']
    hgrepos = []
    issue_num = 28114
    keywords = ['patch']
    message_count = 19.0
    messages = ['276189', '276194', '276215', '276234', '276235', '276241', '276304', '276388', '276391', '276400', '276429', '276430', '276457', '276465', '276578', '276581', '276593', '276599', '276600']
    nosy_count = 11.0
    nosy_names = ['paul.moore', 'vstinner', 'tim.golden', 'ezio.melotti', 'cgohlke', 'python-dev', 'berker.peksag', 'zach.ware', 'serhiy.storchaka', 'eryksun', 'steve.dower']
    pr_nums = ['552']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue28114'
    versions = ['Python 3.6', 'Python 3.7']

    @cgohlke
    Copy link
    Mannequin Author

    cgohlke mannequin commented Sep 13, 2016

    Trying to build numpy-1.11.2rc1 wheels for Python 3.6.0b1 on Windows 10 with python.exe setup.py bdist_wheel, python36.dll (32 and 64 bit) segfaults in the find_maxchar_surrogates function. Some other packages built OK, e.g. Cython, pyzmq and Pillow.

    The crash is at <https://hg.python.org/cpython/file/v3.6.0b1/Objects/unicodeobject.c#l1607\>. The call stack is attached.

    Locals:
    + begin 0x00454c49464f5250 <Error reading characters of string.> const wchar_t *
    ch 1509860640 unsigned int
    + end 0x004550fc90512200 <Error reading characters of string.> const wchar_t *
    + iter 0x00000259a500e7d8 L"\x2" const wchar_t *
    + maxchar 0x0000005a59fea520 {0} unsigned int *
    + num_surrogates 0x0000005a59fea528 {0} __int64 *

    @cgohlke cgohlke mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode OS-windows type-crash A hard crash of the interpreter, possibly with a core dump labels Sep 13, 2016
    @serhiy-storchaka
    Copy link
    Member

    end-begin = 0x004550fc90512200-0x00454c49464f5250 = 5168087289776

    Seems find_maxchar_surrogates() is called with wrong arguments.

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Sep 13, 2016
    @vstinner
    Copy link
    Member

    Trying to build numpy-1.11.2rc1 wheels for Python 3.6.0b1 on Windows 10 with python.exe setup.py bdist_wheel, python36.dll (32 and 64 bit) segfaults in the find_maxchar_surrogates function.

    I ran "python setup.py bdist_wheel" on numpy-1.11.2rc1 on Linux: no issue.

    I compiled Python 3.6 (the default branch in fact, so the future 3.7) in debug mode/64 bit with SSL, but the setup.py command failed with:

    Don't know how to compile Fortran code on platform 'nt'

    I don't want to spend too much time to try to reproduce the issue.

    Can you please try to get the Python traceback? Try:

    python -X faulthandler setup.py bdist_wheel

    It seems like the crash occurs in parse_envlist(), function to parse environment variables in os.spawnve(). Can you try to dump the env parameter of os.spawnve()?

    For example, you can put the following lines at the beginning of setup.py:

    ---

    import os
    orig_spawnve = os.spawnve
    
    def log_spawnve(*args):
        print("spawnve: %r" % (args,))
        return orig_spawnve(*args)
    
    os.spawnve = log_spawnve

    My test:
    ---

    >>> import os, sys; args=[sys.executable, "-c", "pass"]; os.spawnve(os.P_WAIT, args[0], args, None)
    spawnve: (0, '/home/haypo/prog/python/default/python', ['/home/haypo/prog/python/default/python', '-c', 'pass'], None)
    127

    @eryksun
    Copy link
    Contributor

    eryksun commented Sep 13, 2016

    parse_envlist is calling PyUnicode_FromFormat with the format "%U=%U", but passing it bytes:

    |Debug Error!
    
    Program: C:\Program Files\Python36\python_d.exe
    
    abort() has been called
    
    (Press Retry to debug the application)
    (1e8.374): Break instruction exception - code 80000003 (first chance)
    ucrtbased!issue_debug_notification+0x45:
    00007ffc`aa1f1985 cc              int     3
    1:002> k 0n12
    Child-SP          RetAddr           Call Site
    00000030`a5fe9bb0 00007ffc`aa1f1b23 ucrtbased!issue_debug_notification+0x45
    00000030`a5fe9c00 00007ffc`aa212d7d ucrtbased!__acrt_report_runtime_error+0x13
    00000030`a5fe9c60 00007ffc`aa2184cf ucrtbased!abort+0x1d
    00000030`a5fe9ca0 00007ffc`aa2164c8 ucrtbased!common_assert_to_stderr<wchar_t>+0xbf
    00000030`a5fe9d00 00007ffc`aa218e7f ucrtbased!common_assert<wchar_t>+0x68
    00000030`a5fe9d40 00000000`60e59045 ucrtbased!_wassert+0x2f
    00000030`a5fe9d70 00000000`60eb9f71 python36_d!_PyUnicode_CheckConsistency+0x45
    00000030`a5fe9e00 00000000`60e4cd07 python36_d!unicode_fromformat_arg+0xc81
    00000030`a5fe9fa0 00000000`60e4cc31 python36_d!PyUnicode_FromFormatV+0xa7
    00000030`a5fea040 00000000`60c69853 python36_d!PyUnicode_FromFormat+0x31
    00000030`a5fea080 00000000`60c68286 python36_d!parse_envlist+0x1c3
    00000030`a5fea180 00000000`60c5d4ee python36_d!os_spawnve_impl+0x1d6
    
    1:002> .frame 6
    06 00000030`a5fe9d70 00000000`60eb9f71 python36_d!_PyUnicode_CheckConsistency+0x45
    1:002> ?? op->ob_type->tp_name
    char * 0x00000000`60fe1248
     "bytes"
    

    parse_envlist used to call PyUnicode_FSConverter on key and val; get the underlying buffers via PyBytes_AsString; allocate a new buffer and copy key and val into "%s=%s". Now it instead calls PyUnicode_FromFormat for "%U=%U" and then fsconvert_strdup, which has been modified to call PyUnicode_AsWideCharString on Windows and otherwise PyUnicode_FSConverter.

    For Windows it could use PyUnicode_FSDecoder on key and val in combination with PyUnicode_FromFormat and the format string "%U=%U". For Unix it could use PyUnicode_FSConverter for key and val in combination with PyBytes_FromFormat and the format string "%s=%s".

    @vstinner
    Copy link
    Member

    parse_envlist is calling PyUnicode_FromFormat with the format "%U=%U", but passing it bytes:

    Ah, it's a regression introduced by the issue bpo-27781 with the change e20c7d8a8187: parse_envlist() was modified to first call PyUnicode_FromFormat("%U=%U") and then convert.

    @eryksun
    Copy link
    Contributor

    eryksun commented Sep 13, 2016

    Here's a snippet to reproduce this bug:

        import os, sys
        environb = {os.fsencode(k):os.fsencode(v) for k,v in os.environ.items()}
        os.spawnve(os.P_WAIT, sys.executable, ('python', '--version'), environb)

    (Now that Windows Python provisionally supports bytes via utf-8:surrogatepass, maybe it should also have an os.environb mapping.)

    @zooba
    Copy link
    Member

    zooba commented Sep 13, 2016

    Whoops, yeah that's my fault. I'll get to it before b2, but I need to catch up on my day job the next week or two, so if someone else wants to apply a fix (and add a test - test_crashers, presumably?) then go for it!

    @berkerpeksag
    Copy link
    Member

    Here's a patch for Unix. I will add Windows support when I get my Windows VM.

    [...] (and add a test - test_crashers, presumably?) [...]

    Unfortunately, test_crashers doesn't run since 2011 (skipped in 481ad9129a0f.) parse_envlist() is only used by os.execve() and os.spawnve() so I'm not sure what's the best way to test the patch (other than adapting Eryk's snippet in msg276241.)

    @vstinner vstinner changed the title Crash in unicodeobject.c find_maxchar_surrogates on python-3.6.0b1 for Windows parse_envlist(): os.execve(), os.spawnve(), etc. crash in Python 3.6.0 when env contains byte strings Sep 14, 2016
    @vstinner
    Copy link
    Member

    The issue is not specific to Windows, the following example also crash on Linux:

    import os, sys
    args = [sys.executable, "-c", "pass"]
    os.execve(args[0], args, os.environb)

    @eryksun
    Copy link
    Contributor

    eryksun commented Sep 14, 2016

    Berker, this is basically what I had in my initial patch on the Unix side. I also addressed the Windows issues in parse_envlist and fsconvert_strdup. I'm uploading that patch for reference. It needs a test. I also need to verify that there are no additional problems with the high-level os._execvpe function regarding bytes support on Windows.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 14, 2016

    New changeset 0ca42273c714 by Victor Stinner in branch '3.6':
    Issue bpo-28114: Add unit tests on os.spawn*()
    https://hg.python.org/cpython/rev/0ca42273c714

    @vstinner
    Copy link
    Member

    Berker and/or Eryksun: Please write unit tests for your patch. I just added a new SpawnTests to test_os which tests all os.spawn*() functions.

    Please add at least one unit test with env contains a bytes key/value entry. Maybe add a use_bytes=False parameter to create_args() to use bytes for the variable added by the test?

    @eryksun
    Copy link
    Contributor

    eryksun commented Sep 14, 2016

    Thanks for the spawn test framework, Victor. I've added a use_bytes argument to encode the args and env using os.fsencode. It's encoding args as well because parse_arglist calls fsconvert_strdup, which was assuming Unicode strings on Windows instead of first calling PyUnicode_FSDecoder. test_spawnve_bytes passes for me on Linux and Windows.

    @berkerpeksag
    Copy link
    Member

    Eryk's patch looks good to me, thanks! I will wait for others to review the patch.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 15, 2016

    New changeset 8d80a4291ddc by Berker Peksag in branch '3.6':
    Issue bpo-28114: Fix a crash in parse_envlist() when env contains byte strings
    https://hg.python.org/cpython/rev/8d80a4291ddc

    New changeset 8ad99fdc84a3 by Berker Peksag in branch 'default':
    Issue bpo-28114: Merge from 3.6
    https://hg.python.org/cpython/rev/8ad99fdc84a3

    @berkerpeksag
    Copy link
    Member

    Thanks for the patch, Eryk!

    @vstinner
    Copy link
    Member

    #if defined(HAVE_WEXECV) || defined(HAVE_WSPAWNV)

    Hum, can't we use MS_WINDOWS here? Or maybe pass a parameter in
    parse_envlist() for Windows?

    @zooba
    Copy link
    Member

    zooba commented Sep 15, 2016

    We already use HAVE_WEXECV and HAVE_WSPAWNV elsewhere to choose the wide-char versions of these functions. You've got a few changes to make if you want to maintain consistency.

    Personally I prefer feature flags to platform flags, then let pyconfig.h control them, especially when we're not directly using platform-specific APIs.

    @vstinner
    Copy link
    Member

    Ok fine ;-) Let's keep #if defined(HAVE_WEXECV) || defined(HAVE_WSPAWNV).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants