classification
Title: parse_envlist(): os.execve(), os.spawnve(), etc. crash in Python 3.6.0 when env contains byte strings
Type: crash Stage: resolved
Components: Interpreter Core, Unicode, Windows Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, cgohlke, eryksun, ezio.melotti, paul.moore, python-dev, serhiy.storchaka, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2016-09-13 06:55 by cgohlke, last changed 2017-03-31 16:36 by dstufft. This issue is now closed.

Files
File name Uploaded Description Edit
callstack.txt cgohlke, 2016-09-13 06:55
issue28114_unix.diff berker.peksag, 2016-09-14 07:28 review
issue_28114_01.patch eryksun, 2016-09-14 08:48 review
issue_28114_02.patch eryksun, 2016-09-14 14:19 review
issue_28114_03.patch eryksun, 2016-09-14 15:12 review
Pull Requests
URL Status Linked Edit
PR 552 closed dstufft, 2017-03-31 16:36
Messages (19)
msg276189 - (view) Author: Christoph Gohlke (cgohlke) Date: 2016-09-13 06:55
Trying to build numpy-1.11.2rc1 wheels for Python 3.6.0b1 on Windows 10 with `python.exe setup.py bdist_wheel`, python36.dll (32 and 64 bit) segfaults in the `find_maxchar_surrogates` function. Some other packages built OK, e.g. Cython, pyzmq and Pillow.

The crash is at <https://hg.python.org/cpython/file/v3.6.0b1/Objects/unicodeobject.c#l1607>. The call stack is attached.

Locals:
+		begin	0x00454c49464f5250 <Error reading characters of string.>	const wchar_t *
		ch	1509860640	unsigned int
+		end	0x004550fc90512200 <Error reading characters of string.>	const wchar_t *
+		iter	0x00000259a500e7d8 L"\x2"	const wchar_t *
+		maxchar	0x0000005a59fea520 {0}	unsigned int *
+		num_surrogates	0x0000005a59fea528 {0}	__int64 *
msg276194 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-13 07:18
end-begin = 0x004550fc90512200-0x00454c49464f5250 = 5168087289776

Seems find_maxchar_surrogates() is called with wrong arguments.
msg276215 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-13 08:57
> Trying to build numpy-1.11.2rc1 wheels for Python 3.6.0b1 on Windows 10 with `python.exe setup.py bdist_wheel`, python36.dll (32 and 64 bit) segfaults in the `find_maxchar_surrogates` function.

I ran "python setup.py bdist_wheel" on numpy-1.11.2rc1 on Linux: no issue.

I compiled Python 3.6 (the default branch in fact, so the future 3.7) in debug mode/64 bit with SSL, but the setup.py command failed with:

   Don't know how to compile Fortran code on platform 'nt'

I don't want to spend too much time to try to reproduce the issue.

Can you please try to get the Python traceback? Try:

   python -X faulthandler setup.py bdist_wheel

It seems like the crash occurs in parse_envlist(), function to parse environment variables in os.spawnve(). Can you try to dump the env parameter of os.spawnve()?

For example, you can put the following lines at the beginning of setup.py:

---
import os
orig_spawnve = os.spawnve

def log_spawnve(*args):
    print("spawnve: %r" % (args,))
    return orig_spawnve(*args)

os.spawnve = log_spawnve
---

My test:
---
>>> import os, sys; args=[sys.executable, "-c", "pass"]; os.spawnve(os.P_WAIT, args[0], args, None)
spawnve: (0, '/home/haypo/prog/python/default/python', ['/home/haypo/prog/python/default/python', '-c', 'pass'], None)
127
---
msg276234 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-09-13 09:57
parse_envlist is calling PyUnicode_FromFormat with the format "%U=%U", but passing it bytes:

    |Debug Error!

    Program: C:\Program Files\Python36\python_d.exe

    abort() has been called

    (Press Retry to debug the application)
    (1e8.374): Break instruction exception - code 80000003 (first chance)
    ucrtbased!issue_debug_notification+0x45:
    00007ffc`aa1f1985 cc              int     3
    1:002> k 0n12
    Child-SP          RetAddr           Call Site
    00000030`a5fe9bb0 00007ffc`aa1f1b23 ucrtbased!issue_debug_notification+0x45
    00000030`a5fe9c00 00007ffc`aa212d7d ucrtbased!__acrt_report_runtime_error+0x13
    00000030`a5fe9c60 00007ffc`aa2184cf ucrtbased!abort+0x1d
    00000030`a5fe9ca0 00007ffc`aa2164c8 ucrtbased!common_assert_to_stderr<wchar_t>+0xbf
    00000030`a5fe9d00 00007ffc`aa218e7f ucrtbased!common_assert<wchar_t>+0x68
    00000030`a5fe9d40 00000000`60e59045 ucrtbased!_wassert+0x2f
    00000030`a5fe9d70 00000000`60eb9f71 python36_d!_PyUnicode_CheckConsistency+0x45
    00000030`a5fe9e00 00000000`60e4cd07 python36_d!unicode_fromformat_arg+0xc81
    00000030`a5fe9fa0 00000000`60e4cc31 python36_d!PyUnicode_FromFormatV+0xa7
    00000030`a5fea040 00000000`60c69853 python36_d!PyUnicode_FromFormat+0x31
    00000030`a5fea080 00000000`60c68286 python36_d!parse_envlist+0x1c3
    00000030`a5fea180 00000000`60c5d4ee python36_d!os_spawnve_impl+0x1d6

    1:002> .frame 6
    06 00000030`a5fe9d70 00000000`60eb9f71 python36_d!_PyUnicode_CheckConsistency+0x45
    1:002> ?? op->ob_type->tp_name
    char * 0x00000000`60fe1248
     "bytes"

parse_envlist used to call PyUnicode_FSConverter on key and val; get the underlying buffers via PyBytes_AsString; allocate a new buffer and copy key and val into "%s=%s". Now it instead calls PyUnicode_FromFormat for "%U=%U" and then fsconvert_strdup, which has been modified to call PyUnicode_AsWideCharString on Windows and otherwise PyUnicode_FSConverter. 

For Windows it could use PyUnicode_FSDecoder on key and val in combination with PyUnicode_FromFormat and the format string "%U=%U". For Unix it could use PyUnicode_FSConverter for key and val in combination with PyBytes_FromFormat and the format string "%s=%s".
msg276235 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-13 10:04
> parse_envlist is calling PyUnicode_FromFormat with the format "%U=%U", but passing it bytes:

Ah, it's a regression introduced by the issue #27781 with the change e20c7d8a8187: parse_envlist() was modified to first call PyUnicode_FromFormat("%U=%U") and then convert.
msg276241 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-09-13 10:43
Here's a snippet to reproduce this bug:

    import os, sys
    environb = {os.fsencode(k):os.fsencode(v) for k,v in os.environ.items()}
    os.spawnve(os.P_WAIT, sys.executable, ('python', '--version'), environb)

(Now that Windows Python provisionally supports bytes via utf-8:surrogatepass, maybe it should also have an os.environb mapping.)
msg276304 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-13 17:06
Whoops, yeah that's my fault. I'll get to it before b2, but I need to catch up on my day job the next week or two, so if someone else wants to apply a fix (and add a test - test_crashers, presumably?) then go for it!
msg276388 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-09-14 07:28
Here's a patch for Unix. I will add Windows support when I get my Windows VM.

> [...] (and add a test - test_crashers, presumably?) [...]

Unfortunately, test_crashers doesn't run since 2011 (skipped in 481ad9129a0f.) parse_envlist() is only used by os.execve() and os.spawnve() so I'm not sure what's the best way to test the patch (other than adapting Eryk's snippet in msg276241.)
msg276391 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-14 07:55
The issue is not specific to Windows, the following example also crash on Linux:

import os, sys
args = [sys.executable, "-c", "pass"]
os.execve(args[0], args, os.environb)
msg276400 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-09-14 08:48
Berker, this is basically what I had in my initial patch on the Unix side. I also addressed the Windows issues in parse_envlist and fsconvert_strdup. I'm uploading that patch for reference. It needs a test. I also need to verify that there are no additional problems with the high-level os._execvpe function regarding bytes support on Windows.
msg276429 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-09-14 12:57
New changeset 0ca42273c714 by Victor Stinner in branch '3.6':
Issue #28114: Add unit tests on os.spawn*()
https://hg.python.org/cpython/rev/0ca42273c714
msg276430 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-14 12:59
Berker and/or Eryksun: Please write unit tests for your patch. I just added a new SpawnTests to test_os which tests all os.spawn*() functions.

Please add at least one unit test with env contains a bytes key/value entry. Maybe add a use_bytes=False parameter to create_args() to use bytes for the variable added by the test?
msg276457 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-09-14 14:19
Thanks for the spawn test framework, Victor. I've added a use_bytes argument to encode the args and env using os.fsencode. It's encoding args as well because parse_arglist calls fsconvert_strdup, which was assuming Unicode strings on Windows instead of first calling PyUnicode_FSDecoder. test_spawnve_bytes passes for me on Linux and Windows.
msg276465 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-09-14 15:23
Eryk's patch looks good to me, thanks! I will wait for others to review the patch.
msg276578 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-09-15 17:19
New changeset 8d80a4291ddc by Berker Peksag in branch '3.6':
Issue #28114: Fix a crash in parse_envlist() when env contains byte strings
https://hg.python.org/cpython/rev/8d80a4291ddc

New changeset 8ad99fdc84a3 by Berker Peksag in branch 'default':
Issue #28114: Merge from 3.6
https://hg.python.org/cpython/rev/8ad99fdc84a3
msg276581 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-09-15 17:48
Thanks for the patch, Eryk!
msg276593 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-15 19:06
> #if defined(HAVE_WEXECV) || defined(HAVE_WSPAWNV)

Hum, can't we use MS_WINDOWS here? Or maybe pass a parameter in
parse_envlist() for Windows?
msg276599 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-15 19:17
We already use HAVE_WEXECV and HAVE_WSPAWNV elsewhere to choose the wide-char versions of these functions. You've got a few changes to make if you want to maintain consistency.

Personally I prefer feature flags to platform flags, then let pyconfig.h control them, especially when we're not directly using platform-specific APIs.
msg276600 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-15 19:17
Ok fine ;-) Let's keep #if defined(HAVE_WEXECV) || defined(HAVE_WSPAWNV).
History
Date User Action Args
2017-03-31 16:36:10dstufftsetpull_requests: + pull_request861
2016-09-15 19:17:47vstinnersetmessages: + msg276600
2016-09-15 19:17:06steve.dowersetmessages: + msg276599
2016-09-15 19:06:12vstinnersetmessages: + msg276593
2016-09-15 17:48:30berker.peksagsetstatus: open -> closed
resolution: fixed
messages: + msg276581

stage: patch review -> resolved
2016-09-15 17:19:44python-devsetmessages: + msg276578
2016-09-14 15:23:02berker.peksagsetmessages: + msg276465
2016-09-14 15:12:45eryksunsetfiles: + issue_28114_03.patch
2016-09-14 14:19:27eryksunsetfiles: + issue_28114_02.patch

messages: + msg276457
2016-09-14 12:59:57vstinnersetmessages: + msg276430
2016-09-14 12:57:56python-devsetnosy: + python-dev
messages: + msg276429
2016-09-14 08:48:10eryksunsetfiles: + issue_28114_01.patch

messages: + msg276400
2016-09-14 07:55:03vstinnersetmessages: + msg276391
2016-09-14 07:53:14vstinnersettitle: Crash in unicodeobject.c find_maxchar_surrogates on python-3.6.0b1 for Windows -> parse_envlist(): os.execve(), os.spawnve(), etc. crash in Python 3.6.0 when env contains byte strings
2016-09-14 07:28:11berker.peksagsetfiles: + issue28114_unix.diff

nosy: + berker.peksag
messages: + msg276388

keywords: + patch
stage: patch review
2016-09-13 17:06:04steve.dowersetmessages: + msg276304
2016-09-13 10:43:16eryksunsetmessages: + msg276241
2016-09-13 10:04:15vstinnersetmessages: + msg276235
2016-09-13 09:57:58eryksunsetnosy: + eryksun
messages: + msg276234
2016-09-13 08:57:24vstinnersetmessages: + msg276215
2016-09-13 07:18:07serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg276194
versions: + Python 3.7
2016-09-13 06:55:35cgohlkecreate