Title: Segmentation fault in operator.attrgetter
Type: Stage:
Components: Build, Interpreter Core, Windows Versions: Python 3.5
Status: open Resolution:
Dependencies: Superseder:
Assigned To: larry Nosy List: Ray Donnelly, asturm, larry, paul.moore, steve.dower, tim.golden, xtreak, zach.ware
Priority: normal Keywords:

Created on 2018-04-05 14:49 by asturm, last changed 2018-09-22 18:26 by xtreak.

File name Uploaded Description Edit asturm, 2018-04-05 14:49 Reproduction example
annotated_dissasembly.txt asturm, 2018-04-05 14:50 Annotated disassembly of the loop in attrgetter_new
Messages (3)
msg314987 - (view) Author: Alexander Sturm (asturm) Date: 2018-04-05 14:49
This issue can be reproduced using Python 3.5.2 - 3.5.5 compiled with VS2015 (tested both the official Python builds as well as local builds using VS2015 Version 14.0.25431.01 Update 3) on 64-bit Windows.

To reproduce, run the attached file (or python -c "import operator; operator.attrgetter('x'*1000000)").

The segfault itself occurs in attrgetter_new [1], while scanning for dots inside a unicode string. From my reading of the code, the code itself is actually correct - instead, it triggers a compiler bug in VS2015, which causes it to miscompile the code by applying an incorrect optimization.

The optimization VS2015 applies is twofold: It uses SIMD instructions to avoid looking at every byte of the input string individually, and it tries to eliminate the condition in PyUnicode_READ [2] by executing all three branches (UCS1, UCS2 and UCS4) at once and then in a separate step discarding the results for the two branches that would not be taken.

When combining these optimizations, the generated code incorrectly only checks the precondition of the UCS1 branch to decide whether it can use the optimized SIMD version. Since the UCS1 branch looks at 8 bytes of the buffer in each iteration, it checks whether the buffer contains at least 8 bytes. However, each iteration of the UCS2/UCS4 branches look at 16/32 bytes, respectively.

As a result, when passing a UCS1 string to attrgetter(), we end up reading 24 bytes past the end of the buffer for every 8 bytes the string contains. In the reproduction example, we thus read about 24 MB past the buffer, almost guaranteeing a crash.

The issue does not occur when compiling Python using VS2017 (i.e. PlatformToolset v141, MSVC version 14.13.26128). Presumably this means the compiler bug has been fixed upstream.

msg314988 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2018-04-05 15:02
Python 3.5 is in security-fix-only mode now, which means we won't be releasing any more binaries for it, and thus can't really do much about this other than possibly recommending that Windows users build with VS2017 instead of VS2015 in future 3.5 release notes.  I believe Python 3.6+ are already built with VS2017, but can you reproduce this in Python 3.6.5?
msg314989 - (view) Author: Alexander Sturm (asturm) Date: 2018-04-05 15:10
As mentioned, the code is compiled correctly by VS2017 - both in 3.5.5 and 3.6.5. So I cannot reproduce the issue with Python 3.6.5.

Would it be possible to switch the default platform toolset used to compile Python 3.5 in the next source release to v141? According to Microsoft, it should be ABI compatible with v140:

If that's out of the question, I agree that there's probably not much that can be done here - other then maybe locally disabling optimizations for that loop.
Date User Action Args
2018-09-22 18:26:13xtreaksetnosy: + xtreak
2018-04-05 23:08:46rhettingersetassignee: larry
2018-04-05 21:45:23Ray Donnellysetnosy: + Ray Donnelly
2018-04-05 15:10:58asturmsetmessages: + msg314989
2018-04-05 15:02:24zach.waresetnosy: + larry, paul.moore, tim.golden, zach.ware, steve.dower
messages: + msg314988
components: + Build, Windows
2018-04-05 14:50:12asturmsetfiles: + annotated_dissasembly.txt
2018-04-05 14:49:04asturmcreate