classification
Title: Windows: 3.3.0-rc2.msi: test_buffer fails
Type: compile error Stage: resolved
Components: Interpreter Core, Windows Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, brian.curtin, haypo, larry, loewis, nadeem.vawda, steve.dower, tim.golden, zach.ware
Priority: critical Keywords: patch

Created on 2012-09-20 20:31 by skrah, last changed 2015-07-05 02:07 by steve.dower. This issue is now closed.

Files
File name Uploaded Description Edit
profile.bat loewis, 2012-09-21 21:07
profiletests.txt loewis, 2012-09-21 21:07
issue15993.diff skrah, 2012-09-21 22:54 review
ull_vctest.diff skrah, 2014-06-18 11:20 Quick-and-dirty alternative version of PyLong_AsUnsignedLongLong(). Just for testing. review
Messages (32)
msg170844 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-20 20:31
I've installed 3.3.0-rc2 on Windows-7 64-bit using the msi installer.
I'm getting these failures in test_buffer, but I can *not* reproduce
them when I build Win-32/pgo/python.exe from source:

======================================================================
FAIL: test_memoryview_assign (test.test_buffer.TestBufferProtocol)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Program Files (x86)\Python33\lib\test\test_buffer.py", line 2863,
    self.assertEqual(m[i], 8)
AssertionError: 34359738368 != 8

======================================================================
FAIL: test_memoryview_struct_module (test.test_buffer.TestBufferProtocol)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Program Files (x86)\Python33\lib\test\test_buffer.py", line 2476,
    self.assertEqual(m[0], nd[0])
AssertionError: 15080797365275624638 != 6838299039298601293
msg170858 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-20 22:05
Both lzma and memoryview use PyLong_AsUnsignedLongLong() in the
affected code paths. I get this with the msi installed python.exe:

>>> import array
>>> a = array.array('Q', [1,2,3,4])
>>> m = memoryview(a)
>>> m[0] = 4
>>> m[0]
17179869184
>>>


And the correct result with the self-compiled (PGO) python.exe:

>>> import array
>>> a = array.array('Q', [1,2,3,4])
>>> m = memoryview(a)
>>> m[0] = 4
>>> m[0]
4
msg170873 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-21 09:27
The high and low words of the 64-bit value are switched:

>>> a = array.array('Q', [1])
>>> m = memoryview(a)
>>> m[0]= 2**32+5
>>> m[0]
21474836481
>>> struct.unpack_from('8s', m, 0)
(b'\x01\x00\x00\x00\x05\x00\x00\x00',)


Can anyone reproduce this in a source build? I think this should
be a blocker.
msg170916 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-09-21 20:54
Issue #15995 was marked as a duplicate of this issue. Copy of its initial message (msg170853):

This is similar to #15993: With the installed Python from the rc2-msi
test_lzma fails. I cannot reproduce the failure with python.exe (PGO)
compiled from source:


======================================================================                                   
ERROR: test__decode_filter_properties (test.test_lzma.MiscellaneousTestCase)                             
----------------------------------------------------------------------                                   
Traceback (most recent call last):                                                                       
  File "C:\Python33\lib\test\test_lzma.py", line 1105, in test__decode_filter_properties                 
    lzma.FILTER_LZMA1, b"]\x00\x00\x80\x00")                                                             
ValueError: Invalid filter ID: 4611686018427387905                                                       
                                                                                                         
======================================================================                                   
ERROR: test_filter_properties_roundtrip (test.test_lzma.MiscellaneousTestCase)                           
----------------------------------------------------------------------                                   
Traceback (most recent call last):                                                                       
  File "C:\Python33\lib\test\test_lzma.py", line 1114, in test_filter_properties_roundtrip               
    lzma.FILTER_LZMA1, b"]\x00\x00\x80\x00")                                                             
ValueError: Invalid filter ID: 4611686018427387905                                                       
                                                                                                         
----------------------------------------------------------------------
msg170917 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-09-21 21:02
I fail to reproduce the issue on Windows 7 (Version 6.0.1, number 7601, Service Pack 1):
---
Microsoft Windows [version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. Tous droits réservés.

C:\Users\haypo>cd \python33

C:\Python33>python.exe
Python 3.3.0rc2 (v3.3.0rc2:88a0792e8ba3, Sep  9 2012, 10:13:38) [MSC v.1600 64 b
it (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()

C:\Python33>python.exe -m test test_buffer test_lzma
[1/2] test_buffer
[2/2] test_lzma
All 2 tests OK.
---
I'm running Windows 7 in KVM, my host CPU is a Intel i7-2600.

> I'm getting these failures in test_buffer, 
> but I can *not* reproduce them when I build
> Win-32/pgo/python.exe from source:

The issue looks to be specific to 64 bits binaries. You need the professional version of Visual Studio 10. The express version doesn't support 64 bits. I only have the Express version.
msg170918 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-09-21 21:04
Declaring this a release blocker is technically difficult. If it is a release blocker, further releases cannot be done until it is resolved. Since it is an issue with the binary only, the only possible way to resolve this is with a release. So declaring this a release blocker essentially deadlocks the release.
msg170919 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-09-21 21:07
For the record, the released binary is not just a PGO build, but has been trained with the attached training script.
msg170920 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-09-21 21:14
> You need the professional version of Visual Studio 10.
> The express version doesn't support 64 bits. I only have the
> Express version.

Ah yes, I now remember my issue with VS10 Express: when I set the project to 64 bits, I get such error popup:
http://www.haypocalc.com/tmp/visual_studio_64bits.png

Which version should I try? Ultimate? Premium? Professional?
msg170923 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-21 21:29
STINNER Victor <report@bugs.python.org> wrote:
> Which version should I try? Ultimate? Premium? Professional?

Try Ultimate, it's AFAIK the only version these days that supports PGO.
msg170924 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-21 21:30
Martin v. Löwis <report@bugs.python.org> wrote:
> For the record, the released binary is not just a PGO build, but has
> been trained with the attached training script.

Thanks. Now I can reproduce the issue with a source build.
msg170929 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-09-21 21:40
I'm using Ultimate, but I think Professional should provide you with all required tools.
msg170935 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-21 22:54
It's a bit late here already, but unless I'm missing something I think
this is an optimizer bug. I'm attaching a workaround for memoryview.c
and _lzmamodule.c.
msg170971 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-22 08:32
System: Windows 7 64-bit
Build (unpatched): PCBuild\Win32-pgo\python.exe, trained with profile.bat


In the unpatched version, I stepped through this test case in the debugger:

import array
a = array.array('Q', [1])
m = memoryview(a)
m[0] = 1


At Objects/memoryobject.c:1572:  llu == 1

At Objects/memoryobject.c:1782:  pylong_as_llu returned value 4294967296



So I think it's pretty safe to say that this is indeed an optimizer bug.
msg170973 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-22 08:38
Sorry, the line numbers are messed up. They should be:

Objects/memoryobject.c:1569
Objects/memoryobject.c:1776
msg170981 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-09-22 10:00
Unfortunately (?) I now cannot produce this myself, anymore, either.

Given all the problems, I'll stop using PGO on all branches and targets, as the compiler apparently generates bad code.

Anybody curious to investigate the issue further who is able to reproduce it might want to look at the generated machine code, to find out where exactly the incorrect processing happens.
msg170985 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-22 11:09
> Given all the problems, I'll stop using PGO on all branches and
> as the compiler apparently generates bad code.

That is probably the best solution.



The problem in memoryview.c:pack_single() is that Visual Studio optimizes the
memcpy() to mov instructions, but exchanges the low and high dwords:

unsigned __int64 llu = 1;

do { unsigned __int64 x; 
     x = (unsigned __int64)llu; 
     // At this point x=llu is in edx=1 and ecx=0.
     // The memcpy() is optimized to:
     //    mov dword ptr [esi], ecx
     //    mov dword ptr [esi+4], edx
     memcpy(ptr, (char *)&x, sizeof x); 
} while (0);
msg170988 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-09-22 11:48
It's interesting that the compiler uses ecx:edx to represent a __int64 quantity when 64-bit registers are available...
msg170989 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-22 12:00
Hmm, the bug is in the 32-bit build. The 64-bit build is fine. Or do you
mean MMX registers?
msg220180 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-06-10 17:34
"I ran a quick test with profile-guided optimization (PGO, pronounced "pogo"), which has supposedly been improved since VC9, and
+saw a very unscientific 20% speed improvement on pybench.py and 10% size reduction in python35.dll. I'm not sure what we used
+to get from VC9, but it certainly seems worth enabling provided it doesn't break anything. (Interestingly, PGO decided that
+only 1% of functions needed to be compiled for speed. Not sure if I can find out which ones those are but if anyone's
+interested I can give it a shot?)"


Steve, could you try if this is still a problem?  The (presumed) Visual Studio bug only surfaced when training the instrumented build with "profiletests.txt" and then doing the PGupdate build.


I'm opening this issue again and leave it at release blocker for 3.5.
msg220183 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-06-10 18:17
test_memoryview_assign seems to be okay, but the two test_lzma tests still fail with the same message. Both pass without PGO.

I'll get in touch with the PGO team and try and get it fixed. I haven't checked, but it looks consistent with Stefan's analysis of the disassembly. My guess it that it's a broken space optimisation rather than a speed one - our release builds normally optimise everything for speed but PGO will often decide that space is better.
msg220531 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-06-14 07:12
It's actually bad code generation for the switch statement in build_filter_spec() in _lzmamodule.c.

I've filed a bug, so hopefully it will be fixed, but if not then it should be easy enough to exclude that function (or even the whole module - _lzmamodule.c doesn't have any of the speed critical parts in it, by the look of it).
msg220545 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-06-14 10:08
Isn't PyLong_FromUnsignedLongLong() still involved through spec_add_field()?
msg220551 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2014-06-14 12:49
Please don't. If the compiler is demonstrated to generate bad code in one case, we should *not* exclude that code from optimization, but not use optimization at all. How many other places will there be which also cause bad code being generated that just happens not to be uncovered by the test suite?
msg220557 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-06-14 14:55
> Isn't PyLong_FromUnsignedLongLong() still involved through spec_add_field()?

The two issues were unrelated - the 'invalid filter ID' (4611686018427387905 == 0x40000000_00000001) is the correct value but the wrong branch in the switch was taken, leading to the error message.

> If the compiler is demonstrated to generate bad code in one case, we should *not* exclude that code from optimization, but not use optimization at all.

By that logic, we should be using a debug build on every platform... I've encountered various codegen bugs in gcc and MSVC, though they've all been fixed (apart from this one). All developers are human, including most compiler writers.

That said, I'll wait on the response from the PGO team. If they don't give me enough confidence, then I'll happily forget about the whole idea of using it for 3.5.
msg220894 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2014-06-17 21:14
I'd be fine to reconsider if a previously-demonstrated bug is now demonstrated-fixed. However, if the actual bug persists, optimization should be disabled for all code, not just for the code that allows to demonstrate the bug. This principle should indeed been followed for all platforms (and it has, e.g. on hpux).
msg220932 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-06-18 11:35
> The two issues were unrelated - the 'invalid filter ID'
> (4611686018427387905 == 0x40000000_00000001) is the correct
> value but the wrong branch in the switch was taken, leading
> to the error message.

Unfortunately I don't have a Visual Studio setup right now.

It seems to me that at the time the wrong branch is taken, f->id
could be in the registers in the wrong order (same as in msg170985),
but when the error message is printed, the value is read from
memory.  This is just a guess of course.


As Martin, I'm uncomfortable that the memoryview issue no longer
appears, but this one still does.


I've attached an alternative version of PyLong_AsUnsignedLongLong()
that is just intended for testing the compiler.


If the optimizer does whole progam optimization, it might choke
on _PyLong_AsByteArray().
msg220992 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-06-19 15:14
> I'd be fine to reconsider if a previously-demonstrated bug is now 
> demonstrated-fixed. However, if the actual bug persists, optimization 
> should be disabled for all code, not just for the code that allows to 
> demonstrate the bug. 

I'm okay with that. I thought you meant never enable optimizations with that compiler ever again, which is obviously ridiculous and I should have dismissed the idea on that basis rather than posting a snarky response. Sorry.

> It seems to me that at the time the wrong branch is taken, f->id
> could be in the registers in the wrong order (same as in msg170985),
> but when the error message is printed, the value is read from
> memory.  This is just a guess of course.

I checked that and the registers are fine. Here's the snippet of disassembly I posted with the bug I filed:

 mov     edx,dword ptr [edi+4]  ; == 0x40000000
 mov     ecx,dword ptr [edi]    ; == 0x00000001
 test    edx,edx           ; should be cmp edx,40000000h or equiv.
 ja      lbl1  ; 'default:'
 jb      lbl2              ; should be je after change above
 cmp     ecx,21h
 jbe     lbl2              ; should probably be lbl3
lbl1:
 ; default:
...
lbl2:
 cmp     ecx,1
 jne     lbl3
 ; case 0x4000000000000001
...


It's clearly an incorrect `test` opcode, and I'd expect switch statements where the first case is a 64-bit integer larger than 2**32 to be rare - I've certainly never encountered one before - which is why such a bug could go undiscovered.

When I looked at the disassembly for memoryview it was fine. I actually spent far longer than I should have trying to find the bug that was no longer there...

Also bear in mind that I'm working with VC14 and not VC10, so the difference is due to the compiler and not simply time or magic :)
msg221380 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-06-23 21:47
This has been confirmed as a bug in VC14 (and earlier) and there'll be a fix going in soon.

For those interested, here's a brief rundown of the root cause:

* the switch in build_filter_spec() switches on a 64-bit value
* one case is 0x4000000000000001 and the rest are <=0x21
* PGO detects that 0x4000000000000001 is the hot case
(bug starts here)
* PGO detects that the cold cases are 32-bits or less and so enables an optimisation to skip comparing the high DWORD
* PGO adds check for the hot case, but using the 32-bit optimisation
 - it checks for "0x1" rather than the full value
(bug ends here)
* PGO adds checks for cold cases

The fix will be to check both hot and cold cases to see whether the 32-bit optimisation can be used. A "workaround" (that I wouldn't dream of using, but it illustrates the issue) would be to add a dead case that requires 64-bits. This would show up in the list of cold cases and prevent the 32-bit optimisation from being used.

No indication of when the fix will go in, but it should be in the next public release, and I'll certainly be able to test it in advance of that.
msg221381 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2014-06-23 21:59
Thanks a lot for this investigation; I'm glad you are working on this.
msg242548 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-05-04 09:57
Is this now fixed in VS?  I don't believe I can test myself as I've only got express/community editions.
msg246282 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-07-05 01:05
So, the purpose in marking this as a "release blocker" is so that we can hold up the release while we wait for Microsoft to release a new compiler?  If our approach to fixing this is to get the compiler fixed, I can live with marking this as "critical", but not "release blocker".
msg246287 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-07-05 02:07
Eh, why bother. I don't remember if the fix is in for 3.5.0b3, but I'll vouch that the compiler build with the fix does exist and will be used for 3.5, so this should just be closed (again).
History
Date User Action Args
2015-07-05 02:07:22steve.dowersetstatus: open -> closed

messages: + msg246287
stage: resolved
2015-07-05 01:05:33larrysetpriority: release blocker -> critical
nosy: + larry
messages: + msg246282

2015-05-04 09:57:14BreamoreBoysetnosy: + BreamoreBoy
messages: + msg242548
components: + Windows
2014-10-14 18:03:01skrahsetnosy: - skrah
2014-06-23 22:24:14hayposetnosy: + haypo
2014-06-23 22:21:53hayposetnosy: + tim.golden, zach.ware, - haypo
2014-06-23 21:59:45loewissetmessages: + msg221381
2014-06-23 21:47:48steve.dowersetmessages: + msg221380
2014-06-19 15:14:59steve.dowersetmessages: + msg220992
2014-06-18 11:35:49skrahsetmessages: + msg220932
2014-06-18 11:20:22skrahsetfiles: + ull_vctest.diff
2014-06-17 21:14:12loewissetmessages: + msg220894
2014-06-14 14:55:51steve.dowersetmessages: + msg220557
2014-06-14 12:49:18loewissetmessages: + msg220551
2014-06-14 10:08:49skrahsetmessages: + msg220545
2014-06-14 07:12:53steve.dowersetmessages: + msg220531
2014-06-10 18:17:38steve.dowersetmessages: + msg220183
2014-06-10 17:34:57skrahsetstatus: closed -> open
versions: + Python 3.5, - Python 3.3
nosy: + steve.dower

messages: + msg220180

type: compile error
2012-09-22 12:00:06skrahsetmessages: + msg170989
2012-09-22 11:48:53loewissetmessages: + msg170988
2012-09-22 11:09:28skrahsetmessages: + msg170985
2012-09-22 10:01:07loewissetstatus: open -> closed
resolution: fixed
2012-09-22 10:00:59loewissetmessages: + msg170981
2012-09-22 08:38:31skrahsetmessages: + msg170973
2012-09-22 08:32:17skrahsetmessages: + msg170971
2012-09-21 22:54:14skrahsetfiles: + issue15993.diff
keywords: + patch
messages: + msg170935
2012-09-21 21:40:14loewissetmessages: + msg170929
2012-09-21 21:30:59skrahsetmessages: + msg170924
2012-09-21 21:29:38skrahsetmessages: + msg170923
2012-09-21 21:14:39hayposetmessages: + msg170920
2012-09-21 21:07:21loewissetfiles: + profiletests.txt
2012-09-21 21:07:12loewissetfiles: + profile.bat

messages: + msg170919
2012-09-21 21:04:08loewissetmessages: + msg170918
2012-09-21 21:02:24hayposetmessages: + msg170917
2012-09-21 20:54:19hayposetmessages: + msg170916
2012-09-21 09:30:53skrahsetnosy: + nadeem.vawda
2012-09-21 09:30:09skrahlinkissue15995 superseder
2012-09-21 09:27:29skrahsetpriority: critical -> release blocker

messages: + msg170873
2012-09-20 22:05:45skrahsetmessages: + msg170858
2012-09-20 21:53:37hayposetnosy: + haypo
2012-09-20 20:31:49skrahcreate