msg170844 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-09-20 20:31 |
I've installed 3.3.0-rc2 on Windows-7 64-bit using the msi installer.
I'm getting these failures in test_buffer, but I can *not* reproduce
them when I build Win-32/pgo/python.exe from source:
======================================================================
FAIL: test_memoryview_assign (test.test_buffer.TestBufferProtocol)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Program Files (x86)\Python33\lib\test\test_buffer.py", line 2863,
self.assertEqual(m[i], 8)
AssertionError: 34359738368 != 8
======================================================================
FAIL: test_memoryview_struct_module (test.test_buffer.TestBufferProtocol)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Program Files (x86)\Python33\lib\test\test_buffer.py", line 2476,
self.assertEqual(m[0], nd[0])
AssertionError: 15080797365275624638 != 6838299039298601293
|
msg170858 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-09-20 22:05 |
Both lzma and memoryview use PyLong_AsUnsignedLongLong() in the
affected code paths. I get this with the msi installed python.exe:
>>> import array
>>> a = array.array('Q', [1,2,3,4])
>>> m = memoryview(a)
>>> m[0] = 4
>>> m[0]
17179869184
>>>
And the correct result with the self-compiled (PGO) python.exe:
>>> import array
>>> a = array.array('Q', [1,2,3,4])
>>> m = memoryview(a)
>>> m[0] = 4
>>> m[0]
4
|
msg170873 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-09-21 09:27 |
The high and low words of the 64-bit value are switched:
>>> a = array.array('Q', [1])
>>> m = memoryview(a)
>>> m[0]= 2**32+5
>>> m[0]
21474836481
>>> struct.unpack_from('8s', m, 0)
(b'\x01\x00\x00\x00\x05\x00\x00\x00',)
Can anyone reproduce this in a source build? I think this should
be a blocker.
|
msg170916 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-09-21 20:54 |
Issue #15995 was marked as a duplicate of this issue. Copy of its initial message (msg170853):
This is similar to #15993: With the installed Python from the rc2-msi
test_lzma fails. I cannot reproduce the failure with python.exe (PGO)
compiled from source:
======================================================================
ERROR: test__decode_filter_properties (test.test_lzma.MiscellaneousTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Python33\lib\test\test_lzma.py", line 1105, in test__decode_filter_properties
lzma.FILTER_LZMA1, b"]\x00\x00\x80\x00")
ValueError: Invalid filter ID: 4611686018427387905
======================================================================
ERROR: test_filter_properties_roundtrip (test.test_lzma.MiscellaneousTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Python33\lib\test\test_lzma.py", line 1114, in test_filter_properties_roundtrip
lzma.FILTER_LZMA1, b"]\x00\x00\x80\x00")
ValueError: Invalid filter ID: 4611686018427387905
----------------------------------------------------------------------
|
msg170917 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-09-21 21:02 |
I fail to reproduce the issue on Windows 7 (Version 6.0.1, number 7601, Service Pack 1):
---
Microsoft Windows [version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. Tous droits réservés.
C:\Users\haypo>cd \python33
C:\Python33>python.exe
Python 3.3.0rc2 (v3.3.0rc2:88a0792e8ba3, Sep 9 2012, 10:13:38) [MSC v.1600 64 b
it (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()
C:\Python33>python.exe -m test test_buffer test_lzma
[1/2] test_buffer
[2/2] test_lzma
All 2 tests OK.
---
I'm running Windows 7 in KVM, my host CPU is a Intel i7-2600.
> I'm getting these failures in test_buffer,
> but I can *not* reproduce them when I build
> Win-32/pgo/python.exe from source:
The issue looks to be specific to 64 bits binaries. You need the professional version of Visual Studio 10. The express version doesn't support 64 bits. I only have the Express version.
|
msg170918 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-09-21 21:04 |
Declaring this a release blocker is technically difficult. If it is a release blocker, further releases cannot be done until it is resolved. Since it is an issue with the binary only, the only possible way to resolve this is with a release. So declaring this a release blocker essentially deadlocks the release.
|
msg170919 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-09-21 21:07 |
For the record, the released binary is not just a PGO build, but has been trained with the attached training script.
|
msg170920 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-09-21 21:14 |
> You need the professional version of Visual Studio 10.
> The express version doesn't support 64 bits. I only have the
> Express version.
Ah yes, I now remember my issue with VS10 Express: when I set the project to 64 bits, I get such error popup:
http://www.haypocalc.com/tmp/visual_studio_64bits.png
Which version should I try? Ultimate? Premium? Professional?
|
msg170923 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-09-21 21:29 |
STINNER Victor <report@bugs.python.org> wrote:
> Which version should I try? Ultimate? Premium? Professional?
Try Ultimate, it's AFAIK the only version these days that supports PGO.
|
msg170924 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-09-21 21:30 |
Martin v. Löwis <report@bugs.python.org> wrote:
> For the record, the released binary is not just a PGO build, but has
> been trained with the attached training script.
Thanks. Now I can reproduce the issue with a source build.
|
msg170929 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-09-21 21:40 |
I'm using Ultimate, but I think Professional should provide you with all required tools.
|
msg170935 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-09-21 22:54 |
It's a bit late here already, but unless I'm missing something I think
this is an optimizer bug. I'm attaching a workaround for memoryview.c
and _lzmamodule.c.
|
msg170971 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-09-22 08:32 |
System: Windows 7 64-bit
Build (unpatched): PCBuild\Win32-pgo\python.exe, trained with profile.bat
In the unpatched version, I stepped through this test case in the debugger:
import array
a = array.array('Q', [1])
m = memoryview(a)
m[0] = 1
At Objects/memoryobject.c:1572: llu == 1
At Objects/memoryobject.c:1782: pylong_as_llu returned value 4294967296
So I think it's pretty safe to say that this is indeed an optimizer bug.
|
msg170973 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-09-22 08:38 |
Sorry, the line numbers are messed up. They should be:
Objects/memoryobject.c:1569
Objects/memoryobject.c:1776
|
msg170981 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-09-22 10:00 |
Unfortunately (?) I now cannot produce this myself, anymore, either.
Given all the problems, I'll stop using PGO on all branches and targets, as the compiler apparently generates bad code.
Anybody curious to investigate the issue further who is able to reproduce it might want to look at the generated machine code, to find out where exactly the incorrect processing happens.
|
msg170985 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-09-22 11:09 |
> Given all the problems, I'll stop using PGO on all branches and
> as the compiler apparently generates bad code.
That is probably the best solution.
The problem in memoryview.c:pack_single() is that Visual Studio optimizes the
memcpy() to mov instructions, but exchanges the low and high dwords:
unsigned __int64 llu = 1;
do { unsigned __int64 x;
x = (unsigned __int64)llu;
// At this point x=llu is in edx=1 and ecx=0.
// The memcpy() is optimized to:
// mov dword ptr [esi], ecx
// mov dword ptr [esi+4], edx
memcpy(ptr, (char *)&x, sizeof x);
} while (0);
|
msg170988 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-09-22 11:48 |
It's interesting that the compiler uses ecx:edx to represent a __int64 quantity when 64-bit registers are available...
|
msg170989 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-09-22 12:00 |
Hmm, the bug is in the 32-bit build. The 64-bit build is fine. Or do you
mean MMX registers?
|
msg220180 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2014-06-10 17:34 |
"I ran a quick test with profile-guided optimization (PGO, pronounced "pogo"), which has supposedly been improved since VC9, and
+saw a very unscientific 20% speed improvement on pybench.py and 10% size reduction in python35.dll. I'm not sure what we used
+to get from VC9, but it certainly seems worth enabling provided it doesn't break anything. (Interestingly, PGO decided that
+only 1% of functions needed to be compiled for speed. Not sure if I can find out which ones those are but if anyone's
+interested I can give it a shot?)"
Steve, could you try if this is still a problem? The (presumed) Visual Studio bug only surfaced when training the instrumented build with "profiletests.txt" and then doing the PGupdate build.
I'm opening this issue again and leave it at release blocker for 3.5.
|
msg220183 - (view) |
Author: Steve Dower (steve.dower) * |
Date: 2014-06-10 18:17 |
test_memoryview_assign seems to be okay, but the two test_lzma tests still fail with the same message. Both pass without PGO.
I'll get in touch with the PGO team and try and get it fixed. I haven't checked, but it looks consistent with Stefan's analysis of the disassembly. My guess it that it's a broken space optimisation rather than a speed one - our release builds normally optimise everything for speed but PGO will often decide that space is better.
|
msg220531 - (view) |
Author: Steve Dower (steve.dower) * |
Date: 2014-06-14 07:12 |
It's actually bad code generation for the switch statement in build_filter_spec() in _lzmamodule.c.
I've filed a bug, so hopefully it will be fixed, but if not then it should be easy enough to exclude that function (or even the whole module - _lzmamodule.c doesn't have any of the speed critical parts in it, by the look of it).
|
msg220545 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2014-06-14 10:08 |
Isn't PyLong_FromUnsignedLongLong() still involved through spec_add_field()?
|
msg220551 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2014-06-14 12:49 |
Please don't. If the compiler is demonstrated to generate bad code in one case, we should *not* exclude that code from optimization, but not use optimization at all. How many other places will there be which also cause bad code being generated that just happens not to be uncovered by the test suite?
|
msg220557 - (view) |
Author: Steve Dower (steve.dower) * |
Date: 2014-06-14 14:55 |
> Isn't PyLong_FromUnsignedLongLong() still involved through spec_add_field()?
The two issues were unrelated - the 'invalid filter ID' (4611686018427387905 == 0x40000000_00000001) is the correct value but the wrong branch in the switch was taken, leading to the error message.
> If the compiler is demonstrated to generate bad code in one case, we should *not* exclude that code from optimization, but not use optimization at all.
By that logic, we should be using a debug build on every platform... I've encountered various codegen bugs in gcc and MSVC, though they've all been fixed (apart from this one). All developers are human, including most compiler writers.
That said, I'll wait on the response from the PGO team. If they don't give me enough confidence, then I'll happily forget about the whole idea of using it for 3.5.
|
msg220894 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2014-06-17 21:14 |
I'd be fine to reconsider if a previously-demonstrated bug is now demonstrated-fixed. However, if the actual bug persists, optimization should be disabled for all code, not just for the code that allows to demonstrate the bug. This principle should indeed been followed for all platforms (and it has, e.g. on hpux).
|
msg220932 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2014-06-18 11:35 |
> The two issues were unrelated - the 'invalid filter ID'
> (4611686018427387905 == 0x40000000_00000001) is the correct
> value but the wrong branch in the switch was taken, leading
> to the error message.
Unfortunately I don't have a Visual Studio setup right now.
It seems to me that at the time the wrong branch is taken, f->id
could be in the registers in the wrong order (same as in msg170985),
but when the error message is printed, the value is read from
memory. This is just a guess of course.
As Martin, I'm uncomfortable that the memoryview issue no longer
appears, but this one still does.
I've attached an alternative version of PyLong_AsUnsignedLongLong()
that is just intended for testing the compiler.
If the optimizer does whole progam optimization, it might choke
on _PyLong_AsByteArray().
|
msg220992 - (view) |
Author: Steve Dower (steve.dower) * |
Date: 2014-06-19 15:14 |
> I'd be fine to reconsider if a previously-demonstrated bug is now
> demonstrated-fixed. However, if the actual bug persists, optimization
> should be disabled for all code, not just for the code that allows to
> demonstrate the bug.
I'm okay with that. I thought you meant never enable optimizations with that compiler ever again, which is obviously ridiculous and I should have dismissed the idea on that basis rather than posting a snarky response. Sorry.
> It seems to me that at the time the wrong branch is taken, f->id
> could be in the registers in the wrong order (same as in msg170985),
> but when the error message is printed, the value is read from
> memory. This is just a guess of course.
I checked that and the registers are fine. Here's the snippet of disassembly I posted with the bug I filed:
mov edx,dword ptr [edi+4] ; == 0x40000000
mov ecx,dword ptr [edi] ; == 0x00000001
test edx,edx ; should be cmp edx,40000000h or equiv.
ja lbl1 ; 'default:'
jb lbl2 ; should be je after change above
cmp ecx,21h
jbe lbl2 ; should probably be lbl3
lbl1:
; default:
...
lbl2:
cmp ecx,1
jne lbl3
; case 0x4000000000000001
...
It's clearly an incorrect `test` opcode, and I'd expect switch statements where the first case is a 64-bit integer larger than 2**32 to be rare - I've certainly never encountered one before - which is why such a bug could go undiscovered.
When I looked at the disassembly for memoryview it was fine. I actually spent far longer than I should have trying to find the bug that was no longer there...
Also bear in mind that I'm working with VC14 and not VC10, so the difference is due to the compiler and not simply time or magic :)
|
msg221380 - (view) |
Author: Steve Dower (steve.dower) * |
Date: 2014-06-23 21:47 |
This has been confirmed as a bug in VC14 (and earlier) and there'll be a fix going in soon.
For those interested, here's a brief rundown of the root cause:
* the switch in build_filter_spec() switches on a 64-bit value
* one case is 0x4000000000000001 and the rest are <=0x21
* PGO detects that 0x4000000000000001 is the hot case
(bug starts here)
* PGO detects that the cold cases are 32-bits or less and so enables an optimisation to skip comparing the high DWORD
* PGO adds check for the hot case, but using the 32-bit optimisation
- it checks for "0x1" rather than the full value
(bug ends here)
* PGO adds checks for cold cases
The fix will be to check both hot and cold cases to see whether the 32-bit optimisation can be used. A "workaround" (that I wouldn't dream of using, but it illustrates the issue) would be to add a dead case that requires 64-bits. This would show up in the list of cold cases and prevent the 32-bit optimisation from being used.
No indication of when the fix will go in, but it should be in the next public release, and I'll certainly be able to test it in advance of that.
|
msg221381 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2014-06-23 21:59 |
Thanks a lot for this investigation; I'm glad you are working on this.
|
msg242548 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2015-05-04 09:57 |
Is this now fixed in VS? I don't believe I can test myself as I've only got express/community editions.
|
msg246282 - (view) |
Author: Larry Hastings (larry) * |
Date: 2015-07-05 01:05 |
So, the purpose in marking this as a "release blocker" is so that we can hold up the release while we wait for Microsoft to release a new compiler? If our approach to fixing this is to get the compiler fixed, I can live with marking this as "critical", but not "release blocker".
|
msg246287 - (view) |
Author: Steve Dower (steve.dower) * |
Date: 2015-07-05 02:07 |
Eh, why bother. I don't remember if the fix is in for 3.5.0b3, but I'll vouch that the compiler build with the fix does exist and will be used for 3.5, so this should just be closed (again).
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:36 | admin | set | github: 60197 |
2015-07-05 02:07:22 | steve.dower | set | status: open -> closed
messages:
+ msg246287 stage: resolved |
2015-07-05 01:05:33 | larry | set | priority: release blocker -> critical nosy:
+ larry messages:
+ msg246282
|
2015-05-04 09:57:14 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages:
+ msg242548 components:
+ Windows
|
2014-10-14 18:03:01 | skrah | set | nosy:
- skrah
|
2014-06-23 22:24:14 | vstinner | set | nosy:
+ vstinner
|
2014-06-23 22:21:53 | vstinner | set | nosy:
+ tim.golden, zach.ware, - vstinner
|
2014-06-23 21:59:45 | loewis | set | messages:
+ msg221381 |
2014-06-23 21:47:48 | steve.dower | set | messages:
+ msg221380 |
2014-06-19 15:14:59 | steve.dower | set | messages:
+ msg220992 |
2014-06-18 11:35:49 | skrah | set | messages:
+ msg220932 |
2014-06-18 11:20:22 | skrah | set | files:
+ ull_vctest.diff |
2014-06-17 21:14:12 | loewis | set | messages:
+ msg220894 |
2014-06-14 14:55:51 | steve.dower | set | messages:
+ msg220557 |
2014-06-14 12:49:18 | loewis | set | messages:
+ msg220551 |
2014-06-14 10:08:49 | skrah | set | messages:
+ msg220545 |
2014-06-14 07:12:53 | steve.dower | set | messages:
+ msg220531 |
2014-06-10 18:17:38 | steve.dower | set | messages:
+ msg220183 |
2014-06-10 17:34:57 | skrah | set | status: closed -> open versions:
+ Python 3.5, - Python 3.3 nosy:
+ steve.dower
messages:
+ msg220180
type: compile error |
2012-09-22 12:00:06 | skrah | set | messages:
+ msg170989 |
2012-09-22 11:48:53 | loewis | set | messages:
+ msg170988 |
2012-09-22 11:09:28 | skrah | set | messages:
+ msg170985 |
2012-09-22 10:01:07 | loewis | set | status: open -> closed resolution: fixed |
2012-09-22 10:00:59 | loewis | set | messages:
+ msg170981 |
2012-09-22 08:38:31 | skrah | set | messages:
+ msg170973 |
2012-09-22 08:32:17 | skrah | set | messages:
+ msg170971 |
2012-09-21 22:54:14 | skrah | set | files:
+ issue15993.diff keywords:
+ patch messages:
+ msg170935
|
2012-09-21 21:40:14 | loewis | set | messages:
+ msg170929 |
2012-09-21 21:30:59 | skrah | set | messages:
+ msg170924 |
2012-09-21 21:29:38 | skrah | set | messages:
+ msg170923 |
2012-09-21 21:14:39 | vstinner | set | messages:
+ msg170920 |
2012-09-21 21:07:21 | loewis | set | files:
+ profiletests.txt |
2012-09-21 21:07:12 | loewis | set | files:
+ profile.bat
messages:
+ msg170919 |
2012-09-21 21:04:08 | loewis | set | messages:
+ msg170918 |
2012-09-21 21:02:24 | vstinner | set | messages:
+ msg170917 |
2012-09-21 20:54:19 | vstinner | set | messages:
+ msg170916 |
2012-09-21 09:30:53 | skrah | set | nosy:
+ nadeem.vawda
|
2012-09-21 09:30:09 | skrah | link | issue15995 superseder |
2012-09-21 09:27:29 | skrah | set | priority: critical -> release blocker
messages:
+ msg170873 |
2012-09-20 22:05:45 | skrah | set | messages:
+ msg170858 |
2012-09-20 21:53:37 | vstinner | set | nosy:
+ vstinner
|
2012-09-20 20:31:49 | skrah | create | |