msg91097 - (view) |
Author: Artur Frysiak (wiget) |
Date: 2009-07-30 14:30 |
Building Python 3.1 (or Python from mercural py3k branch) fail if
configured --with-computed-gotos.
Traceback (most recent call last):
File "./setup.py", line 13, in <module>
from distutils.core import Extension, setup
File "/home/users/wiget/mirror/python/cpython/Lib/distutils/core.py",
line 18, in <module>
from distutils.dist import Distribution
File "/home/users/wiget/mirror/python/cpython/Lib/distutils/dist.py",
line 12, in <module>
import warnings
File "/home/users/wiget/mirror/python/cpython/Lib/warnings.py", line
6, in <module>
import linecache
File "/home/users/wiget/mirror/python/cpython/Lib/linecache.py", line
10, in <module>
import tokenize
File "/home/users/wiget/mirror/python/cpython/Lib/tokenize.py", line
94, in <module>
Ignore = Whitespace + any(r'\\\r?\n' + Whitespace) + maybe(Comment)
File "/home/users/wiget/mirror/python/cpython/Lib/tokenize.py", line
87, in any
def any(*choices): return group(*choices) + '*'
TypeError: group() argument after ** must be a mapping, not tuple
make: *** [sharedmods] Error 1
Compiled on PLD Linux using GCC 4.4.1
|
msg91098 - (view) |
Author: Artur Frysiak (wiget) |
Date: 2009-07-30 14:33 |
Exact options used to configure:
./configure \
--enable-ipv6 \
--with-wide-unicode \
--enable-shared \
--with-computed-gotos \
--with-dbmliborder=gdbm:bdb \
--with-signal-module \
--with-tsc \
--with-threads \
--with-doc-strings \
--with-fpectl \
--with-system-ffi
|
msg91306 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-08-05 10:34 |
I can't reproduce here (using gcc 4.3.2). Have you tried `make
distclean`? Is it a fresh checkout?
|
msg91325 - (view) |
Author: Artur Frysiak (wiget) |
Date: 2009-08-05 17:14 |
reproduced on fresh checkout from
http://svn.python.org/projects/python/branches/py3k (rev. 74321)
$ gcc --version|head -n 1
gcc (PLD-Linux) 4.4.1 20090724 (release)
|
msg91387 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-08-06 20:52 |
I'm afraid you'll have to investigate a bit by yourself, or wait for a
core developer to get the same problem :-/
|
msg91447 - (view) |
Author: Artur Frysiak (wiget) |
Date: 2009-08-10 14:18 |
Adding -fno-caller-saves to OPT in Makefile fixes problem.
BTW All compilations done on x86_64 architecture.
|
msg91454 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-08-10 16:31 |
It sounds like a compiler bug... It would be nice if people could give
results with other gcc versions and builds.
(my test was done on x86-64 too, by the way)
|
msg93896 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-10-12 13:54 |
I'm able to reproduce this on Debian Lenny/x86_64 with a home-built gcc
4.4.0 and the --with-computed-gotos and --with-tsc configure options.
I'm compiling with:
CC=gcc-4.4 ./configure --with-tsc --with-computed-gotos && make
The Python executable builds successfully, and then the module build
stage fails exactly as Artur described. Here's a minimal failing example:
Python 3.2a0 (py3k:75376, Oct 12 2009, 14:32:36)
[GCC 4.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def f(*args): return None
...
>>> def g(*args): return f(*args)
...
>>> g()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in g
TypeError: f() argument after ** must be a mapping, not tuple
With gcc 4.3.2 I get no failure. With either of the configure options
removed I get no failure.
After some investigation, I believe the problem is in Python/ceval.c:
what's going on is that the inline x86 assembly for the READ_TIMESTAMP
macro is clobbering the rdx register. This clobbering changes the value
of the 'flags' variable in _call_function_var_kw, leading the
ext_do_call to think that it's supposed to be doing a
CALL_FUNCTION_VAR_KW instead of a CALL_FUNCTION_VAR.
I believe this *is* a Python bug: the inline assembly should be fixed
to use the eax and edx registers explicitly instead of using the "=A"
constraint; for x86_64, "=A" apparently means the 64-bit rax register,
rather than the edx/eax pair of 32-bit registers (though it's difficult
to find any mention of this in the gcc documentation).
I don't know why the --with-computed-gotos configure option affects this.
|
msg93898 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-10-12 18:14 |
Here's a patch.
Martin, would you be able to take a look at this?
N.B. I also tried './configure --with-tsc && make && make test' for a
build of the trunk on OS X 10.6; the configure and make steps succeeded,
but 'make test' immediately segfaults; gdb shows the segfault coming from
ceval.c. This patch fixes that segfault.
|
msg93899 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-10-12 18:16 |
N.B. The patch assumes that unsigned int has 32 bits. This is almost
certainly true on the platforms of interest, but it might be better to use
uint64_t and uint32_t throughout the tsc code. For Python 2.7 and 3.x,
uint32_t and uint64_t are already detected by the configure script; for
2.6, the detection might need to be added.
|
msg93927 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2009-10-13 16:07 |
I'm skeptical that the patch is right (or, rather, that it is not a
compiler bug). According to
http://gcc.gnu.org/viewcvs/trunk/gcc/config/i386/i386.h?view=markup
in macro REG_CLASS_CONTENTS, the class AD_REGS includes indeed EDX and
EAX. The exact place where this gets used has varied recently quite a
lot; it currently lives in predicates.md:
http://gcc.gnu.org/viewcvs/trunk/gcc/config/i386/constraints.md?
revision=149373&view=markup
where "A" becomes the letter denoting the AD_REGS constraint. So IIUC,
this means that EDX is the output of the insn, and your change should
have no effect. I suggest talking to gcc people what the problem might
be.
|
msg93928 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-10-13 16:28 |
Here's a relevant gcc bug report:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21249
As I understand it, the rules are that in 32-bit mode "A" means edx/eax
for a 64-bit quantity, and in 64-bit mode "A" means rdx/rax for a 128-bit
quantity and "rdx or rax" for a 64-bit quantity.
|
msg93929 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-10-13 16:41 |
I just found a more recent one:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41133
Whether this is a gcc misfeature or not, it sounds as though it's not
going to change.
Maybe there should be two separate READ_TIMESTAMP cases? One for x86 (or
x86_64 in 32-bit mode), and one for x86_64 in 64-bit mode.
|
msg93930 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2009-10-13 16:42 |
Looking at this further, it seems that the rdtsc code got miscompiled on
x64 for some time already. Consider this code
typedef unsigned long long uint64;
uint64 f(uint64 b)
{
uint64 a;
__asm__ __volatile__("rdtsc" : "=A" (a));
return a+b;
}
My Apple gcc 4.0.1 compiles that into
_f:
pushq %rbp
movq %rsp, %rbp
rdtsc
addq %rdi, %rax
leave
ret
Here, %rdi is the incoming parameter; %rdx is not considered at all.
This seems to come from DImode (double integer) processing: gcc just
"knows" that a DImode variable lives in a single register on AMD64.
So even if your code is right in principle, I still think there is a gcc
bug here.
As for the specific code: I'm not sure whether it's guaranteed that you
can truncate output registers in an asm. If you can't, you should make
the output registers 64-bit integers on AMD64. If you can, I think you
can "simplify" the code by directly outputting to (int*)v and
((int*)v)[1]; this would be worthwhile only if the generated code
actually gets better by omitting the shift operation.
FWIW, I don't consider a bug that only occurs --with-tsc and only on
AMD64 critical.
|
msg93931 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-10-13 16:53 |
> As for the specific code: I'm not sure whether it's guaranteed that
> you can truncate output registers in an asm. If you can't, you
> should make the output registers 64-bit integers on AMD64.
Specifying "eax" and "edx" directly, rather than using "a" and "d",
would presumably also work here?
> If you can, I think you
> can "simplify" the code by directly outputting to (int*)v and
>((int*)v)[1];
I'll take a look at that. Thanks.
> FWIW, I don't consider a bug that only occurs --with-tsc and only on
> AMD64 critical.
Sorry. I got overexcited. Changed priority to 'normal'.
|
msg93933 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-10-13 17:17 |
FWIW, the Linux source splits things up into 32-bit and 64-bit, and uses
"A" for 32-bit and "a" and "d" for 64-bit; their 64-bit code (after
unwinding the macros) is essentially identical to the code in the patch.
See
http://lxr.linux.no/linux+v2.6.31/arch/x86/include/asm/msr.h
|
msg93937 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-10-13 20:07 |
> "simplify" the code by directly outputting to (int*)v and
> ((int*)v)[1];
This does indeed seem to produce better compiler output, at least under
Apple gcc-4.0.1, Apple gcc-4.2.1, and non-Apple gcc-4.4.1.
Here's a revised patch, which also separates the i386 and x86_64 cases.
Assigning to Martin for review.
|
msg93938 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-10-13 20:08 |
Whoops. Wrong patch. Here's the right one.
|
msg93939 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-10-13 20:31 |
Aargh. I somehow deleted a backslash between testing and submitting that
patch. Sorry.
Third time lucky.
|
msg93945 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2009-10-13 22:29 |
The patch is fine, please apply (notice that the 2.6 branch is closed for
anything but 2.6.3 regressions right now, so you may want to defer this
after 2.6.4).
|
msg94750 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-10-31 10:23 |
Applied to 2.7, 2.6, 3.2, 3.1 in r75982 through r75985.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:51 | admin | set | github: 50852 |
2009-10-31 10:23:02 | mark.dickinson | set | status: open -> closed
messages:
+ msg94750 stage: patch review -> resolved |
2009-10-13 22:29:32 | loewis | set | assignee: loewis -> mark.dickinson resolution: accepted messages:
+ msg93945 |
2009-10-13 20:31:12 | mark.dickinson | set | files:
+ tsc3.patch
messages:
+ msg93939 |
2009-10-13 20:09:00 | mark.dickinson | set | files:
- tsc2.patch |
2009-10-13 20:08:54 | mark.dickinson | set | files:
+ tsc2.patch
messages:
+ msg93938 |
2009-10-13 20:07:35 | mark.dickinson | set | files:
+ tsc2.patch assignee: mark.dickinson -> loewis messages:
+ msg93937
|
2009-10-13 17:17:10 | mark.dickinson | set | messages:
+ msg93933 |
2009-10-13 16:53:35 | mark.dickinson | set | priority: critical -> normal
messages:
+ msg93931 |
2009-10-13 16:42:48 | loewis | set | messages:
+ msg93930 |
2009-10-13 16:41:23 | mark.dickinson | set | messages:
+ msg93929 |
2009-10-13 16:28:21 | mark.dickinson | set | messages:
+ msg93928 |
2009-10-13 16:07:02 | loewis | set | messages:
+ msg93927 |
2009-10-12 18:16:56 | mark.dickinson | set | messages:
+ msg93899 |
2009-10-12 18:14:02 | mark.dickinson | set | files:
+ tsc_x86_64.patch
nosy:
+ loewis messages:
+ msg93898
keywords:
+ patch stage: needs patch -> patch review |
2009-10-12 13:54:36 | mark.dickinson | set | priority: critical
assignee: mark.dickinson versions:
+ Python 2.6, Python 2.7 keywords:
+ easy nosy:
+ mark.dickinson
messages:
+ msg93896 stage: needs patch |
2009-08-10 16:31:04 | pitrou | set | messages:
+ msg91454 |
2009-08-10 14:18:28 | wiget | set | messages:
+ msg91447 versions:
+ Python 3.2 |
2009-08-06 20:52:24 | pitrou | set | messages:
+ msg91387 |
2009-08-05 17:14:39 | wiget | set | messages:
+ msg91325 |
2009-08-05 10:34:21 | pitrou | set | nosy:
+ pitrou messages:
+ msg91306
|
2009-07-30 14:33:24 | wiget | set | messages:
+ msg91098 |
2009-07-30 14:30:34 | wiget | create | |