Issue943898
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2004-04-28 18:33 by arigo, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
oparg-opt.diff | arigo, 2004-04-28 18:33 | eval_frame opcode/oparg optimizations | ||
asm-locals.diff | arigo, 2004-05-10 10:48 | Put the two main locals into registers | ||
sp0.diff | arigo, 2004-06-17 07:35 | Re-generated patch, without 386-specific optimizations |
Messages (10) | |||
---|---|---|---|
msg45864 - (view) | Author: Armin Rigo (arigo) * ![]() |
Date: 2004-04-28 18:33 | |
The result of a few experiments looking at the assembler produced by gcc for eval_frame(): * on PCs, reading the arguments as an unsigned short instead of two bytes is a good win. * oparg is more "local" with this patch: its value doesn't need to be saved across an iteration of the main loop, allowing it to live in a register only. * added an explicit "case STOP_CODE:" so that the switch starts at 0 instead of 1 -- that's one instruction less with gcc. * it seems not to pay off to move reading the argument at the start of each case of an operation that expects one, even though it removes the unpredictable branch "if (HAS_ARG(op))". This patch should be timed on other platforms to make sure that it doesn't slow things down. If it does, then only reading the arg as an unsigned short could be checked in -- it is compilation-conditional over the fact that shorts are 2 bytes in little endian order. By the way, anyone knows why 'stack_pointer' isn't a 'register' local? I bet it would make a difference on PowerPC, for example, with compilers that care about this keyword. |
|||
msg45865 - (view) | Author: Armin Rigo (arigo) * ![]() |
Date: 2004-04-28 21:02 | |
Logged In: YES user_id=4771 stack_pointer isn't a register because its address is taken at two places. This is a really bad idea for optimization. Instead of &stack_pointer, we should do: PyObject **sp = stack_pointer; ... use &sp ... stack_pointer = sp; I'm pretty sure this simple change along with a 'register' declaration of stack_pointer gives a good speed-up on all architectures with plenty of registers. For PCs I've experimented with forcing one or two locals into specific registers, with the gcc syntax asm("esi"), asm("ebx"), etc. Forcing stack_pointer and next_instr gives another 3-4% of improvement. Next step is to see if this can be done with #if's for common compilers beside gcc. |
|||
msg45866 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2004-04-28 23:45 | |
Logged In: YES user_id=80475 With MSVC++ 6.0 under WinME on a Pentium III, there is no change in timing (measurements accurate within 0.25%): I wonder if the speedup from retrieving the unsigned short is offset by alignment penalties when the starting address is odd. |
|||
msg45867 - (view) | Author: Armin Rigo (arigo) * ![]() |
Date: 2004-05-10 10:48 | |
Logged In: YES user_id=4771 The short trick might be a bit fragile. For example, the current patch would incorrectly use it on machines where unaligned accesses are forbidden. I isolated the other issue I talked about (making stack_pointer a register variable) in a separate patch. This patch alone is clearly safe. It should give a bit of speed-up on any machine but it definitely gives 5% on PCs with gcc by forcing the two most important local variables into specific registers. (If someone knows the corresponding syntax for other compilers, it can be added in the #if.) |
|||
msg45868 - (view) | Author: Armin Rigo (arigo) * ![]() |
Date: 2004-05-10 14:26 | |
Logged In: YES user_id=4771 Tested on a MacOSX box, the patch also gives a 5% speed-up there. Allowing stack_pointer to be in a register is a very good idea. (all tests with Pystone) |
|||
msg45869 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2004-05-12 15:27 | |
Logged In: YES user_id=80475 Tim, I remember you having some options about these sort of optimizations. Will you take a brief look at Armin's latest patch. |
|||
msg45870 - (view) | Author: Armin Rigo (arigo) * ![]() |
Date: 2004-05-21 15:44 | |
Logged In: YES user_id=4771 The bit with gcc-specific keywords is useful but arguably scary, but the other part of the patch -- stack_pointer not being assignable to a register -- solves a definite performance bug in my opinion. I'd even suggest back-porting this one to 2.3. Apple is more likely to ship its next MacOSX release with the latest 2.3 than with 2.4, as far as I can tell. |
|||
msg45871 - (view) | Author: Armin Rigo (arigo) * ![]() |
Date: 2004-06-17 07:35 | |
Logged In: YES user_id=4771 The patch no longer cleanly applies, so here is it again. This one is minimalistic and does not contain the 386-specific register tweaks. It only allows two variables (stack_pointer and oparg) to be stored in registers instead of the stack on machines that have enough of them. I still regard this as a small performance bugfix and suggest back-porting. |
|||
msg45872 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2004-06-17 08:50 | |
Logged In: YES user_id=80475 Very nice. Code passes review, passes regression tests, and the timings were confirmed (Pentium III running Win ME with MSVC++ 6.0). Please apply. Though this patch is very clean, we do not backport performance tweaks. The only exception would be to repair devastatingly bad performance. Let this be some incentive to step up to Py2.4. |
|||
msg45873 - (view) | Author: Armin Rigo (arigo) * ![]() |
Date: 2004-06-17 10:23 | |
Logged In: YES user_id=4771 Checked in as ceval.c 2.400. Let's forget about the GCC-specific extension and close the patch. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:03 | admin | set | github: 40192 |
2004-04-28 18:33:16 | arigo | create |