msg75694 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-11-10 13:12 |
It's hard to read Objects/longobject.c because the code depends to
much on the implementation. I wrote macros to simplify the code:
- PyLong_SIGN(x)
- PyLong_EQUALS_ZERO(x)
- PyLong_FITS_INT(x)
- PyLong_GET_INT(x)
- PyLong_NDIGITS(x)
Macros are declared in Include/longintrepr.h to be able to use them
outside longobject.c. Eg. I used it in Modules/mathmodule.c.
The patch also contains extra changes:
- create sdigit type (used by PyLong_GET_INT(x))
- use memcpy() in _PyLong_Copy()
- use stwodigits in long_mul()
- replace many Py_SIZE() hacks by my macros (eg. see long_hash)
Using my patch, it will be easier to change long integer
implementation, like:
- Use 30-bit digits instead of 15-bit digits (issue #4258)
- Use GMP for PyLong (issue #1814)
Example of changes:
i = Py_SIZE(v);
x = 0;
if (i < 0) {
PyErr_SetString(PyExc_OverflowError,
"can't convert negative value to unsigned
int");
return (unsigned long) -1;
}
switch (i) {
case 0: return 0;
case 1: return v->ob_digit[0];
}
while (--i >= 0) {
...
is replaced by:
if (PyLong_SIGN(v) < 0) {
PyErr_SetString(PyExc_OverflowError,
"can't convert negative value to unsigned
int");
return (unsigned long) -1;
}
if (PyLong_FITS_INT(v))
return (unsigned long) PyLong_GET_INT(v);
x = 0;
i = PyLong_NDIGITS(v);
while (--i >= 0) {
...
|
msg75696 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2008-11-10 13:37 |
Interesting.
Incidentally, I'm already using sdigit for the signed version of digit---
this seems to fit with the current digit/twodigits/stwodigits typedefs.
What's the purpose of your sdigit? Do you really want it to be type
'int'?
|
msg75697 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2008-11-10 13:48 |
Another thought: macros that are going to be used elsewhere in Python
(like the way you're using PyLong_SIGN in mathmodule.c) would probably
be better off in longobject.h. The fewer places there are that have to
include longintrepr.h, the easier it is to mess with the internal
representation.
It's quite tempting to 'fix' _PyLong_AsScaledDouble to return e as the
number of bits, rather than the number of digits; then mathmodule.c
wouldn't have to include longintrepr.h at all.
(And if marshal.c were also changed, to read and write integers as byte
strings, we wouldn't need longintrepr.h anywhere any more!)
|
msg75698 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-11-10 13:50 |
Second patch to optimize some PyLong operations:
- Write special code for small (a, b) of long_true_div(),
long_bitwise(), and l_divmod() (used by long_div(), long_mod() and
long_divmod(), and long_pow())
- PyLong_FromLong(): don't go to the while() for small *negative*
integers
TODO: Write special code for small integers of long_rshift() and
long_lshift()
|
msg75699 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-11-10 14:00 |
> What's the purpose of your sdigit?
The type sdigit should be able to store a signed digit, so a number in
[-(2^15-1); 2^15-1]. short may be enough, but I think that the CPU
prefers (CPU is faster with) an int because it doesn't have to
truncate the MSB (eg. 32 bits (u)int => 16 bits (u)short). I also
used "int" because Python was already using "int" (right?)
> would probably be better off in longobject.h
Right.
> It's quite tempting to 'fix' _PyLong_AsScaledDouble to
> return e as the number of bits, rather than the number of digits
Good idea. But instead of writing a patch of 100.000 lines, I prefer
to start with a small patch and then add new patches to improve it. It
allows to apply only some patches but not all. Is
_PyLong_AsScaledDouble() used by another module than mathmodule.c?
> marshal.c
Same: it's better to write a separated patch.
--
I opened a new issue because the GMP and 30-bits patchs are too big
and it's not easy to review/commit them. The idea is to do small
changes to allow bigger changes later.
|
msg75701 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2008-11-10 14:34 |
> Same: it's better to write a separated patch.
Agreed.
Unfortunately, I think all of this is going to have to wait for Python
3.0 to be released before we can consider committing anything. The
first step is to commit the pure bugfixes (missing casts, etc.). I
suppose we *could* do this for 2.6.1/2.7 now, but it seems safer to make
all the changes in the 2.x and 3.x branches simultaneously.
By the way, I think we can also get rid of wdigit---it can be safely
replaced by digit everywhere, as far as I can tell.
|
msg75702 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-11-10 15:47 |
@marketdickinson: As you proposed, here is a patch for
_PyLong_AsScaledDouble(): stores directly the number of bits in an
unsigned int instead of the number of digits.
|
msg75703 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2008-11-10 16:31 |
I like the idea, Victor
|
msg75704 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-11-10 18:55 |
Other optimization for long_compare(), long_lshift() and
long_rshift().
Note: with all my patches, Python is a little bit faster (but not
slower, which is already a good thing ;-)).
|
msg75710 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-11-10 22:35 |
More benchmark with py3k trunk on a Quad Core@2.5 GHz (64 bits):
./python Lib/test/pystone.py 250000 (maximum value on 5 runs):
- original: 58685.4 pystones/second
- patched: 61274.5 pystones/second
Don't trust pystones, results are variables :-/
|
msg75713 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2008-11-10 23:40 |
There are so many things going on here it's getting difficult to keep
track of them all! Not only Victor's stuff, but also Mario's patches
for faster multiplication, and for subquadratic division. And it might
be interesting to experiment with subquadratic int <-> str conversion as
well.
I'm wondering whether it would be worth starting a new
"py3k_long_optimization" branch, so that we can get some of the obvious
stuff in and start experimenting properly with the rest.
Christian, you're the expert on this: any thoughts? Is it possible to
set up a new svn branch so that it's easy to merge from the py3k branch,
or is svn merging only really possible from the trunk? I'm happy to set
things up and take care of doing regular merges, if you can give me some
pointers.
(If not, maybe it's time for me to learn how to use git/hg/bzr. I've
used darcs quite a bit before, so the concepts aren't totally alien to
me, but I haven't found time to learn other systems yet.)
|
msg75714 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2008-11-11 00:41 |
It's probably easier to wait a couple of days until 3.0.0 is out. I
don't know how the new branches will look like.
By the way your work fits nicely in my plans to work on optimizations
for 3.1. Go, Victor, go!
|
msg75715 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-11-11 01:38 |
I'm unable to use pystone nor pybench to compare all integers patches.
So I wrote my own tool: bench_int.py. Don't use to compare different
computers or Python versions, it's just useful to test if a patch is
faster or slower.
Example (still the 64 bits CPU@2.5 GHz):
-------------------------------
original : 896.5 ms
+ macros : 891.0 ms (+0.6%)
++ optimize : 884.3 ms (+1.4%)
+++ shift : 880.8 ms (+1.7%)
GMP : 700.9 ms (+22%)
30 bits : 659.9 ms (+26%)
-------------------------------
Result: my optimizations are useless, whereas mark's patch (#4258) is
26% faster! My GMP patch is only 22% faster (and so slower than the 30
bits patch). The GMP hack would only be useful for huge value whereas
my benchmark tool use mostly small values (the biggest is near
2**200).
I use it with "sync && ./python -OO bench_int.py", run the command 2
or 3 times, and keep the smallest value.
|
msg75719 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-11-11 02:16 |
More numbers!
------ 64 bits CPU -------
original: 885.0 ms
fast long: 746.8 ms (+16%) -- issue #3944
--------------------------
+16% only with an optimized multiplicaton, great job Pernici :-)
------ 32 bits CPU -------
original: 1564.3 ms
30 bits: 1497.3 ms (+4%)
fast long: 1591.7 ms (-2%)
GMP: 1564.4 ms (=)
--------------------------
It looks like the operation 32 bits x 32 bits is slower on a 32 bits
CPU, the gain is small (only +4%). fast long and a little bit slower,
and GMP patch doesn't change anything.
|
msg75752 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2008-11-11 17:37 |
I agree with Christian that we should wait until 3.0 is out. Using A
DVCS should make this much easier if anyone needs an excuse to learn bzr.
I think we should aim to commit the 30bit change (a pretty clear win by
the looks of things) after 3.0 is released and py3k branch is unfrozen
and continue work from there to determine what to do with the others.
|
msg77644 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-12-11 23:29 |
We have now a benchmark tool (attached file "bench_int.py"), many
patches to try, and 3.0 is released. Anyone interrested to work on it?
|
msg83787 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-03-18 22:44 |
Updated version of my macros patch for PyLong type:
- patch for Python trunk
- define 3 macros: PyLong_SIGN(x), PyLong_EQUALS_ZERO(x),
PyLong_NDIGITS(x)
- just replace code by the equivalent macros
The goal is the make the code easier to read. It would also help if we
change the PyLong implementation (eg. the 2^30 base patch).
My previous patch was for py3k, the new one is for Python trunk and is
shorter. I only kept the most simple macros.
|
msg83967 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2009-03-22 10:42 |
A few comments:
I think PyLong_SIGN and PyLong_EQUALS_ZERO should go in
Include/longobject.h, not Include/longintrepr.h: these 2 macros have an
unambiguous meaning for *any* representation of integers. And
longintrepr.h is really supposed to be private, for the use of
longobject.c only. PyLong_NDIGITS should stay in longintrepr.h, though,
since it's dependent on the representation.
Perhaps rename PyLong_EQUALS_ZERO to PyLong_IS_ZERO?
Almost all your uses of PyLong_SIGN take the form PyLong_SIGN(X) < 0.
Maybe it would be better to have a PyLong_IS_NEGATIVE macro instead?
|
msg84234 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-03-26 23:57 |
> I think PyLong_SIGN and PyLong_EQUALS_ZERO should go in
> Include/longobject.h, not Include/longintrepr.h
Yeah, it's better for 3rd party modules.
> PyLong_NDIGITS should stay in longintrepr.h, though,
> since it's dependent on the representation.
I don't understand why. PyLong_SIGN() and PyLong_EQUALS_ZERO() do also
depend on the PyLong implementation. Eg. if we choose to store zero in
a PyLong of 1 digit (digits={0}), PyLong_SIGN() have also to be
changed. I think that PyLong_SIGN() can also be moved to longobject.h
(not done in my patch v3).
> Perhaps rename PyLong_EQUALS_ZERO to PyLong_IS_ZERO?
Done.
> Almost all your uses of PyLong_SIGN take the form
> PyLong_SIGN(X) < 0. Maybe it would be better to have a
> PyLong_IS_NEGATIVE macro instead?
Not "almost all", just "all" :-) So I also changed the macro name.
|
msg84235 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-03-27 00:01 |
I removed my "optimization" patches: there were useless (no or low
speedup). I also removed bench_int.py: I moved it to my own public
SVN:
http://haypo.hachoir.org/trac/browser/misc/bench_int.py
|
msg84237 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-03-27 00:24 |
I created a decicated issue for my last unrelated patch:
#5576: Don't use PyLong_SHIFT with _PyLong_AsScaledDouble()
|
msg84312 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2009-03-28 16:36 |
[Mark]
> PyLong_NDIGITS should stay in longintrepr.h, though,
> since it's dependent on the representation.
[Victor]
>I don't understand why. [...]
I expressed myself badly. I guess my point was that PyLong_SIGN
and PyLong_EQUALS_ZERO (and PyLong_IS_NEGATIVE) have an obvious
meaning for integers themselves, regardless of the particular
implementation chosen. But there could be representations of
integers for which PyLong_NDIGITS doesn't even make sense, or
is ambiguous. Furthermore, if the implementation of long integers
changes then the meanings of PyLong_SIGN and PyLong_EQUALS_ZERO won't
change, but the meaning of PyLong_NDIGITS might well do.
So it seems to me that PyLong_NDIGITS is only really a useful
macro for *this particular* implementation of integers---it's
something that should be internal to Objects/longobject.c and
Include/longintrepr.h, and not exposed to the rest of Python.
|
msg97159 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-01-03 12:34 |
Closing this. I like the PyLong_IS_NEGATIVE and PyLong_IS_ZERO macros, but I can't find anywhere outside longobject.c where they'd actually be useful (with the possible exception of marshal.c, but that currently depends on knowing lots about the long implementation anyway). And I don't really see much benefit from doing a mass replacement of uses of 'Py_SIZE(a) < 0', etc. in Objects/longobject.c; I think it's the code is readable enough as it stands.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:41 | admin | set | github: 48544 |
2010-01-09 18:57:37 | mark.dickinson | set | status: pending -> closed |
2010-01-03 12:34:37 | mark.dickinson | set | status: open -> pending resolution: rejected messages:
+ msg97159
|
2009-03-28 16:36:21 | mark.dickinson | set | messages:
+ msg84312 |
2009-03-27 00:24:47 | vstinner | set | files:
- pylong_asscaleddouble.patch |
2009-03-27 00:24:41 | vstinner | set | messages:
+ msg84237 |
2009-03-27 00:01:55 | vstinner | set | messages:
+ msg84235 |
2009-03-27 00:00:58 | vstinner | set | files:
- bench_int.py |
2009-03-27 00:00:43 | vstinner | set | files:
- pylong_shift.patch |
2009-03-27 00:00:26 | vstinner | set | files:
- pylong_optimize.patch |
2009-03-26 23:58:53 | vstinner | set | files:
- pylong_macros-2.patch |
2009-03-26 23:58:11 | vstinner | set | files:
- pylong_macros.patch |
2009-03-26 23:57:57 | vstinner | set | files:
+ pylong_macros-3.patch
messages:
+ msg84234 |
2009-03-22 10:44:16 | mark.dickinson | set | priority: normal assignee: mark.dickinson type: enhancement components:
+ Interpreter Core |
2009-03-22 10:42:02 | mark.dickinson | set | messages:
+ msg83967 |
2009-03-18 22:44:39 | vstinner | set | files:
+ pylong_macros-2.patch
messages:
+ msg83787 |
2008-12-11 23:29:13 | vstinner | set | messages:
+ msg77644 |
2008-11-11 17:37:15 | gregory.p.smith | set | messages:
+ msg75752 |
2008-11-11 02:16:57 | vstinner | set | nosy:
+ pernici messages:
+ msg75719 |
2008-11-11 01:38:32 | vstinner | set | files:
+ bench_int.py messages:
+ msg75715 |
2008-11-11 00:41:04 | christian.heimes | set | messages:
+ msg75714 |
2008-11-10 23:40:03 | mark.dickinson | set | messages:
+ msg75713 |
2008-11-10 22:36:00 | vstinner | set | messages:
+ msg75710 |
2008-11-10 18:55:26 | vstinner | set | files:
+ pylong_shift.patch messages:
+ msg75704 |
2008-11-10 16:54:42 | loewis | set | versions:
+ Python 3.1, Python 2.7, - Python 2.6, Python 3.0 |
2008-11-10 16:31:40 | christian.heimes | set | nosy:
+ christian.heimes messages:
+ msg75703 |
2008-11-10 15:47:17 | vstinner | set | files:
+ pylong_asscaleddouble.patch messages:
+ msg75702 |
2008-11-10 14:34:04 | mark.dickinson | set | messages:
+ msg75701 |
2008-11-10 14:00:36 | vstinner | set | messages:
+ msg75699 |
2008-11-10 13:50:58 | vstinner | set | files:
+ pylong_optimize.patch messages:
+ msg75698 |
2008-11-10 13:48:47 | mark.dickinson | set | messages:
+ msg75697 |
2008-11-10 13:37:32 | mark.dickinson | set | messages:
+ msg75696 |
2008-11-10 13:13:06 | vstinner | set | nosy:
+ gregory.p.smith, mark.dickinson |
2008-11-10 13:12:32 | vstinner | create | |