Issue5512
Created on 2009-03-18 22:11 by mark.dickinson, last changed 2009-03-24 09:11 by haypo.
|
msg83781 - (view) |
Author: Mark Dickinson (mark.dickinson) |
Date: 2009-03-18 22:11 |
|
Here's a patch that streamlines the x_divrem function in
Objects/longobject.c. More benchmarks are needed, but the initial
results are promising: for py3k, on a 32-bit machine with 15-bit
digits, Victor's pidigits benchmark runs almost twice as fast with this
patch (numbers below).
Patch details
=============
- Normalize inputs by shifting instead of multiplying and dividing.
This halves the number of divisions used in the algorithm.
- Streamline innermost loop.
- Save an iteration of outer loop around half the time.
- Save an object allocation: only 3 allocations per x_divrem call
instead of 4.
- Remove special case where initial quotient estimate is >= PyLong_BASE.
There's no need for this, since the 'digit' type holds values up
to 2*PyLong_BASE - 1.
- Make q type digit instead of type twodigits: this halves the size
of the multiplication in the innermost loop.
Benchmark results
=================
Using the pidigits_bestof.py script that's posted in the issue 4258
discussion, on a non-debug build of py3k (r70465), on OS X 10.5.6/Core 2
Duo:
Unpatched
---------
Macintosh-3:py3k dickinsm$ ./python.exe ../pidigits_bestof.py 2000
performing a warm up run...
running
sys.int_info= sys.int_info(bits_per_digit=15, sizeof_digit=2)
Time; 2234.6 ms
Time; 2232.2 ms
Time; 2227.9 ms
Time; 2225.7 ms
Time; 2229.8 ms
Best Time; 2225.7 ms
Patched
-------
dickinsm$ ./python.exe ../pidigits_bestof.py 2000
performing a warm up run...
running
sys.int_info= sys.int_info(bits_per_digit=15, sizeof_digit=2)
Time; 1175.6 ms
Time; 1176.5 ms
Time; 1177.3 ms
Time; 1179.5 ms
Time; 1168.5 ms
Best Time; 1168.5 ms
So the patch gives a speedup of around 90%. This particular benchmark
is heavy on the divisions though, so I'd expect lesser speedups for
other benchmarks.
|
|
msg83782 - (view) |
Author: STINNER Victor (haypo) |
Date: 2009-03-18 22:17 |
|
Would it be possible to include integer benchmark tools to upstream? I
mean the "pidigit" tool (I didn't wrote this test! I just ported it to
Python3) and maybe my "bench_int" tool. It could useful to reproduce
the benchmark on different Python versions and different computers. We
may open a new issue for each tool?
|
|
msg83785 - (view) |
Author: Mark Dickinson (mark.dickinson) |
Date: 2009-03-18 22:31 |
|
Sounds good to me! An integer benchmark script in the Tools/ directory
would be handy.
|
|
msg83786 - (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2009-03-18 22:43 |
|
Well, the original code in pidigits is by me :-) (although it's not
stated in the source file).
I haven't tried the patch but the performance work you're doing is
really impressive, Mark!
|
|
msg83788 - (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2009-03-18 22:47 |
|
Results here (Athlon X2 3600+ with gcc in 64-bit mode with 30-bit
digits), "pidigits_bestof.py 2000":
- unpatched: 1644.9 ms
- patched: 694.8 ms
!
|
|
msg83789 - (view) |
Author: STINNER Victor (haypo) |
Date: 2009-03-18 22:48 |
|
Useful informations for an integer benchmark:
- CPU type: 32 or 64 bits, endian
- Memory size
- Python version
- Informations about how Python was compiled
- Integer implementation: use int_info when it's available, otherwise
base=2^15
Who has the last version of pidigit tool? Can you attach it here? :-)
|
|
msg83790 - (view) |
Author: Mark Dickinson (mark.dickinson) |
Date: 2009-03-18 22:52 |
|
[Antoine]
> Well, the original code in pidigits is by me :-)
Apologies for the misattribution!
|
|
msg83791 - (view) |
Author: Mark Dickinson (mark.dickinson) |
Date: 2009-03-18 22:54 |
|
Here's the version of pidigits that I was using.
|
|
msg83793 - (view) |
Author: STINNER Victor (haypo) |
Date: 2009-03-18 23:22 |
|
New version of pidigit:
- works on python 2.6, trunk and py3k
- display Python version, CPU info (bits, endian), PyLong info (base,
digit size)
- display usage if no argument is given
|
|
msg83794 - (view) |
Author: STINNER Victor (haypo) |
Date: 2009-03-18 23:22 |
|
Oh, there is no version number of the benchmark tool itself!
|
|
msg83795 - (view) |
Author: STINNER Victor (haypo) |
Date: 2009-03-18 23:27 |
|
My numbers:
# No patch
$ ./python /home/haypo/pidigits-2.py 3000
Python 3.1a1+ (py3k:70466, Mar 18 2009, 23:56:06)
[GCC 4.3.2]
CPU: 64 bits, little endian
PyLong: base=2^30, sizeof(digit)=32 bits
(...)
Best Time; 2300.2 ms
# With faster_integer_division.patch
(...)
Best Time; 1138.1 ms
Ok, it's two times faster on 64 bits CPU!!!
Other notes about pidigits:
- missing header (no author name, no description)
- pidigit should also write the number of digits in its output to
avoid mistakes in comparaisons
|
|
msg83953 - (view) |
Author: Mark Dickinson (mark.dickinson) |
Date: 2009-03-21 22:19 |
|
Updated patch. Lots of cleanup, but only one significant change: the
inner loop now uses signed arithmetic instead of unsigned arithmetic. This saves a negation and fixes a subtle bug: the previous inner loop code
was incorrect when using 15-bit digits on machines with sizeof(short) ==
sizeof(long). Not that I know of any such machines: the Cray T3E
famously has no 16-bit integer type, but there sizeof(short) is 4 and
sizeof(long) is 8.
A few more timings, this time from doing a single huge integer division:
10**1000000//10**500000 (so this effectively times just the inner loop,
since all else will be insignificant). All timings are best-of-5, from
non-debug builds of py3k, on the same system: OS X 10.5.6/2.4 GHz Core 2
Duo. Times in brackets are the approximate per-inner-loop times
(remembering that there are 4 times as many iterations of the inner loop
for 15-bit digits).
32-bit build, 15-bit digits, unpatched: 92382.2 ms (~7.5 ns)
32-bit build, 15-bit digits, patched: 36473.3 ms (~3.0 ns)
64-bit build, 30-bit digits, unpatched: 14581.4 ms (~4.8 ns)
64-bit build, 30-bit digits, patched: 7385.1 ms (~2.4 ns)
... and just for fun, the other combinations:
64-bit build, 15-bit digits, unpatched: 61927.5 ms (~5.1 ns)
64-bit build, 15-bit digits, patched: 43632.9 ms (~3.6 ns)
32-bit build, 30-bit digits, unpatched: 62374.1 ms (~20.3 ns)
32-bit build, 30-bit digits, patched: 26928.3 ms (~8.8 ns)
Thanks for the updated pidigits script, Victor! Maybe this is too small
right now to be worth including in the Tools directory, but I hope we can
fatten it up with some other benchmarks. What do you think?
|
|
msg84027 - (view) |
Author: Mark Dickinson (mark.dickinson) |
Date: 2009-03-23 20:04 |
|
Applied, r70542 and r70547.
|
|
msg84066 - (view) |
Author: STINNER Victor (haypo) |
Date: 2009-03-24 09:00 |
|
Would it be possible to port the patch to python trunk?
|
|
msg84067 - (view) |
Author: Mark Dickinson (mark.dickinson) |
Date: 2009-03-24 09:10 |
|
Hi Victor! I already applied the x_divrem patch to the trunk and to py3k.
Did you mean the release branch?
I'd prefer not to backport this to 2.6 or 3.0: it's not a new feature,
but it's not a bugfix either and there's always some risk of breakage...
|
|
msg84068 - (view) |
Author: STINNER Victor (haypo) |
Date: 2009-03-24 09:11 |
|
> I already applied the x_divrem patch to the trunk and to py3k
Oops! I taught that you applied it to py3k and release30-maint :-/
It's ok, trunk and py3k is enough ;-)
|
|
| Date |
User |
Action |
Args |
| 2009-03-24 09:11:59 | haypo | set | messages:
+ msg84068 |
| 2009-03-24 09:10:01 | mark.dickinson | set | messages:
+ msg84067 |
| 2009-03-24 09:00:07 | haypo | set | messages:
+ msg84066 |
| 2009-03-23 20:04:49 | mark.dickinson | set | status: open -> closed resolution: accepted messages:
+ msg84027
|
| 2009-03-21 22:19:31 | mark.dickinson | set | files:
+ faster_integer_division2.patch
messages:
+ msg83953 |
| 2009-03-18 23:27:21 | haypo | set | messages:
+ msg83795 |
| 2009-03-18 23:22:35 | haypo | set | messages:
+ msg83794 |
| 2009-03-18 23:22:07 | haypo | set | files:
+ pidigits-2.py
messages:
+ msg83793 |
| 2009-03-18 22:54:54 | mark.dickinson | set | files:
+ pidigits_bestof.py
messages:
+ msg83791 |
| 2009-03-18 22:52:05 | mark.dickinson | set | messages:
+ msg83790 |
| 2009-03-18 22:48:53 | haypo | set | messages:
+ msg83789 |
| 2009-03-18 22:47:45 | pitrou | set | messages:
+ msg83788 |
| 2009-03-18 22:43:56 | pitrou | set | nosy:
+ pitrou messages:
+ msg83786
|
| 2009-03-18 22:31:29 | mark.dickinson | set | messages:
+ msg83785 |
| 2009-03-18 22:17:09 | haypo | set | nosy:
+ haypo messages:
+ msg83782
|
| 2009-03-18 22:11:26 | mark.dickinson | create | |
|