Message 117097 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	stutzbach
Recipients	collinwinter, rhettinger, stutzbach, tim.peters
Date	2010-09-21.20:03:26
SpamBayes Score	2.7755576e-16
Marked as misclassified	No
Message-id	<1285099412.4.0.33526475148.issue9915@psf.upfronthosting.co.za>
In-reply-to

Content
(I've made an educated guess about who to add to the Nosy list) The attached patch substantially speeds up sorting using the "key" parameter. It is purely a performance patch; the language and libraries are not changed in any other way from the users point of view. I measured a reduction in execution time of at least 15% in many cases and more than 40% for large n. I performed measurements on an Intel 64-bit Linux system using gcc 4.3.2 and on an Intel 32-bit Windows XP system using Visual C Express Edition 2009. Previously, "key" was implemented by a creating a sortwrapperobject, which is a PyObject storing a key and a value and using only the key for comparison. With the patch, sortwrapperobject is removed entirely. Instead, the sort uses two arrays: one for keys and one for values. Comparisons use the keys array. Whenever a swap is performed, the swap is performed on both arrays. If the "keys" parameter is not provided to the sort, the second swap is skipped (since the keys are also the values). Compared to the sortwrapperobject approach, speed is improved by: - Requiring only 1 memory allocation for an array of PyObject *'s, instead of n memory allocations for n sortwrapperobjects - Removes the need to dereference through sortwrapperobjects, which were scattered about in memory (i.e., had poor cache locality) - Substantially smaller memory overhead, further improving cache performance When the "key" parameter is not used, the code still needs to check to see if it should be performing a second swap. However, the additional instructions are cache and branch-prediction friendly and do not have a noticeable impact on performance. I conducted enough experiments to establish a 95% confidence interval that excluded a slowdown of more than 1% (but did not exclude a slowdown of 0% - which is good). A handful of results: # No key, same speed otto:~$ py3k-git-base/python -m timeit -s 'x = list(range(5))' -s 'f = x.sort' 'f()' 1000000 loops, best of 3: 0.276 usec per loop otto:~$ py3k-git/python -m timeit -s 'x = list(range(5))' -s 'f = x.sort' 'f()' 1000000 loops, best of 3: 0.276 usec per loop # With a key, patched version is faster otto:~$ py3k-git-base/python -m timeit -s 'x = list(range(5))' -s 'f = x.sort' 'f(key=int)' 1000000 loops, best of 3: 1.76 usec per loop otto:~$ py3k-git/python -m timeit -s 'x = list(range(5))' -s 'f = x.sort' 'f(key=int)' 1000000 loops, best of 3: 1.5 usec per loop # Results are more dramatic with large n otto:~$ py3k-git-base/python -m timeit -s 'x = list(range(100000))' -s 'f = x.sort' 'f(key=int)' 10 loops, best of 3: 35.2 msec per loop otto:~$ py3k-git/python -m timeit -s 'x = list(range(100000))' -s 'f = x.sort' 'f(key=int)' 10 loops, best of 3: 22.4 msec per loop I have been using a script for running a large battery of experiments with different values of n and different conditions (random data, sorted data, reverse-sorted data, key, no key, etc.). The script works, but it's clunky to use. I'm working on cleaning that up and hope to attach it to this issue within the next few days.

(I've made an educated guess about who to add to the Nosy list)

The attached patch substantially speeds up sorting using the "key" parameter.  It is purely a performance patch; the language and libraries are not changed in any other way from the users point of view.  I measured a reduction in execution time of at least 15% in many cases and more than 40% for large n.

I performed measurements on an Intel 64-bit Linux system using gcc 4.3.2 and on an Intel 32-bit Windows XP system using Visual C Express Edition 2009.

Previously, "key" was implemented by a creating a sortwrapperobject, which is a PyObject storing a key and a value and using only the key for comparison.  

With the patch, sortwrapperobject is removed entirely.  Instead, the sort uses two arrays: one for keys and one for values.  Comparisons use the keys array.  Whenever a swap is performed, the swap is performed on both arrays.  If the "keys" parameter is not provided to the sort, the second swap is skipped (since the keys are also the values).

Compared to the sortwrapperobject approach, speed is improved by:
- Requiring only 1 memory allocation for an array of PyObject *'s, instead of n memory allocations for n sortwrapperobjects
- Removes the need to dereference through sortwrapperobjects, which were scattered about in memory (i.e., had poor cache locality)
- Substantially smaller memory overhead, further improving cache performance
 
When the "key" parameter is not used, the code still needs to check to see if it should be performing a second swap.  However, the additional instructions are cache and branch-prediction friendly and do not have a noticeable impact on performance.  I conducted enough experiments to establish a 95% confidence interval that excluded a slowdown of more than 1% (but did not exclude a slowdown of 0% - which is good).

A handful of results:

# No key, same speed
otto:~$ py3k-git-base/python -m timeit -s 'x = list(range(5))' -s 'f = x.sort' 'f()'
1000000 loops, best of 3: 0.276 usec per loop
otto:~$ py3k-git/python -m timeit -s 'x = list(range(5))' -s 'f = x.sort' 'f()' 
1000000 loops, best of 3: 0.276 usec per loop

# With a key, patched version is faster
otto:~$ py3k-git-base/python -m timeit -s 'x = list(range(5))' -s 'f = x.sort' 'f(key=int)'
1000000 loops, best of 3: 1.76 usec per loop
otto:~$ py3k-git/python -m timeit -s 'x = list(range(5))' -s 'f = x.sort' 'f(key=int)'
1000000 loops, best of 3: 1.5 usec per loop

# Results are more dramatic with large n
otto:~$ py3k-git-base/python -m timeit -s 'x = list(range(100000))' -s 'f = x.sort' 'f(key=int)'
10 loops, best of 3: 35.2 msec per loop
otto:~$ py3k-git/python -m timeit -s 'x = list(range(100000))' -s 'f = x.sort' 'f(key=int)'
10 loops, best of 3: 22.4 msec per loop

I have been using a script for running a large battery of experiments with different values of n and different conditions (random data, sorted data, reverse-sorted data, key, no key, etc.).  The script works, but it's clunky to use.  I'm working on cleaning that up and hope to attach it to this issue within the next few days.

History
Date	User	Action	Args
2010-09-21 20:03:32	stutzbach	set	recipients: + stutzbach, tim.peters, collinwinter, rhettinger
2010-09-21 20:03:32	stutzbach	set	messageid: <1285099412.4.0.33526475148.issue9915@psf.upfronthosting.co.za>
2010-09-21 20:03:31	stutzbach	link	issue9915 messages
2010-09-21 20:03:30	stutzbach	create