This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Bytes and bytesarrays can be sorted with a much faster count sort.
Type: enhancement Stage: resolved
Components: C API Versions: Python 3.11
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Mark.Shannon, eric.smith, gregory.p.smith, lemburg, mark.dickinson, rhettinger, rhpvorderman, serhiy.storchaka, tim.peters, vstinner, yselivanov
Priority: normal Keywords:

Created on 2021-11-26 09:36 by rhpvorderman, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (15)
msg407032 - (view) Author: Ruben Vorderman (rhpvorderman) * Date: 2021-11-26 09:36
Python now uses the excellent timsort for most (all?) of its sorting. But this is not the fastest sort available for one particular use case.

If the number of possible values in the array is limited, it is possible to perform a counting sort: https://en.wikipedia.org/wiki/Counting_sort. In a counting sort each value maps to an integer corresponding to its relative value. The values are then counted by using key = map_to_key(value); count_array[key]. Then from this count_array a new array of sorted values can be constructed. (See the wikipedia article for a more detailed explanation).

For the python bytes and bytesarray types this is extremely simple to implement. All 256 possible values are can be directly used as keys. 

Rough implementation:
- Use buffer protocol to get pointer to bytes/bytesarray internal c-string
- Declare a count_array: Py_ssize_t[256] count_array . (use memset to set it to 0)
- Iterate over the c-string and add each value to the countarray. count_array[buffer[i]] += 1
- Allocate a new bytes(array) object, or in the case of bytesarray the sorting can be performed inplace when bytesarray.sort() is used. 
- Iterate over the count_array. Get the number of values for each key and use memset to insert the sequences of keys into position.


The most obvious way to implement this algorithm will be as bytesarray.sort() where it is sorted inplace and as bytes.sort() which returns a new sorted bytes object. This is much much faster than using bytes(sorted(bytes)).

I made a quick cython implementation for speed testing here: https://github.com/rhpvorderman/bytes_sort/blob/main/bytes_sort.pyx

Currently to get a sorted bytestring one has to do bytes(sorted(my_bytes)). 

Test results:

# First make sure there is no regression when sorting an empty string
$ python -m timeit -c "from bytes_sort import bytes_sort" "bytes(sorted(b''))"
500000 loops, best of 5: 560 nsec per loop
$ python -m timeit -c "from bytes_sort import bytes_sort" "bytes_sort(b'')"
500000 loops, best of 5: 565 nsec per loop

# Test result for very small strings
$ python -m timeit -c "from bytes_sort import bytes_sort" "bytes(sorted(b'abc'))"
500000 loops, best of 5: 628 nsec per loop
$ python -m timeit -c "from bytes_sort import bytes_sort" "bytes_sort(b'abc')"
500000 loops, best of 5: 578 nsec per loop

# Even on a very small already sorted string, a counting sort is faster.

# Test with a proper string
$ python -m timeit -c "from bytes_sort import bytes_sort" "bytes(sorted(b'Let\'s test a proper string now. One that has some value to be sorted.'))"
100000 loops, best of 5: 2.32 usec per loop
$ python -m timeit -c "from bytes_sort import bytes_sort" "bytes_sort(b'Let\'s test a proper string now. One that has some value to be sorted.')"
500000 loops, best of 5: 674 nsec per loop

More than three times faster!
msg407033 - (view) Author: Ruben Vorderman (rhpvorderman) * Date: 2021-11-26 09:40
Also I didn't know if this should be in Component C-API or Interpreter Core. But I guess this will be implemented as C-API calls PyBytes_Sort and PyByteArray_SortInplace so I figured C-API is the correct component here.
msg407034 - (view) Author: Ruben Vorderman (rhpvorderman) * Date: 2021-11-26 09:56
I changed the cython script a bit to use a more naive implementation without memset.
Now it is always significantly faster than bytes(sorted(my_bytes)).

$ python -m timeit -c "from bytes_sort import bytes_sort" "bytes_sort(b'')"
500000 loops, best of 5: 495 nsec per loop
$ python -m timeit -c "from bytes_sort import bytes_sort" "bytes_sort(b'abc')"
500000 loops, best of 5: 519 nsec per loop
$ python -m timeit -c "from bytes_sort import bytes_sort" "bytes_sort(b'Let\'s test a proper string now. One that has some value to be sorted.')"
500000 loops, best of 5: 594 nsec per loop
msg407040 - (view) Author: Ruben Vorderman (rhpvorderman) * Date: 2021-11-26 10:51
Sorry for the spam. I see I made a typo in the timeit script. Next time I will be more dilligent when making these kinds of reports and triple checking it before hand, and sending it once. I used -c instead of -s and now all the setup time is also included. This confounds the results.

The proper test commands should be:

python -m timeit -s "from bytes_sort import bytes_sort, bytearray_sort_inplace" "bytes_sort(b'My string here')"

python -m timeit "bytes(sorted(b'My string here'))"

Using just sorted, to purely compare the sorting algorithms without the overhead of creating a new bytes object.
python -m timeit "sorted(b'My string here')"

Correct comparison results
# String = b''
using bytes(sorted: 188 nsec
using sorted:       108 nsec
using byte_sort:    125 nsec  # Some overhead here, setting up the countarray
# String = b'abc'
using bytes(sorted: 252 nsec
using sorted:       145 nsec
using byte_sort:    136 nsec  # Overhead compared to sorted already negated when sorting 3 items(!)
# String = b'Let\'s test a proper string now. One that has some value to be sorted.'
using bytes(sorted: 1830 nsec (reported as 1.83 usec)
using sorted:       1550 nsec (reported as 1.55 usec)
using byte_sort:     220 nsec
msg407041 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-11-26 11:03
Can you give example use-cases for sorting a bytes or bytearray object? I see value in the intermediate object - the frequency table, but the reconstructed sorted bytes object just seems like an inefficient representation of the frequency table, and I'm not sure how it would be useful.

As the wikipedia page for counting sort says, the real value is in sorting items by keys that are small integers, and the special case where the item is identical to the key isn't all that useful:

> In some descriptions of counting sort, the input to be sorted is assumed to be more simply a sequence of integers itself, but this simplification does not accommodate many applications of counting sort.
msg407042 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-11-26 11:54
(Changing the issue type: as I understand it, this is a proposal for a new feature, namely new methods bytes.sort and bytearray.sort, rather than a performance improvement to an existing feature.)
msg407043 - (view) Author: Ruben Vorderman (rhpvorderman) * Date: 2021-11-26 12:05
I used it for the median calculation of FASTQ quality scores (https://en.wikipedia.org/wiki/FASTQ_format). But in the end I used the frequency table to calculate the median more quickly. So as you say, the frequency table turned out to be more useful.

Having said that the usefulness depends on how many times 8-bit data is passed into sorted. (bytes,bytearrays, most python strings are 8-bit I believe). I raised this issue not because I want a .sort() method on bytes or 
bytearrays, but mostly because I think python's sorted function can be improved with regards to 8-bit data. I think it is an interesting thing to consider, depending on how often this occurs.

For example:
sorted(b'Let\'s test a proper string now. One that has some value to be sorted.')
and 
list(bytes_sort(b'Let\'s test a proper string now. One that has some value to be sorted.'))

This returns the same result (a list of integers). But the byte_sort implementation is 2.5 times faster. So sorted is not optimally implemented here.

Since sorted is now basically throwing everything into list.sort an alternative codepath using bytes.sort can be considered. (If there are enough use cases for it).
msg407059 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-11-26 15:50
> If there are enough use cases for it.

Well, that's the question. :-) *I* can't think of any good use cases, but maybe others can. But if we can't come up with some use cases, then this feels like a solution looking for a problem, and that makes it hard to justify both the short-term effort and the longer-term maintenance costs of adding the complexity.

FWIW, given a need to efficiently compute frequency tables for reasonably long byte data, I'd probably reach first for NumPy (and numpy.bincount in particular):

Python 3.10.0 (default, Nov 12 2021, 12:32:57) [Clang 12.0.5 (clang-1205.0.22.11)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.28.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import collections, numpy as np

In [2]: t = b'MDIAIHHPWIRRPFFPFHSPSRLFDQFFGEHLLESDLFSTATSLSPFYLRPPSFLRAPSWIDTGLSEMRLEKDRFSVNLDVKHFSPEELKVKVLGDVIEVHGKHEERQDEHGFISREFHRKYRI
   ...: PADVDPLAITSSLSSDGVLTVNGPRKQVSGPERTIPITREEKPAVAAAPKK';  t *= 100

In [3]: %timeit np.bincount(np.frombuffer(t, np.uint8))
32.7 µs ± 3.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [4]: %timeit collections.Counter(t)
702 µs ± 25.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: %timeit sorted(t)
896 µs ± 64.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
msg407073 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-11-26 16:53
Given that the normal sort() machinery wouldn't use this code, I don't think there's any advantage to adding .sort() methods to bytes and bytesarray. The downside to adding these methods is the increased complexity in the stdlib.

I think the better approach is to put bytes_sort on PyPI.
msg407075 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-11-26 17:20
There are other byte-like objects like memoryview or array.array (for some array types). I would suggest writing a function rather than a method.
msg407150 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-11-27 15:04
I’m -1 on this. Given that use cases are rare, there is no need to burden the code base with an optimization of something we can already do in other ways.

Also, I don’t like that the APIs for list.sort(), bytes.sort(), and bytearray.sort() wouldn’t match.  IMO that would do more harm than good.
msg407170 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-11-27 20:55
I concur with Raymond. It is difficult to find any use case for sorting bytes objects (I cannot find any).

As for using radix sort in list.sort() in special case of small integer keys, it is difficult to implement, because we should preserve the initial order of items with the same key. I am not sure that it is possible to implement stable radix sort with linear complexity. In any case the overhead will be more significant.
msg407181 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-11-28 02:10
General consensus: There isn't a common need for this.
msg407244 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2021-11-29 03:36
I agree with closing this - I don't know of real use cases either.

Serhiy, essentially all LSD radix sorts are stable, and rely on that for their own correctness. MSD radix sorts vary.
msg407275 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-11-29 14:05
On 26.11.2021 10:56, Ruben Vorderman wrote:
> 
> $ python -m timeit -c "from bytes_sort import bytes_sort" "bytes_sort(b'')"
> 500000 loops, best of 5: 495 nsec per loop

Shouldn't this read:

$ python -m timeit -s "from bytes_sort import bytes_sort" "bytes_sort(b'')"

(-s instead of -c) ?
History
Date User Action Args
2022-04-11 14:59:52adminsetgithub: 90060
2021-11-29 14:05:45lemburgsetnosy: + lemburg
messages: + msg407275
2021-11-29 03:36:18tim.peterssetmessages: + msg407244
2021-11-28 02:10:28gregory.p.smithsetstatus: open -> closed

nosy: + gregory.p.smith
messages: + msg407181

resolution: rejected
stage: resolved
2021-11-27 20:55:49serhiy.storchakasetmessages: + msg407170
2021-11-27 15:04:18rhettingersetmessages: + msg407150
2021-11-26 17:29:34brett.cannonsetnosy: - brett.cannon
2021-11-26 17:20:35vstinnersetmessages: + msg407075
2021-11-26 16:53:52eric.smithsetnosy: + eric.smith
messages: + msg407073
2021-11-26 15:50:23mark.dickinsonsetmessages: + msg407059
2021-11-26 12:05:10rhpvordermansetmessages: + msg407043
2021-11-26 11:54:44mark.dickinsonsettype: performance -> enhancement
messages: + msg407042
2021-11-26 11:03:10mark.dickinsonsetnosy: + mark.dickinson
messages: + msg407041
2021-11-26 10:51:31rhpvordermansetmessages: + msg407040
2021-11-26 09:56:58rhpvordermansetmessages: + msg407034
2021-11-26 09:51:51AlexWaygoodsetnosy: + tim.peters, brett.cannon, rhettinger, vstinner, Mark.Shannon, serhiy.storchaka, yselivanov
2021-11-26 09:40:38rhpvordermansetmessages: + msg407033
2021-11-26 09:36:30rhpvordermancreate