classification
Title: Optimize bytes.join(sequence)
Type: performance Stage:
Components: Versions: Python 3.6
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: haypo, pitrou, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2016-09-07 18:15 by haypo, last changed 2016-11-22 09:26 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
bytes_join.patch haypo, 2016-09-07 18:15 review
Messages (3)
msg274855 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-09-07 18:15
The article https://atleastfornow.net/blog/not-all-bytes/ says that bytes.join(sequence) is slower on Python 3 compared to Python 2.

I compared Python 2 and Python 3 code: the main different seems to be that Python 3 uses the Py_buffer API to support more types than only bytes, and that it allocates a buffer (on the C stack or on the heap memory) of Py_buffer objects.

Attached patch makes bytes.join(sequence) up to 29% faster. The patch avoids the Py_buffer API and the allocation of the temporary array of Py_buffer if all items are bytes or bytearray.


I'm not 100% sure that it's worth it to optimize bytes.join().

On Python 2, bytes += bytes uses an hack in Python/ceval.c to optimize this instruction. On Python 3, the optimization is only applied to str += str (unicode), bytes += bytes is inefficient. To get best performances on Python 2 and Python 3, bytes.join(sequence) is better than bytes+=bytes.


Microbenchmark commands:

$ ./python -m perf timeit -s "sep=b''; seq=(b'hello', b'world')" 'sep.join(seq)'
$ ./python -m perf timeit -s "sep=b''; seq=(b'hello', b'world', b'. ') * 100" 'sep.join(seq)'

Python 3.6 => patched Python3.6:

* 2 items: 92.1 ns +- 1.8 ns => 90.3 ns +- 3.1 ns (-2%)
* 300 items: 3.11 us +- 0.07 us => 2.22 us +- 0.11 us (-29%)


--


I'm not sure that Python 3 is really slower than Python 2 :-/ Python 3.5 is 10 ns slower tha Python 2.7 for sequence of 2 items, but it's 6% faster for 300 items.

So the question is if it's worth it to optimize bytes.join().

Python 2:

* 2 items: 87.7 ns +- 3.7 ns
* 300 items: 3.25 us +- 0.11 us

Python 3.5:

* 2 items: 97.4 ns +- 9.0 ns
* 300 items: 3.06 us +- 0.20 us
msg275608 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-10 09:39
Tests in the article don't look reliable. For example bytes_percent() and bytes_plus() test nothing, because b"%s %s" % (b"hi", b"there") and b"hi" + b" " + b"there" are evaluated at compile time.

Yes, bytes.join(sequence) is a little slower on Python 3 for short sequences. But for long sequences Python 3 is faster.

The code for bytes.join() is already too complex, and the proposed optimization makes it more complicated. And the optimization decreases performance on my netbook:

$ ./python -m timeit -s "sep=b' '; seq=(b'hello', b'world')" -- 'sep.join(seq); sep.join(seq); sep.join(seq); sep.join(seq); sep.join(seq); sep.join(seq); sep.join(seq); sep.join(seq); sep.join(seq); sep.join(seq)'

Python 2.7: 100000 loops, best of 3: 7.24 usec per loop
Python 3.6 unpatched: 100000 loops, best of 3: 8.62 usec per loop
Python 3.6 patched: 100000 loops, best of 3: 9.11 usec per loop
msg281452 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-11-22 09:26
Ok, I agree that it's not worth it to optimize bytes.join(list of byte strings). Code is already fast enough.
History
Date User Action Args
2016-11-22 09:26:08hayposetstatus: open -> closed
resolution: rejected
messages: + msg281452
2016-09-10 09:39:50serhiy.storchakasetmessages: + msg275608
2016-09-07 19:03:34serhiy.storchakasetnosy: + pitrou
2016-09-07 18:15:06haypocreate