This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Concatenating bytes is much slower than concatenating strings
Type: performance Stage:
Components: Benchmarks Versions: Python 3.3
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Sworddragon, pitrou, r.david.murray
Priority: normal Keywords:

Created on 2013-11-26 16:57 by Sworddragon, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test.py Sworddragon, 2013-11-26 16:57 Testcase
test.py Sworddragon, 2013-12-03 02:46 Benchmark 2
Messages (6)
msg204503 - (view) Author: (Sworddragon) Date: 2013-11-26 16:57
In the attachments is a testcase which does concatenate 100000 times a string and than 100000 times a bytes object. Here is my result:

sworddragon@ubuntu:~/tmp$ ./test.py
String: 0.03165316581726074
Bytes : 0.5805566310882568
msg204507 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-11-26 17:21
It is definitely not a good idea to rely on that optimization of += for string.  Obviously bytes doesn't have the same optimization.  (String didn't either for a while in Python3, and there was some controversy around adding it back exactly because one should not rely on it.)
msg204508 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-11-26 17:23
Indeed. If you want to concatenate a lot of bytes objects efficiently, there are three solutions:
- concatenate to a bytearray
- write to a io.BytesIO object
- use b''.join to concatenate all objects at once
msg205070 - (view) Author: (Sworddragon) Date: 2013-12-03 02:46
I have extended the benchmark a little and here are my new results:

concatenate_string()           : 0.037489
concatenate_bytes()            : 2.920202
concatenate_bytearray()        : 0.157311
concatenate_string_io()        : 0.035397
concatenate_bytes_io()         : 0.032835
concatenate_string_join()      : 0.170623
concatenate_string_and_encode(): 0.037280

- As we already know concatenating bytes is much slower then concatenating strings.
- concatenate_bytearray() shows that doing this with bytearrays is 5 times slower than concatenating strings. Also it will return a bytearray and I couldn't figure out how to convert it simply to a bytes object in this short time.
- Interestingly concatenate_string_io() shows that using a StringIO object is faster than concatenating strings directly.
- Even more interesting is that concatenate_bytes_io() shows that a BytesIO object is the fastest solution of all.
- Using .join in concatenate_string_join() shows that it is slow too.
- Curiously I couldn't test concatenate_bytes_join() as it will result in an exception. Searching the documentation resulted that I can't find a join method for bytes objects to look what is wrong.
- I have also tested in concatenate_string_and_encode() how fast it is to concatenate strings and then simply encode them. The performance impact compared to concatenating strings directly is low enough that the test couldn't measure it anymore.


Summary: BytesIO is the fastest solution but needs to import an extra library. Concatenating strings and then encode them seems to be the most practicable solution if io is not already imported.

But I'm wondering why Python can't simply have the string optimization on bytes too.
msg205072 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-12-03 04:23
Please take these observations and questions to python-list.  They aren't really appropriate for the bug tracker.  We aren't going to add the optimization shortcut for bytes unless someone does a bunch of convincing on python-ideas, which seems unlikely (but not impossible).
msg205073 - (view) Author: (Sworddragon) Date: 2013-12-03 04:32
> We aren't going to add the optimization shortcut for bytes

There is still the question: Why isn't this going to be optimized?
History
Date User Action Args
2022-04-11 14:57:54adminsetgithub: 64000
2013-12-03 04:32:11Sworddragonsetmessages: + msg205073
2013-12-03 04:23:09r.david.murraysetmessages: + msg205072
2013-12-03 02:46:32Sworddragonsetfiles: + test.py

messages: + msg205070
2013-11-26 17:23:52pitrousetnosy: + pitrou
messages: + msg204508
2013-11-26 17:23:14benjamin.petersonsetstatus: open -> closed
resolution: wont fix
2013-11-26 17:21:50r.david.murraysettype: behavior -> performance
2013-11-26 17:21:34r.david.murraysetnosy: + r.david.murray
messages: + msg204507
2013-11-26 16:57:10Sworddragoncreate