This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: concatenating string in dict unexpected performance
Type: performance Stage:
Components: Benchmarks Versions: Python 3.2
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: collinwinter Nosy List: alex, collinwinter, hevi, pitrou, serhiy.storchaka
Priority: normal Keywords:

Created on 2012-07-30 14:20 by hevi, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
string_cat.py hevi, 2012-07-30 14:20
Messages (4)
msg166901 - (view) Author: Petri Heinilä (hevi) Date: 2012-07-30 14:20
In concatenating string in dict somehow radically depends the added string size.

code 

----
import timeit

s1 = """
text = ""
for i in range(1,50000):
  text += "12345678901234567890"
"""
print("str cat 20: {0}s".format(timeit.timeit(s1,number=1)))

s2 = """
d = dict()
d["text"] = ""
for i in range(1,50000):
  d["text"] += "12345"
"""
print("dict cat 5: {0}s".format(timeit.timeit(s2,number=1)))

s3 = """
d = dict()
d["text"] = ""
for i in range(1,50000):
  d["text"] += "1234567890"
"""
print("dict cat 10: {0}s".format(timeit.timeit(s3,number=1)))


s4 = """
d = dict()
d["text"] = ""
for i in range(1,50000):
  d["text"] += "12345678901234567890"
"""
print("dict cat 20: {0}s".format(timeit.timeit(s4,number=1)))

----

gives:

str cat 20: 0.011670112609863281s
dict cat 5: 2.994051933288574s
dict cat 10: 6.61974310874939s
dict cat 20: 34.230541944503784s
msg166903 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2012-07-30 14:51
Actually, I would argue that it's concatentation of a local variable which has unexpected performance. Logically it should be O(n**2), however due to hacks in CPython it isn't.
msg166904 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-07-30 14:57
Yes, the total time of repeated string concatenation is O(N**2). s3 is twice larger s2, therefore s3 time about twice large s2 time.

In the first case Python use a special optimization which allows O(N) in some cases. You can deactivate it:

s5 = """
text = ""
for i in range(1,50000):
  text2 = text
  text += "12345678901234567890"
"""
print("str cat 20: {0}s".format(timeit.timeit(s5,number=1)))

It's not a bug, it's feature.
msg166905 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-30 15:14
It's a FAQ entry:
http://docs.python.org/dev/faq/programming.html#what-is-the-most-efficient-way-to-concatenate-many-strings-together
History
Date User Action Args
2022-04-11 14:57:33adminsetgithub: 59708
2012-07-30 15:14:19pitrousetstatus: open -> closed

nosy: + pitrou
messages: + msg166905

resolution: wont fix
2012-07-30 14:57:55serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg166904
2012-07-30 14:51:16alexsetnosy: + alex
messages: + msg166903
2012-07-30 14:20:00hevicreate