This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: textwrap is very slow on long words without spaces
Type: performance Stage:
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: horrible performance of textwrap.wrap() with a long word
View: 22687
Assigned To: Nosy List: bmwiedemann, brett.cannon, ezio.melotti, martin.panter, mrabarnett, pitrou, r.david.murray
Priority: normal Keywords:

Created on 2015-12-15 15:54 by bmwiedemann, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (5)
msg256461 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2015-12-15 15:54
Many python scripts use textwrap to break base64-encoded strings from openssl into lines - e.g. https://bugs.launchpad.net/python-keystoneclient/+bug/1404402
and https://github.com/diafygi/acme-tiny/blob/master/acme_tiny.py#L166

Steps To Reproduce:
time python -c "import textwrap; textwrap.wrap('x'*9000, 64)"

This has a complexity of O(n^2), meaning wrapping 18000 chars takes 4 times as long as 9000.

one known workaround is to use
textwrap.wrap('x'*9000, 64, break_on_hyphens=False)

this also has O(n^2) complexity, but is around 2000 times faster.
msg256464 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-12-15 16:19
This has already been fixed in issue 22687.  It was deemed a performance improvement for an edge case and was not backported.

I don't see the advantage of using textwrap to split up base64 encoded strings, by the way.  The module isn't designed for doing line splitting, it designed for doing text wrapping where blanks matter.  For your application I would just do:

   lines = [x[n*64:(n+1)*64] for n in range((len(x)//64)+1)]
msg256466 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2015-12-15 16:41
should probably be

   lines = [x[n*64:(n+1)*64] for n in range(((len(x)-1)//64)+1)]

to avoid an empty line added when the last line is full
which once again shows why people prefer to use standard libraries
for this kind of work
msg256493 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-12-15 23:54
There is a standard library fuction for that ;) the step argument to range():

lines = (result[n:n + 64] for n in range(0, len(result), 64))
msg256501 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-12-16 01:38
Oh, good call.  I forgot about step.
History
Date User Action Args
2022-04-11 14:58:25adminsetgithub: 70058
2015-12-16 01:38:47r.david.murraysetmessages: + msg256501
2015-12-15 23:54:14martin.pantersetnosy: + martin.panter
messages: + msg256493
2015-12-15 16:41:54bmwiedemannsetmessages: + msg256466
2015-12-15 16:19:45r.david.murraysetstatus: open -> closed

superseder: horrible performance of textwrap.wrap() with a long word
components: - Extension Modules, Regular Expressions, Benchmarks

nosy: + r.david.murray
messages: + msg256464
resolution: duplicate
2015-12-15 15:54:57bmwiedemanncreate