Issue 25870: textwrap is very slow on long words without spaces

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/70058

classification

Title:	textwrap is very slow on long words without spaces
Type:	performance	Stage:
Components:	Library (Lib)	Versions:	Python 3.2, Python 3.3, Python 3.4, Python 2.7

process

Status:	closed	Resolution:	duplicate
Dependencies:		Superseder:	horrible performance of textwrap.wrap() with a long word View: 22687
Assigned To:		Nosy List:	bmwiedemann, brett.cannon, ezio.melotti, martin.panter, mrabarnett, pitrou, r.david.murray
Priority:	normal	Keywords:

Created on 2015-12-15 15:54 by bmwiedemann, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (5)
msg256461 - (view)	Author: Bernhard M. Wiedemann (bmwiedemann) *	Date: 2015-12-15 15:54
Many python scripts use textwrap to break base64-encoded strings from openssl into lines - e.g. https://bugs.launchpad.net/python-keystoneclient/+bug/1404402 and https://github.com/diafygi/acme-tiny/blob/master/acme_tiny.py#L166 Steps To Reproduce: time python -c "import textwrap; textwrap.wrap('x'9000, 64)" This has a complexity of O(n^2), meaning wrapping 18000 chars takes 4 times as long as 9000. one known workaround is to use textwrap.wrap('x'9000, 64, break_on_hyphens=False) this also has O(n^2) complexity, but is around 2000 times faster.
msg256464 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-12-15 16:19
This has already been fixed in issue 22687. It was deemed a performance improvement for an edge case and was not backported. I don't see the advantage of using textwrap to split up base64 encoded strings, by the way. The module isn't designed for doing line splitting, it designed for doing text wrapping where blanks matter. For your application I would just do: lines = [x[n64:(n+1)64] for n in range((len(x)//64)+1)]
msg256466 - (view)	Author: Bernhard M. Wiedemann (bmwiedemann) *	Date: 2015-12-15 16:41
should probably be lines = [x[n64:(n+1)64] for n in range(((len(x)-1)//64)+1)] to avoid an empty line added when the last line is full which once again shows why people prefer to use standard libraries for this kind of work
msg256493 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-12-15 23:54
There is a standard library fuction for that ;) the step argument to range(): lines = (result[n:n + 64] for n in range(0, len(result), 64))
msg256501 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-12-16 01:38
Oh, good call. I forgot about step.

History
Date	User	Action	Args
2022-04-11 14:58:25	admin	set	github: 70058
2015-12-16 01:38:47	r.david.murray	set	messages: + msg256501
2015-12-15 23:54:14	martin.panter	set	nosy: + martin.panter messages: + msg256493
2015-12-15 16:41:54	bmwiedemann	set	messages: + msg256466
2015-12-15 16:19:45	r.david.murray	set	status: open -> closed superseder: horrible performance of textwrap.wrap() with a long word components: - Extension Modules, Regular Expressions, Benchmarks nosy: + r.david.murray messages: + msg256464 resolution: duplicate
2015-12-15 15:54:57	bmwiedemann	create