Issue 5803: email/quoprimime: encode and decode are very slow on large messages

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/50053

classification

Title:	email/quoprimime: encode and decode are very slow on large messages
Type:	performance	Stage:	resolved
Components:	email, Library (Lib)	Versions:	Python 3.3, Python 3.4

process

Status:	closed	Resolution:
Dependencies:		Superseder:
Assigned To:	serhiy.storchaka	Nosy List:	barry, cainmatt, dmbaggett, pitrou, python-dev, r.david.murray, serhiy.storchaka
Priority:	normal	Keywords:	patch

Created on 2009-04-20 19:01 by dmbaggett, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
quoprimime.py.diff	cainmatt, 2011-01-31 19:49	Lib/email/quoprimime.py patch	review
email_quoprimime_fast_encode.patch	serhiy.storchaka, 2013-05-13 13:06		review

Messages (11)
msg86203 - (view)	Author: Dave Baggett (dmbaggett)	Date: 2009-04-20 19:01
The implementation of encode and decode are slow, and scale nonlinearly in the size of the message to be encoded/decoded. A simple fix is to use an array.array('c') and append to it, rather than using string concatenation. This change makes the code more than an order of magnitude faster.
msg86235 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-04-21 15:21
Do you have a patch? Also, detailed timing examples are welcome (you can e.g. use the "timeit" module from the command line).
msg86236 - (view)	Author: Dave Baggett (dmbaggett)	Date: 2009-04-21 15:23
I can certainly generate a patch for you. What form would you like it in, and against what source tree? Also, do you have a preference between the use of array.array vs. standard arrays? (I don't know whether it's good or bad to depend on "import array" in quoprimime.)
msg86238 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-04-21 15:30
The patch must be against the SVN trunk, in standard unified diff ("svn diff" does the trick). What do you call standard arrays? Do you mean the builtin list type? Storing one separate character per list element is a poor choice because it will waste a lot of memory. The array.array type is more efficient for this. However, there are two other, more idiomatic ways of doing this: - accumulate the chunks of encoded/decoded data into a list (but not only 1-character strings...) and join() them at the end. - or, use a StringIO object (from the cStringIO module) Bonus points if you time all three possibilities and submit the fastest :-)
msg86239 - (view)	Author: Dave Baggett (dmbaggett)	Date: 2009-04-21 15:34
Yes, sorry, I meant "built-in list type" not "array". Your point about using lists this way is valid, and is why I used array.array('c'). I will do as you suggest and try all three methods. I did time the array.array approach vs. the one currently in the code and it was about 30 times faster.
msg87769 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-05-14 21:41
Any news?
msg127654 - (view)	Author: Matt Cain (cainmatt)	Date: 2011-01-31 19:49
I re-wrote encode() to be simpler and faster. My version runs about 10 times faster on a 30KB message. I have tested it somewhat but not rigorously see attached patch
msg173557 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2012-10-22 18:45
3.2+ already use StringIO, there is no nonlinearity. On 2.7 I couldn't find the nonlinearity with the texts of up to 100 MB. Therefore the path outdated as bugfix. However it shows 20-30x speedup on 2.7. It would be worth port it to 3.4. This will be a significant enhancement (if the patch can preserve the current semantics).
msg189126 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-05-13 13:06
Here is a patch for 3.4 based on Matt's patch with additional optimizations. It speeds up body_encode() and header_encode(). $ ./python -m timeit -s "from email.quoprimime import body_encode as encode; x = open('Lib/decimal.py').read()[:100000]" "encode(x)" Before patch: 1.12 sec per loop After patch: 26.3 msec per loop $ ./python -m timeit -s "from email.quoprimime import header_encode as encode; x = b'A'*100" "encode(x)" Before patch: 97.9 usec per loop After patch: 23.7 usec per loop For non-ascii data difference is even larger.
msg208030 - (view)	Author: Roundup Robot (python-dev)	Date: 2014-01-13 18:31
New changeset 4c5b1932354b by R David Murray in branch '3.3': #20206, #5803: more efficient algorithm that doesn't truncate output. http://hg.python.org/cpython/rev/4c5b1932354b New changeset b6c3fc21286f by R David Murray in branch 'default': Merge #20206, #5803: more efficient algorithm that doesn't truncate output. http://hg.python.org/cpython/rev/b6c3fc21286f
msg208033 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2014-01-13 18:33
I've reviewed this and applied it to both 3.3 and 3.4 in order to fix issue 20206. Thanks, Serhiy.

History
Date	User	Action	Args
2022-04-11 14:56:48	admin	set	github: 50053
2014-01-13 18:33:56	r.david.murray	set	status: open -> closed stage: patch review -> resolved messages: + msg208033 versions: + Python 3.3
2014-01-13 18:31:37	python-dev	set	nosy: + python-dev messages: + msg208030
2014-01-09 16:16:51	r.david.murray	link	issue20206 dependencies
2013-05-13 13:06:41	serhiy.storchaka	set	files: + email_quoprimime_fast_encode.patch messages: + msg189126 stage: needs patch -> patch review
2013-01-07 17:47:19	serhiy.storchaka	set	assignee: serhiy.storchaka
2012-10-22 18:45:04	serhiy.storchaka	set	versions: + Python 3.4, - Python 2.7, Python 3.2, Python 3.3 nosy: + serhiy.storchaka messages: + msg173557 stage: patch review -> needs patch
2012-05-16 02:00:44	r.david.murray	set	assignee: r.david.murray -> (no value) components: + email versions: - Python 3.1
2011-03-14 03:54:18	r.david.murray	set	nosy: barry, pitrou, dmbaggett, r.david.murray, cainmatt versions: + Python 3.2
2011-01-31 20:06:56	r.david.murray	set	assignee: r.david.murray stage: needs patch -> patch review nosy: + r.david.murray versions: + Python 3.3
2011-01-31 19:49:39	cainmatt	set	files: + quoprimime.py.diff nosy: + cainmatt messages: + msg127654 keywords: + patch
2009-05-14 21:41:13	pitrou	set	nosy: + barry messages: + msg87769
2009-04-21 15:34:46	dmbaggett	set	messages: + msg86239
2009-04-21 15:30:50	pitrou	set	messages: + msg86238
2009-04-21 15:23:43	dmbaggett	set	messages: + msg86236
2009-04-21 15:21:43	pitrou	set	priority: normal stage: needs patch
2009-04-21 15:21:31	pitrou	set	nosy: + pitrou messages: + msg86235 versions: + Python 3.1, Python 2.7, - Python 2.6
2009-04-20 19:01:43	dmbaggett	set	type: performance
2009-04-20 19:01:29	dmbaggett	create