classification
Title: Big speedup in email message parsing
Type: performance Stage: needs patch
Components: email, Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ajaksu2, barry, holdenweb, lpd, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2005-07-23 22:07 by lpd, last changed 2014-01-23 21:32 by serhiy.storchaka.

Files
File name Uploaded Description Edit
t.dif lpd, 2005-07-23 22:07 Patches for email/FeedParser.py
1243730.diff barry, 2006-05-28 01:12 review
Messages (5)
msg48615 - (view) Author: L. Peter Deutsch (lpd) Date: 2005-07-23 22:07
Python 2.4.1, Red Hat Linux 7.3.

Speeds up message parsing on files with large
attachments by approximately 4x, mostly by replacing
REs by direct string processing.
msg48616 - (view) Author: Steve Holden (holdenweb) * (Python committer) Date: 2006-05-25 22:55
Logged In: YES 
user_id=88157

A first examinaation reveals no particular speedup on an
email with approximately 30 MB of attachments. Can the OP
perhaps provide some code and test data I could time to
verify the assertions of speedup? Otherwise I can't see much
point in applying the patch.
msg48617 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2006-05-28 01:12
Logged In: YES 
user_id=12800

Here's a slightly better version, cleaned up for style and
applicable to Python 2.5 (which is the only place I'd feel
comfortable applying it).  I've verified that this provides
about a 3x speed up at least for some messages with really
big attachments.
msg124717 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-27 17:17
Since this is a performance hack and is considerably invasive of the feedparser code (and needs updating), I'm deferring it to 3.3.
msg184191 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-03-14 20:46
Test fails with stack overflow:

======================================================================
ERROR: test_pushCR_LF (email.test.test_email.TestIterators)
FeedParser BufferedSubFile.push() assumed it received complete
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/serhiy/py/cpython2.7/Lib/email/test/test_email.py", line 2585, in test_pushCR_LF
    bsf.push(il)
  File "/home/serhiy/py/cpython2.7/Lib/email/feedparser.py", line 140, in push
    parts = _splitlines(data)
  File "/home/serhiy/py/cpython2.7/Lib/email/feedparser.py", line 170, in _splitlines
    lines.extend(_splitlines(part))
...
  File "/home/serhiy/py/cpython2.7/Lib/email/feedparser.py", line 170, in _splitlines
    lines.extend(_splitlines(part))
RuntimeError: maximum recursion depth exceeded
History
Date User Action Args
2014-01-23 21:32:49serhiy.storchakasetversions: + Python 3.5, - Python 3.4
2013-04-13 17:08:13serhiy.storchakasetstage: patch review -> needs patch
2013-03-14 20:46:00serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg184191
2013-03-14 08:00:07ezio.melottisetversions: + Python 3.4, - Python 3.3
2012-05-16 01:23:15r.david.murraysetkeywords: - easy
assignee: r.david.murray ->
components: + email
2010-12-27 17:17:26r.david.murraysetnosy: barry, holdenweb, lpd, ajaksu2, r.david.murray
versions: + Python 3.3, - Python 3.1, Python 2.7, Python 3.2
messages: + msg124717
stage: test needed -> patch review
2010-07-20 03:18:04BreamoreBoysetversions: + Python 3.2
2010-05-05 13:45:09barrysetassignee: barry -> r.david.murray

nosy: + r.david.murray
2009-04-22 14:36:47ajaksu2setkeywords: + easy
2009-03-20 22:12:45ajaksu2setversions: + Python 3.1, Python 2.7, - Python 2.5
nosy: + ajaksu2

components: + Library (Lib), - Interpreter Core
type: performance
stage: test needed
2005-07-23 22:07:37lpdcreate