Issue 795081: email.Message param parsing problem II

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/39128

classification

Title:	email.Message param parsing problem II
Type:	enhancement	Stage:	test needed
Components:	email, Library (Lib)	Versions:	Python 3.3

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	ajaksu2, barry, bpoaugust, collinwinter, customdesigned, r.david.murray, tony_nelson, zvyn
Priority:	normal	Keywords:	easy, patch

Created on 2003-08-26 03:37 by customdesigned, last changed 2022-04-10 16:10 by admin.

Files
File name	Uploaded	Description	Edit
virus5	customdesigned, 2003-08-26 03:37	Example (deactivated) virus with wierd headers
issue795081.patch	zvyn, 2015-06-13 06:14		review

Messages (9)
msg53986 - (view)	Author: Stuart D. Gathman (customdesigned)	Date: 2003-08-26 03:37
The enclosed real life (inactivated) virus message causes email.Message to fail to find the multipart attachments. This is because the headers following Content-Type are indented, causing email.Message to properly append them to Content-Type. The trick is that the boundary is quoted, and Outhouse^H^H^H^H^Hlook apparently gets a value of 'bound' for boundary, whereas email.Message gets the value '"bound"\n\tX-Priority...'. email.Utils.unqoute apparently gives up and doesn't remove any quotes. I believe that unqoute should return just what is between the quotes, so that '"abc" def' would be unquoted to 'abc'. In fact, my email filtering software (http://bmsi.com/python/milter.html) works correctly on all kinds of screwy mail using my version of unquote using this heuristic. I believe that header used by the virus is invalid, so a STRICT parser should reject it, but a tolerant parser (such as a virus scanner would use) should use the heuristic. Here is a brief script to show the problem (attached file in test/virus5): ----------t.py---------- import email msg = email.message_from_file(open('test/virus5','r')) print msg.get_params() --------------------- $ python2 t.py [('multipart/mixed', ''), ('boundary', '"bound"\n\tX-Priority: 3\n\tX-MSMail-Priority: Normal\n\tX-Mailer: Microsoft Outlook Express 5.50.4522.1300\n\tX-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1300')]
msg53987 - (view)	Author: Stuart D. Gathman (customdesigned)	Date: 2003-08-26 03:57
Logged In: YES user_id=142072 Here is a proposed fix for email.Util.unquote (except it should test for a 'strict' mode flag, which is current only in Parser): def unquote(str): """Remove quotes from a string.""" if len(str) > 1: if str.startswith('"'): if str.endswith('"'): str = str[1:-1] else: # remove garbage after trailing quote try: str = str[1:str[1:].index('"')+1] except: return str return str.replace('\\\\', '\\').replace('\\"', '"') if str.startswith('<') and str.endswith('>'): return str[1:-1] return str Actually, I replaced only email.Message._unquotevalue for my application to minimize the impact. That would also be a good place to check for a STRICT flag stored with the message object. Perhaps the Parser should set the Message _strict flag from its own _strict flag.
msg53988 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2003-11-21 20:45
Logged In: YES user_id=12800 Moving this to feature requests for Python 2.4. If appropriate, the email-sig should address this in the intended new lax parser for email 3.0 / Python 2.4. We can't add this to the Python 2.3 (or earlier) maintenance releases.
msg53989 - (view)	Author: Collin Winter (collinwinter) *	Date: 2007-03-30 14:58
I'm still seeing this behaviour as of Python 2.6a0. Barry: I take it email-sig didn't get around to discussing this?
msg63201 - (view)	Author: Tony Nelson (tony_nelson)	Date: 2008-03-03 03:47
If I understand RFC2822 3.2.2. Quoted characters (heh), unquoting must be done in one pass, so the current replace().replace() is wrong. It will change '\\"' to '"', but it should become '\"' when unquoted. This seems to work: re.sub(r'\\(.)',r'\1',s) I haven't encountered a problem with this; I just came across it while looking at the file Utils.py (Python 2.4, but unchanged in trunk). I will submit a new bug if desired.
msg82025 - (view)	Author: Daniel Diniz (ajaksu2) *	Date: 2009-02-14 12:11
Good candidate for the email sprint. Fix suggested inline.
msg245292 - (view)	Author: Milan Oberkirch (zvyn) *	Date: 2015-06-13 06:14
Is this still relevant? I just made a patch based on the suggestions discussed and it does not change the behavior of the original bug report (but fixed the bug regarding '\\\\' mentioned by tony_nelson). Maybe I'm missing something?
msg245317 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-06-13 14:51
It think the thing to do is to turn it into a test case for both the old and the new parser, and the decide what we want the behavior to be.
msg282986 - (view)	Author: (bpoaugust)	Date: 2016-12-12 12:01
Rather that change unquote to deal with such malformed input, why not just enhance get/set boundary? That would reduce the impact of any changes. Also it should be easier to detect trailing rubbish in the value if you know it is a boundary value.

History
Date	User	Action	Args
2022-04-10 16:10:50	admin	set	github: 39128
2016-12-12 12:01:53	bpoaugust	set	nosy: + bpoaugust messages: + msg282986
2015-06-13 14:51:48	r.david.murray	set	messages: + msg245317
2015-06-13 06:14:33	zvyn	set	files: + issue795081.patch nosy: + zvyn messages: + msg245292 keywords: + patch
2012-05-16 01:37:00	r.david.murray	set	assignee: r.david.murray -> components: + email
2010-12-27 18:24:09	r.david.murray	set	nosy: barry, collinwinter, customdesigned, tony_nelson, ajaksu2, r.david.murray versions: + Python 3.3, - Python 2.7
2010-05-05 13:50:32	barry	set	assignee: barry -> r.david.murray nosy: + r.david.murray
2009-02-14 12:11:39	ajaksu2	set	keywords: + easy nosy: + ajaksu2 stage: test needed messages: + msg82025 versions: + Python 2.7
2008-03-03 03:47:13	tony_nelson	set	nosy: + tony_nelson messages: + msg63201
2003-08-26 03:37:10	customdesigned	create