classification
Title: email.Message param parsing problem II
Type: enhancement Stage: test needed
Components: email, Library (Lib) Versions: Python 3.3
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ajaksu2, barry, bpoaugust, collinwinter, customdesigned, r.david.murray, tony_nelson, zvyn
Priority: normal Keywords: easy, patch

Created on 2003-08-26 03:37 by customdesigned, last changed 2016-12-12 12:01 by bpoaugust.

Files
File name Uploaded Description Edit
virus5 customdesigned, 2003-08-26 03:37 Example (deactivated) virus with wierd headers
issue795081.patch zvyn, 2015-06-13 06:14 review
Messages (9)
msg53986 - (view) Author: Stuart D. Gathman (customdesigned) Date: 2003-08-26 03:37
The enclosed real life (inactivated) virus message
causes email.Message to fail to find the multipart
attachments.  This is because the headers following
Content-Type are indented, causing email.Message to
properly append them to Content-Type.  The trick is
that the boundary is quoted, and Outhouse^H^H^H^H^Hlook
apparently gets a value of 'bound' for boundary,
whereas email.Message gets the value
'"bound"\n\tX-Priority...'.  email.Utils.unqoute
apparently gives up and doesn't remove any quotes.

I believe that unqoute should return just what is
between the quotes, so that '"abc" def' would be
unquoted to 'abc'.  In fact, my email filtering
software (http://bmsi.com/python/milter.html) works
correctly on all kinds of screwy mail using my version
of unquote using this heuristic.  I believe that header
used by the virus is invalid, so a STRICT parser should
reject it, but a tolerant parser (such as a virus
scanner would use) should use the heuristic.

Here is a brief script to show the problem (attached
file in test/virus5): 
----------t.py----------
import email

msg = email.message_from_file(open('test/virus5','r'))
print msg.get_params()
---------------------
$ python2 t.py
[('multipart/mixed', ''), ('boundary',
'"bound"\n\tX-Priority: 3\n\tX-MSMail-Priority:
Normal\n\tX-Mailer: Microsoft Outlook Express
5.50.4522.1300\n\tX-MimeOLE: Produced By Microsoft
MimeOLE V5.50.4522.1300')]
msg53987 - (view) Author: Stuart D. Gathman (customdesigned) Date: 2003-08-26 03:57
Logged In: YES 
user_id=142072

Here is a proposed fix for email.Util.unquote (except it
should test for a 'strict' mode flag, which is current only
in Parser):

def unquote(str):
    """Remove quotes from a string."""
    if len(str) > 1:
        if str.startswith('"'):
          if str.endswith('"'):
            str = str[1:-1]
          else: # remove garbage after trailing quote
            try: str = str[1:str[1:].index('"')+1]
            except: return str
          return str.replace('\\\\', '\\').replace('\\"', '"')
        if str.startswith('<') and str.endswith('>'):
            return str[1:-1]
    return str

Actually, I replaced only email.Message._unquotevalue for my
application to minimize the impact.  That would also be a
good place to check for a STRICT flag stored with the
message object.  Perhaps the Parser should set the Message
_strict flag from its own _strict flag. 
msg53988 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2003-11-21 20:45
Logged In: YES 
user_id=12800

Moving this to feature requests for Python 2.4.  If
appropriate, the email-sig should address this in the
intended new lax parser for email 3.0 / Python 2.4.  We
can't add this to the Python 2.3 (or earlier) maintenance
releases.
msg53989 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2007-03-30 14:58
I'm still seeing this behaviour as of Python 2.6a0.

Barry: I take it email-sig didn't get around to discussing this?
msg63201 - (view) Author: Tony Nelson (tony_nelson) Date: 2008-03-03 03:47
If I understand RFC2822 3.2.2. Quoted characters (heh), unquoting must
be done in one pass, so the current replace().replace() is wrong.  It
will change '\\"' to '"', but it should become '\"' when unquoted.

This seems to work:

    re.sub(r'\\(.)',r'\1',s)

I haven't encountered a problem with this; I just came across it while
looking at the file Utils.py (Python 2.4, but unchanged in trunk).  I
will submit a new bug if desired.
msg82025 - (view) Author: Daniel Diniz (ajaksu2) Date: 2009-02-14 12:11
Good candidate for the email sprint. Fix suggested inline.
msg245292 - (view) Author: Milan Oberkirch (zvyn) * Date: 2015-06-13 06:14
Is this still relevant? I just made a patch based on the suggestions discussed and it does not change the behavior of the original bug report (but fixed the bug regarding '\\\\' mentioned by tony_nelson). Maybe I'm missing something?
msg245317 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-06-13 14:51
It think the thing to do is to turn it into a test case for both the old and the new parser, and the decide what we want the behavior to be.
msg282986 - (view) Author: (bpoaugust) Date: 2016-12-12 12:01
Rather that change unquote to deal with such malformed input, why not just enhance get/set boundary? That would reduce the impact of any changes.

Also it should be easier to detect trailing rubbish in the value if you know it is a boundary value.
History
Date User Action Args
2016-12-12 12:01:53bpoaugustsetnosy: + bpoaugust
messages: + msg282986
2015-06-13 14:51:48r.david.murraysetmessages: + msg245317
2015-06-13 06:14:33zvynsetfiles: + issue795081.patch

nosy: + zvyn
messages: + msg245292

keywords: + patch
2012-05-16 01:37:00r.david.murraysetassignee: r.david.murray ->
components: + email
2010-12-27 18:24:09r.david.murraysetnosy: barry, collinwinter, customdesigned, tony_nelson, ajaksu2, r.david.murray
versions: + Python 3.3, - Python 2.7
2010-05-05 13:50:32barrysetassignee: barry -> r.david.murray

nosy: + r.david.murray
2009-02-14 12:11:39ajaksu2setkeywords: + easy
nosy: + ajaksu2
stage: test needed
messages: + msg82025
versions: + Python 2.7
2008-03-03 03:47:13tony_nelsonsetnosy: + tony_nelson
messages: + msg63201
2003-08-26 03:37:10customdesignedcreate