classification
Title: email.Message param parsing problem II
Type: enhancement Stage: test needed
Components: email, Library (Lib) Versions: Python 3.3
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ajaksu2, barry, collinwinter, customdesigned, r.david.murray, tony_nelson
Priority: normal Keywords: easy

Created on 2003-08-26 03:37 by customdesigned, last changed 2012-05-16 01:37 by r.david.murray.

Files
File name Uploaded Description Edit
virus5 customdesigned, 2003-08-26 03:37 Example (deactivated) virus with wierd headers
Messages (6)
msg53986 - (view) Author: Stuart D. Gathman (customdesigned) Date: 2003-08-26 03:37
The enclosed real life (inactivated) virus message
causes email.Message to fail to find the multipart
attachments.  This is because the headers following
Content-Type are indented, causing email.Message to
properly append them to Content-Type.  The trick is
that the boundary is quoted, and Outhouse^H^H^H^H^Hlook
apparently gets a value of 'bound' for boundary,
whereas email.Message gets the value
'"bound"\n\tX-Priority...'.  email.Utils.unqoute
apparently gives up and doesn't remove any quotes.

I believe that unqoute should return just what is
between the quotes, so that '"abc" def' would be
unquoted to 'abc'.  In fact, my email filtering
software (http://bmsi.com/python/milter.html) works
correctly on all kinds of screwy mail using my version
of unquote using this heuristic.  I believe that header
used by the virus is invalid, so a STRICT parser should
reject it, but a tolerant parser (such as a virus
scanner would use) should use the heuristic.

Here is a brief script to show the problem (attached
file in test/virus5): 
----------t.py----------
import email

msg = email.message_from_file(open('test/virus5','r'))
print msg.get_params()
---------------------
$ python2 t.py
[('multipart/mixed', ''), ('boundary',
'"bound"\n\tX-Priority: 3\n\tX-MSMail-Priority:
Normal\n\tX-Mailer: Microsoft Outlook Express
5.50.4522.1300\n\tX-MimeOLE: Produced By Microsoft
MimeOLE V5.50.4522.1300')]
msg53987 - (view) Author: Stuart D. Gathman (customdesigned) Date: 2003-08-26 03:57
Logged In: YES 
user_id=142072

Here is a proposed fix for email.Util.unquote (except it
should test for a 'strict' mode flag, which is current only
in Parser):

def unquote(str):
    """Remove quotes from a string."""
    if len(str) > 1:
        if str.startswith('"'):
          if str.endswith('"'):
            str = str[1:-1]
          else: # remove garbage after trailing quote
            try: str = str[1:str[1:].index('"')+1]
            except: return str
          return str.replace('\\\\', '\\').replace('\\"', '"')
        if str.startswith('<') and str.endswith('>'):
            return str[1:-1]
    return str

Actually, I replaced only email.Message._unquotevalue for my
application to minimize the impact.  That would also be a
good place to check for a STRICT flag stored with the
message object.  Perhaps the Parser should set the Message
_strict flag from its own _strict flag. 
msg53988 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2003-11-21 20:45
Logged In: YES 
user_id=12800

Moving this to feature requests for Python 2.4.  If
appropriate, the email-sig should address this in the
intended new lax parser for email 3.0 / Python 2.4.  We
can't add this to the Python 2.3 (or earlier) maintenance
releases.
msg53989 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2007-03-30 14:58
I'm still seeing this behaviour as of Python 2.6a0.

Barry: I take it email-sig didn't get around to discussing this?
msg63201 - (view) Author: Tony Nelson (tony_nelson) Date: 2008-03-03 03:47
If I understand RFC2822 3.2.2. Quoted characters (heh), unquoting must
be done in one pass, so the current replace().replace() is wrong.  It
will change '\\"' to '"', but it should become '\"' when unquoted.

This seems to work:

    re.sub(r'\\(.)',r'\1',s)

I haven't encountered a problem with this; I just came across it while
looking at the file Utils.py (Python 2.4, but unchanged in trunk).  I
will submit a new bug if desired.
msg82025 - (view) Author: Daniel Diniz (ajaksu2) Date: 2009-02-14 12:11
Good candidate for the email sprint. Fix suggested inline.
History
Date User Action Args
2012-05-16 01:37:00r.david.murraysetassignee: r.david.murray ->
components: + email
2010-12-27 18:24:09r.david.murraysetnosy: barry, collinwinter, customdesigned, tony_nelson, ajaksu2, r.david.murray
versions: + Python 3.3, - Python 2.7
2010-05-05 13:50:32barrysetassignee: barry -> r.david.murray

nosy: + r.david.murray
2009-02-14 12:11:39ajaksu2setkeywords: + easy
nosy: + ajaksu2
stage: test needed
messages: + msg82025
versions: + Python 2.7
2008-03-03 03:47:13tony_nelsonsetnosy: + tony_nelson
messages: + msg63201
2003-08-26 03:37:10customdesignedcreate