classification
Title: header parsing could apply postel's law to encoded words inside quotes
Type: behavior Stage: resolved
Components: email Versions: Python 3.4, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: barry, python-dev, r.david.murray
Priority: normal Keywords:

Created on 2013-01-16 21:13 by r.david.murray, last changed 2014-02-08 18:20 by r.david.murray. This issue is now closed.

Messages (3)
msg180108 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-01-16 21:13
It has come to my attention that at least some mail agents apply postel's law to addresses like the following:

   From: "=?utf-8?Q?not_really_valid?=" <foo@example.com>

Since encountering something that looks like an encoded word but that is not is a very unlikely occurrence, we could go ahead and decode such strings, resulting in

   "not really valid" <foo@example.com>

a defect would be registered, and some sort of 'strict' policy mode could refuse to do the decode (as well as several other non-compliant patterns, such as encoded words not separated by whitespace).  I think the decoding should be the default, though.

This applies also to other headers where words or phrases can be quoted, such as in filenames.  I have also encountered the quoted-encoded-word-as-filename in the wild.
msg210671 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-08 18:13
New changeset 1dcb9d0d53a6 by R David Murray in branch '3.3':
#16983: Apply postel's law to encoded words inside quoted strings.
http://hg.python.org/cpython/rev/1dcb9d0d53a6

New changeset 5f7e626730df by R David Murray in branch 'default':
Merge: #16983: Apply postel's law to encoded words inside quoted strings.
http://hg.python.org/cpython/rev/5f7e626730df
msg210673 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-02-08 18:20
The old header parsing code already decodes these, although it gets the spacing wrong if you do the standard str(make_header(decode_header(x))) dance.  The fix for the new header parsing code only handles the specific case of only encoded words surrounded by double quotes.  That's the only variation I've seen in the wild so far, so I think that may be enough.  To extend it to handle mixed regular text and encoded words would require rewriting the qcontent and ptext functions.  Possible, but not worth it unless a real use case turns up.  (Although, I think there might be a bug in quoted text parsing that may make that rewrite worthwhile later; but it is only a bug if you are actually walking the parse tree, it is not a functional bug.)

Oh, and I decided to treat this as a bug fix, not an enhancement, because the old parser code already did this decoding.
History
Date User Action Args
2014-02-08 18:20:07r.david.murraysetstatus: open -> closed
versions: + Python 3.3
type: enhancement -> behavior
messages: + msg210673

resolution: fixed
stage: needs patch -> resolved
2014-02-08 18:13:50python-devsetnosy: + python-dev
messages: + msg210671
2013-01-16 21:13:44r.david.murraycreate