classification
Title: RFC822-comments in email header fields can fool, e.g., get_filename()
Type: behavior Stage: resolved
Components: email Versions: Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: ale2017, barry, r.david.murray
Priority: normal Keywords:

Created on 2017-02-06 11:28 by ale2017, last changed 2017-02-07 08:29 by ale2017. This issue is now closed.

Files
File name Uploaded Description Edit
attachments.py ale2017, 2017-02-06 19:26 a de_comment() function and its intended use.
Messages (6)
msg287119 - (view) Author: Alessandro Vesely (ale2017) * Date: 2017-02-06 11:28
Comments are allowed almost everywhere in an email message, and should be eliminated before attributing any meaning to a field.  In the words of RFC5322, any CRLF that appears in FWS is semantically "invisible".

In particular, some note that comments can be used to deceive an email filter.  For example, like so:

Content-Disposition: attachment;
 filename=''attached%2E";
 filename*1*="%62";
 filename*2=(fool filters)at

(I don't know which, if any, email clients would execute that batch...)

Anyway, removing comments is needed for any structured header field.  One is usually interested in the unfolded, de-commented value.  It is difficult to do correctly, because of nesting and quoting possibilities.

This issue seems to be ignored, except for address lists (there is a getcomment() member in AddrlistClass).  Why?
msg287134 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-02-06 14:10
My reading of rfc 2231 is that CFWS is not allowed in that position.  Can you explain your interpretation with specific cites to the RFC?

Also please provide an example of specific behavior of the email package that you think is incorrect.  An email processor should always be treating a filename as a dirty string, so I'm not clear on what you are worried about here.
msg287135 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-02-06 14:17
Oh, and the answer to you "why" is that the email package is only dealing with content semantically in address lists.  Everywhere else it is up to the library using program to interpret the structured headers.  In 2.7 the email package provides you the tools to process emails, but does not do very much hand holding.  The python3 email package tries to do a much better job; but, frankly, I skimped on handling comments and have done almost no testing of the code that theoretically handles them, since they are so rarely encountered in the wild.  Specifically they are supposed to be correctly parsed, but there is no way to access comment content and, as I said, there are few to zero tests that include comments to validate that syntactic handling. 

I would be interested in patches to complete the comment support in _header_value_parser in python3.
msg287166 - (view) Author: Alessandro Vesely (ale2017) * Date: 2017-02-06 19:26
Neither I found CFWS in rfc2231.  In addition, rfc 2045 (Introduction) says that Content-Disposition —where filename is defined— cannot include comments.  However, Content-Type can include RFC 822 comments, so the filename should be de-commented in case it is inferred from the name parameter there.

I'm rather new to Python, and sticking to version 2 because of the packages I work with.  I see Python3's email has a much more robust design.  Does this mean Python2 cannot get fixed?

I attach a de_comment() function, copied from the one I mentioned this morning.  The rest of the file shows its intended use.  (Oops, it removes comments even from where they are not supposed to be allowed ;-)
Having that kind of functionality in email.utils would make it easier to read Message's, no?
msg287170 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-02-06 20:40
Your thought is correct: python2 no longer gets enhancements.  So improved comment handling can only be added to python3, assuming anyone is interested in doing it :)
msg287208 - (view) Author: Alessandro Vesely (ale2017) * Date: 2017-02-07 08:29
We can close this, then.  Let's hope migration to Python3 isn't going to last forever...

Thank you for your cooperation
History
Date User Action Args
2017-02-07 08:29:16ale2017setstatus: open -> closed
resolution: wont fix
messages: + msg287208

stage: resolved
2017-02-06 20:40:15r.david.murraysetmessages: + msg287170
2017-02-06 19:26:07ale2017setfiles: + attachments.py

messages: + msg287166
2017-02-06 14:17:59r.david.murraysetmessages: + msg287135
2017-02-06 14:10:09r.david.murraysetmessages: + msg287134
2017-02-06 11:28:50ale2017create