classification
Title: EmailMessage.is_attachment == False if filename is present
Type: Stage: resolved
Components: email Versions: Python 3.5, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: barry, brandon-rhodes, python-dev, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-03-27 17:12 by brandon-rhodes, last changed 2014-09-21 17:33 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
is_attachment.patch r.david.murray, 2014-03-27 22:56 review
Messages (15)
msg214969 - (view) Author: Brandon Rhodes (brandon-rhodes) * Date: 2014-03-27 17:12
Most attachments (in my inbox, at least) specify a filename, and thus
have a Content-Disposition header that looks like:

Content-Disposition: attachment; filename="attachment.gz"

In fact, this sample header was generated by the new add_attachment()
method in Python itself. Unfortunately, the is_attachment property
currently does this test:

c_d.lower() == 'attachment'

Which means that it returns False for almost all attachments in my
email archive. I believe that the test should instead be:

c_d.split(';', 1)[0].lower() == 'attachment'
msg214970 - (view) Author: Brandon Rhodes (brandon-rhodes) * Date: 2014-03-27 17:21
Oh - this also, happily, explains why iter_attachments() is ignoring
all of the attachments on my email: because it internally relies upon
is_attachment to make the decision. So this fix will also make
iter_attachments() usable!
msg214972 - (view) Author: Brandon Rhodes (brandon-rhodes) * Date: 2014-03-27 17:30
Okay, having looked at the source a bit more it would probably make
more sense to use _splitparam() instead of doing the split manually.
msg214974 - (view) Author: Brandon Rhodes (brandon-rhodes) * Date: 2014-03-27 17:47
Given that methods like get_param() already exist for pulling data out of
the right-hand-side of the ';' in a parameterized email header, would it
be amiss for EmailMessage to also have a method that either returns
everything to the left of the semicolon, or returns something like:

('attachment', [('filename', 'example.txt')])

thus doing all the parsing in one place that everything else can then
steadily rely upon, including users that want to pull the parsed values
for their own inspection?
msg214978 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-27 18:49
That facility already mostly exists.  The bug is that the code in question doesn't use it.

>>> m['Content-Disposition'].content_disposition
'attachment'
>>> m['Content-Disposition'].params
{'filename': 'attachment.gz'}

On the other hand, looking at that it is obvious there should be a generic 'value' attribute on all MIME headers so that that could be written:

m['Content-Disposition'].value == 'attachment'

where value would be the 'canonical form' of the value for that header when there is one, including normalizing the case.  Some headers still want specialized attributes (Content-Type's maintype and subtype, for example), but there is always the value/params split, and that ought to be accessible generically and currently isn't.

This is why this stuff is still provisional :)
msg214990 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-27 22:56
Here's patch.
msg214995 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-27 23:13
I was going to open new issue for adding 'value', but looking at the parsing code I see why I didn't add one.  The way the new parser works it really wants to know the actual structure of the value, it doesn't have a good way to treat it as generic.  We could still add 'value' and just set it to the appropriate value, but there isn't currently a way to parse an unknown MIME header.

I think maybe I'll postpone this 'value' idea until we have a bit more experience with the API.
msg214997 - (view) Author: Brandon Rhodes (brandon-rhodes) * Date: 2014-03-27 23:18
Understood. I wonder where in the documentation the ability to get the content disposition should wind up? I am almost tempted to suggest a get_content_disposition() method that parallels get_content_type(), mostly to avoid having to document the asymmetry between how users should go about getting these two pieces of information. :)
msg215002 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-27 23:27
Well, that's why is_attachment exists.  I wouldn't be averse to adding get_content_disposition if nobody objects, though.

The attributes are on the headers because the data really is attributes of the parsed headers, but the more useful user API is the methods on the message object those headers are part of.  Originally I thought the header attributes could replace the message object methods, but the more I actually used that interface the less I liked it :).  So now I consider them more of an implementation or lower-level-of-the-model detail.
msg215004 - (view) Author: Brandon Rhodes (brandon-rhodes) * Date: 2014-03-27 23:37
I agree that is_attachment supports the most common use-case of people who need to inspect the content disposition!

But people implementing heavyweight tools and email clients might additionally need to distinguish between a MIME part whose disposition is explicitly "inline" and a MIME part whose disposition is simply unspecified — since the RFC seems to allow clients quite a bit of freedom in the case where it is entirely unspecified:

"Content-Disposition is an optional header field. In its absence, the MUA may use whatever presentation method it deems suitable." — RFC 2183

And a three-possibility 'inline'|'attachment'|None return value from get_content_disposition() would perfectly reflect the three possibilities envisioned in the standard.
msg215014 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-28 02:18
OK.  If you would be willing to open a feature request for that, that would be great.
msg215037 - (view) Author: Brandon Rhodes (brandon-rhodes) * Date: 2014-03-28 11:14
Thanks — done! http://bugs.python.org/issue21083
msg227181 - (view) Author: Roundup Robot (python-dev) Date: 2014-09-20 21:49
New changeset 0044ed0af96f by R David Murray in branch '3.4':
#21079: is_attachment now looks only at the value, ignoring parameters.
https://hg.python.org/cpython/rev/0044ed0af96f
msg227182 - (view) Author: Roundup Robot (python-dev) Date: 2014-09-20 21:50
New changeset 54392c4a8880 by R David Murray in branch 'default':
Merge: #21079: is_attachment now looks only at the value, ignoring parameters.
https://hg.python.org/cpython/rev/54392c4a8880
msg227221 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-21 17:33
Following expression looks more clear to me:

    c_d is not None and c_d.content_disposition == 'attachment'
History
Date User Action Args
2014-09-21 17:33:43serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg227221
2014-09-20 21:50:28r.david.murraysetstatus: open -> closed
stage: resolved
resolution: fixed
versions: + Python 3.5
2014-09-20 21:50:01python-devsetmessages: + msg227182
2014-09-20 21:49:08python-devsetnosy: + python-dev
messages: + msg227181
2014-03-28 11:14:26brandon-rhodessetmessages: + msg215037
2014-03-28 02:18:36r.david.murraysetmessages: + msg215014
2014-03-27 23:37:32brandon-rhodessetmessages: + msg215004
2014-03-27 23:27:34r.david.murraysetmessages: + msg215002
2014-03-27 23:18:27brandon-rhodessetmessages: + msg214997
2014-03-27 23:13:09r.david.murraysetmessages: + msg214995
2014-03-27 22:56:51r.david.murraysetfiles: + is_attachment.patch
keywords: + patch
messages: + msg214990
2014-03-27 18:49:14r.david.murraysetmessages: + msg214978
2014-03-27 17:47:06brandon-rhodessetmessages: + msg214974
2014-03-27 17:30:31brandon-rhodessetmessages: + msg214972
2014-03-27 17:21:14brandon-rhodessetmessages: + msg214970
2014-03-27 17:12:10brandon-rhodescreate