classification
Title: mailcap does not respect precedence in the presence of wildcards
Type: behavior Stage: resolved
Components: email Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Fabio Alessandro Locati, barry, manu-beffara, michael-lazar, petri.lehtinen, python-dev, r.david.murray
Priority: normal Keywords: patch

Created on 2012-06-01 09:28 by manu-beffara, last changed 2016-09-10 00:10 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
mailcap.patch michael-lazar, 2016-07-11 04:28 review
mailcap_v2.patch michael-lazar, 2016-07-12 04:35
mailcap_v3.patch michael-lazar, 2016-07-13 15:57 review
mailcap_v4.patch michael-lazar, 2016-07-14 06:29 review
Messages (19)
msg162064 - (view) Author: Emmanuel Beffara (manu-beffara) Date: 2012-06-01 09:28
According to RFC 1542, the first matching entry in mailcap files should be used for handling a given type. The mailcap module does not respect this rule when wildcards are used in some rules, because the "lookup" function always checks entries for a specific type (say image/jpeg) before checking the generic ones (image/*). As a consequence, if a user overrides system defaults using a generic type, this choice is not satisfied, and the behaviour is inconsistent with other tools using mailcap (including the standard "run-mailcap").
msg162070 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-06-01 12:24
Hmm.  It seems to me this is an edge case as far as backward compatibility goes.  On the one hand it does seem like an RFC violation (ie: bug), on the other hand there could be user programs depending on the current behavior.  I'm inclined toward treating it as a bug, but I can see it being argued the other way.  The fact that the file is designed for interoperability and other tools do take the wildcard first argues in favor of treating it as a bug, I think.
msg168438 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-08-17 10:21
Sounds like a bug to me.

It's not too straightforward to fix, though. The order of MIME types is lost because they are stored as keys of a dict. AFAICS, it wouldn't help to use OrderedDict and checking for the wildcard type first if its index is smaller. Example:

image/*; foo %s; test=/bin/false
image/jpeg; bar %s; test=/bin/true
image/*; baz %s; test=/bin/true

In this case, the image/jpeg entry should be returned, but both image/* entries get grouped together, so they would be evaluated first and the "baz %s" entry would be returned.

The API expects the result of getcaps() (a dict) to be passed to findmatch(), which makes it very hard to change the caps representation. The only way I can think of would be to change the representation to a dict subclass with extra semantics. Plain dict would still have to be supported for backwards compatibility, though.
msg270143 - (view) Author: Michael Lazar (michael-lazar) * Date: 2016-07-11 00:32
Hello. In my opinion this is a pretty major deficiency. I was trying to add definitions to my ~/.mailcap file (which should take priority over system mailcap files) but they weren't getting applied because of the wildcard bug. This was prohibiting me from using mailcap, so I wrote a patch and submitted it to PyPI. I've never contributed before; what would be the first step towards submitting this to python?

https://github.com/michael-lazar/mailcap_fix
https://pypi.python.org/pypi/mailcap-fix/0.1.0
msg270145 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-11 00:47
Submit it as a proposed patch here for review, please.  git or hg diff against the default branch, or a pointer to an hg clone with the patch applied.
msg270158 - (view) Author: Michael Lazar (michael-lazar) * Date: 2016-07-11 04:29
Alright thanks, I've submitted a patch
msg270226 - (view) Author: Michael Lazar (michael-lazar) * Date: 2016-07-12 04:35
Submitting an updated patch; simplified the implementation and updated test_mailcap.py
msg270243 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-12 13:20
I think we need to preserve backward compatibility in the readmailcapfile function even though it isn't technically a public API (make lineno a keyword argument and the behavior dependent on its presence).  I haven't experimented with the patch yet, but it looks reasonable.
msg270283 - (view) Author: Michael Lazar (michael-lazar) * Date: 2016-07-13 01:22
I can certainly do that. Although in addition to adding a keyword argument, we would also have to change the return signature to switch between modes like this:

    if lineno is None:
        return caps
    else:
        return caps, lineno

Overall I'm not a fan of this technique and would like to avoid it if possible. The problem is that we have to keep track of the current line between successive calls to readmailcapfile(). An alternative would be to go back to using lineno as a generator. This is close to what I had in the initial patch.

    lineno = itertools.count()
    caps = readmailcapfile(fp, lineno=lineno)
    caps = readmailcapfile(fp2, lineno=lineno)
    caps = readmailcapfile(fp3, lineno=lineno)
    ...etc

Happy to hear any insights you have on this.
msg270284 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-13 01:30
How about this: rename the existing readmailcapfile as an internal _readmailcapfile with the new signature.  Then add a backward compatible readmailcapfile with the existing signature/return value that uses a dummy value for lineno and throws away the lineno on output.  The point here is just to avoid breaking programs that are using the existing api, even though it is an internal one.  Yes, it is cruft, but backward compatibility sometimes requires cruft.  We could deprecate the old api with a message that says it is an internal method and should not be used, and see if anyone complains about it being deprecated.
msg270299 - (view) Author: Michael Lazar (michael-lazar) * Date: 2016-07-13 06:54
That works for me, patch updated to match your suggestion. So to recap, the proposed fix implements the following changes to the api:

getcaps()
   The returned dict now has an additional `lineno` field. There's a *slim* chance that this could break behavior for somebody who relies on the output for something other than passing it to findmatch().

readmailcapfiles()
    Marked as deprecated, but the behavior will remain the same.

lookup(caps, ...)
    If the caps dict contains `lineno`, the metadata will be used to sort the matching entries. If the caps dict does not contain `lineno`, for example if it was constructed manually, the behavior will remain the same.

findmatch(caps, ...)
    Same as lookup()
msg270306 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-13 12:14
Looks like you didn't attach the new patch.
msg270320 - (view) Author: Michael Lazar (michael-lazar) * Date: 2016-07-13 15:55
Whoops
msg270322 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-13 16:15
There's actually programmatic way to generate a deprecation warning when the function is called (and then we'd want a test to make sure it gets generated).  if you don't feel like working through that hopefully someone else will pick it up.  If you want to work on it, you can find examples by grepping for DeprecationWarning.
msg270373 - (view) Author: Michael Lazar (michael-lazar) * Date: 2016-07-14 06:29
Got it, I found some examples and it didn't look too complicated so I took a shot at it. Is there anything else that needs to be added? Does the documentation need to be updated at all?

btw, thanks for the rapid iteration. I appreciate you taking the time to help guide me though this :)
msg270423 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-14 16:45
I think a doc update is probably worthwhile, but I haven't looked at the docs so I'm not sure.

I'll try to give this a complete review this weekend, though I'm a bit busy so it may not happen.
msg272543 - (view) Author: Fabio Alessandro Locati (Fabio Alessandro Locati) Date: 2016-08-12 15:13
Any news on this?
msg275506 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-09-10 00:10
New changeset f1bf0abcca0c by R David Murray in branch '3.5':
#14977: Make mailcap respect the order of the lines in the mailcap file.
https://hg.python.org/cpython/rev/f1bf0abcca0c

New changeset efd692c86429 by R David Murray in branch 'default':
Merge: #14977: Make mailcap respect the order of the lines in the mailcap file.
https://hg.python.org/cpython/rev/efd692c86429
msg275507 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-09-10 00:10
Thanks, Michael.  I didn't see anything that needed updating in the docs.
History
Date User Action Args
2016-09-10 00:10:59r.david.murraysetstatus: open -> closed
resolution: fixed
messages: + msg275507

stage: resolved
2016-09-10 00:10:00python-devsetnosy: + python-dev
messages: + msg275506
2016-08-12 15:13:29Fabio Alessandro Locatisetnosy: + Fabio Alessandro Locati
messages: + msg272543
2016-07-14 16:45:59r.david.murraysetmessages: + msg270423
2016-07-14 06:29:05michael-lazarsetfiles: + mailcap_v4.patch

messages: + msg270373
2016-07-13 16:15:29r.david.murraysetmessages: + msg270322
2016-07-13 15:57:16michael-lazarsetfiles: + mailcap_v3.patch
2016-07-13 15:55:29michael-lazarsetmessages: + msg270320
2016-07-13 12:14:58r.david.murraysetmessages: + msg270306
2016-07-13 06:54:31michael-lazarsetmessages: + msg270299
2016-07-13 01:30:42r.david.murraysetmessages: + msg270284
2016-07-13 01:22:14michael-lazarsetmessages: + msg270283
2016-07-12 13:20:02r.david.murraysetmessages: + msg270243
2016-07-12 04:35:55michael-lazarsetfiles: + mailcap_v2.patch

messages: + msg270226
2016-07-11 04:29:01michael-lazarsetmessages: + msg270158
2016-07-11 04:28:02michael-lazarsetfiles: + mailcap.patch
keywords: + patch
2016-07-11 00:47:38r.david.murraysetmessages: + msg270145
versions: + Python 3.5, Python 3.6, - Python 2.7, Python 3.2, Python 3.3
2016-07-11 00:32:06michael-lazarsetnosy: + michael-lazar
messages: + msg270143
2012-08-17 10:22:33petri.lehtinensetkeywords: - easy
2012-08-17 10:21:04petri.lehtinensetnosy: + petri.lehtinen
messages: + msg168438
2012-06-01 12:24:36r.david.murraysetversions: + Python 2.7, Python 3.2, Python 3.3
nosy: + barry, r.david.murray

messages: + msg162070

components: + email, - Library (Lib)
keywords: + easy
2012-06-01 09:28:52manu-beffaracreate