This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: validate mime types loaded from system files. Document that system files take precedence.
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: cheryl.sabella, iritkatriel, r.david.murray
Priority: normal Keywords: patch

Created on 2017-12-31 11:07 by cheryl.sabella, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL Status Linked Edit
PR 5063 closed cheryl.sabella, 2017-12-31 11:27
Messages (4)
msg309275 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2017-12-31 11:07
On a Windows 7 system, entering the following:

    >>> mime, encoding = mimetypes.guess_type('Untitled.sql')
    >>> mime
    'text\\plain'

Meaning, the return value is 'text\\plain' instead of 'text/plain'.  Tracking this down, it's due to .sql being loaded from the Windows registry and the registry is using the wrong slash.

The mimetypes.guess_type() documentation states:
> The return value is a tuple (type, encoding) where type is None if    > the type can’t be guessed (missing or unknown suffix) or a string of
> the form 'type/subtype', usable for a MIME content-type header.

I don't know if guess_type() (or add_types) should check for a valid types, if .sql should be added to the valid types (it's on the IANA page), or if the documentation should be fixed so it doesn't look like a guarantee.  Or all three.  :-)
msg309291 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-12-31 16:38
You can get the same "bad" behavior on a posix system by having a mimetypes file with an incorrect entry in it.  That would be a system misconfiguration, as is your Windows registry case, and is outside of Python's control.  I suppose we could make it clearer (ie: in that intro paragraph) that the system files are read by default (that is, the built-in tables are only *defaults* unless you specify otherwise).

It is unfortunately true that the mime types in the Windows registry are less reliable than those on unix systems.  This has nothing to do with the mimetypes module itself, though ;)  I wonder if we should have made the default to be loading windows registry as non-strict, but that ship has sailed, I think.

Checking for at least minimal validity (xxx/yyy) would at least make things a little better on Windows, so I wouldn't object to adding that.

To summarize, my suggestion would be to add a note to the intro paragraph that system files/registry are read by default and override the built-in tables, and add a minimal sanity check on the mime type values read.  Adding .sql to the strict list is a separate issue, and would not change the behavior here (unless I'm missing something, which is possible).

There are issues around adding even a minimal validity check, though: do we backport that?  Do we silently ignore strings in the wrong format?  Do we "fix" a backslash to be a slash?  Do we issue a warning for any problems we find?  These questions should be discussed if we decide to go this route.
msg410864 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2022-01-18 13:37
I've closed Issue46035 as a duplicate of this.
msg410865 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2022-01-18 13:39
I've closed Issue43975 as a duplicate of this.
History
Date User Action Args
2022-04-11 14:58:56adminsetgithub: 76643
2022-01-18 13:39:10iritkatrielsetmessages: + msg410865
2022-01-18 13:39:07iritkatriellinkissue43975 superseder
2022-01-18 13:37:53iritkatrielsetnosy: + iritkatriel
messages: + msg410864
2022-01-18 13:35:14iritkatrielsettitle: mimetypes.guess_type() returns incorrectly formatted type -> validate mime types loaded from system files. Document that system files take precedence.
type: behavior -> enhancement
versions: + Python 3.11, - Python 3.7
2022-01-18 13:34:51iritkatriellinkissue46035 superseder
2017-12-31 16:38:10r.david.murraysetnosy: + r.david.murray
messages: + msg309291
2017-12-31 11:27:36cheryl.sabellasetkeywords: + patch
stage: patch review
pull_requests: + pull_request4939
2017-12-31 11:08:35cheryl.sabellasettitle: mimetypes.guess_type() might be return None or a tuple with (type/subtype, encoding) -> mimetypes.guess_type() returns incorrectly formatted type
2017-12-31 11:07:49cheryl.sabellacreate