classification
Title: show Python mimetypes module some love
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eric.araujo, jab, jrus, ncoghlan, r.david.murray
Priority: normal Keywords: patch

Created on 2009-08-02 19:19 by jrus, last changed 2014-01-25 03:41 by ncoghlan.

Files
File name Uploaded Description Edit
mimetypes3.diff jrus, 2009-08-02 20:23 patch "version 3"
mimetypes2.diff jrus, 2009-08-02 20:26 patch "version 2"
apache_mimetypes.py jrus, 2009-08-02 21:52 a list of tuples containing apache's default extension -> type mappings
mimetypes4.diff jrus, 2009-08-12 04:41 patch "version 4"
mimetypes5.diff jrus, 2009-08-15 02:20 patch "version 5"
Messages (13)
msg91196 - (view) Author: Jacob Rus (jrus) * Date: 2009-08-02 19:19
See discussion started right at the end of the month at
http://mail.python.org/pipermail/python-dev/2009-July/090928.html

And continued at
http://mail.python.org/pipermail/python-dev/2009-August/thread.html

Basically, the mimetypes module is fragile and very confusing code, 
built up over years of feature creep without refactoring or careful 
overall design. I'd like to cut it down to a more manageable code size, 
fix some bugs, update the included list of mime types, and use some nice 
Python features of versions 2.2+. Ideally someone reading the module 
once through would be able to understand what it does.

Patches to be attached shortly.
msg91200 - (view) Author: Jacob Rus (jrus) * Date: 2009-08-02 20:08
This diff should leave the semantics of the module essentially unchanged 
(including lazy-loading of default files), and also leave the particular 
MIME types used unchanged, even though these are out of date and should be 
updated; a subsequent suggested version will address that, perhaps after 
some discussion.
msg91203 - (view) Author: Jacob Rus (jrus) * Date: 2009-08-02 20:23
Here is a version of the patch which does away with the lazy loading: 
these are a small handful of easy-to-parse ~40k files; if the import takes 
an extra eye-blink, it shouldn't be too big a deal.
msg91204 - (view) Author: Jacob Rus (jrus) * Date: 2009-08-02 20:26
A fixed version of the patch from msg91200, 2009-08-02 20:08
msg91205 - (view) Author: Jacob Rus (jrus) * Date: 2009-08-02 20:43
This version (#4) switches to expressing the default types as a list of 
tuples instead of as a dict, so that we can include duplicate rows so that 
"reverse" type -> extension lookups will behave properly, once we start 
changing the actual content of the defaults.

The types_map and common_types dictionaries (aliases to the singleton 
MimeTypes object's types_map property) have been left behaving as before 
for backwards compatibility.

The tests still pass.
msg91208 - (view) Author: Jacob Rus (jrus) * Date: 2009-08-02 21:52
Here is a list I generated of all the current Apache mime.types:

I would just as soon include this in the python standard library, either 
just the Apache file as is, or even these python object literals (maybe 
in a file outside of mimetypes.py), and then *not* import from Apache 
files by default, to cut down on external dependencies. There are 
several alternate MIME types for various types that should be added to 
this list (in earlier positions so they only are used in the type -> 
extension map).

The only issue is that some users may have added to their Apache 
mime.types files for the sake of getting mailman or other python 
programs to do what they want. So I'm not entirely sure to what extent 
we should be 100% backwards compatible in such edge cases. 

My personal opinion is that the 'strict' option is unnecessary and 
should be set to do nothing, because users are more likely to want the 
predictable behavior where an unorthodox type gives back the proper 
extension, than the behavior where their code fails unless they pass a 
flag in: I don't see any reason for a user to want a 'type doesn't 
exist' message back for non-registered types. This isn't a "test for 
IANA registration" module.
msg91489 - (view) Author: Jacob Rus (jrus) * Date: 2009-08-11 22:58
Plone uses this thing, which has *much* more complexity than necessary for 
the standard library, but it might be nice to pick up the code for pulling 
types out of the windows registry, for instance.

http://svn.plone.org/svn/archetypes/Products.MimetypesRegistry/trunk/Produ
cts/MimetypesRegistry/MimeTypesRegistry.py
msg91583 - (view) Author: Jacob Rus (jrus) * Date: 2009-08-15 02:20
Okay, here's a version of this patch which (a) adds deprecation warnings, 
and (b) doesn't bother with lazy init. It should still be nearly 
completely backwards compatible with the previous mimetypes module.
msg91585 - (view) Author: Jacob Rus (jrus) * Date: 2009-08-15 02:30
And at Rietveld, patch version 5:
http://codereview.appspot.com/107042
msg91884 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-08-23 02:17
See also issue 6763.
msg93829 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2009-10-10 14:38
Putting this here for the record rather than leaving it in Rietveld:

I appreciate the desire for a cleaner API for handling mimetypes, but
this isn't the way to get it. Finding projects that have their own
mimetypes implementations, asking them why they created their own rather
than using the standard one, seeing what features are common to those
APIs, etc, are all things that need to be done before making major
changes to the standard library API.

What you see as a critical bug (custom MimeTypes instances inheriting
their initial settings from the mimetypes._db instance), you can bet
some developers are relying on as a feature. If code is in the standard
library, someone, somewhere, is relying on it working just the way it is
now. Even bug fixes can sometimes break code that was designed to work
around the presence of the bug.

The concept of having a master copy that new instances are cloned from
isn't even particularly objectionable, so long as people clearly
understand that is what is going on (e.g. this happens with
decimal.DefaultContext being used as the basis for new decimal.Context
instances).

With code this old, 'softly, softly' is the way to go, and the fewer
user visible changes in semantics the better.
msg128251 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-02-09 23:18
Thanks for working on cleaning up that module.  I have to agree with Nick though (see also minor comments on Rietveld): code in the stdlib just can’t move as freely as outside of it.

I’m updating the version to 3.3, given that this patch adds new features and refactors things (stable branches only get bug fixes).
msg209140 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-01-25 03:41
Note that I still believe there are substantial improvements that could be made without a wholesale rewrite of the module that poses significant backwards compatibility risks (just improving the documentation regarding how the list of types is populated could likely help some users, as would updating the default list we use if we can't retrieve one from the environment).

Alternatively, even if we can't get anyone interested in such a refactoring task, it may be feasible to introduce an improved mimetypes handling interface that is easier to maintain and keep up to date, again without risking backwards compatibility issues for users of the current module.

Some potentially relevant links for anyone wanting to investigate improving the standard library's MIME type support:

The discussions with Jacob in Rietveld regarding his original approach: https://codereview.appspot.com/107042

PyPI libraries:

https://pypi.python.org/pypi/mimeparse/
https://pypi.python.org/pypi/mime
https://pypi.python.org/pypi/zope.mimetype
https://pypi.python.org/pypi/Products.MimetypesRegistry (Jacob pointed this one out above)

The various PyPI wrappers around libmagic and the *nix "file" utility are also of potential interest for research purposes (but aren't especially useful on Windows, where those tools are significantly less likely to be available).
History
Date User Action Args
2014-01-25 03:41:52ncoghlansetmessages: + msg209140
versions: + Python 3.5, - Python 3.3
2011-02-09 23:18:09eric.araujosetnosy: ncoghlan, eric.araujo, r.david.murray, jab, jrus
messages: + msg128251
versions: + Python 3.3, - Python 2.7, Python 3.2
2011-02-09 22:42:27sandro.tosisetnosy: ncoghlan, eric.araujo, r.david.murray, jab, jrus
stage: patch review
2010-04-13 20:14:45eric.araujosetnosy: + eric.araujo
2009-10-10 14:38:51ncoghlansetmessages: + msg93829
2009-08-23 02:17:36r.david.murraysetnosy: + r.david.murray
messages: + msg91884
2009-08-22 13:40:28jabsetnosy: + jab
2009-08-15 02:30:37jrussetmessages: + msg91585
2009-08-15 02:21:04jrussetfiles: + mimetypes5.diff

messages: + msg91583
2009-08-12 04:41:31jrussetfiles: + mimetypes4.diff
2009-08-12 04:40:30jrussetfiles: - mimetypes4.diff
2009-08-11 22:58:56jrussetmessages: + msg91489
2009-08-07 16:19:52ncoghlansetnosy: + ncoghlan
2009-08-02 21:52:43jrussetfiles: + apache_mimetypes.py

messages: + msg91208
2009-08-02 20:43:46jrussetfiles: + mimetypes4.diff

messages: + msg91205
2009-08-02 20:26:59jrussetfiles: + mimetypes2.diff

messages: + msg91204
2009-08-02 20:24:29jrussetfiles: - mimetypes-2.diff
2009-08-02 20:23:48jrussetfiles: + mimetypes3.diff

messages: + msg91203
2009-08-02 20:08:56jrussetfiles: + mimetypes-2.diff
keywords: + patch
messages: + msg91200
2009-08-02 19:19:22jruscreate