Title: Mimetype module duplicates
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7, Python 3.6, Python 3.5, Python 2.7
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Dominik Czarnota, Jeffrey.Kintscher
Priority: normal Keywords:

Created on 2019-07-09 13:59 by Dominik Czarnota, last changed 2019-07-16 20:10 by Jeffrey.Kintscher.

Messages (3)
msg347562 - (view) Author: disconnect3d (Dominik Czarnota) * Date: 2019-07-09 13:59
The mimetype builtin module allows users to guess extension for a given mimetype through the `mimetypes.guess_extension` function.

Default mimetypes are stored in `types_map` and `_types_map_default` dictionaries that maps extensions to mimetypes. Those dictionaries are created by `_default_mime_types` function in `cpython/Lib/`.

If a given extension have more than one mimetype, this information is lost.
This happens currently for ".bmp" extension in CPython's codebase.

This can be seen in the linked code below:

Here is an example in an interactive IPython session:
In [1]: import mimetypes

In [2]: mimetypes.guess_extension('image/bmp')
Out[2]: '.bmp'

In [3]: mimetypes.guess_extension('image/x-ms-bmp')

In [4]:

The issue has been found by using Semmle's LGTM:

PS / offtopic / loud thinking: Maybe there should be a debug build of CPython that would detect such key overwrites during dicts initialisation and warn about them?
msg347563 - (view) Author: disconnect3d (Dominik Czarnota) * Date: 2019-07-09 14:03
To be more specific and to keep this information historically, the .bmp registers two mimetypes - 'image/bmp' and 'image/x-ms-bmp'.

Below a part of the relevant code.
    types_map = _types_map_default = {
        # (...)
        '.bmp'    : 'image/bmp',
        '.gif'    : 'image/gif',
        '.ief'    : 'image/ief',
        '.jpg'    : 'image/jpeg',
        '.jpe'    : 'image/jpeg',
        '.jpeg'   : 'image/jpeg',
        '.png'    : 'image/png',
        '.svg'    : 'image/svg+xml',
        '.tiff'   : 'image/tiff',
        '.tif'    : 'image/tiff',
        '.ico'    : 'image/',
        '.ras'    : 'image/x-cmu-raster',
        '.bmp'    : 'image/x-ms-bmp',
msg348039 - (view) Author: Jeffrey Kintscher (Jeffrey.Kintscher) * Date: 2019-07-16 20:10
This appears to have been fixed by issue #4963 and backported to the 3.7 and 3.8 branches:

Python 3.7.4+ (heads/3.7-dirty:e7bec26937, Jul 16 2019, 12:53:26) 
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import mimetypes
>>> mimetypes.guess_extension('image/bmp')
>>> mimetypes.guess_extension('image/x-ms-bmp')
Date User Action Args
2019-07-16 20:10:32Jeffrey.Kintschersetmessages: + msg348039
2019-07-11 06:48:14Jeffrey.Kintschersetnosy: + Jeffrey.Kintscher
2019-07-09 14:03:11Dominik Czarnotasetmessages: + msg347563
2019-07-09 13:59:21Dominik Czarnotacreate