This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: mimetypes - "strict" on Windows
Type: behavior Stage: patch review
Components: Library (Lib) Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Norman Lorrain, brandonschabell, markdtw, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2021-05-17 19:54 by Norman Lorrain, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 27088 open brandonschabell, 2021-07-12 00:10
Messages (3)
msg393826 - (view) Author: Norman Lorrain (Norman Lorrain) * Date: 2021-05-17 19:54
On Windows 10 machine, unit tests show this error:

    0:05:10 load avg: 3.24 [221/427/1] test_mimetypes failed
    test test_mimetypes failed -- Traceback (most recent call last):
    File "D:\github\cpython\lib\test\test_mimetypes.py", line 289, in test_guess_type
        eq(type_info, "I don't know anything about type foo.pic")
    AssertionError: 'type: image/pict encoding: None' != "I don't know anything about type foo.pic"
    - type: image/pict encoding: None
    + I don't know anything about type foo.pic

The test is verifying that the code reports `image/pict` as a *non-standard* MIME type:

    def test_guess_type(self):
        eq = self.assertEqual

        type_info = self.mimetypes_cmd("-l", "foo.pic")
        eq(type_info, "type: image/pict encoding: None")

        type_info = self.mimetypes_cmd("foo.pic")
        eq(type_info, "I don't know anything about type foo.pic")

Looking in my registry, I see the entry for `.pic`

    [HKEY_CLASSES_ROOT\.pic]
    @="QuickTime.pic"
    "Content Type"="image/pict"
    ...etc

The module seems to report everything it finds in the registry as "strict"
msg393827 - (view) Author: Norman Lorrain (Norman Lorrain) * Date: 2021-05-17 20:07
Possible solution is to read the Windows Registry entries, assigning those entries as "strict=False".  Unit tests pass with this change:

diff --git a/Lib/mimetypes.py b/Lib/mimetypes.py
index 018793c4f0..dd2bddf064 100644
--- a/Lib/mimetypes.py
+++ b/Lib/mimetypes.py
@@ -350,7 +350,7 @@ def init(files=None):
     if files is None or _db is None:
         db = MimeTypes()
         if _winreg:
-            db.read_windows_registry()
+            db.read_windows_registry(strict = False)
 
         if files is None:
             files = knownfiles
msg416461 - (view) Author: Mark Dong (markdtw) Date: 2022-03-31 21:55
Hi, 

I want to follow up on this:

On Linux (Ubuntu 20.04.4 LTS), the module also loads everything it finds in the registries (a.k.a, entries in the "knownfiles" variable) in "strict" mode, even though some of them aren't registered in IANA. (I'm assuming that "registered in IANA" means everything in here only: https://www.iana.org/assignments/media-types/media-types.xhtml)

For example, ".com" is recognized as having mimetype "applications/x-msdos-program". This becomes problematic when an unparsed URL, such as "http://abc.efg/hij.html#http://abc.com", is fed into guess_type.

I'm wondering if we should make the documentation clearer and state that "strict=True" means using IANA registered types along with the types found on the machine, it seems like this is the expected behavior based on the comments in "def _default_mime_types()", or we should actually move everything other than IANA registered types out of strict mode.

Best regards,
Mark
History
Date User Action Args
2022-04-11 14:59:45adminsetgithub: 88325
2022-03-31 21:55:41markdtwsetversions: - Python 3.11
nosy: + markdtw

messages: + msg416461

components: - Windows
2021-07-12 00:10:05brandonschabellsetkeywords: + patch
nosy: + brandonschabell

pull_requests: + pull_request25636
stage: patch review
2021-05-18 02:12:01ned.deilysetnosy: + paul.moore, tim.golden, zach.ware, steve.dower
components: + Windows
2021-05-17 20:07:31Norman Lorrainsetmessages: + msg393827
2021-05-17 19:54:58Norman Lorraincreate