This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: mimetypes module racy
Type: behavior Stage: resolved
Components: Library (Lib) Versions:
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: dhess, steve.dower, terry.reedy, ukl
Priority: normal Keywords:

Created on 2020-04-01 18:01 by ukl, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (5)
msg365500 - (view) Author: Uwe Kleine-König (ukl) * Date: 2020-04-01 18:01

in a project using aiohttp with Python 3.5 as provided by Debian Stretch (3.5.3) I sometimes see a wrong mimetype assigned to .css files. When trying to create a minimal reproduction recipe a colleage and I came up with:

    import asyncio
    import sys
    from mimetypes import guess_type
    async def f():
        t = guess_type('foo.css')
        return t == ('text/css', None)
    async def main():
        done, pending = await asyncio.wait([
        return all(d.result() for d in done)
    if __name__ == '__main__':
        loop = asyncio.get_event_loop()
        if not loop.run_until_complete(main()):

We didn't see this exact code failing but something very similar and only once. Up to now we only tested on Python 3.5 as this is what is used in production.

By code inspection I found a race: In the module's guess_type function there is:

    if _db is None:
    return ...

It can happen here that init() is entered twice when the first context entered init() but gets preempted before setting _db.

However I failed to see how this can result in guess_type returning None (which is what we occasionally see in our production code).

Also the code in is rather convoluted with two different guards for not calling init (_db is None + not inited), init() updating various global variables and instantiating a MimeTypes object that depends on these variables, ... changed in master a few times, as I didn't spot the actual problem yet and the issue hardly reproduces I cannot tell if the problem still exists in newer versions of Python.

There are also some bug reports that seem related, I found reading and interesting.

Best regards
msg365741 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-04-04 03:29
3.5.3 is not the most recent 3.5.  Anyway, 3.5 and 3.6 and soon 3.7 only get security patches.  So it needs to be determined if there are failure with 3.8 and 3.9.
msg365759 - (view) Author: Uwe Kleine-König (ukl) * Date: 2020-04-04 11:15
I agree that 3.5 is ancient and the focus should be to fix the newer versions of Python.

But given that the problem seems to be hard to reproduce -- I have the reproducer script from the original report running under the tracer since over a week now without a hit -- it seems to be beneficial to me to understand the issue on 3.5 to then check if 3.8+ is also affected.

Also note that Lib/ didn't change between 3.5.3 and 3.5.9.
The difference between 3.5.x and 3.6.x in this file doesn't affect the code flow, in fact the first commit that has a change to actually change the behavior I saw between 3.5.0 and current master is bpo-4963 that only went into 3.9.x.

I added dhess and steve.dower who were involved in bpo-4963 to the nosy list, maybe one of them remembers or is able to quickly spot the problem.
msg365761 - (view) Author: David K. Hess (dhess) * Date: 2020-04-04 12:10
I’m not sure I can shed any light on this particular bug, but I would say that based on my dealings with this module, it is definitely not thread-safe. That means that if you are going to have multiple threads accessing it simultaneously, you really should have a mutex around that access ensuring only one thread is running through the code in this module at a time. 

Now in reality, asyncio and other cooperatively scheduled multi-processing packages like gevent are not going to unpredictably yield control to another thread like true threads will. So, in this particular case, since the init code doesn’t use async or await, I don’t think there is a chance of an initialization race bug there. 

As to the bug witnessed, the only thing I can suggest is to add a considerable amount of debugging that logs the argument to guess_type and prints out the mimetype module’s internal state if and when this happens again. My best guess based on the amount of work that method does to inspect the passed in url, is that it has something to do with the url itself.
msg405924 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-11-08 03:22
Without assurance that the problem exists in current python, let alone a reproducing code, there is nothing we can do.  If this or a related problem occurs again, this can be reopened, or a new issue started.
Date User Action Args
2022-04-11 14:59:28adminsetgithub: 84320
2021-11-08 03:22:15terry.reedysetstatus: open -> closed
resolution: out of date
messages: + msg405924

stage: resolved
2020-04-04 12:10:23dhesssetmessages: + msg365761
2020-04-04 11:15:17uklsetnosy: + steve.dower, dhess
messages: + msg365759
2020-04-04 03:29:28terry.reedysetnosy: + terry.reedy

messages: + msg365741
versions: - Python 3.5
2020-04-01 18:01:20uklcreate