This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author aronacher
Recipients aronacher
Date 2009-03-01.23:47:58
SpamBayes Score 2.6017638e-10
Marked as misclassified No
Message-id <1235951282.21.0.0432661498186.issue5401@psf.upfronthosting.co.za>
In-reply-to
Content
Sorry for the harsh words, but when I found that code I nearly freaked
out.  For all those years I was using "from mimetypes import guess_type"
until today I found out that this has horrendous performance problems
due to the fact that the mimetype database is re-parsed on each call.

The reason for this is that mimetypes.guess_type is implemented like this:

def guess_type(...):
    global guess_type
    init()
    guess_type = new_guess_type
    return guess_type(...)

Obviously if the function was imported from the module and not looked up
via standard attribute lookup before each call (by calling it like
mimetypes.guess_type(...)) init() would be called over and over again.

What's the performance impact?  In a small WSGI middleware that serves
static files the *total* performance impact (including HTTP header
parsing, file serving etc.) was 1000%.  Just for guess_type() versus
mimetypes.guess_type() which was called just once per request.

I attached a workaround for that problem that tries to avoid init()
calls after the thing was initialized.

If this is intended behaviour it should be documented but I doubt that
this is a good idea as people don't read documentation it stuff seems to
work.

And google tells me I'm not the first one who invoked guess_type that
way: http://google.com/codesearch?q="from+mimetypes+import+guess_type"
History
Date User Action Args
2009-03-01 23:48:02aronachersetrecipients: + aronacher
2009-03-01 23:48:02aronachersetmessageid: <1235951282.21.0.0432661498186.issue5401@psf.upfronthosting.co.za>
2009-03-01 23:48:00aronacherlinkissue5401 messages
2009-03-01 23:47:59aronachercreate