Message 82992 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	aronacher
Recipients	aronacher
Date	2009-03-01.23:47:58
SpamBayes Score	2.6017638e-10
Marked as misclassified	No
Message-id	<1235951282.21.0.0432661498186.issue5401@psf.upfronthosting.co.za>
In-reply-to

Content
Sorry for the harsh words, but when I found that code I nearly freaked out. For all those years I was using "from mimetypes import guess_type" until today I found out that this has horrendous performance problems due to the fact that the mimetype database is re-parsed on each call. The reason for this is that mimetypes.guess_type is implemented like this: def guess_type(...): global guess_type init() guess_type = new_guess_type return guess_type(...) Obviously if the function was imported from the module and not looked up via standard attribute lookup before each call (by calling it like mimetypes.guess_type(...)) init() would be called over and over again. What's the performance impact? In a small WSGI middleware that serves static files the total performance impact (including HTTP header parsing, file serving etc.) was 1000%. Just for guess_type() versus mimetypes.guess_type() which was called just once per request. I attached a workaround for that problem that tries to avoid init() calls after the thing was initialized. If this is intended behaviour it should be documented but I doubt that this is a good idea as people don't read documentation it stuff seems to work. And google tells me I'm not the first one who invoked guess_type that way: http://google.com/codesearch?q="from+mimetypes+import+guess_type"

Sorry for the harsh words, but when I found that code I nearly freaked
out.  For all those years I was using "from mimetypes import guess_type"
until today I found out that this has horrendous performance problems
due to the fact that the mimetype database is re-parsed on each call.

The reason for this is that mimetypes.guess_type is implemented like this:

def guess_type(...):
    global guess_type
    init()
    guess_type = new_guess_type
    return guess_type(...)

Obviously if the function was imported from the module and not looked up
via standard attribute lookup before each call (by calling it like
mimetypes.guess_type(...)) init() would be called over and over again.

What's the performance impact?  In a small WSGI middleware that serves
static files the *total* performance impact (including HTTP header
parsing, file serving etc.) was 1000%.  Just for guess_type() versus
mimetypes.guess_type() which was called just once per request.

I attached a workaround for that problem that tries to avoid init()
calls after the thing was initialized.

If this is intended behaviour it should be documented but I doubt that
this is a good idea as people don't read documentation it stuff seems to
work.

And google tells me I'm not the first one who invoked guess_type that
way: http://google.com/codesearch?q="from+mimetypes+import+guess_type"

History
Date	User	Action	Args
2009-03-01 23:48:02	aronacher	set	recipients: + aronacher
2009-03-01 23:48:02	aronacher	set	messageid: <1235951282.21.0.0432661498186.issue5401@psf.upfronthosting.co.za>
2009-03-01 23:48:00	aronacher	link	issue5401 messages
2009-03-01 23:47:59	aronacher	create