This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: mimetypes.MAGIC_FUNCTION performance problems
Type: Stage: patch review
Components: Library (Lib) Versions: Python 3.0, Python 2.4, Python 3.1, Python 2.7, Python 2.6, Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: aronacher, benjamin.peterson, georg.brandl
Priority: critical Keywords: easy, needs review, patch

Created on 2009-03-01 23:48 by aronacher, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
mimetypes-speedup.diff aronacher, 2009-03-01 23:47 A patch that works around the problem by not calling init() after prior initialization
Messages (3)
msg82992 - (view) Author: Armin Ronacher (aronacher) * (Python committer) Date: 2009-03-01 23:47
Sorry for the harsh words, but when I found that code I nearly freaked
out.  For all those years I was using "from mimetypes import guess_type"
until today I found out that this has horrendous performance problems
due to the fact that the mimetype database is re-parsed on each call.

The reason for this is that mimetypes.guess_type is implemented like this:

def guess_type(...):
    global guess_type
    init()
    guess_type = new_guess_type
    return guess_type(...)

Obviously if the function was imported from the module and not looked up
via standard attribute lookup before each call (by calling it like
mimetypes.guess_type(...)) init() would be called over and over again.

What's the performance impact?  In a small WSGI middleware that serves
static files the *total* performance impact (including HTTP header
parsing, file serving etc.) was 1000%.  Just for guess_type() versus
mimetypes.guess_type() which was called just once per request.

I attached a workaround for that problem that tries to avoid init()
calls after the thing was initialized.

If this is intended behaviour it should be documented but I doubt that
this is a good idea as people don't read documentation it stuff seems to
work.

And google tells me I'm not the first one who invoked guess_type that
way: http://google.com/codesearch?q="from+mimetypes+import+guess_type"
msg82993 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2009-03-01 23:53
Wah, that's really a horrible way to implement this caching.
msg82997 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-03-02 03:35
Well, that was embarrassing! Fixed in r70086.
History
Date User Action Args
2022-04-11 14:56:46adminsetgithub: 49651
2009-03-02 03:35:37benjamin.petersonsetstatus: open -> closed
keywords: patch, patch, easy, needs review
messages: + msg82997
resolution: fixed
nosy: + benjamin.peterson
2009-03-01 23:53:11georg.brandlsetkeywords: patch, patch, easy, needs review
nosy: + georg.brandl
messages: + msg82993
2009-03-01 23:51:14aronachersetkeywords: patch, patch, easy, needs review
title: mimetypes.MAGIC_FUNCTION implementation clusterfuck -> mimetypes.MAGIC_FUNCTION performance problems
2009-03-01 23:48:00aronachercreate