classification
Title: Make SimpleHTTPRequestHandler load mimetypes lazily
Type: performance Stage: patch review
Components: Library (Lib), Windows Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: danishprakash, paul.moore, pmpp, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: easy, patch

Created on 2018-11-21 17:41 by steve.dower, last changed 2018-12-08 23:55 by steve.dower.

Pull Requests
URL Status Linked Edit
PR 11036 open onyb, 2018-12-08 11:36
Messages (12)
msg330212 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-11-21 17:41
Importing http.server triggers mimetypes.init(), which can be fairly expensive on Windows (and potentially other platforms?) due to having to enumerate a lot of registry keys.

Instead, SimpleHTTPRequestHandler.extensions_map can be a dict with just its default values in lib/http/server.py and the first call to guess_type() can initialize mimetypes if necessary.

We should check whether people read from SimpleHTTPRequestHandler.extensions_map directly before calling guess_type(), and decide how quickly we can make the change. My gut feeling is we will be okay to make this in the next release but probably shouldn't backport it.
msg330292 - (view) Author: Danish Prakash (danishprakash) * Date: 2018-11-23 02:52
I would like to work on this if you have not already started, Steve. 

> We should check whether people read from SimpleHTTPRequestHandler.extensions_map directly before calling guess_type(), and decide how quickly we can make the change.

Are you implying we make the change based on whether someone use SimpleHTTPRequestHandler.extensions_map before calling guess_type()? Could you please elaborate that a bit, thanks.
msg330322 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-11-23 13:21
You're welcome to it - I deliberately left it for someone else to work on, though I'm happy to review and merge.

The visible change of making this lazy is that if someone reads from the dict, it'll be missing system-specified content types (until we lazily fill it in guess_type() ). Nobody is meant to read from it - it's public for people to *add* custom overrides - so this should be okay, but a little bit of GitHub research to find out if anyone does wouldn't hurt. Then we can at least warn them that it's changing before they run into their own bug.
msg331381 - (view) Author: pmpp (pmpp) * Date: 2018-12-08 12:05
> and potentially other platforms?
strangely, it does not.


but adding a blocking read on first requests may ruin profiling data on any platform.

isn't http.server reserved for instrumentation and testing ?
msg331386 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-12-08 15:17
Instrumentation is not the same thing as profiling, and this one is certainly not intended for performance. That said, simply importing it shouldn't impact anyone else's performance either, and since six imports this, fixing it will have a pretty broad impact.

Thanks for the PR - I'll look when I get a chance, but most of my time is going towards the next 3.7 release right now. Feel free to ping me on here in a week if nobody has had a look.
msg331392 - (view) Author: pmpp (pmpp) * Date: 2018-12-08 21:20
??? i never compared Instrumentation and profiling

and what getting delta timing about the script behind the first http request has to do with performance ? <= that's actually another question 

thefirst was : isn't http.server reserved for instrumentation and testing (because lazy loading seems a production feature to me ) ?
msg331394 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-12-08 22:13
If you weren't conflating profiling and instrumentation then your original comment makes no sense.
msg331395 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-12-08 22:17
(Sorry, page submitted while I was still typing for some reason.)

... Instrumentation for everything other than time is unaffected, so if you didn't mean to conflate them, then no there is no impact on the documented use.

Yes, moving mimetypes.init to the first request will make the first request slower. However, it makes *import* faster, and there should not be a significant penalty just for importing this module.

You can trivially front-load the work in your app by calling mimetypes.init yourself, which will reduce the overhead on first request to a dict copy (I assume, haven't looked at the PR yet). Or just don't measure the first request - it's typical to do some warm-up loops when profiling.
msg331396 - (view) Author: pmpp (pmpp) * Date: 2018-12-08 22:21
the PR as it is actually add a blocking read (and maybe exceptions or whatever mimetypes modules decide to do) in processing of the first request to server, which has nothing to do with the code serving that request.

I agree that makes no sense at all.
msg331398 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-12-08 23:31
> I agree that makes no sense at all.

I have to admit I have no idea what you're trying to argue or suggesting as an alternative. Could you try clearly explaining the change you'd rather see to fix the issue?
msg331399 - (view) Author: pmpp (pmpp) * Date: 2018-12-08 23:45
sorry i was on my free time when enumerating profiling/instrumentation and testing. given now you pay attention i'll try to explain more.

instrumentation : what does that server actually support ? eg writing a test for a specific mimitype support eg https://bugs.python.org/issue35403 

profiling: i just want to time the first request to a service, ie without learning all the art of warming up a *one liner* http server with bare hands.

testing: i want to test the servicing code, not receive error because mimetypes module may have failed late and server should not have started *at all* since it is a critical component of it.




how to fix the issue ? as i said earlier, strangely "other platforms" don't have that issue so maybe the problem is in mimetypes module. 

So my position is "keep it that way" and investigate the slowdown at import on concerned platform.
msg331400 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-12-08 23:55
So, you're suggesting doing a lazy mimetypes.init() on Windows and not on the others? There's no reason to have such a semantic difference between platforms, and since the whole point of mimetypes.init() is to avoid initializing on import, it makes the most sense to avoid initializing on import.

In fact, it probably makes the *most* sense for http.server to have its own overrides, but otherwise fall back to the mimetypes function when needed. Bypassing the public API doesn't really add anything other than fragility.
History
Date User Action Args
2018-12-08 23:55:35steve.dowersetmessages: + msg331400
2018-12-08 23:45:16pmppsetmessages: + msg331399
2018-12-08 23:31:52steve.dowersetmessages: + msg331398
2018-12-08 22:21:19pmppsetmessages: + msg331396
2018-12-08 22:17:16steve.dowersetmessages: + msg331395
2018-12-08 22:13:48steve.dowersetmessages: + msg331394
2018-12-08 21:20:39pmppsetmessages: + msg331392
2018-12-08 15:17:50steve.dowersetmessages: + msg331386
2018-12-08 12:05:57pmppsetnosy: + pmpp
messages: + msg331381
2018-12-08 11:36:16onybsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request10273
2018-11-23 13:21:11steve.dowersetmessages: + msg330322
2018-11-23 02:52:38danishprakashsetnosy: + danishprakash
messages: + msg330292
2018-11-21 17:41:40steve.dowercreate