Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SimpleHTTPRequestHandler load mimetypes lazily #79473

Closed
zooba opened this issue Nov 21, 2018 · 15 comments
Closed

Make SimpleHTTPRequestHandler load mimetypes lazily #79473

zooba opened this issue Nov 21, 2018 · 15 comments
Labels
3.9 only security fixes easy OS-windows performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@zooba
Copy link
Member

zooba commented Nov 21, 2018

BPO 35292
Nosy @pfmoore, @tjguk, @pmp-p, @zware, @zooba, @demianbrecht, @danishprakash
PRs
  • bpo-35292: Lazily inject system mime.types into SimpleHTTPRequestHandler.extensions_map #11036
  • bpo-35292: Avoid call mimetypes.init when http.server is imported #17822
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-01-08.18:31:52.576>
    created_at = <Date 2018-11-21.17:41:40.354>
    labels = ['easy', 'library', '3.9', 'OS-windows', 'performance']
    title = 'Make SimpleHTTPRequestHandler load mimetypes lazily'
    updated_at = <Date 2020-01-08.18:31:52.575>
    user = 'https://github.com/zooba'

    bugs.python.org fields:

    activity = <Date 2020-01-08.18:31:52.575>
    actor = 'steve.dower'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-01-08.18:31:52.576>
    closer = 'steve.dower'
    components = ['Library (Lib)', 'Windows']
    creation = <Date 2018-11-21.17:41:40.354>
    creator = 'steve.dower'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 35292
    keywords = ['patch', 'easy']
    message_count = 15.0
    messages = ['330212', '330292', '330322', '331381', '331386', '331392', '331394', '331395', '331396', '331398', '331399', '331400', '331739', '359621', '359622']
    nosy_count = 7.0
    nosy_names = ['paul.moore', 'tim.golden', 'pmpp', 'zach.ware', 'steve.dower', 'demian.brecht', 'danishprakash']
    pr_nums = ['11036', '17822']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue35292'
    versions = ['Python 3.9']

    @zooba
    Copy link
    Member Author

    zooba commented Nov 21, 2018

    Importing http.server triggers mimetypes.init(), which can be fairly expensive on Windows (and potentially other platforms?) due to having to enumerate a lot of registry keys.

    Instead, SimpleHTTPRequestHandler.extensions_map can be a dict with just its default values in lib/http/server.py and the first call to guess_type() can initialize mimetypes if necessary.

    We should check whether people read from SimpleHTTPRequestHandler.extensions_map directly before calling guess_type(), and decide how quickly we can make the change. My gut feeling is we will be okay to make this in the next release but probably shouldn't backport it.

    @zooba zooba added 3.8 only security fixes stdlib Python modules in the Lib dir OS-windows easy performance Performance or resource usage labels Nov 21, 2018
    @danishprakash
    Copy link
    Mannequin

    danishprakash mannequin commented Nov 23, 2018

    I would like to work on this if you have not already started, Steve.

    We should check whether people read from SimpleHTTPRequestHandler.extensions_map directly before calling guess_type(), and decide how quickly we can make the change.

    Are you implying we make the change based on whether someone use SimpleHTTPRequestHandler.extensions_map before calling guess_type()? Could you please elaborate that a bit, thanks.

    @zooba
    Copy link
    Member Author

    zooba commented Nov 23, 2018

    You're welcome to it - I deliberately left it for someone else to work on, though I'm happy to review and merge.

    The visible change of making this lazy is that if someone reads from the dict, it'll be missing system-specified content types (until we lazily fill it in guess_type() ). Nobody is meant to read from it - it's public for people to *add* custom overrides - so this should be okay, but a little bit of GitHub research to find out if anyone does wouldn't hurt. Then we can at least warn them that it's changing before they run into their own bug.

    @pmp-p
    Copy link
    Mannequin

    pmp-p mannequin commented Dec 8, 2018

    and potentially other platforms?
    strangely, it does not.

    but adding a blocking read on first requests may ruin profiling data on any platform.

    isn't http.server reserved for instrumentation and testing ?

    @zooba
    Copy link
    Member Author

    zooba commented Dec 8, 2018

    Instrumentation is not the same thing as profiling, and this one is certainly not intended for performance. That said, simply importing it shouldn't impact anyone else's performance either, and since six imports this, fixing it will have a pretty broad impact.

    Thanks for the PR - I'll look when I get a chance, but most of my time is going towards the next 3.7 release right now. Feel free to ping me on here in a week if nobody has had a look.

    @pmp-p
    Copy link
    Mannequin

    pmp-p mannequin commented Dec 8, 2018

    ??? i never compared Instrumentation and profiling

    and what getting delta timing about the script behind the first http request has to do with performance ? <= that's actually another question

    thefirst was : isn't http.server reserved for instrumentation and testing (because lazy loading seems a production feature to me ) ?

    @zooba
    Copy link
    Member Author

    zooba commented Dec 8, 2018

    If you weren't conflating profiling and instrumentation then your original comment makes no sense.

    @zooba
    Copy link
    Member Author

    zooba commented Dec 8, 2018

    (Sorry, page submitted while I was still typing for some reason.)

    ... Instrumentation for everything other than time is unaffected, so if you didn't mean to conflate them, then no there is no impact on the documented use.

    Yes, moving mimetypes.init to the first request will make the first request slower. However, it makes *import* faster, and there should not be a significant penalty just for importing this module.

    You can trivially front-load the work in your app by calling mimetypes.init yourself, which will reduce the overhead on first request to a dict copy (I assume, haven't looked at the PR yet). Or just don't measure the first request - it's typical to do some warm-up loops when profiling.

    @pmp-p
    Copy link
    Mannequin

    pmp-p mannequin commented Dec 8, 2018

    the PR as it is actually add a blocking read (and maybe exceptions or whatever mimetypes modules decide to do) in processing of the first request to server, which has nothing to do with the code serving that request.

    I agree that makes no sense at all.

    @zooba
    Copy link
    Member Author

    zooba commented Dec 8, 2018

    I agree that makes no sense at all.

    I have to admit I have no idea what you're trying to argue or suggesting as an alternative. Could you try clearly explaining the change you'd rather see to fix the issue?

    @pmp-p
    Copy link
    Mannequin

    pmp-p mannequin commented Dec 8, 2018

    sorry i was on my free time when enumerating profiling/instrumentation and testing. given now you pay attention i'll try to explain more.

    instrumentation : what does that server actually support ? eg writing a test for a specific mimitype support eg https://bugs.python.org/issue35403

    profiling: i just want to time the first request to a service, ie without learning all the art of warming up a *one liner* http server with bare hands.

    testing: i want to test the servicing code, not receive error because mimetypes module may have failed late and server should not have started *at all* since it is a critical component of it.

    how to fix the issue ? as i said earlier, strangely "other platforms" don't have that issue so maybe the problem is in mimetypes module.

    So my position is "keep it that way" and investigate the slowdown at import on concerned platform.

    @zooba
    Copy link
    Member Author

    zooba commented Dec 8, 2018

    So, you're suggesting doing a lazy mimetypes.init() on Windows and not on the others? There's no reason to have such a semantic difference between platforms, and since the whole point of mimetypes.init() is to avoid initializing on import, it makes the most sense to avoid initializing on import.

    In fact, it probably makes the *most* sense for http.server to have its own overrides, but otherwise fall back to the mimetypes function when needed. Bypassing the public API doesn't really add anything other than fragility.

    @zooba
    Copy link
    Member Author

    zooba commented Dec 13, 2018

    I have another idea - what if we checked self.extensions_map for overrides but otherwise just called mimetypes.guess_type(path) in the guess_type method? That would be the same behaviour as initializing it lazily, but we don't have to worry about initializing at all, since guess_type() will do it.

    I also did a bit of a GitHub search (which I generally criticize, but it's a pretty good way to evaluate whether something is in _common_ use). Mostly it looks like code will be unaffected by this change - people set values in extensions_map, but don't expect to be able to read global values out of it.

    @zooba
    Copy link
    Member Author

    zooba commented Jan 8, 2020

    New changeset 5907e61 by Steve Dower (An Long) in branch 'master':
    bpo-35292: Avoid calling mimetypes.init when http.server is imported (GH-17822)
    5907e61

    @zooba
    Copy link
    Member Author

    zooba commented Jan 8, 2020

    Thanks for the patch!

    @zooba zooba added 3.9 only security fixes and removed 3.8 only security fixes labels Jan 8, 2020
    @zooba zooba closed this as completed Jan 8, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    facebook-github-bot pushed a commit to facebook/sapling that referenced this issue Jul 19, 2022
    Summary: Python 3.9 has a fix (python/cpython#79473) for a slow mimetype import. This fix saves 300ms on Python startup for Mercurial.
    
    Reviewed By: jordanwebster
    
    Differential Revision: D37890474
    
    fbshipit-source-id: 04da3d3963ba53514b2fa71dd7f1c255740dc9c1
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes easy OS-windows performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant