Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncio uses too many threads by default #79460

Closed
VojtchBoek mannequin opened this issue Nov 19, 2018 · 9 comments
Closed

asyncio uses too many threads by default #79460

VojtchBoek mannequin opened this issue Nov 19, 2018 · 9 comments
Labels
3.7 (EOL) end of life performance Performance or resource usage topic-asyncio

Comments

@VojtchBoek
Copy link
Mannequin

VojtchBoek mannequin commented Nov 19, 2018

BPO 35279
Nosy @asvetlov, @methane, @1st1, @EpicWink
PRs
  • bpo-35279: reduce default max_workers of ThreadPoolExecutor #13618
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-05-28.12:03:20.311>
    created_at = <Date 2018-11-19.14:26:55.537>
    labels = ['expert-asyncio', '3.7', 'performance']
    title = 'asyncio uses too many threads by default'
    updated_at = <Date 2019-05-28.12:03:20.311>
    user = 'https://bugs.python.org/VojtchBoek'

    bugs.python.org fields:

    activity = <Date 2019-05-28.12:03:20.311>
    actor = 'methane'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-05-28.12:03:20.311>
    closer = 'methane'
    components = ['asyncio']
    creation = <Date 2018-11-19.14:26:55.537>
    creator = 'Vojt\xc4\x9bch Bo\xc4\x8dek'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 35279
    keywords = ['patch']
    message_count = 9.0
    messages = ['330101', '340013', '340015', '343688', '343733', '343737', '343740', '343743', '343766']
    nosy_count = 5.0
    nosy_names = ['asvetlov', 'methane', 'yselivanov', 'Vojt\xc4\x9bch Bo\xc4\x8dek', 'Epic_Wink']
    pr_nums = ['13618']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue35279'
    versions = ['Python 3.7']

    @VojtchBoek
    Copy link
    Mannequin Author

    VojtchBoek mannequin commented Nov 19, 2018

    By default, asyncio spawns as many as os.cpu_count() * 5 threads to run I/O on. When combined with beefy machines (e.g. kubernetes servers) with, says, 56 cores, it results in very high memory usage.

    This is amplified by the fact that the concurrent.futures.ThreadPoolExecutor threads are never killed, and are not re-used until max_workers threads are spawned.

    Workaround:

        loop.set_default_executor(concurrent.futures.ThreadPoolExecutor(max_workers=8))

    This is still not ideal as the program might not need max_workers threads, but they are still spawned anyway.

    I've hit this issue when running asyncio program in kubernetes. It created 260 idle threads and then ran out of memory.

    I think the default max_workers should be limited to some max value and ThreadPoolExecutor should not spawn new threads unless necessary.

    @VojtchBoek VojtchBoek mannequin added 3.7 (EOL) end of life topic-asyncio performance Performance or resource usage labels Nov 19, 2018
    @EpicWink
    Copy link
    Mannequin

    EpicWink mannequin commented Apr 12, 2019

    What about making it dependant on memory as well as logical processor count:

    n_workers = min(RAM_GB / some_number, N_CORES * 5)

    @methane
    Copy link
    Member

    methane commented Apr 12, 2019

    node.js default threadpool size is 4 regardless number of cores.
    https://nodejs.org/api/cli.html#cli_uv_threadpool_size_size

    Since we has GIL, I think fixed-size pool is better idea.

    @asvetlov
    Copy link
    Contributor

    asyncio uses bare concurrent.futures.ThreadPoolExecutor.
    Tha question is: should asyncio reduce the number of threads in the pool or concurrent.futures should change the default value?

    @methane
    Copy link
    Member

    methane commented May 28, 2019

    Current default value is decided in here:
    https://bugs.python.org/review/21527/#ps11902

    It seems there were no strong reason for current cpu_count * 5.
    I think cpu_count + 4 is better default value, because:

    • When people are using threadpool for CPU heavy job which releases GIL, workers >= cpu_count is good.
    • When people are using threadpool for multiplexing I/O, best workers number is vary on the workload. But I think 4~16 is good for typical case.

    This is amplified by the fact that the concurrent.futures.ThreadPoolExecutor threads are never killed, and are not re-used until max_workers threads are spawned.

    Note that this is fixed by bpo-24882.

    @asvetlov
    Copy link
    Contributor

    I'm ok with changing the default threads number limit.
    Not sure about numbers.
    If you want to limit to 16-20 that may be ok but cpu_count + 4 doesn't work in this case. On cloud servers, I see 128 or even more cores very often. 160+4 is not that you want to propose, sure.

    I insist on changing the default calculation schema in concurrent.futures, not in asyncio. There is no case for asyncio to be exceptional.

    @methane
    Copy link
    Member

    methane commented May 28, 2019

    If you want to limit to 16-20 that may be ok but cpu_count + 4 doesn't work in this case. On cloud servers, I see 128 or even more cores very often. 160+4 is not that you want to propose, sure.

    I proposed cpu_count + 4 because bpo-24882 almost fixed the problem of large maxworks.
    If you don't like it, how about min(32, cpu_count+4)?

    I insist on changing the default calculation schema in concurrent.futures, not in asyncio. There is no case for asyncio to be exceptional.

    Makes sense.

    @asvetlov
    Copy link
    Contributor

    how about min(32, cpu_count+4)?

    I think it produces reasonable good numbers for any CPU count.

    Do you have time for preparing a pull request?

    @methane
    Copy link
    Member

    methane commented May 28, 2019

    New changeset 9a7e5b1 by Inada Naoki in branch 'master':
    bpo-35279: reduce default max_workers of ThreadPoolExecutor (GH-13618)
    9a7e5b1

    @methane methane closed this as completed May 28, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life performance Performance or resource usage topic-asyncio
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants