Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

micro-optimization of PyLong_FromSize_t() #81983

Closed
sir-sigurd mannequin opened this issue Aug 9, 2019 · 4 comments
Closed

micro-optimization of PyLong_FromSize_t() #81983

sir-sigurd mannequin opened this issue Aug 9, 2019 · 4 comments
Labels
3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@sir-sigurd
Copy link
Mannequin

sir-sigurd mannequin commented Aug 9, 2019

BPO 37802
Nosy @gpshead, @vstinner, @gnprice, @sir-sigurd
PRs
  • bpo-37802: Slightly improve perfomance of PyLong_FromSize_t() #15192
  • bpo-37802: Fix a compiler warning in longobject.c #16517
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-09-12.14:42:39.846>
    created_at = <Date 2019-08-09.13:27:40.684>
    labels = ['interpreter-core', '3.9', 'performance']
    title = 'micro-optimization of PyLong_FromSize_t()'
    updated_at = <Date 2019-10-01.11:29:56.281>
    user = 'https://github.com/sir-sigurd'

    bugs.python.org fields:

    activity = <Date 2019-10-01.11:29:56.281>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-09-12.14:42:39.846>
    closer = 'gregory.p.smith'
    components = ['Interpreter Core']
    creation = <Date 2019-08-09.13:27:40.684>
    creator = 'sir-sigurd'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 37802
    keywords = ['patch']
    message_count = 4.0
    messages = ['349285', '350446', '352184', '353675']
    nosy_count = 4.0
    nosy_names = ['gregory.p.smith', 'vstinner', 'Greg Price', 'sir-sigurd']
    pr_nums = ['15192', '16517']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue37802'
    versions = ['Python 3.9']

    @sir-sigurd
    Copy link
    Mannequin Author

    sir-sigurd mannequin commented Aug 9, 2019

    Currently PyLong_FromSize_t() uses PyLong_FromLong() for values < PyLong_BASE. It's suboptimal because PyLong_FromLong() needs to handle the sign. Removing PyLong_FromLong() call and handling small ints directly in PyLong_FromSize_t() makes it faster:

    $ python -m perf timeit -s "from itertools import repeat; _len = repeat(None, 2).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=10000
    /home/sergey/tmp/cpython-master/venv/bin/python: ..................... 18.7 ns +- 0.3 ns
    /home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 16.7 ns +- 0.1 ns
    Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 18.7 ns +- 0.3 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 16.7 ns +- 0.1 ns: 1.12x faster (-10%)
    
    $ python -m perf timeit -s "from itertools import repeat; _len = repeat(None, 2**10).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=10000
    /home/sergey/tmp/cpython-master/venv/bin/python: ..................... 26.2 ns +- 0.0 ns
    /home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 25.0 ns +- 0.7 ns
    Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 26.2 ns +- 0.0 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 25.0 ns +- 0.7 ns: 1.05x faster (-5%)
    
    $ python -m perf timeit -s "from itertools import repeat; _len = repeat(None, 2**30).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=10000
    /home/sergey/tmp/cpython-master/venv/bin/python: ..................... 25.6 ns +- 0.1 ns
    /home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 25.6 ns +- 0.0 ns
    Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 25.6 ns +- 0.1 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 25.6 ns +- 0.0 ns: 1.00x faster (-0%)

    This change makes PyLong_FromSize_t() consistently faster than PyLong_FromSsize_t(). So it might make sense to replace PyLong_FromSsize_t() with PyLong_FromSize_t() in __length_hint__() implementations and other similar cases. For example:

    $ python -m perf timeit -s "_len = iter(bytes(2)).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=10000
    /home/sergey/tmp/cpython-master/venv/bin/python: ..................... 19.4 ns +- 0.3 ns
    /home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 17.3 ns +- 0.1 ns
    Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 19.4 ns +- 0.3 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 17.3 ns +- 0.1 ns: 1.12x faster (-11%)
    
    $ python -m perf timeit -s "_len = iter(bytes(2**10)).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=10000
    /home/sergey/tmp/cpython-master/venv/bin/python: ..................... 26.3 ns +- 0.1 ns
    /home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 25.3 ns +- 0.2 ns
    Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 26.3 ns +- 0.1 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 25.3 ns +- 0.2 ns: 1.04x faster (-4%)
    
    $ python -m perf timeit -s "_len = iter(bytes(2**30)).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=10000
    /home/sergey/tmp/cpython-master/venv/bin/python: ..................... 27.6 ns +- 0.1 ns
    /home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 26.0 ns +- 0.1 ns
    Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 27.6 ns +- 0.1 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 26.0 ns +- 0.1 ns: 1.06x faster (-6%)

    @sir-sigurd sir-sigurd mannequin added 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Aug 9, 2019
    @sir-sigurd
    Copy link
    Mannequin Author

    sir-sigurd mannequin commented Aug 25, 2019

    Previous benchmarks results were obtained with non-LTO build.

    Here are results for LTO build:

    $ python -m perf timeit -s "from itertools import repeat; _len = repeat(None, 0).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=1000
    /home/sergey/tmp/cpython-master/venv/bin/python: ..................... 14.9 ns +- 0.2 ns
    /home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 13.1 ns +- 0.5 ns
    Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 14.9 ns +- 0.2 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 13.1 ns +- 0.5 ns: 1.13x faster (-12%)
    
    $ python -m perf timeit -s "from itertools import repeat; _len = repeat(None, 2**10).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=1000
    /home/sergey/tmp/cpython-master/venv/bin/python: ..................... 22.1 ns +- 0.1 ns
    /home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 20.9 ns +- 0.4 ns
    Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 22.1 ns +- 0.1 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 20.9 ns +- 0.4 ns: 1.05x faster (-5%)
    
    $ python -m perf timeit -s "from itertools import repeat; _len = repeat(None, 2**30).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=1000
    /home/sergey/tmp/cpython-master/venv/bin/python: ..................... 23.3 ns +- 0.0 ns
    /home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 21.6 ns +- 0.1 ns
    Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 23.3 ns +- 0.0 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 21.6 ns +- 0.1 ns: 1.08x faster (-8%)
    
    $ python -m perf timeit -s "from itertools import repeat; _len = repeat(None, 2**60).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=1000
    /home/sergey/tmp/cpython-master/venv/bin/python: ..................... 24.4 ns +- 0.1 ns
    /home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 22.7 ns +- 0.1 ns
    Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 24.4 ns +- 0.1 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 22.7 ns +- 0.1 ns: 1.08x faster (-7%)

    @gpshead
    Copy link
    Member

    gpshead commented Sep 12, 2019

    New changeset c6734ee by Gregory P. Smith (Sergey Fedoseev) in branch 'master':
    bpo-37802: Slightly improve perfomance of PyLong_FromUnsigned*() (GH-15192)
    c6734ee

    @gpshead gpshead closed this as completed Sep 12, 2019
    @vstinner
    Copy link
    Member

    vstinner commented Oct 1, 2019

    New changeset 6314abc by Victor Stinner in branch 'master':
    bpo-37802: Fix a compiler warning in longobject.c (GH-16517)
    6314abc

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants