Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline integer division #49762

Closed
mdickinson opened this issue Mar 18, 2009 · 16 comments
Closed

Streamline integer division #49762

mdickinson opened this issue Mar 18, 2009 · 16 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@mdickinson
Copy link
Member

BPO 5512
Nosy @mdickinson, @pitrou, @vstinner
Files
  • faster_integer_division.patch
  • pidigits_bestof.py: Integer benchmark
  • pidigits-2.py
  • faster_integer_division2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/mdickinson'
    closed_at = <Date 2009-03-23.20:04:49.137>
    created_at = <Date 2009-03-18.22:11:26.021>
    labels = ['interpreter-core', 'performance']
    title = 'Streamline integer division'
    updated_at = <Date 2009-03-24.09:11:59.959>
    user = 'https://github.com/mdickinson'

    bugs.python.org fields:

    activity = <Date 2009-03-24.09:11:59.959>
    actor = 'vstinner'
    assignee = 'mark.dickinson'
    closed = True
    closed_date = <Date 2009-03-23.20:04:49.137>
    closer = 'mark.dickinson'
    components = ['Interpreter Core']
    creation = <Date 2009-03-18.22:11:26.021>
    creator = 'mark.dickinson'
    dependencies = []
    files = ['13368', '13370', '13371', '13391']
    hgrepos = []
    issue_num = 5512
    keywords = ['patch']
    message_count = 16.0
    messages = ['83781', '83782', '83785', '83786', '83788', '83789', '83790', '83791', '83793', '83794', '83795', '83953', '84027', '84066', '84067', '84068']
    nosy_count = 3.0
    nosy_names = ['mark.dickinson', 'pitrou', 'vstinner']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue5512'
    versions = ['Python 3.1', 'Python 2.7']

    @mdickinson
    Copy link
    Member Author

    Here's a patch that streamlines the x_divrem function in
    Objects/longobject.c. More benchmarks are needed, but the initial
    results are promising: for py3k, on a 32-bit machine with 15-bit
    digits, Victor's pidigits benchmark runs almost twice as fast with this
    patch (numbers below).

    Patch details
    =============

    • Normalize inputs by shifting instead of multiplying and dividing.
      This halves the number of divisions used in the algorithm.

    • Streamline innermost loop.

    • Save an iteration of outer loop around half the time.

    • Save an object allocation: only 3 allocations per x_divrem call
      instead of 4.

    • Remove special case where initial quotient estimate is >= PyLong_BASE.
      There's no need for this, since the 'digit' type holds values up
      to 2*PyLong_BASE - 1.

    • Make q type digit instead of type twodigits: this halves the size
      of the multiplication in the innermost loop.

    Benchmark results
    =================

    Using the pidigits_bestof.py script that's posted in the bpo-4258
    discussion, on a non-debug build of py3k (r70465), on OS X 10.5.6/Core 2
    Duo:

    Unpatched
    ---------
    Macintosh-3:py3k dickinsm$ ./python.exe ../pidigits_bestof.py 2000
    performing a warm up run...
    running
    sys.int_info= sys.int_info(bits_per_digit=15, sizeof_digit=2)
    Time; 2234.6 ms
    Time; 2232.2 ms
    Time; 2227.9 ms
    Time; 2225.7 ms
    Time; 2229.8 ms
    Best Time; 2225.7 ms

    Patched
    -------
    dickinsm$ ./python.exe ../pidigits_bestof.py 2000
    performing a warm up run...
    running
    sys.int_info= sys.int_info(bits_per_digit=15, sizeof_digit=2)
    Time; 1175.6 ms
    Time; 1176.5 ms
    Time; 1177.3 ms
    Time; 1179.5 ms
    Time; 1168.5 ms
    Best Time; 1168.5 ms

    So the patch gives a speedup of around 90%. This particular benchmark
    is heavy on the divisions though, so I'd expect lesser speedups for
    other benchmarks.

    @mdickinson mdickinson self-assigned this Mar 18, 2009
    @mdickinson mdickinson added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Mar 18, 2009
    @vstinner
    Copy link
    Member

    Would it be possible to include integer benchmark tools to upstream? I
    mean the "pidigit" tool (I didn't wrote this test! I just ported it to
    Python3) and maybe my "bench_int" tool. It could useful to reproduce
    the benchmark on different Python versions and different computers. We
    may open a new issue for each tool?

    @mdickinson
    Copy link
    Member Author

    Sounds good to me! An integer benchmark script in the Tools/ directory
    would be handy.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 18, 2009

    Well, the original code in pidigits is by me :-) (although it's not
    stated in the source file).
    I haven't tried the patch but the performance work you're doing is
    really impressive, Mark!

    @pitrou
    Copy link
    Member

    pitrou commented Mar 18, 2009

    Results here (Athlon X2 3600+ with gcc in 64-bit mode with 30-bit
    digits), "pidigits_bestof.py 2000":

    • unpatched: 1644.9 ms
    • patched: 694.8 ms
      !

    @vstinner
    Copy link
    Member

    Useful informations for an integer benchmark:

    • CPU type: 32 or 64 bits, endian
    • Memory size
    • Python version
    • Informations about how Python was compiled
    • Integer implementation: use int_info when it's available, otherwise
      base=2^15

    Who has the last version of pidigit tool? Can you attach it here? :-)

    @mdickinson
    Copy link
    Member Author

    [Antoine]

    Well, the original code in pidigits is by me :-)

    Apologies for the misattribution!

    @mdickinson
    Copy link
    Member Author

    Here's the version of pidigits that I was using.

    @vstinner
    Copy link
    Member

    New version of pidigit:

    • works on python 2.6, trunk and py3k
    • display Python version, CPU info (bits, endian), PyLong info (base,
      digit size)
    • display usage if no argument is given

    @vstinner
    Copy link
    Member

    Oh, there is no version number of the benchmark tool itself!

    @vstinner
    Copy link
    Member

    My numbers:

    # No patch
    $ ./python /home/haypo/pidigits-2.py 3000
    Python 3.1a1+ (py3k:70466, Mar 18 2009, 23:56:06)
    [GCC 4.3.2]
    CPU: 64 bits, little endian
    PyLong: base=2^30, sizeof(digit)=32 bits
    (...)
    Best Time; 2300.2 ms

    # With faster_integer_division.patch
    (...)
    Best Time; 1138.1 ms

    Ok, it's two times faster on 64 bits CPU!!!

    Other notes about pidigits:

    • missing header (no author name, no description)
    • pidigit should also write the number of digits in its output to
      avoid mistakes in comparaisons

    @mdickinson
    Copy link
    Member Author

    Updated patch. Lots of cleanup, but only one significant change: the
    inner loop now uses signed arithmetic instead of unsigned arithmetic. This saves a negation and fixes a subtle bug: the previous inner loop code
    was incorrect when using 15-bit digits on machines with sizeof(short) ==
    sizeof(long). Not that I know of any such machines: the Cray T3E
    famously has no 16-bit integer type, but there sizeof(short) is 4 and
    sizeof(long) is 8.

    A few more timings, this time from doing a single huge integer division:
    10**1000000//10**500000 (so this effectively times just the inner loop,
    since all else will be insignificant). All timings are best-of-5, from
    non-debug builds of py3k, on the same system: OS X 10.5.6/2.4 GHz Core 2
    Duo. Times in brackets are the approximate per-inner-loop times
    (remembering that there are 4 times as many iterations of the inner loop
    for 15-bit digits).

    32-bit build, 15-bit digits, unpatched: 92382.2 ms (~7.5 ns)
    32-bit build, 15-bit digits, patched: 36473.3 ms (~3.0 ns)
    64-bit build, 30-bit digits, unpatched: 14581.4 ms (~4.8 ns)
    64-bit build, 30-bit digits, patched: 7385.1 ms (~2.4 ns)

    ... and just for fun, the other combinations:

    64-bit build, 15-bit digits, unpatched: 61927.5 ms (~5.1 ns)
    64-bit build, 15-bit digits, patched: 43632.9 ms (~3.6 ns)
    32-bit build, 30-bit digits, unpatched: 62374.1 ms (~20.3 ns)
    32-bit build, 30-bit digits, patched: 26928.3 ms (~8.8 ns)

    Thanks for the updated pidigits script, Victor! Maybe this is too small
    right now to be worth including in the Tools directory, but I hope we can
    fatten it up with some other benchmarks. What do you think?

    @mdickinson
    Copy link
    Member Author

    Applied, r70542 and r70547.

    @vstinner
    Copy link
    Member

    Would it be possible to port the patch to python trunk?

    @mdickinson
    Copy link
    Member Author

    Hi Victor! I already applied the x_divrem patch to the trunk and to py3k.
    Did you mean the release branch?

    I'd prefer not to backport this to 2.6 or 3.0: it's not a new feature,
    but it's not a bugfix either and there's always some risk of breakage...

    @vstinner
    Copy link
    Member

    I already applied the x_divrem patch to the trunk and to py3k

    Oops! I taught that you applied it to py3k and release30-maint :-/

    It's ok, trunk and py3k is enough ;-)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants