Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dict: perturb shift should be done when first conflict #72388

Closed
methane opened this issue Sep 19, 2016 · 14 comments
Closed

dict: perturb shift should be done when first conflict #72388

methane opened this issue Sep 19, 2016 · 14 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@methane
Copy link
Member

methane commented Sep 19, 2016

BPO 28201
Nosy @tim-one, @rhettinger, @methane, @serhiy-storchaka, @MojoVampire
PRs
  • [Do Not Merge] Convert Misc/NEWS so that it is managed by towncrier #552
  • Files
  • dict-perturb-shift.patch
  • dict-perturb-shift.patch: smaller diff
  • dict-perturb-shift2.patch
  • dict-perturb-shift3.patch
  • dict-perturb-shift4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/methane'
    closed_at = <Date 2016-10-06.06:38:04.210>
    created_at = <Date 2016-09-19.04:25:13.086>
    labels = ['interpreter-core', 'performance']
    title = 'dict: perturb shift should be done when first conflict'
    updated_at = <Date 2017-03-31.16:36:39.107>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2017-03-31.16:36:39.107>
    actor = 'dstufft'
    assignee = 'methane'
    closed = True
    closed_date = <Date 2016-10-06.06:38:04.210>
    closer = 'methane'
    components = ['Interpreter Core']
    creation = <Date 2016-09-19.04:25:13.086>
    creator = 'methane'
    dependencies = []
    files = ['44759', '44774', '44966', '44980', '44981']
    hgrepos = []
    issue_num = 28201
    keywords = ['patch']
    message_count = 14.0
    messages = ['276936', '276937', '276943', '277088', '277089', '277105', '277708', '278101', '278170', '278171', '278172', '278175', '278176', '278177']
    nosy_count = 6.0
    nosy_names = ['tim.peters', 'rhettinger', 'methane', 'python-dev', 'serhiy.storchaka', 'josh.r']
    pr_nums = ['552']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue28201'
    versions = ['Python 3.6']

    @methane
    Copy link
    Member Author

    methane commented Sep 19, 2016

    Current perturb shift code is like following:

        for (perturb = hash; ; perturb >>= PERTURB_SHIFT) {
            i = mask & ((i << 2) + i + perturb + 1);

    This loop is start after first conflict. It means perturb == hash for first conflict.

    The purpose of perturb shift is avoid long conflict chain when keys more
    than two have hashes their lower-bit is same. So first perturb should be hash >> PERTURB_SHIFT.

    example: Consider about ma_keys == 16 and keys are [1, 17, 33, 49, 65].
    Current perturb

    1. hash(1) & (16-1) == 1; 1 uses ix==1 slot.
    2. hash(17) & (16-1) == 1; ix==1 conflicts; Next ix is mask & (3 + 17 + 1) == 5; use ix==5 slot.
    3. hash(33) & (16-1) == 1; ix==1 conflicts; Next ix is mask & (3 + 33 + 1) == 5; ix==5 conflicts; ...

    When first perturb = hash >> PERTURB_SHIFT:

    1. hash(1) & (16-1) == 1; 1 uses ix==1 slot.
    2. hash(17) & (16-1) == 1; ix==1 conflicts; Next ix is mask & (3 + (17>>5) + 1) == 4; use ix==4 slot.
    3. hash(33) & (16-1) == 1; ix==1 conflicts; Next ix is mask & (3 + (33>>5) + 1) == 5; use ix==5 slot.

    While it's difficult to see performance difference from benchmark, this should be decrease possibility of 2nd conflict.

    @methane methane added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Sep 19, 2016
    @tim-one
    Copy link
    Member

    tim-one commented Sep 19, 2016

    Good catch! I agree - and I wrote this code to begin with, so my opinion should count ;-)

    @rhettinger
    Copy link
    Contributor

    I agree and my opinion counts even more because I long ago made this change for setobjects ;-)

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Sep 21, 2016

    General comment on the patch: I believe per PEP-7, we're still sticking to ANSI C (aka C89), and specifically, "all declarations must be at the top of a block (not necessarily at the top of function".

    The patch assumes lax standards compliance (or C99+ compliance), declaring variables in the for loop initializer section and midway through blocks. This should be changed to declare all variables at the top of blocks, and not in for loop initializer sections.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Sep 21, 2016

    Removing those unrelated changes looks like it would dramatically reduce the churn too, making review easier.

    @methane
    Copy link
    Member Author

    methane commented Sep 21, 2016

    josh.r:

    I believe per PEP-7, we're still sticking to ANSI C (aka C89), and specifically, "all declarations must be at the top of a block (not necessarily at the top of function".

    Python 3.6 branch allows some C99 features.
    https://www.python.org/dev/peps/pep-0007/#c-dialect

    Removing those unrelated changes looks like it would dramatically reduce the churn too, making review easier.

    While I think refactoring around changes is practical [1], I agree that I made too much change. I'll reduce diff size.

    [1] Changing without refactoring is hard sometimes. Whole file refactoring
    without actual change may cause many conflicts against other patches.

    @methane
    Copy link
    Member Author

    methane commented Sep 29, 2016

    Could anyone review the patch?
    I'm starting to bpo-28199. But it must conflict with this patch.

    @methane methane self-assigned this Sep 29, 2016
    @methane methane added the performance Performance or resource usage label Oct 1, 2016
    @methane
    Copy link
    Member Author

    methane commented Oct 5, 2016

    Fixed conflict with current 3.6 branch, and added NEWS entry.

    @methane
    Copy link
    Member Author

    methane commented Oct 6, 2016

    While I think this patch is safe, I need LGTM from another committer
    because I'm new committer.

    Could Tim or Raymond review the patch before 3.6beta2?

    @tim-one
    Copy link
    Member

    tim-one commented Oct 6, 2016

    LGTM! Ship it :-)

    @serhiy-storchaka
    Copy link
    Member

    Added comments on Rietveld.

    @serhiy-storchaka
    Copy link
    Member

    LGTM. But why you moved the declaration of perturb?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 6, 2016

    New changeset cf2778fd7acb by INADA Naoki in branch '3.6':
    Issue bpo-28201: Dict reduces possibility of 2nd conflict in hash table.
    https://hg.python.org/cpython/rev/cf2778fd7acb

    New changeset 80b01cd94a63 by INADA Naoki in branch 'default':
    Issue bpo-28201: Dict reduces possibility of 2nd conflict in hash table.
    https://hg.python.org/cpython/rev/80b01cd94a63

    @methane
    Copy link
    Member Author

    methane commented Oct 6, 2016

    Thank you, Tim and Serhiy. My first commit has been pushed now!

    Serhiy:
    Since I prefer putting variable declaration near it's usage, and
    PEP-7 permits it since Python 3.6.

    @methane methane closed this as completed Oct 6, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants