Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save OrderedDict import in re #76519

Closed
serhiy-storchaka opened this issue Dec 15, 2017 · 10 comments
Closed

Save OrderedDict import in re #76519

serhiy-storchaka opened this issue Dec 15, 2017 · 10 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir topic-regex type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 32338
Nosy @rhettinger, @vstinner, @ezio-melotti, @methane, @serhiy-storchaka, @miss-islington
PRs
  • bpo-32338: OrderedDict import is no longer needed in re. #4891
  • bpo-32360: Remove OrderedDict from re module #5013
  • [3.7] bpo-32338: OrderedDict import is no longer needed in re. (GH-4891) #6073
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-03-11.07:06:10.801>
    created_at = <Date 2017-12-15.18:21:39.079>
    labels = ['expert-regex', 'type-feature', 'library', '3.7']
    title = 'Save OrderedDict import in re'
    updated_at = <Date 2018-03-11.07:06:10.799>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2018-03-11.07:06:10.799>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-03-11.07:06:10.801>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)', 'Regular Expressions']
    creation = <Date 2017-12-15.18:21:39.079>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 32338
    keywords = ['patch']
    message_count = 10.0
    messages = ['308418', '308516', '308524', '308525', '308526', '308527', '308529', '308534', '313583', '313584']
    nosy_count = 7.0
    nosy_names = ['rhettinger', 'vstinner', 'ezio.melotti', 'mrabarnett', 'methane', 'serhiy.storchaka', 'miss-islington']
    pr_nums = ['4891', '5013', '6073']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue32338'
    versions = ['Python 3.7']

    @serhiy-storchaka
    Copy link
    Member Author

    Since regular dicts are now ordered by default, the OrderedDict import is no longer necessary.

    @serhiy-storchaka serhiy-storchaka added 3.7 (EOL) end of life stdlib Python modules in the Lib dir topic-regex type-feature A feature request or enhancement labels Dec 15, 2017
    @methane
    Copy link
    Member

    methane commented Dec 18, 2017

    Please don't do this.
    del d[next(iter(d))] is not O(1) on current dict implementation.
    OrderedDict is designed for such use cases. Please keep using it.

    @serhiy-storchaka
    Copy link
    Member Author

    This is surprising. But OrderedDict also can be O(N) here.

    Do you have benchmarking results Inada?

    @methane
    Copy link
    Member

    methane commented Dec 18, 2017

    This is surprising. But OrderedDict also can be O(N) here.

    Current dict implementation doesn't skip empty entries.
    So next(iter(D)) takes time.
    On the other hand, OrderedDict uses doubly-linked list to find first entry.

    (I fixed it in compact ODict branch, but it added one word to dict)

    Do you have benchmarking results Inada?

    $ ./python -m perf timeit -s 'cache={}' -- '
    for i in range(10000):
        if len(cache) > 512:
            del cache[next(iter(cache))]
        cache[i]=i
    '
    .....................
    Mean +- std dev: 6.81 ms +- 0.08 ms
    
    $ ./python -m perf timeit -s 'from collections import OrderedDict; cache=OrderedDict()' -- '
    for i in range(10000):
        if len(cache) > 512:
            cache.popitem(last=False)
        cache[i]=i
    '
    .....................
    Mean +- std dev: 3.75 ms +- 0.07 ms

    Performance difference is measurable even when N is only 512.

    Maybe, we can use hack similar to Python 3.5 had for O(1) popitem().
    When entries[0].key==NULL, (Py_ssize_t)entries[0].value can be index
    to first known non-empty entry.

    @methane
    Copy link
    Member

    methane commented Dec 18, 2017

    Hmm, 0.3 μs for each lookup may be negligible compared to re.compile() speed?

    @vstinner
    Copy link
    Member

    del d[next(iter(d))] is not O(1) on current dict implementation.

    We are talking about a dictionary of 512 items in the worst case. On such very tiny collection, benchmarking matters more than O(...) complexity ;-)

    @methane
    Copy link
    Member

    methane commented Dec 18, 2017

    We are talking about a dictionary of 512 items in the worst case. On such very tiny collection, benchmarking matters more than O(...) complexity ;-)

    You're right. Rob Pike said:

    "Fancy algorithms are slow when n is small, and n is usually small."
    http://users.ece.utexas.edu/~adnan/pike.html

    @methane
    Copy link
    Member

    methane commented Dec 18, 2017

    • del cache[next(iter(cache))] happens only when sre.compile() is called.
    • del cache[next(iter(cache))] is much faster than sre.compile().

    OK, performance difference is negligible, surely.

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset b931bd0 by Serhiy Storchaka in branch 'master':
    bpo-32338: OrderedDict import is no longer needed in re. (bpo-4891)
    b931bd0

    @miss-islington
    Copy link
    Contributor

    New changeset 39441fc by Miss Islington (bot) in branch '3.7':
    bpo-32338: OrderedDict import is no longer needed in re. (GH-4891)
    39441fc

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir topic-regex type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants