This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Save OrderedDict import in re
Type: enhancement Stage: resolved
Components: Library (Lib), Regular Expressions Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, methane, miss-islington, mrabarnett, rhettinger, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2017-12-15 18:21 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 4891 merged serhiy.storchaka, 2017-12-15 18:25
PR 5013 closed thatiparthy, 2017-12-26 05:00
PR 6073 merged miss-islington, 2018-03-11 06:39
Messages (10)
msg308418 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-15 18:21
Since regular dicts are now ordered by default, the OrderedDict import is no longer necessary.
msg308516 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-12-18 06:20
Please don't do this.
del d[next(iter(d))] is not O(1) on current dict implementation.
OrderedDict is designed for such use cases. Please keep using it.
msg308524 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-18 09:08
This is surprising. But OrderedDict also can be O(N) here.

Do you have benchmarking results Inada?
msg308525 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-12-18 09:23
> This is surprising. But OrderedDict also can be O(N) here.

Current dict implementation doesn't skip empty entries.
So next(iter(D)) takes time.
On the other hand, OrderedDict uses doubly-linked list to find first entry.

(I fixed it in compact ODict branch, but it added one word to dict)

> Do you have benchmarking results Inada?

$ ./python -m perf timeit -s 'cache={}' -- '
for i in range(10000):
    if len(cache) > 512:
        del cache[next(iter(cache))]
    cache[i]=i
'
.....................
Mean +- std dev: 6.81 ms +- 0.08 ms

$ ./python -m perf timeit -s 'from collections import OrderedDict; cache=OrderedDict()' -- '
for i in range(10000):
    if len(cache) > 512:
        cache.popitem(last=False)
    cache[i]=i
'
.....................
Mean +- std dev: 3.75 ms +- 0.07 ms

Performance difference is measurable even when N is only 512.

Maybe, we can use hack similar to Python 3.5 had for O(1) popitem().
When entries[0].key==NULL, (Py_ssize_t)entries[0].value can be index
to first known non-empty entry.
msg308526 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-12-18 09:25
Hmm, 0.3 μs for each lookup may be negligible compared to re.compile() speed?
msg308527 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-12-18 09:26
> del d[next(iter(d))] is not O(1) on current dict implementation.

We are talking about a dictionary of 512 items in the worst case. On such very tiny collection, benchmarking matters more than O(...) complexity ;-)
msg308529 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-12-18 09:29
> We are talking about a dictionary of 512 items in the worst case. On such very tiny collection, benchmarking matters more than O(...) complexity ;-)

You're right. Rob Pike said:

"Fancy algorithms are slow when n is small, and n is usually small."
http://users.ece.utexas.edu/~adnan/pike.html
msg308534 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-12-18 09:44
* del cache[next(iter(cache))] happens only when sre.compile() is called.
* del cache[next(iter(cache))] is much faster than sre.compile().

OK, performance difference is negligible, surely.
msg313583 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-11 06:38
New changeset b931bd0a2fe7e9293339019352baf3317166b769 by Serhiy Storchaka in branch 'master':
bpo-32338: OrderedDict import is no longer needed in re. (#4891)
https://github.com/python/cpython/commit/b931bd0a2fe7e9293339019352baf3317166b769
msg313584 - (view) Author: miss-islington (miss-islington) Date: 2018-03-11 07:02
New changeset 39441fce0218a3f51a80cf17aa179a32651a02f6 by Miss Islington (bot) in branch '3.7':
bpo-32338: OrderedDict import is no longer needed in re. (GH-4891)
https://github.com/python/cpython/commit/39441fce0218a3f51a80cf17aa179a32651a02f6
History
Date User Action Args
2022-04-11 14:58:55adminsetgithub: 76519
2018-03-11 07:06:10serhiy.storchakasetstatus: open -> closed
dependencies: - Dict order is now guaranteed, so add tests and doc for it
resolution: fixed
stage: patch review -> resolved
2018-03-11 07:02:00miss-islingtonsetnosy: + miss-islington
messages: + msg313584
2018-03-11 06:39:27miss-islingtonsetpull_requests: + pull_request5833
2018-03-11 06:38:15serhiy.storchakasetmessages: + msg313583
2017-12-26 05:00:56thatiparthysetpull_requests: + pull_request4903
2017-12-18 09:53:44serhiy.storchakasetdependencies: + Dict order is now guaranteed, so add tests and doc for it
2017-12-18 09:52:26serhiy.storchakalinkissue32360 dependencies
2017-12-18 09:44:55methanesetmessages: + msg308534
2017-12-18 09:29:45methanesetmessages: + msg308529
2017-12-18 09:26:16vstinnersetnosy: + vstinner
messages: + msg308527
2017-12-18 09:25:33methanesetmessages: + msg308526
2017-12-18 09:23:33methanesetmessages: + msg308525
2017-12-18 09:08:32serhiy.storchakasetmessages: + msg308524
2017-12-18 06:20:59methanesetnosy: + methane
messages: + msg308516
2017-12-15 18:25:31serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request4784
2017-12-15 18:21:39serhiy.storchakacreate