Message 234555 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	josh.r, pitrou, rhettinger, serhiy.storchaka
Date	2015-01-23.14:00:59
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1422021659.66.0.940723971801.issue23282@psf.upfronthosting.co.za>
In-reply-to

Content
Good point Josh. Yes, it may slow down other cases, and there is a difference between sets and dicts. 1. If the key is not contained in the set, then if the first tested entry table[hash & mask] is empty, then the lookup is slowed down by one pointer comparison (2 comparisons instead of 1). If the entry is used then the lookup is not slow downed or even speed up (if need to test more than one additional entries). 2. If the same case is in the set, then if the first tested entry is the set then the lookup will use one comparison instead of 4. In other cases (the key is placed in other entry) the lookup will use at least one comparison less. 3. If the set contains key equal but not identical to tested key, then the lookup will use at least one comparison less. Only first case can cause slowdown. If we test a lot of keys not existing in the set with small ratio between used and allocated entries numbers. May be the worst case is creating a copy of the set. For other operations I did not find significant difference. $ ./python -m timeit -s "s = set(range(10**4))" -- "frozenset(s)" Unpatched: 1000 loops, best of 3: 658 usec per loop Patched: 1000 loops, best of 3: 668 usec per loop But issue23290 prevents this regression.

Good point Josh. Yes, it may slow down other cases, and there is a difference between sets and dicts.

1. If the key is not contained in the set, then if the first tested entry table[hash & mask] is empty, then the lookup is slowed down by one pointer comparison (2 comparisons instead of 1). If the entry is used then the lookup is not slow downed or even speed up (if need to test more than one additional entries).

2. If the same case is in the set, then if the first tested entry is the set then the lookup will use one comparison instead of 4. In other cases (the key is placed in other entry) the lookup will use at least one comparison less.

3. If the set contains key equal but not identical to tested key, then the lookup will use at least one comparison less.

Only first case can cause slowdown. If we test a lot of keys not existing in the set with small ratio between used and allocated entries numbers. May be the worst case is creating a copy of the set. For other operations I did not find significant difference.

$ ./python -m timeit -s "s = set(range(10**4))" -- "frozenset(s)"
Unpatched: 1000 loops, best of 3: 658 usec per loop
Patched: 1000 loops, best of 3: 668 usec per loop

But issue23290 prevents this regression.

History
Date	User	Action	Args
2015-01-23 14:00:59	serhiy.storchaka	set	recipients: + serhiy.storchaka, rhettinger, pitrou, josh.r
2015-01-23 14:00:59	serhiy.storchaka	set	messageid: <1422021659.66.0.940723971801.issue23282@psf.upfronthosting.co.za>
2015-01-23 14:00:59	serhiy.storchaka	link	issue23282 messages
2015-01-23 14:00:59	serhiy.storchaka	create