Issue 42368: Make set ordered

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/86534

classification

Title:	Make set ordered
Type:	enhancement	Stage:	resolved
Components:	Interpreter Core	Versions:

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	danuker, methane, rhettinger, xtreak
Priority:	normal	Keywords:

Created on 2020-11-16 10:42 by danuker, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
set_test.py	danuker, 2020-11-16 14:06	Naive implementation of set through dict

Messages (5)
msg381088 - (view)	Author: Dan Gheorghe Haiduc (danuker)	Date: 2020-11-16 10:42
Now that dicts are ordered[1], would it by any chance make sense to also order sets? A use case that I ran into is trying to reproducibly get a random.choice from a set (after calling random.seed). At first I did not need reproducibility, and I just called random.choice(list(my_set)). Later when I did need it, it was difficult to find out what was wrong. Then I realized that sets are unordered, and that order is not dependent on random.seed. It seems there are also some other confused newbies out there.[2][3] Thank you for the powerful language that is Python! [1] https://www.python.org/dev/peps/pep-0468 [2] https://stackoverflow.com/q/11929701/235463 [3] https://stackoverflow.com/q/36317520/235463
msg381089 - (view)	Author: Dan Gheorghe Haiduc (danuker)	Date: 2020-11-16 10:46
Oops. The reference number [2] in the previous message should instead be https://stackoverflow.com/q/6614447/235463 .
msg381093 - (view)	Author: Karthikeyan Singaravelan (xtreak) *	Date: 2020-11-16 12:41
See also https://mail.python.org/pipermail/python-dev/2019-February/156466.html
msg381098 - (view)	Author: Dan Gheorghe Haiduc (danuker)	Date: 2020-11-16 14:06
rhettinger wrote [1]: > The existing setobject code has been finely tuned and micro-optimized over the years, giving it excellent performance on workloads we care about. This worries me also. But I suppose the tuning should make itself visible in benchmarks. Does Python have "canonical" benchmarks or speed tests like PyPy[2]? I am not sure how useful this is, but I made a naive and synthetic line_profiler benchmark of a naive implementation of set through a dict (see attached file). It resulted in roughly the following differences: * Initialization from a 1M-sized list: 41-66% slower * Inserting an item until doubling size: about the same (perhaps faster, due to the size I've chosen not triggering rebuilding of a dict hash table) * Deletion: 7-11% slower * Membership test: 2-15% slower Running them in the opposite order (first dict, then set) gave me the ranges. I have not considered memory usage (I have not profiled memory before). But I suspect this would be larger, since a dict would keep values in addition to keys. Additionally, initializing smaller structures (length = 100) seems to be slower; the initialization takes 2x longer (100% slower), but the O(1) operations take about the same. I suspect methane's implementation linked by xtreak is better (but I have not tried it). Profiled with: kernprof -l set_test.py python -m line_profiler set_test.py.lprof [1] https://mail.python.org/pipermail/python-dev/2019-February/156475.html [2] https://speed.pypy.org/
msg381198 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2020-11-17 01:01
Thank for the suggestion. We've considered it a couple of times before but there are downsides that get in the way: * The related mathematical concept of sets is unordered. * It isn't clear what ordering should be returned by set operations like intersection, union, and difference especially if we want the first two to be commutative and if we want consistent in-place and data removal options. * It gets in the way of existing performance optimizations some of which are significant. * Other programming languages don't seem to have found compelling the need for this. So yes, it is something we could do, but probably shouldn't.

History
Date	User	Action	Args
2022-04-11 14:59:38	admin	set	github: 86534
2021-09-03 06:22:07	serhiy.storchaka	link	issue45092 superseder
2020-11-17 01:01:32	rhettinger	set	status: open -> closed resolution: rejected messages: + msg381198 stage: resolved
2020-11-16 14:06:03	danuker	set	files: + set_test.py messages: + msg381098
2020-11-16 12:41:20	xtreak	set	nosy: + rhettinger, xtreak, methane messages: + msg381093
2020-11-16 10:46:21	danuker	set	messages: + msg381089
2020-11-16 10:42:46	danuker	create