Message 381098 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	danuker
Recipients	danuker, methane, rhettinger, xtreak
Date	2020-11-16.14:06:02
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1605535563.62.0.465789036383.issue42368@roundup.psfhosted.org>
In-reply-to

Content
rhettinger wrote [1]: > The existing setobject code has been finely tuned and micro-optimized over the years, giving it excellent performance on workloads we care about. This worries me also. But I suppose the tuning should make itself visible in benchmarks. Does Python have "canonical" benchmarks or speed tests like PyPy[2]? I am not sure how useful this is, but I made a naive and synthetic line_profiler benchmark of a naive implementation of set through a dict (see attached file). It resulted in roughly the following differences: * Initialization from a 1M-sized list: 41-66% slower * Inserting an item until doubling size: about the same (perhaps faster, due to the size I've chosen not triggering rebuilding of a dict hash table) * Deletion: 7-11% slower * Membership test: 2-15% slower Running them in the opposite order (first dict, then set) gave me the ranges. I have not considered memory usage (I have not profiled memory before). But I suspect this would be larger, since a dict would keep values in addition to keys. Additionally, initializing smaller structures (length = 100) seems to be slower; the initialization takes 2x longer (100% slower), but the O(1) operations take about the same. I suspect methane's implementation linked by xtreak is better (but I have not tried it). Profiled with: kernprof -l set_test.py python -m line_profiler set_test.py.lprof [1] https://mail.python.org/pipermail/python-dev/2019-February/156475.html [2] https://speed.pypy.org/

rhettinger wrote [1]:

> The existing setobject code has been finely tuned and micro-optimized over the years, giving it excellent performance on workloads we care about.

This worries me also. But I suppose the tuning should make itself visible in benchmarks. Does Python have "canonical" benchmarks or  speed tests like PyPy[2]?

I am not sure how useful this is, but I made a naive and synthetic line_profiler benchmark of a naive implementation of set through a dict (see attached file).

It resulted in roughly the following differences:

* Initialization from a 1M-sized list: 41-66% slower
* Inserting an item until doubling size: about the same (perhaps faster, due to the size I've chosen not triggering rebuilding of a dict hash table)
* Deletion: 7-11% slower
* Membership test: 2-15% slower

Running them in the opposite order (first dict, then set) gave me the ranges.

I have not considered memory usage (I have not profiled memory before). But I suspect this would be larger, since a dict would keep values in addition to keys.

Additionally, initializing smaller structures (length = 100) seems to be slower; the initialization takes 2x longer (100% slower), but the O(1) operations take about the same.

I suspect methane's implementation linked by xtreak is better (but I have not tried it).

Profiled with:
kernprof -l set_test.py
python -m line_profiler set_test.py.lprof

[1] https://mail.python.org/pipermail/python-dev/2019-February/156475.html

[2] https://speed.pypy.org/

History
Date	User	Action	Args
2020-11-16 14:06:03	danuker	set	recipients: + danuker, rhettinger, methane, xtreak
2020-11-16 14:06:03	danuker	set	messageid: <1605535563.62.0.465789036383.issue42368@roundup.psfhosted.org>
2020-11-16 14:06:03	danuker	link	issue42368 messages
2020-11-16 14:06:02	danuker	create