Issue 34719: Deprecate set to frozenset conversion in set.__contains__

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/78900

classification

Title:	Deprecate set to frozenset conversion in set.__contains__
Type:	behavior	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.8

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	Javier Dehesa, rhettinger, xtreak
Priority:	normal	Keywords:

Created on 2018-09-18 11:57 by Javier Dehesa, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg325631 - (view)	Author: Javier Dehesa (Javier Dehesa)	Date: 2018-09-18 11:57
This comes from this SO question: https://stackoverflow.com/q/52382983/1782792 Currently, this works: > print({1, 2} in {frozenset({1, 2})) # True This is strange because set is unhashable. Apparently, it is a case-specific feature implemented back in 2003 (https://github.com/python/cpython/commit/19c2d77842290af9b5f470c1eea2a71d1f77c9fe), by which set objects are converted to frozensets when checking for membership in another set. Personally I feel this is a bit surprising and inconsistent, but that is not the only issue with it. In the original implementation, this conversion was basically free because the created frozenset used the same storage as the given one. In the current implementation, however (https://github.com/python/cpython/blob/3.7/Objects/setobject.c#L1888-L1906), a new frozenset object is created, copied from the previous one. It seems this was done for thread-safety. The problem with that is that it is significantly more expensive: s = set(range(100000)) sf = frozenset(s) t = { sf } %timeit sf in t # True >>> 31.6 ns ± 1.04 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) %timeit s in t # True >>> 4.9 ms ± 168 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In the above case, using the conversion takes five order of magnitude more time than the regular check. I suppose there is a memory impact too. I think this (as far as I know) undocumented feature does not provide a significant usability gain, is inconsistent with the documented behavior of set and gives rise to obscurely underperfoming code. Removing it would be a breaking change, but again, affected code would be relying on undocumented behavior (or even "against-documentation" behavior).
msg325659 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2018-09-18 15:56
The feature was first implemented in Lib/sets.py (arising from PEP 218) in pure python. It was found to be useful and carried forward to the C implementation for the built-in type. The feature is documented but not highlighted in the Library Reference, "Both set and frozenset support set to set comparisons." You are correct that feature became less performant when the swap-bodies code was eliminated; however, the feature still has value so that a user can write "s in t" rather than "frozenset(s) in t". Thank you for the suggestion, but I'm going to elect to leave the code as-is. There's no reason break other people's code and remove a feature that is intentional, guaranteed, tested, and sometimes useful in applications that have sets of frozensets. If you don't like the feature, it is okay to not use it ;-)

History
Date	User	Action	Args
2022-04-11 14:59:06	admin	set	github: 78900
2018-09-18 15:56:11	rhettinger	set	status: open -> closed resolution: rejected messages: + msg325659 stage: resolved
2018-09-18 15:27:36	rhettinger	set	versions: - Python 2.7, Python 3.4, Python 3.5, Python 3.6, Python 3.7
2018-09-18 15:27:30	rhettinger	set	assignee: rhettinger nosy: + rhettinger
2018-09-18 12:31:16	xtreak	set	nosy: + xtreak
2018-09-18 11:57:29	Javier Dehesa	create