Message 296221 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tim.peters
Recipients	Dmitry Rubanovich, methane, rhettinger, serhiy.storchaka, tim.peters, xiang.zhang
Date	2017-06-16.23:47:08
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1497656828.37.0.331875222459.issue30671@psf.upfronthosting.co.za>
In-reply-to

Content
Yes, any scheme whatsoever that guarantees to visit every int in range(2**i) meets the "correctness" part. But I concur with Inada: "head arguments" aren't compelling here, not even if I understood what they were saying ;-) If you implement it and demonstrate significant speedups in plausibly realistic settings, then it may be worth pursuing. Anything short of that is at best informed speculation, and so better hashed out on, e.g., the python-ideas mailing list than on the bug tracker. BTW, I realize you're not running tests against random data. But you're not running tests against _any_ real data, which is worrisome. Your code appears to be utterly uniform, thus approximating a uniform random distribution. I have scant idea of what you believe your code shows. For example, it only looks at "hash codes" in `range(1<<p)`, which largely misses the point of the "perturb" logic. In real life, on most boxes these days hash codes are 64 bits (regardless of dict size), and the real perturb code eventually folds in every one of those 64 bits (if the loop goes around often enough to need all of them). Pretending hash codes contain only `p` bits may or may not be theoretically interesting for some purpose(s), but has nothing obvious to do with how the dict code actually works.

Yes, any scheme whatsoever that guarantees to visit every int in range(2**i) meets the "correctness" part.

But I concur with Inada:  "head arguments" aren't compelling here, not even if I understood what they were saying ;-)  If you implement it and demonstrate significant speedups in plausibly realistic settings, then it may be worth pursuing.  Anything short of that is at best informed speculation, and so better hashed out on, e.g., the python-ideas mailing list than on the bug tracker.

BTW, I realize you're not running tests against random data.  But you're not running tests against _any_ real data, which is worrisome.  Your code appears to be utterly uniform, thus approximating a uniform random distribution.  I have scant idea of what you believe your code shows.  For example, it only looks at "hash codes" in `range(1<<p)`, which largely misses the point of the "perturb" logic.  In real life, on most boxes these days hash codes are 64 bits (regardless of dict size), and the real perturb code eventually folds in every one of those 64 bits (if the loop goes around often enough to need all of them).  Pretending hash codes contain only `p` bits may or may not be theoretically interesting for some purpose(s), but has nothing obvious to do with how the dict code actually works.

History
Date	User	Action	Args
2017-06-16 23:47:08	tim.peters	set	recipients: + tim.peters, rhettinger, methane, serhiy.storchaka, xiang.zhang, Dmitry Rubanovich
2017-06-16 23:47:08	tim.peters	set	messageid: <1497656828.37.0.331875222459.issue30671@psf.upfronthosting.co.za>
2017-06-16 23:47:08	tim.peters	link	issue30671 messages
2017-06-16 23:47:08	tim.peters	create