This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author pitrou
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date 2012-01-21.22:20:47
SpamBayes Score 9.992007e-16
Marked as misclassified No
Message-id <1327184329.3382.23.camel@localhost.localdomain>
In-reply-to <1327176822.4992.257.camel@surprise>
Content
> I wonder if the dict statistics should be exposed with extra attributes
> or a method on the dict; e.g. a __stats__ attribute, something like
> this:
> 
> LargeDictStats(keys=58, mask=127, insertions=53, iterations=1697)
> 
> SmallDictStats(keys=3, mask=7)

Sounds a bit overkill, and it shouldn't be a public API (which
__methods__ are). Even a private API on dicts would quickly become
visible, since dicts are so pervasive.

> > > Caveats:
> > > * no overflow handling (what happens after 2**32 modifications to a
> > > long-lived dict on a 32-bit build?) - though that's fixable.
> > 
> > How do you suggest to fix it?
> 
> If the dict is heading towards overflow of these counters, it's either
> long-lived, or *huge*.
> 
> Possible approaches:
> (a) use 64-bit counters rather than 32-bit, though that's simply
> delaying the inevitable

Well, even assuming one billion lookup probes per second on a single
dictionary, the inevitable will happen in 584 years with a 64-bit
counter (but only 4 seconds with a 32-bit counter).

A real issue, though, may be the cost of 64-bit arithmetic on 32-bit
CPUs.

> (b) when one of the counters gets large, divide both of them by a
> constant (e.g. 2).  We're interested in their ratio, so dividing both by
> a constant preserves this.

Sounds good, although we may want to pull this outside of the critical
loop.

> By "a constant" do you mean from the perspective of big-O notation, or
> do you mean that it should be hardcoded (I was wondering if it should be
> a sys variable/environment variable etc?).

Hardcoded, as in your patch.

> > There needs to be a way to disable it: an environment variable would be
> > the minimum IMO.
> 
> e.g. set it to 0 to enable it, set it to nonzero to set the scale
> factor.

0 to enable it sounds misleading. I'd say:
- 0 to disable it
- 1 to enable it and use the default scaling factor
- >= 2 to enable it and set the scaling factor

> Any idea what to call it? 

PYTHONDICTPROTECTION ?
Most people should either enable or disable it, not change the scaling
factor.

> BTW, presumably if we do it, we should do it for sets as well?

Yeah, and use the same env var / sys function.
History
Date User Action Args
2012-01-21 22:20:48pitrousetrecipients: + pitrou, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, vstinner, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-21 22:20:47pitroulinkissue13703 messages
2012-01-21 22:20:47pitroucreate