Message241179
Matt Mackall brought this over to me today.
If list "l" has one million pairs and you do
dict(l)
it creates the dict, then resizes it to a million elements, then starts adding name/value pairs. But since it's sized to a million elements, it gets beyond its "fill ratio" comfort level of 666k and resizes. And when you're up to a million elements, the resize is quite slow. It should have internally resized the dict correctly from the beginning such that no resize was necessary.
The same is true for dict.fromkeys--passed in a list of 1m elements, it uses an initial size of 1m, not 1.5m.
Attached is a sample patch to change the behavior of both these operations so no resizes are needed during the insertions. I haven't checked that it actually fixes the behavior, I just made the change and ran the regression test. (I'm oversubscribed here at the sprints, and am kind of rushing this diff out just to get the conversation rolling.)
Benjamin: what do you think? Would it be appropriate to make a change like this in 2.7?
Python 3 (or at least 3.5 trunk) is fancy here. It starts with the minimum size of a dict, then iterates until the new size is > 1.5*the number of elements. This means the dicts it creates are of the same size as they would be if we'd started with a minsize dict and inserted one element at a time. This might help minimize wear and tear on the small block allocator for smaller dicts.
BTW, the internal function _PyDict_NewPresize should really have taken the fill ratio into account too. But, even though it's an internal-only function, it's probably too late to change its behavior. |
|
Date |
User |
Action |
Args |
2015-04-15 23:35:44 | larry | set | recipients:
+ larry, benjamin.peterson, mpm, marmoute |
2015-04-15 23:35:44 | larry | set | messageid: <1429140944.27.0.450320519696.issue23971@psf.upfronthosting.co.za> |
2015-04-15 23:35:44 | larry | link | issue23971 messages |
2015-04-15 23:35:44 | larry | create | |
|