msg183332 - (view) |
Author: Alex Gaynor (alex) * |
Date: 2013-03-02 17:23 |
Following the length_hint PEP, we should expose this facility to end-python programmers. The semantics would be basically: the list has behavior identical to if you hadn't provided length_hint, except the VM is free to preallocate efficiently.
CPython (and PyPy) use this kind of logic internally all over the place when lists are to be returned (including, but not limited to, list() with a single argument!)
If this is considered to be a good idea I'll whip up a patch.
|
msg183339 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2013-03-02 18:39 |
Definitely needs to be discussed on python-ideas/python-dev.
|
msg183349 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2013-03-02 21:48 |
Guido has previously rejected any kind of user specified presizing parameter for dictionaries. He felt that it was breaking the abstraction in a way that caused people to focus on implementation specific details.
Also, set() and dict() use presizing (and hash value reuse) only in the limited case of the input iterable being an exact dictionary or set. Otherwise, it makes no assumptions about the input iterable and grows as needed. For sets, even when the __len__ is known for a general iterable, we don't know how many unique elements are the input -- consider for example: set([None] * 1000000). In typical use cases for sets, a user also doesn't know in advance how many unique elements will be present -- and to the extent they try to add estimation logic, it would only complicate their programs.
|
msg183353 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-03-03 01:15 |
I think it is going a bit too far in the direction of exposing low-level optimizations. Python is not C++. I also agree with Raymond that it's rather rare to construct read-only containers (in the sense that they are allocated once and for all).
|
msg183360 - (view) |
Author: Alex Gaynor (alex) * |
Date: 2013-03-03 03:48 |
python-dev/ideas may be a better place to have this discussion, but basically if you're going to insist that stuff like this doesn't go into Python "because it isn't C++", people are going to have to write C++, and that makes me incredibly sad.
There's an obvious need for this, CPython uses optimizations like this internally, as does PyPy.
|
msg183361 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2013-03-03 03:52 |
I think the emphasis is on "internally" :).
|
msg183365 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-03-03 10:29 |
> There's an obvious need for this, CPython uses optimizations like this
> internally, as does PyPy.
I don't know if it's a need or if it's just "nice to have".
By the way, in the list case it also makes C code simpler, since once
the list is preallocated you just have to call PyList_SET_ITEM to
populate it, and there's no error return to worry about.
Also, lists are easy to pre-allocate in pure Python as well:
l = [None] * N
# populate
for i in range(N):
l[i] = ...
|
msg183366 - (view) |
Author: Alex Gaynor (alex) * |
Date: 2013-03-03 10:31 |
That strategy only works if you know the exact count, it fails if you only have an estimate (as __length_hint__ gives you).
|
msg183473 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-03-04 16:34 |
This unlikely will be useful. C code uses __length_hint__ because C code is fast enough and fill time can be comparable to resize time. But Python code will be one or two order slower and resize time is insignificant fraction of total time for any real code. You can implement your proposition and do the tests, and I doubt that they will show impressive results.
|
msg183536 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-03-05 17:13 |
My python-ideas posting about the same topic http://mail.python.org/pipermail/python-ideas/2013-March/019807.html
|
msg183545 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-03-05 18:53 |
Here is an experimental patch. The speedup is ... measurable.
$ ./python -m timeit -n1000 "l = []" "l.__preallocate__(10000)" "app = l.append" "for i in range(10000): app(i)" "l.__shrink__()"
1000 loops, best of 3: 3.68 msec per loop
$ ./python -m timeit -n1000 "l = []" "app = l.append" "for i in range(10000): app(i)"
1000 loops, best of 3: 3.75 msec per loop
|
msg183552 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-03-05 19:58 |
> Here is an experimental patch. The speedup is ... measurable.
It's likely to be more useful for dict and set, to avoid/limit rehashs.
Also, the allocation overhead depends on the implementation, I suspect the gain would be more important with PyPy.
FWIW, Java proposes this for lists and maps, but I'm not convinced exposing such tunables really makes sense for Python.
|
msg183558 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2013-03-05 22:27 |
The gain will be more noticeable the faster the Python implementation it is running under is. It is going to avoid logN relloc's in just about all implementations. That CPython is relatively slow is not a justification to avoid adding the feature.
I like Christian's proposal of not having it be a constructor argument but having it be a __preallocate__ method on the object. That keeps it out of the eyes of most users reading the docs (keeping the main interface clear) and lets us put it in a special section of the python docs. Code that *needs* it could use it but most would not need to bother.
Python VMs would be free to have dummy do nothing implementations (ie: we should *never* guarantee that a python call to thing.__preallocate__(1000) followed immediately by a C extension module call that takes thing as input can assume anything about the underlying object... it needs to check the size or resize itself before using the unchecked _SET_ITEM macros, etc.)
|
msg183580 - (view) |
Author: Stefan Behnel (scoder) * |
Date: 2013-03-06 05:51 |
I agree with basically all counter-arguments that were brought up so far. While I would eventually like to use the available length hints in the special case of Cython's list comprehensions, I'm far from seeing a general applicability for this. The use cases are fairly limited, the (missing) guarantees are easy to misunderstand by users and alternatives exist for most use cases. Python code shouldn't be bothered with memory management.
|
msg183589 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-03-06 12:37 |
Gregory, thanks. :)
By the way CPython's list type does more than log(N) resize ops:
$ ./listresize.py 10 100 1000 10000 100000
10 list.append() do 4 resize ops.
100 list.append() do 11 resize ops.
1000 list.append() do 28 resize ops.
10000 list.append() do 47 resize ops.
100000 list.append() do 66 resize ops.
I suspect that hash types (dict, set) could gain even more speed as it eliminates the need for rehashing. It's more costly than memcpy() inside realloc().
I understand the proposal as a power user tool. Most people don't need a pneumatic hammer, an ordinary hammer suffices. But in some cases you need a tool with more power. "CPython doesn't gain much from the new API, let's not add it to Python" isn't nice to other implementations that may benefit from it.
|
msg183592 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-03-06 14:14 |
> By the way CPython's list type does more than log(N) resize ops:
Obviously it's an asymptotic complexity, not a hard number.
> I understand the proposal as a power user tool. Most people don't
> need a pneumatic hammer, an ordinary hammer suffices. But in some
> cases you need a tool with more power. "CPython doesn't gain much
> from the new API, let's not add it to Python" isn't nice to other
> implementations that may benefit from it.
I'd still like to see concrete use cases, not micro-benchmarks.
|
msg217843 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2014-05-03 23:17 |
Based on all the negative comments, I'm closing this one. It adds too much complication in return for dubious value.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:42 | admin | set | github: 61540 |
2014-05-03 23:17:55 | rhettinger | set | status: open -> closed resolution: rejected messages:
+ msg217843
|
2013-03-21 03:52:48 | trent | set | nosy:
+ trent
|
2013-03-10 20:49:24 | asvetlov | set | nosy:
+ asvetlov
|
2013-03-07 10:53:11 | giampaolo.rodola | set | nosy:
+ giampaolo.rodola
|
2013-03-06 18:27:00 | vstinner | set | nosy:
+ vstinner
|
2013-03-06 14:14:07 | pitrou | set | messages:
+ msg183592 title: Add length_hint parameter to list, dict, set constructors to allow efficient presizing -> Add length_hint parameter to list, dict, set constructors to allow efficient presizing |
2013-03-06 12:37:17 | christian.heimes | set | files:
+ listresize.py
messages:
+ msg183589 |
2013-03-06 05:51:12 | scoder | set | nosy:
+ scoder messages:
+ msg183580
|
2013-03-05 22:27:59 | gregory.p.smith | set | nosy:
+ gregory.p.smith messages:
+ msg183558
|
2013-03-05 19:58:04 | neologix | set | nosy:
+ neologix messages:
+ msg183552
|
2013-03-05 18:53:35 | christian.heimes | set | files:
+ preallocate.patch keywords:
+ patch messages:
+ msg183545
|
2013-03-05 17:13:02 | christian.heimes | set | messages:
+ msg183536 |
2013-03-05 17:09:48 | christian.heimes | set | nosy:
+ christian.heimes
|
2013-03-04 16:34:15 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg183473
|
2013-03-04 16:27:44 | brett.cannon | set | nosy:
+ brett.cannon
|
2013-03-03 10:31:12 | alex | set | messages:
+ msg183366 |
2013-03-03 10:29:59 | pitrou | set | messages:
+ msg183365 |
2013-03-03 03:52:04 | benjamin.peterson | set | messages:
+ msg183361 |
2013-03-03 03:48:41 | alex | set | messages:
+ msg183360 |
2013-03-03 01:15:43 | pitrou | set | nosy:
+ pitrou messages:
+ msg183353
|
2013-03-02 21:48:38 | rhettinger | set | nosy:
+ rhettinger messages:
+ msg183349
|
2013-03-02 18:39:56 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages:
+ msg183339
|
2013-03-02 17:23:39 | alex | create | |