This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author methane
Recipients methane, rhettinger, serhiy.storchaka
Date 2017-04-02.04:05:55
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1491105957.4.0.947348008783.issue29961@psf.upfronthosting.co.za>
In-reply-to
Content
I also feel set created from iterator is too large sometimes.
Especially, frozenset is used for "unordered tuple" dict key sometimes.

$ python -c 'N = 6000; from sys import getsizeof; s = [getsizeof(frozenset(range(n))) for n in range(N)]; print( [(n, s[n]) for n in range(N) if not n or s[n] != s[n-1]] )'
[(0, 224), (5, 736), (21, 2272), (85, 8416), (341, 32992), (1365, 131296), (5461, 524512)]

$ python -c 'N = 6000; from sys import getsizeof; s = [getsizeof(dict.fromkeys(range(n))) for n in range(N)]; print( [(n, s[n]) for n in range(N) if not n or s[n] != s[n-1]] )'
[(0, 240), (6, 368), (11, 648), (22, 1184), (43, 2280), (86, 4704), (171, 9320), (342, 18528), (683, 36968), (1366, 73824), (2731, 147560), (5462, 295008)]

set size increases by *4 each resize.
This is code of set_add_entry():

    if ((size_t)so->fill*5 < mask*3)
        return 0;
    return set_table_resize(so, so->used>50000 ? so->used*2 : so->used*4);

If we simply replace `so->used>50000 ? so->used*2 : so->used*4` with `so->used*2`, there may be not enough room when there are some dummy entries. (used + dummy = fill).

But how about replacing `*4` with `*3`?
In general, load factor after resize will be changed from 12.5% ~ 25% to 16.6% ~ 33.3%. It seems OK to me.
In case of no removal (e.g. constructor called with iterator), set size increases by *2 each time:

>>> for n in range(3, 20):
...     print(math.ceil(2 ** n * 3/5) * 3, 2 ** (n+1))
15 16
30 32
60 64
117 128
231 256
462 512
924 1024
1845 2048
3687 4096
7374 8192
14748 16384
29493 32768
58983 65536
117966 131072
235932 262144
471861 524288
943719 1048576
History
Date User Action Args
2017-04-02 04:05:58methanesetrecipients: + methane, rhettinger, serhiy.storchaka
2017-04-02 04:05:57methanesetmessageid: <1491105957.4.0.947348008783.issue29961@psf.upfronthosting.co.za>
2017-04-02 04:05:57methanelinkissue29961 messages
2017-04-02 04:05:55methanecreate