classification
Title: new empty dict can be more small
Type: resource usage Stage: patch review
Components: Interpreter Core Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: inada.naoki, josh.r, louielu, r.david.murray, serhiy.storchaka, xiang.zhang
Priority: normal Keywords:

Created on 2017-04-11 09:39 by inada.naoki, last changed 2017-04-18 01:44 by josh.r.

Pull Requests
URL Status Linked Edit
PR 1080 open inada.naoki, 2017-04-11 09:42
Messages (17)
msg291470 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-04-11 09:39
dict.clear() make the dict to empty key-sharing dict to reduce it's size.
New dict can use same technique.

$ ./python.default 
Python 3.7.0a0 (heads/master:6dfcc81, Apr 10 2017, 19:55:52) 
[GCC 6.2.0 20161005] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> d = {}
>>> sys.getsizeof(d)
240
>>> d.clear()
>>> sys.getsizeof(d)
72

$ ./python.patched 
Python 3.7.0a0 (heads/master-dirty:6dfcc81, Apr 11 2017, 18:11:02) 
[GCC 6.2.0 20161005] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getsizeof({})
72
msg291471 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-04-11 09:42
performance impact

best case:
$ ./python.patched -m perf timeit --compare-to=`pwd`/python.default  -- '{}'
python.default: ..................... 36.9 ns +- 0.9 ns
python.patched: ..................... 25.3 ns +- 0.7 ns
Mean +- std dev: [python.default] 36.9 ns +- 0.9 ns -> [python.patched] 25.3 ns +- 0.7 ns: 1.46x faster (-31%)

worst case:
$ ./python.patched -m perf timeit --compare-to=`pwd`/python.default  -- 'x={}; x["a"]=1'
python.default: ..................... 73.3 ns +- 1.2 ns
python.patched: ..................... 82.8 ns +- 1.8 ns
Mean +- std dev: [python.default] 73.3 ns +- 1.2 ns -> [python.patched] 82.8 ns +- 1.8 ns: 1.13x slower (+13%)
msg291477 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-04-11 10:36
Isn't the latter case the more common one? Creating an empty dict and then populate it.
msg291481 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-04-11 10:45
> Isn't the latter case the more common one? Creating an empty dict and then populate it.

This is memory usage optimization, not performance optimization.
(But I think memory efficiency makes multi process application faster because L3 cache size is limited resource.)
Later case shows how performance penalty is large.
msg291482 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-04-11 10:46
macro bench result:

$ ./python.default -m perf compare_to -G --min-speed=1 default.json patched.json
Slower (11):
- scimark_lu: 362 ms +- 13 ms -> 383 ms +- 22 ms: 1.06x slower (+6%)
- unpickle_pure_python: 882 us +- 18 us -> 924 us +- 18 us: 1.05x slower (+5%)
- regex_v8: 45.4 ms +- 0.6 ms -> 46.7 ms +- 3.1 ms: 1.03x slower (+3%)
- mako: 40.4 ms +- 0.4 ms -> 41.4 ms +- 0.4 ms: 1.03x slower (+3%)
- meteor_contest: 200 ms +- 1 ms -> 204 ms +- 2 ms: 1.02x slower (+2%)
- genshi_text: 88.8 ms +- 1.2 ms -> 90.1 ms +- 1.6 ms: 1.01x slower (+1%)
- scimark_monte_carlo: 255 ms +- 6 ms -> 258 ms +- 7 ms: 1.01x slower (+1%)
- richards: 176 ms +- 4 ms -> 178 ms +- 8 ms: 1.01x slower (+1%)
- pickle: 24.2 us +- 0.5 us -> 24.4 us +- 0.7 us: 1.01x slower (+1%)
- sympy_str: 438 ms +- 3 ms -> 442 ms +- 3 ms: 1.01x slower (+1%)
- genshi_xml: 196 ms +- 3 ms -> 198 ms +- 2 ms: 1.01x slower (+1%)

Faster (7):
- logging_silent: 746 ns +- 12 ns -> 722 ns +- 11 ns: 1.03x faster (-3%)
- xml_etree_generate: 272 ms +- 4 ms -> 264 ms +- 4 ms: 1.03x faster (-3%)
- telco: 20.7 ms +- 0.7 ms -> 20.2 ms +- 0.4 ms: 1.02x faster (-2%)
- xml_etree_parse: 311 ms +- 13 ms -> 305 ms +- 12 ms: 1.02x faster (-2%)
- nqueens: 266 ms +- 4 ms -> 262 ms +- 2 ms: 1.02x faster (-2%)
- unpack_sequence: 123 ns +- 1 ns -> 122 ns +- 2 ns: 1.01x faster (-1%)
- raytrace: 1.27 sec +- 0.01 sec -> 1.25 sec +- 0.01 sec: 1.01x faster (-1%)

Benchmark hidden because not significant (46)
msg291483 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-04-11 10:55
I mean creating a solo empty dict doesn't seem to make much sense. Although it saves memory, but when it's populated, it's resized and the memory occupation comes back.

And this makes PyDict_New() hard to understand. :-(
msg291485 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-11 11:53
Use "--duplicate 100" when making microbenchmarks for such fast operations. The overhead of iterating can be significant and comparable with the time of the operation.
msg291489 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-04-11 12:21
While I think it's preferable that {} and d.clear() have same memory footprint,
I need real world example which empty dict affects overall memory usage.

I'll check memory usage difference with application I used in this ML thread.
https://mail.python.org/pipermail/python-dev/2017-January/147194.html
msg291494 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-04-11 13:49
I've worked on an application (proprietary, unfortunately) that created a lot of empty dictionaries that only sometimes got populated.  It involved sqlalchemy, but I don't remember if the dicts came from sqlalchemy itself or from the code that used it.  That application did care about memory, and the shared-key dicts were a big benefit to it.
msg291506 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-04-11 17:40
Thank you for your reply.
Would you try to check how the patch [1] affects memory usage of your application?
I think the patch can be applied to 3.6 easily.

[1] https://patch-diff.githubusercontent.com/raw/python/cpython/pull/1080.patch
msg291507 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-04-11 17:54
Sorry, but I no longer have access to that application (I'm a consultant, and the owner is no longer a client).
msg291521 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-04-12 00:06
> I mean creating a solo empty dict doesn't seem to make much sense. Although it saves memory, but when it's populated, it's resized and the memory occupation comes back.

But sometimes it's not populated.

class A:
    def __init__(self, **kwargs):
        self._extra = kwargs

xa = [A() for _ in range(1000)]

So problem is (a) how many empty dicts, and (b) how much memory this patch saves.

> And this makes PyDict_New() hard to understand. :-(

Yes, but it is not new complexity because it's same to d.clear().
msg291683 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-14 19:41
I got following result:

$ ./python.patched -m perf timeit --compare-to=./python.default --duplicate=100 -- '{}'
python.default: ..................... 203 ns +- 5 ns
python.patched: ..................... 97.1 ns +- 0.7 ns

Mean +- std dev: [python.default] 203 ns +- 5 ns -> [python.patched] 97.1 ns +- 0.7 ns: 2.09x faster (-52%)

$ ./python.patched -m perf timeit --compare-to=./python.default --duplicate=100 -- 'x={}; x[1]=1'
python.default: ..................... 494 ns +- 5 ns
python.patched: ..................... 592 ns +- 7 ns

Mean +- std dev: [python.default] 494 ns +- 5 ns -> [python.patched] 592 ns +- 7 ns: 1.20x slower (+20%)

Seems something is wrong with resizing an empty dict. It shouldn't take such much time.
msg291776 - (view) Author: Louie Lu (louielu) * Date: 2017-04-17 03:13
Inada's patch version act different inside `PyObject_SetItem`, 

when running this code: 'x = {}; x['a'] = 123'

at PyObject_SetItem,

patch version goes to this line:
  >│179         if (m && m->mp_ass_subscript)
   │180             return m->mp_ass_subscript(o, key, value);

but original version goes to:
  >│182         if (o->ob_type->tp_as_sequence) {
   │183             if (PyIndex_Check(key)) {

I think that's why the performance issue came out, still digging why this happened.
msg291777 - (view) Author: Louie Lu (louielu) * Date: 2017-04-17 03:22
forgive my words, I trace the wrong code, sorry about that.
msg291791 - (view) Author: Louie Lu (louielu) * Date: 2017-04-17 08:41
I'm testing[1] that if we make a fast path that detect if keys is `empty_keys_struct` inside `dictresize`. It can be faster than original patch, but still slower than default (unpatch) in most case.


➜  cpython git:(compact_empty_dict) ✗ ./python.default -m perf compare_to -G --min-speed=1 default.json lpatch.json  
Slower (14):
- pickle_dict: 91.1 us +- 2.2 us -> 98.0 us +- 3.3 us: 1.08x slower (+8%)
- xml_etree_parse: 534 ms +- 29 ms -> 565 ms +- 27 ms: 1.06x slower (+6%)
- crypto_pyaes: 679 ms +- 22 ms -> 708 ms +- 22 ms: 1.04x slower (+4%)
- regex_effbot: 12.1 ms +- 0.2 ms -> 12.6 ms +- 0.2 ms: 1.04x slower (+4%)
- tornado_http: 678 ms +- 21 ms -> 704 ms +- 31 ms: 1.04x slower (+4%)
- pidigits: 432 ms +- 7 ms -> 447 ms +- 18 ms: 1.03x slower (+3%)
- spectral_norm: 869 ms +- 21 ms -> 898 ms +- 22 ms: 1.03x slower (+3%)
- unpickle_list: 20.6 us +- 0.6 us -> 21.2 us +- 0.8 us: 1.03x slower (+3%)
- pathlib: 87.9 ms +- 3.0 ms -> 90.6 ms +- 3.5 ms: 1.03x slower (+3%)
- pickle_list: 13.0 us +- 0.3 us -> 13.3 us +- 0.4 us: 1.03x slower (+3%)
- meteor_contest: 367 ms +- 13 ms -> 378 ms +- 14 ms: 1.03x slower (+3%)
- scimark_sor: 991 ms +- 28 ms -> 1.02 sec +- 0.03 sec: 1.03x slower (+3%)
- sympy_expand: 1.73 sec +- 0.05 sec -> 1.77 sec +- 0.04 sec: 1.02x slower (+2%)
- python_startup: 29.5 ms +- 1.1 ms -> 30.1 ms +- 1.9 ms: 1.02x slower (+2%)

Faster (8):
- sympy_integrate: 84.3 ms +- 8.3 ms -> 78.3 ms +- 5.0 ms: 1.08x faster (-7%)
- call_simple: 30.6 ms +- 1.7 ms -> 29.0 ms +- 1.4 ms: 1.06x faster (-5%)
- pickle: 43.2 us +- 3.2 us -> 41.1 us +- 1.9 us: 1.05x faster (-5%)
- call_method_unknown: 36.4 ms +- 1.6 ms -> 35.0 ms +- 1.5 ms: 1.04x faster (-4%)
- scimark_lu: 781 ms +- 42 ms -> 752 ms +- 34 ms: 1.04x faster (-4%)
- sympy_sum: 385 ms +- 21 ms -> 372 ms +- 17 ms: 1.03x faster (-3%)
- logging_silent: 1.30 us +- 0.04 us -> 1.26 us +- 0.04 us: 1.03x faster (-3%)
- django_template: 665 ms +- 20 ms -> 643 ms +- 18 ms: 1.03x faster (-3%)

Benchmark hidden because not significant (42)

[1]: https://github.com/lulouie/cpython/blob/compact_empty_dict/Objects/dictobject.c#L1247
msg291817 - (view) Author: Josh Rosenberg (josh.r) * Date: 2017-04-18 01:44
For the record, legitimate case when many empty dicts are created, and few are populated, is the collections-free approach to defaultdict(dict):

    mydict.setdefault(key1, {})[key2] = val

For, say, 100 unique key1s, and 10,000 total key1/key2 pairs, you'd create 10,000 empty dicts, discarding 9,900 of them. Granted, collections.defaultdict(dict) is even better (avoids the 9,900 unused dicts entirely), but I see the setdefault pattern enough, usually with list or dict, that it's not totally unreasonable to account for it.
History
Date User Action Args
2017-04-18 01:44:38josh.rsetnosy: + josh.r
messages: + msg291817
2017-04-17 08:41:52louielusetmessages: + msg291791
2017-04-17 03:22:45louielusetmessages: + msg291777
2017-04-17 03:13:36louielusetnosy: + louielu
messages: + msg291776
2017-04-14 19:41:48serhiy.storchakasetmessages: + msg291683
2017-04-12 00:06:48inada.naokisetmessages: + msg291521
2017-04-11 17:54:09r.david.murraysetmessages: + msg291507
2017-04-11 17:40:14inada.naokisetmessages: + msg291506
2017-04-11 13:49:01r.david.murraysetnosy: + r.david.murray
messages: + msg291494
2017-04-11 12:21:02inada.naokisetmessages: + msg291489
2017-04-11 11:53:02serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg291485
stage: patch review
2017-04-11 10:55:29xiang.zhangsetmessages: + msg291483
2017-04-11 10:46:49inada.naokisetmessages: + msg291482
2017-04-11 10:45:48inada.naokisetmessages: + msg291481
2017-04-11 10:36:31xiang.zhangsetnosy: + xiang.zhang
messages: + msg291477
2017-04-11 09:43:22inada.naokisettype: resource usage
components: + Interpreter Core
versions: + Python 3.7
2017-04-11 09:42:45inada.naokisetmessages: + msg291471
2017-04-11 09:42:12inada.naokisetpull_requests: + pull_request1223
2017-04-11 09:39:20inada.naokicreate