Issue29304
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2017-01-18 02:36 by methane, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
dict-refactor.patch | methane, 2017-01-18 06:09 | review | ||
dict-refactoring-2.patch | methane, 2017-01-26 17:53 | review | ||
dict-refactoring-3.patch | methane, 2017-01-26 18:54 | review | ||
collisions_count.py | Dmitry Rubanovich, 2017-06-15 06:57 |
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 2273 | merged | methane, 2017-06-19 04:38 | |
PR 2347 | merged | methane, 2017-06-23 12:50 | |
PR 2407 | merged | methane, 2017-06-26 13:25 |
Messages (25) | |||
---|---|---|---|
msg285695 - (view) | Author: Inada Naoki (methane) * | Date: 2017-01-18 02:36 | |
All lookdict* functions are implemented like pseudo language: ``` lookup() if not collision: return result while True: perturb_shift() lookup() if not collision: return result ``` This patch changes it as: ``` while True: lookup() if not collision: return result perturb_shift() ``` It removes 100 lines of code. Good. But how about performance? When this patch is applied to 4a534c45bbf6: ``` $ ../python.patched -m perf compare_to default.json patched2.json -G --min-speed=2 Slower (4): - xml_etree_generate: 271 ms +- 6 ms -> 283 ms +- 9 ms: 1.04x slower (+4%) - nqueens: 263 ms +- 4 ms -> 272 ms +- 3 ms: 1.04x slower (+4%) - scimark_monte_carlo: 272 ms +- 10 ms -> 280 ms +- 14 ms: 1.03x slower (+3%) - scimark_lu: 435 ms +- 23 ms -> 446 ms +- 32 ms: 1.03x slower (+3%) Faster (7): - call_method: 15.2 ms +- 0.2 ms -> 14.7 ms +- 0.4 ms: 1.04x faster (-4%) - call_simple: 14.4 ms +- 0.2 ms -> 13.9 ms +- 0.3 ms: 1.04x faster (-4%) - xml_etree_iterparse: 227 ms +- 9 ms -> 219 ms +- 7 ms: 1.04x faster (-3%) - scimark_sor: 527 ms +- 10 ms -> 510 ms +- 11 ms: 1.03x faster (-3%) - call_method_slots: 14.7 ms +- 0.5 ms -> 14.3 ms +- 0.2 ms: 1.03x faster (-3%) - genshi_text: 90.2 ms +- 1.1 ms -> 87.8 ms +- 1.1 ms: 1.03x faster (-3%) - django_template: 403 ms +- 5 ms -> 394 ms +- 4 ms: 1.02x faster (-2%) Benchmark hidden because not significant (53): 2to3, ... ``` And when this patch applied to 1a97b10cb420 : ``` $ ../python.patched -m perf compare_to default.json patched.json -G --min-speed=2 Slower (6): - call_simple: 13.5 ms +- 0.5 ms -> 14.4 ms +- 0.4 ms: 1.07x slower (+7%) - xml_etree_generate: 270 ms +- 6 ms -> 287 ms +- 5 ms: 1.06x slower (+6%) - xml_etree_process: 240 ms +- 6 ms -> 247 ms +- 4 ms: 1.03x slower (+3%) - regex_compile: 429 ms +- 3 ms -> 440 ms +- 5 ms: 1.03x slower (+3%) - call_method_unknown: 16.1 ms +- 0.2 ms -> 16.5 ms +- 0.3 ms: 1.02x slower (+2%) - logging_simple: 31.2 us +- 0.4 us -> 32.0 us +- 0.3 us: 1.02x slower (+2%) Faster (8): - genshi_text: 90.6 ms +- 1.4 ms -> 87.6 ms +- 1.2 ms: 1.03x faster (-3%) - scimark_sor: 513 ms +- 11 ms -> 497 ms +- 12 ms: 1.03x faster (-3%) - genshi_xml: 200 ms +- 2 ms -> 194 ms +- 2 ms: 1.03x faster (-3%) - unpickle_pure_python: 857 us +- 21 us -> 835 us +- 13 us: 1.03x faster (-3%) - python_startup_no_site: 9.95 ms +- 0.02 ms -> 9.74 ms +- 0.02 ms: 1.02x faster (-2%) - json_dumps: 29.7 ms +- 0.4 ms -> 29.1 ms +- 0.4 ms: 1.02x faster (-2%) - xml_etree_iterparse: 225 ms +- 9 ms -> 220 ms +- 5 ms: 1.02x faster (-2%) - chameleon: 31.1 ms +- 0.3 ms -> 30.5 ms +- 0.5 ms: 1.02x faster (-2%) Benchmark hidden because not significant (50): 2to3, ... ``` I can't see any stable and significant performance regression. I'll try to create some micro benchmarks. |
|||
msg285696 - (view) | Author: Xiang Zhang (xiang.zhang) * | Date: 2017-01-18 03:26 | |
Ahh, I got the same idea before. But then I persuaded myself that the first "lookup()" was deliberately extracted from the loop since it's highly possible there is no collision and you can hit the result (empty or not) the first time. |
|||
msg285698 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2017-01-18 03:54 | |
I worry you guys are rapidly sloshing around and churning code that was developed slowly and methodically over many years. A lot of people have looked at an tweaked the dictionary code, but now it seems like there is a rewrite-everything festival. When looking at potential changes, please consider that the people who came before you knew what they were doing. If there is a pre-loop step that could have been folded into a single loop to save a little code, it isn't there because the people who wrote it are daft. That expansion was made for a reason. Even if you get an improved timing here or there, consider that it may vary from compiler to compiler, from os to os, and may vary widely across different kinds of applications. Branch prediction and cache effect issues don't readily show up in simple timing loops. Particularly as a new core dev, I recommend resisting the urge to rewrite everything you touch. That will only destabilize Python. In the past couple of years, two or three devs have churned an enormous amount of code. |
|||
msg285702 - (view) | Author: Inada Naoki (methane) * | Date: 2017-01-18 05:42 | |
Raymond, I understand your worries. I won't commit this until I do more precise survey. But why I trying this is not only I find duplicated code. I think benefit of this technique is reduced by historical reason. In Python 2.7, there are only two lookup functions: lookdict and lookdict_string. Both functions support dict containing dummy entry. In the first comparing part in the lookdict_string(): if (ep->me_key == NULL || ep->me_key == key) return ep; if (ep->me_key == dummy) freeslot = ep; else { if (ep->me_hash == hash && _PyString_Eq(ep->me_key, key)) return ep; freeslot = NULL; } And similar code in same function but in loop: if (ep->me_key == NULL) return freeslot == NULL ? ep : freeslot; if (ep->me_key == key || (ep->me_hash == hash && ep->me_key != dummy && _PyString_Eq(ep->me_key, key))) return ep; if (ep->me_key == dummy && freeslot == NULL) freeslot = ep; First part can ignore freeslot variable is NULL or not. But second lookup can't. Since most lookup is done at first try, this technique may had significant effect. But for now, we're having five lookdict functions (if we include lookdict_index). Three of them (lookdict_unicode_nodummy, lookdict_split, and lookdict_index) cares only dict without dummies. They don't have freeslot variable. So first part and second part is almost same. Performance benefit from this technique may be negligible. About remaining two functions (lookdict_unicode and lookdict), this technique may have still effect. But performance of them are not so important now, compared with Python 2.7. They were used for all name lookup in Python 2.7, but they are fallback of lookdict_split and lookdict_unicode_nodummy now. On the other hand, having 2.5x (2 -> 5) lookdict function means maintenance cost of duplicated code is increased. At first, I'll start three functions which doesn't take care of dummies. I think there are almost zero performance regression. |
|||
msg285703 - (view) | Author: Inada Naoki (methane) * | Date: 2017-01-18 06:09 | |
I've attached wrong file. This patch is first version I wanted to post. It dedupe all five functions. |
|||
msg286320 - (view) | Author: Inada Naoki (methane) * | Date: 2017-01-26 16:24 | |
Digging history, duplicated code is introduced here. (1997-01-17) https://github.com/python/cpython/commit/99304174680d4c724476dad300ae7fc638842bf0#diff-2131209d0deb0e50c93a88ec6c7b0d52 /* Optimizations based on observations by Jyrki Alakuijala (paraphrased): - This routine is very heavily used, so should be AFAP (As Fast As Possible). - Most of the time, the first try is a hit or a definite miss; so postpone the calculation of incr until we know the first try was a miss. - Write the loop twice, so we can move the test for freeslot==NULL out of the loop. - Write the loop using pointer increments and comparisons rather than using an integer loop index. Note that it behooves the compiler to calculate the values of incr*sizeof(*ep) outside the loops and use this in the increment of ep. I've reduced the number of register variables to the two most obvious candidates. At the time, there was a single lookdict function, and three tries. * The comment said the first try was for skipping `inc` calculation. While `inc` had been removed already, I think this implies skipping freeslot==NULL test. * After first try, there are two loop for skipping freeslot==NULL test until first dummy found. This optimization looks gone. As I said above, lookdict_unicode_nodummy and lookdict_split only search from table without dummies. And lookdict_unicode() and lookdict() are not so important lookmapping() was in 1997, duplicated code only for skipping one freeslot==NULL doesn't make sense. One possible optimization is removing freeslot completely. because: * freeslot is used only when inserting. finding is more important. * insertdict() can find dummy quickly, only looking dk_indices. But before trying optimization, I suggest to remove duplicated code first. Macro bench doesn't show significant difference at least. |
|||
msg286322 - (view) | Author: Inada Naoki (methane) * | Date: 2017-01-26 16:32 | |
BTW, perturb shift uses (i << 2) + i, instead of i*5. I feel it doesn't make sense in 21th century. I'll change it too. |
|||
msg286323 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2017-01-26 17:02 | |
I agreed that the duplication was made for reasons, but these reasons can be not valid or less weighty now, after a number of changes made last years like shared keys or compact dicts. An overhead of additional levels of indirection and complex code of dk_get_index() can be much larger than the benefit of code duplicating. I think neither macrobenchmarks nor microbenchmarks would not show the true effect of this change. The code should be analyzed with more precise tools, similarly to how the set implementation was tuned. I think the patch should not be pushed without such analysis. Perhaps Raymond will found a time to do this. |
|||
msg286328 - (view) | Author: Inada Naoki (methane) * | Date: 2017-01-26 17:53 | |
> I think the patch should not be pushed without such analysis. Perhaps Raymond will found a time to do this. Ok, I won't push until expert's LGTM. > One possible optimization is removing freeslot completely. because: > > * freeslot is used only when inserting. finding is more important. > * insertdict() can find dummy quickly, only looking dk_indices. > > But before trying optimization, I found this is not only optimization, but also refactoring. There is function for finding insertion point already, and insertdict() and PyDict_SetDefault() use it in some cases. So this can be done without adding any code. + ../python.default -m perf compare_to default.json patched.json -G --min-speed=2 Slower (5): - logging_silent: 736 ns +- 7 ns -> 758 ns +- 15 ns: 1.03x slower (+3%) - unpickle: 32.2 us +- 0.4 us -> 33.0 us +- 0.3 us: 1.03x slower (+3%) - unpickle_pure_python: 829 us +- 13 us -> 849 us +- 13 us: 1.02x slower (+2%) - meteor_contest: 200 ms +- 2 ms -> 204 ms +- 2 ms: 1.02x slower (+2%) - scimark_sor: 482 ms +- 8 ms -> 493 ms +- 9 ms: 1.02x slower (+2%) Faster (6): - unpack_sequence: 132 ns +- 2 ns -> 123 ns +- 0 ns: 1.07x faster (-7%) - call_simple: 14.3 ms +- 0.5 ms -> 13.7 ms +- 0.3 ms: 1.05x faster (-5%) - sqlite_synth: 10.3 us +- 0.3 us -> 9.91 us +- 0.21 us: 1.03x faster (-3%) - mako: 40.7 ms +- 0.5 ms -> 39.8 ms +- 0.3 ms: 1.02x faster (-2%) - xml_etree_parse: 319 ms +- 11 ms -> 311 ms +- 14 ms: 1.02x faster (-2%) - chameleon: 30.4 ms +- 0.4 ms -> 29.8 ms +- 0.3 ms: 1.02x faster (-2%) Benchmark hidden because not significant (53): ... |
|||
msg286331 - (view) | Author: Inada Naoki (methane) * | Date: 2017-01-26 18:54 | |
dict-refactoring-3.patch: $ ../python.default -m perf compare_to default.json patched2.json -G --min-speed=2 Slower (7): - scimark_lu: 422 ms +- 35 ms -> 442 ms +- 11 ms: 1.05x slower (+5%) - logging_silent: 736 ns +- 7 ns -> 761 ns +- 21 ns: 1.03x slower (+3%) - scimark_sor: 482 ms +- 8 ms -> 494 ms +- 7 ms: 1.03x slower (+3%) - meteor_contest: 200 ms +- 2 ms -> 205 ms +- 2 ms: 1.02x slower (+2%) - unpickle: 32.2 us +- 0.4 us -> 32.9 us +- 0.5 us: 1.02x slower (+2%) - unpickle_pure_python: 829 us +- 13 us -> 848 us +- 14 us: 1.02x slower (+2%) - scimark_sparse_mat_mult: 8.71 ms +- 0.32 ms -> 8.89 ms +- 0.13 ms: 1.02x slower (+2%) Faster (8): - unpack_sequence: 132 ns +- 2 ns -> 123 ns +- 2 ns: 1.07x faster (-7%) - call_simple: 14.3 ms +- 0.5 ms -> 13.4 ms +- 0.3 ms: 1.07x faster (-6%) - call_method: 15.1 ms +- 0.1 ms -> 14.5 ms +- 0.2 ms: 1.04x faster (-4%) - mako: 40.7 ms +- 0.5 ms -> 39.6 ms +- 0.5 ms: 1.03x faster (-3%) - scimark_monte_carlo: 266 ms +- 7 ms -> 258 ms +- 6 ms: 1.03x faster (-3%) - chameleon: 30.4 ms +- 0.4 ms -> 29.6 ms +- 0.4 ms: 1.03x faster (-3%) - xml_etree_parse: 319 ms +- 11 ms -> 312 ms +- 15 ms: 1.02x faster (-2%) - pickle_pure_python: 1.28 ms +- 0.03 ms -> 1.26 ms +- 0.02 ms: 1.02x faster (-2%) # microbench $ ./python -m perf timeit --compare-to=`pwd`/python.default -s 'r=range(1000)' -- '{k:k for k in r}' python.default: ..................... 60.0 us +- 0.3 us python: ..................... 61.7 us +- 0.4 us Median +- std dev: [python.default] 60.0 us +- 0.3 us -> [python] 61.7 us +- 0.4 us: 1.03x slower (+3%) $ ./python -m perf timeit --compare-to=`pwd`/python.default -s 'ks=[str(k) for k in range(1000)]; d={k:k for k in ks}' -- 'for k in ks: d[k]' python.default: ..................... 37.1 us +- 0.9 us python: ..................... 37.7 us +- 0.9 us Median +- std dev: [python.default] 37.1 us +- 0.9 us -> [python] 37.7 us +- 0.9 us: 1.02x slower (+2%) Hmm, 3% slower? I'll rerun benchmarks with PGO+LTO build. |
|||
msg296070 - (view) | Author: Dmitry Rubanovich (Dmitry Rubanovich) * | Date: 2017-06-15 06:57 | |
lookdict_index() (and the rest of the files in dictobject.c) are using unnecessarily complicated perturb mechanism. And, in fact, it's slower than the simpler case. Instead of this: for (size_t perturb = hash;;) { perturb >>= PERTURB_SHIFT; i = mask & ((i << 2) + i + perturb + 1); .... it should do this: for (size_t perturb = hash;;) { i = mask & ((i << 1) + perturb + 1); perturb >>= PERTURB_SHIFT; .... This would not only save an instruction (a minor issue), but it would also reduce collisions. I've attached a file which calculates frequencies of collisions for demonstration purposes. It shows that the calculation, as it stands right now, does not create a 1-1 mapping even on the 1st iteration through the loop. Moving PERTURB_SHIFT to the line before the calculation does reduce the density of the collision space. But using the calculation, which I proposed, eliminates collisions on the 1st iteration completely and reduces it on most subsequent iterations. |
|||
msg296071 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2017-06-15 07:11 | |
Dmitry, please open a new issue for your proposition. It changes visible behavior, while Inada's patch doesn't do this. |
|||
msg296305 - (view) | Author: Inada Naoki (methane) * | Date: 2017-06-19 04:48 | |
pr-2237 only changes lookdict_index() function. The function is used from only dict_popitem(). So it's not performance critical part. And microbench doesn't show significant performance changes: $ ./python.default -m perf timeit -q -l 2000 -s 'd=dict.fromkeys(str(i) for i in range(2000))' -- 'd.popitem()' Mean +- std dev: 96.1 ns +- 1.3 ns $ ./python.patched -m perf timeit -q -l 2000 -s 'd=dict.fromkeys(str(i) for i in range(2000))' -- 'd.popitem()' Mean +- std dev: 96.4 ns +- 1.3 ns |
|||
msg296680 - (view) | Author: Inada Naoki (methane) * | Date: 2017-06-23 06:22 | |
New changeset 073ae487b3ff9001c69d530c7555ddaa530dee16 by INADA Naoki in branch 'master': bpo-29304: simplify lookdict_index() function. (GH-2273) https://github.com/python/cpython/commit/073ae487b3ff9001c69d530c7555ddaa530dee16 |
|||
msg296702 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2017-06-23 11:32 | |
From a purely human and social perspective, I agree that it's beneficial to minimize duplicated code. From a performance perspective, I can see two possible consequences: - either compilers are already smart enough to undo the code duplication, and generated code will remain the same => no performance difference - or compilers are not able to undo the code duplication, and the reduced I-cache pressure may be beneficial on non-trivial workloads => potential improvement (though probably minor) with the reduced code size |
|||
msg296848 - (view) | Author: Tim Peters (tim.peters) * | Date: 2017-06-26 05:15 | |
I suggest reading the thread I started here[1] before pursuing this: it looks very likely that the entire collision resolution scheme should be replaced with one of the "double hashing" ones given there, a bona fide algorithmic improvement for small tables and pathological key sets. Whether it actually runs faster remains a mystery ;-) The loop guts change from a shift, three adds, and a mask (or a multiply, two adds, and a mask) to just one add and a mask. But the post-first-probe pre-loop setup gets more expensive. [1] https://mail.python.org/pipermail/python-ideas/2017-June/046143.html |
|||
msg296849 - (view) | Author: Tim Peters (tim.peters) * | Date: 2017-06-26 05:20 | |
Oops! I undercounted the shifts in the current scheme: there's an additional shift for "perturb". That doesn't exist in the "double hashing" alternatives. |
|||
msg296890 - (view) | Author: Inada Naoki (methane) * | Date: 2017-06-26 13:19 | |
Antonie: Thank you for you comment. Actually speaking, size of instructions are reduced on amd64@gcc. lookdict_unicode_nodummy and lookdict_split doesn't have any reason to have duplicated code anymore. I'll do dedupe for them first. lookdict and lookdict_unicode have one difference between code before loop and inner loop: // before loop: if (ix == DKIX_DUMMY) { freeslot = i; // inner loop: if (ix == DKIX_DUMMY) { if (freeslot == -1) freeslot = i; Since lookdict_unicode may be used for namespace, I'll keep it for now. But lookdict() is very unlikely to be used for namespace. I'll dedupe it after checking some macro/micro benchmarks. |
|||
msg296891 - (view) | Author: Inada Naoki (methane) * | Date: 2017-06-26 13:22 | |
Tim Peters: Thanks for your suggestion. But I want to focus on very simple code cleanup without any change to proving algorithm nor memory access pattern in this issue. I'll reply about my thought about reducing collision on the ML. |
|||
msg297060 - (view) | Author: Inada Naoki (methane) * | Date: 2017-06-27 17:37 | |
Finally, I removed `freeslot` and all lookdict*() functions are much simpler in GH-2407. Macro bench doesn't show any significant performance change. I'll try to write micro benchmarks. |
|||
msg299709 - (view) | Author: Inada Naoki (methane) * | Date: 2017-08-03 14:42 | |
# microbench https://gist.github.com/methane/db16224a99d12ad4ed170cf3bd45b819 Slower (2): - fromkeys_str: 3.85 us +- 0.07 us -> 4.09 us +- 0.09 us: 1.06x slower (+6%) - fromkeys_int: 4.24 us +- 0.13 us -> 4.48 us +- 0.16 us: 1.06x slower (+6%) Faster (2): - unicode_nodummy_6: 604 ns +- 37 ns -> 577 ns +- 8 ns: 1.05x faster (-4%) - unicode_nodummy_5: 626 ns +- 31 ns -> 600 ns +- 21 ns: 1.04x faster (-4%) Benchmark hidden because not significant (1): unicode_split # pyperformance Slower (10): - xml_etree_iterparse: 191 ms +- 5 ms -> 200 ms +- 5 ms: 1.05x slower (+5%) - pickle_list: 7.80 us +- 0.19 us -> 8.08 us +- 0.24 us: 1.04x slower (+4%) - unpack_sequence: 116 ns +- 5 ns -> 119 ns +- 4 ns: 1.03x slower (+3%) - meteor_contest: 183 ms +- 3 ms -> 187 ms +- 2 ms: 1.02x slower (+2%) - logging_silent: 330 ns +- 6 ns -> 336 ns +- 5 ns: 1.02x slower (+2%) - sqlite_synth: 8.65 us +- 0.16 us -> 8.78 us +- 0.14 us: 1.02x slower (+2%) - sympy_sum: 172 ms +- 1 ms -> 174 ms +- 1 ms: 1.01x slower (+1%) - json_dumps: 27.0 ms +- 0.4 ms -> 27.3 ms +- 0.3 ms: 1.01x slower (+1%) - pickle_dict: 65.1 us +- 1.9 us -> 65.8 us +- 1.7 us: 1.01x slower (+1%) - pickle: 20.4 us +- 0.3 us -> 20.6 us +- 0.2 us: 1.01x slower (+1%) Faster (20): - nbody: 287 ms +- 4 ms -> 231 ms +- 6 ms: 1.24x faster (-19%) - scimark_sparse_mat_mult: 8.94 ms +- 1.35 ms -> 8.26 ms +- 0.36 ms: 1.08x faster (-8%) - chaos: 248 ms +- 3 ms -> 229 ms +- 2 ms: 1.08x faster (-8%) - pickle_pure_python: 1.01 ms +- 0.02 ms -> 953 us +- 13 us: 1.06x faster (-5%) - scimark_fft: 689 ms +- 22 ms -> 654 ms +- 14 ms: 1.05x faster (-5%) - regex_v8: 41.5 ms +- 1.8 ms -> 39.8 ms +- 0.2 ms: 1.04x faster (-4%) - xml_etree_parse: 279 ms +- 12 ms -> 271 ms +- 10 ms: 1.03x faster (-3%) - chameleon: 23.2 ms +- 0.5 ms -> 22.6 ms +- 0.5 ms: 1.03x faster (-2%) - dulwich_log: 146 ms +- 2 ms -> 143 ms +- 1 ms: 1.02x faster (-2%) - unpickle_pure_python: 725 us +- 15 us -> 709 us +- 15 us: 1.02x faster (-2%) - nqueens: 211 ms +- 3 ms -> 206 ms +- 2 ms: 1.02x faster (-2%) - json_loads: 56.6 us +- 2.6 us -> 55.7 us +- 0.4 us: 1.02x faster (-2%) - 2to3: 636 ms +- 3 ms -> 625 ms +- 4 ms: 1.02x faster (-2%) - richards: 148 ms +- 4 ms -> 145 ms +- 2 ms: 1.02x faster (-2%) - spectral_norm: 253 ms +- 9 ms -> 250 ms +- 5 ms: 1.01x faster (-1%) - xml_etree_generate: 203 ms +- 5 ms -> 201 ms +- 4 ms: 1.01x faster (-1%) - html5lib: 194 ms +- 6 ms -> 191 ms +- 6 ms: 1.01x faster (-1%) - sqlalchemy_imperative: 55.0 ms +- 1.7 ms -> 54.3 ms +- 1.7 ms: 1.01x faster (-1%) - sqlalchemy_declarative: 285 ms +- 3 ms -> 281 ms +- 3 ms: 1.01x faster (-1%) - genshi_xml: 171 ms +- 2 ms -> 169 ms +- 2 ms: 1.01x faster (-1%) Benchmark hidden because not significant (30): ... |
|||
msg299711 - (view) | Author: Inada Naoki (methane) * | Date: 2017-08-03 14:45 | |
New changeset 778928b0c7aa438c282727535814d73df850693a by INADA Naoki in branch 'master': bpo-29304: Simplify dict lookup functions (GH-2407) https://github.com/python/cpython/commit/778928b0c7aa438c282727535814d73df850693a |
|||
msg299713 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2017-08-03 16:05 | |
Thank you Inada! |
|||
msg299765 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2017-08-05 01:18 | |
It would have been nice to have timed this on ARM chips or 32-bit builds. I suspect that the previous code was mainly beneficial in register starved environments when it would have saved some memory accesses whenever there were an immediate first hit on a dictionary lookup (which is common). Also, it is more likely to make a difference on larger dicts where the memory accesses aren't all in L1 cache. |
|||
msg299777 - (view) | Author: Inada Naoki (methane) * | Date: 2017-08-05 11:51 | |
On i386 docker image, pyperformance: ./python -m perf compare_to before.json after.json -G --min-speed=2 Slower (4): - regex_dna: 279 ms +- 1 ms -> 288 ms +- 3 ms: 1.03x slower (+3%) - deltablue: 17.9 ms +- 0.3 ms -> 18.3 ms +- 1.0 ms: 1.03x slower (+3%) - scimark_monte_carlo: 258 ms +- 4 ms -> 264 ms +- 4 ms: 1.02x slower (+2%) - unpack_sequence: 115 ns +- 1 ns -> 117 ns +- 3 ns: 1.02x slower (+2%) Faster (9): - scimark_lu: 408 ms +- 16 ms -> 389 ms +- 13 ms: 1.05x faster (-5%) - spectral_norm: 315 ms +- 3 ms -> 302 ms +- 2 ms: 1.04x faster (-4%) - xml_etree_process: 235 ms +- 2 ms -> 228 ms +- 3 ms: 1.03x faster (-3%) - telco: 19.8 ms +- 0.4 ms -> 19.3 ms +- 0.3 ms: 1.03x faster (-3%) - scimark_sor: 487 ms +- 6 ms -> 474 ms +- 4 ms: 1.03x faster (-3%) - django_template: 385 ms +- 3 ms -> 376 ms +- 3 ms: 1.02x faster (-2%) - unpickle_pure_python: 934 us +- 13 us -> 913 us +- 11 us: 1.02x faster (-2%) - pathlib: 68.2 ms +- 0.8 ms -> 66.7 ms +- 0.5 ms: 1.02x faster (-2%) - xml_etree_generate: 277 ms +- 3 ms -> 271 ms +- 7 ms: 1.02x faster (-2%) Benchmark hidden because not significant (44) and microbench: ./python -m perf compare_to before-dict.json after-dict.json -G Slower (2): - fromkeys_str: 5.12 us +- 0.10 us -> 5.27 us +- 0.09 us: 1.03x slower (+3%) - unicode_split: 636 ns +- 11 ns -> 648 ns +- 16 ns: 1.02x slower (+2%) Faster (2): - unicode_nodummy_5: 837 ns +- 28 ns -> 818 ns +- 21 ns: 1.02x faster (-2%) - unicode_nodummy_6: 824 ns +- 10 ns -> 808 ns +- 20 ns: 1.02x faster (-2%) Benchmark hidden because not significant (1): fromkeys_int |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:42 | admin | set | github: 73490 |
2017-08-05 11:51:39 | methane | set | messages: + msg299777 |
2017-08-05 01:18:56 | rhettinger | set | messages: + msg299765 |
2017-08-03 16:05:42 | pitrou | set | messages: + msg299713 |
2017-08-03 14:45:59 | methane | set | status: open -> closed resolution: fixed stage: resolved |
2017-08-03 14:45:17 | methane | set | messages: + msg299711 |
2017-08-03 14:42:32 | methane | set | messages: + msg299709 |
2017-06-27 17:37:15 | methane | set | messages: + msg297060 |
2017-06-26 13:25:19 | methane | set | pull_requests: + pull_request2455 |
2017-06-26 13:22:51 | methane | set | messages: + msg296891 |
2017-06-26 13:19:50 | methane | set | messages: + msg296890 |
2017-06-26 05:20:00 | tim.peters | set | messages: + msg296849 |
2017-06-26 05:15:15 | tim.peters | set | nosy:
+ tim.peters messages: + msg296848 |
2017-06-26 04:52:08 | xgdomingo | set | nosy:
+ xgdomingo |
2017-06-23 12:50:23 | methane | set | pull_requests: + pull_request2402 |
2017-06-23 11:32:22 | pitrou | set | nosy:
+ pitrou messages: + msg296702 |
2017-06-23 06:22:52 | methane | set | messages: + msg296680 |
2017-06-19 04:48:49 | methane | set | messages:
+ msg296305 title: dict: simplify lookup function -> dict: simplify lookup functions |
2017-06-19 04:38:15 | methane | set | pull_requests: + pull_request2323 |
2017-06-15 07:11:53 | serhiy.storchaka | set | messages: + msg296071 |
2017-06-15 06:57:07 | Dmitry Rubanovich | set | files:
+ collisions_count.py type: enhancement components: + Interpreter Core versions: + Python 3.7 nosy: + Dmitry Rubanovich messages: + msg296070 |
2017-01-26 18:54:32 | methane | set | files:
+ dict-refactoring-3.patch messages: + msg286331 |
2017-01-26 17:53:09 | methane | set | files:
+ dict-refactoring-2.patch messages: + msg286328 |
2017-01-26 17:02:38 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg286323 |
2017-01-26 16:32:05 | methane | set | messages: + msg286322 |
2017-01-26 16:24:26 | methane | set | messages: + msg286320 |
2017-01-18 06:09:20 | methane | set | files:
+ dict-refactor.patch messages: + msg285703 |
2017-01-18 06:06:31 | methane | set | files: - dictlook-refactoring.patch |
2017-01-18 05:42:22 | methane | set | messages: + msg285702 |
2017-01-18 03:54:41 | rhettinger | set | nosy:
+ rhettinger messages: + msg285698 |
2017-01-18 03:26:45 | xiang.zhang | set | nosy:
+ xiang.zhang messages: + msg285696 |
2017-01-18 02:36:13 | methane | create |