Author vstinner
Recipients jstasiak, larry, rhettinger, serhiy.storchaka, vstinner, yselivanov
Date 2016-05-19.13:30:45
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1463664646.25.0.773910000579.issue26814@psf.upfronthosting.co.za>
In-reply-to
Content
Hi,

I made progress on my FASTCALL branch. I removed tp_fastnew, tp_fastinit and
tp_fastnew fields from PyTypeObject to replace them with new type flags (ex:
Py_TPFLAGS_FASTNEW) to avoid code duplication and reduce the memory footprint.
Before, each function was simply duplicated.

This change introduces a backward incompatibility change: it's not more
possible to call directly tp_new, tp_init and tp_call. I don't know yet if such
change would be acceptable in Python 3.6, nor if it is worth it.

I spent a lot of ot time on the CPython benchmark suite to check for
performance regression. In fact, I spent most of my time to try to understand
why most benchmarks looked completly unstable. I now tuned correctly my
system and patched perf.py to get reliable benchmarks.

On the latest run of the benchmark suite, most benchmarks are faster! I have to investigate why 3 benchmarks are still slower. In the run, normal_startup was not significant, etree_parse was faster (instead of slower), but raytrace was already slower (but only 1.13x slower). It may be the "noise" of the PGO compilation. I already noticed that once: see the issue #27056 "pickle: constant propagation in _Unpickler_Read()".

Result of the benchmark suite:

slower (3):

* raytrace: 1.06x slower
* etree_parse: 1.03x slower
* normal_startup: 1.02x slower

faster (18):

* unpickle_list: 1.11x faster
* chameleon_v2: 1.09x faster
* etree_generate: 1.08x faster
* etree_process: 1.08x faster
* mako_v2: 1.06x faster
* call_method_unknown: 1.06x faster
* django_v3: 1.05x faster
* regex_compile: 1.05x faster
* etree_iterparse: 1.05x faster
* fastunpickle: 1.05x faster
* meteor_contest: 1.05x faster
* pickle_dict: 1.05x faster
* float: 1.04x faster
* pathlib: 1.04x faster
* silent_logging: 1.04x faster
* call_method: 1.03x faster
* json_dump_v2: 1.03x faster
* call_simple: 1.03x faster

not significant (21):

* 2to3
* call_method_slots
* chaos
* fannkuch
* fastpickle
* formatted_logging
* go
* json_load
* nbody
* nqueens
* pickle_list
* pidigits
* regex_effbot
* regex_v8
* richards
* simple_logging
* spectral_norm
* startup_nosite
* telco
* tornado_http
* unpack_sequence

I know that my patch is simply giant and cannot be merged like that.

Since the performance is still promising, I plan to split my giant
patch into smaller patches, easier to review. I will try to check that
individual patches don't make Python slower. This work will take time.
History
Date User Action Args
2016-05-19 13:30:46vstinnersetrecipients: + vstinner, rhettinger, larry, serhiy.storchaka, yselivanov, jstasiak
2016-05-19 13:30:46vstinnersetmessageid: <1463664646.25.0.773910000579.issue26814@psf.upfronthosting.co.za>
2016-05-19 13:30:46vstinnerlinkissue26814 messages
2016-05-19 13:30:45vstinnercreate