Message 265856 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	jstasiak, larry, rhettinger, serhiy.storchaka, vstinner, yselivanov
Date	2016-05-19.13:30:45
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1463664646.25.0.773910000579.issue26814@psf.upfronthosting.co.za>
In-reply-to

Content
Hi, I made progress on my FASTCALL branch. I removed tp_fastnew, tp_fastinit and tp_fastnew fields from PyTypeObject to replace them with new type flags (ex: Py_TPFLAGS_FASTNEW) to avoid code duplication and reduce the memory footprint. Before, each function was simply duplicated. This change introduces a backward incompatibility change: it's not more possible to call directly tp_new, tp_init and tp_call. I don't know yet if such change would be acceptable in Python 3.6, nor if it is worth it. I spent a lot of ot time on the CPython benchmark suite to check for performance regression. In fact, I spent most of my time to try to understand why most benchmarks looked completly unstable. I now tuned correctly my system and patched perf.py to get reliable benchmarks. On the latest run of the benchmark suite, most benchmarks are faster! I have to investigate why 3 benchmarks are still slower. In the run, normal_startup was not significant, etree_parse was faster (instead of slower), but raytrace was already slower (but only 1.13x slower). It may be the "noise" of the PGO compilation. I already noticed that once: see the issue #27056 "pickle: constant propagation in _Unpickler_Read()". Result of the benchmark suite: slower (3): * raytrace: 1.06x slower * etree_parse: 1.03x slower * normal_startup: 1.02x slower faster (18): * unpickle_list: 1.11x faster * chameleon_v2: 1.09x faster * etree_generate: 1.08x faster * etree_process: 1.08x faster * mako_v2: 1.06x faster * call_method_unknown: 1.06x faster * django_v3: 1.05x faster * regex_compile: 1.05x faster * etree_iterparse: 1.05x faster * fastunpickle: 1.05x faster * meteor_contest: 1.05x faster * pickle_dict: 1.05x faster * float: 1.04x faster * pathlib: 1.04x faster * silent_logging: 1.04x faster * call_method: 1.03x faster * json_dump_v2: 1.03x faster * call_simple: 1.03x faster not significant (21): * 2to3 * call_method_slots * chaos * fannkuch * fastpickle * formatted_logging * go * json_load * nbody * nqueens * pickle_list * pidigits * regex_effbot * regex_v8 * richards * simple_logging * spectral_norm * startup_nosite * telco * tornado_http * unpack_sequence I know that my patch is simply giant and cannot be merged like that. Since the performance is still promising, I plan to split my giant patch into smaller patches, easier to review. I will try to check that individual patches don't make Python slower. This work will take time.

Hi,

I made progress on my FASTCALL branch. I removed tp_fastnew, tp_fastinit and
tp_fastnew fields from PyTypeObject to replace them with new type flags (ex:
Py_TPFLAGS_FASTNEW) to avoid code duplication and reduce the memory footprint.
Before, each function was simply duplicated.

This change introduces a backward incompatibility change: it's not more
possible to call directly tp_new, tp_init and tp_call. I don't know yet if such
change would be acceptable in Python 3.6, nor if it is worth it.

I spent a lot of ot time on the CPython benchmark suite to check for
performance regression. In fact, I spent most of my time to try to understand
why most benchmarks looked completly unstable. I now tuned correctly my
system and patched perf.py to get reliable benchmarks.

On the latest run of the benchmark suite, most benchmarks are faster! I have to investigate why 3 benchmarks are still slower. In the run, normal_startup was not significant, etree_parse was faster (instead of slower), but raytrace was already slower (but only 1.13x slower). It may be the "noise" of the PGO compilation. I already noticed that once: see the issue #27056 "pickle: constant propagation in _Unpickler_Read()".

Result of the benchmark suite:

slower (3):

* raytrace: 1.06x slower
* etree_parse: 1.03x slower
* normal_startup: 1.02x slower

faster (18):

* unpickle_list: 1.11x faster
* chameleon_v2: 1.09x faster
* etree_generate: 1.08x faster
* etree_process: 1.08x faster
* mako_v2: 1.06x faster
* call_method_unknown: 1.06x faster
* django_v3: 1.05x faster
* regex_compile: 1.05x faster
* etree_iterparse: 1.05x faster
* fastunpickle: 1.05x faster
* meteor_contest: 1.05x faster
* pickle_dict: 1.05x faster
* float: 1.04x faster
* pathlib: 1.04x faster
* silent_logging: 1.04x faster
* call_method: 1.03x faster
* json_dump_v2: 1.03x faster
* call_simple: 1.03x faster

not significant (21):

* 2to3
* call_method_slots
* chaos
* fannkuch
* fastpickle
* formatted_logging
* go
* json_load
* nbody
* nqueens
* pickle_list
* pidigits
* regex_effbot
* regex_v8
* richards
* simple_logging
* spectral_norm
* startup_nosite
* telco
* tornado_http
* unpack_sequence

I know that my patch is simply giant and cannot be merged like that.

Since the performance is still promising, I plan to split my giant
patch into smaller patches, easier to review. I will try to check that
individual patches don't make Python slower. This work will take time.

History
Date	User	Action	Args
2016-05-19 13:30:46	vstinner	set	recipients: + vstinner, rhettinger, larry, serhiy.storchaka, yselivanov, jstasiak
2016-05-19 13:30:46	vstinner	set	messageid: <1463664646.25.0.773910000579.issue26814@psf.upfronthosting.co.za>
2016-05-19 13:30:46	vstinner	link	issue26814 messages
2016-05-19 13:30:45	vstinner	create