Message 354399 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	brandtbucher
Recipients	brandtbucher
Date	2019-10-10.19:16:03
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1570734964.11.0.75032337331.issue38436@roundup.psfhosted.org>
In-reply-to

Content
The attached PR adds a fast path for BINARY_ADD instructions involving two lists, where the left list has a refcount of exactly 1. In this case, we instead do a PySequence_InPlaceConcat operation. This has the affect of avoiding quadratic complexity for list summations, by keeping only one intermediate result and extending it in-place. For (potentially large) lists: a + b + c + d ... Currently executed as: tmp = a + b tmp = tmp + c tmp = tmp + d ... With this change: tmp = a + b tmp += c tmp += d ... Here are the pyperformance results (optimizations enabled): Slower (6): - pickle_list: 3.63 us +- 0.08 us -> 3.78 us +- 0.10 us: 1.04x slower (+4%) - xml_etree_generate: 102 ms +- 2 ms -> 104 ms +- 3 ms: 1.02x slower (+2%) - xml_etree_parse: 166 ms +- 4 ms -> 169 ms +- 9 ms: 1.02x slower (+2%) - scimark_sparse_mat_mult: 5.42 ms +- 0.11 ms -> 5.47 ms +- 0.16 ms: 1.01x slower (+1%) - pickle_dict: 23.6 us +- 0.8 us -> 23.8 us +- 0.4 us: 1.01x slower (+1%) - scimark_fft: 413 ms +- 6 ms -> 416 ms +- 6 ms: 1.01x slower (+1%) Faster (25): - 2to3: 394 ms +- 18 ms -> 381 ms +- 8 ms: 1.03x faster (-3%) - python_startup_no_site: 12.0 ms +- 0.2 ms -> 11.7 ms +- 0.2 ms: 1.03x faster (-3%) - nqueens: 121 ms +- 2 ms -> 118 ms +- 3 ms: 1.03x faster (-2%) - pathlib: 36.0 ms +- 1.3 ms -> 35.1 ms +- 0.9 ms: 1.02x faster (-2%) - pickle_pure_python: 523 us +- 15 us -> 511 us +- 13 us: 1.02x faster (-2%) - deltablue: 8.54 ms +- 0.25 ms -> 8.35 ms +- 0.27 ms: 1.02x faster (-2%) - logging_silent: 220 ns +- 6 ns -> 215 ns +- 8 ns: 1.02x faster (-2%) - raytrace: 566 ms +- 8 ms -> 554 ms +- 11 ms: 1.02x faster (-2%) - hexiom: 11.2 ms +- 0.3 ms -> 11.0 ms +- 0.2 ms: 1.02x faster (-2%) - go: 294 ms +- 6 ms -> 288 ms +- 5 ms: 1.02x faster (-2%) - python_startup: 15.7 ms +- 0.4 ms -> 15.4 ms +- 0.5 ms: 1.02x faster (-2%) - float: 136 ms +- 4 ms -> 134 ms +- 6 ms: 1.02x faster (-2%) - scimark_sor: 232 ms +- 8 ms -> 228 ms +- 5 ms: 1.02x faster (-2%) - fannkuch: 554 ms +- 8 ms -> 545 ms +- 9 ms: 1.02x faster (-2%) - meteor_contest: 115 ms +- 2 ms -> 113 ms +- 3 ms: 1.02x faster (-2%) - telco: 6.58 ms +- 0.22 ms -> 6.48 ms +- 0.15 ms: 1.02x faster (-2%) - regex_effbot: 3.76 ms +- 0.08 ms -> 3.70 ms +- 0.09 ms: 1.02x faster (-2%) - chaos: 129 ms +- 3 ms -> 127 ms +- 2 ms: 1.01x faster (-1%) - regex_v8: 26.3 ms +- 0.4 ms -> 25.9 ms +- 0.6 ms: 1.01x faster (-1%) - regex_compile: 200 ms +- 5 ms -> 197 ms +- 5 ms: 1.01x faster (-1%) - logging_simple: 9.78 us +- 0.19 us -> 9.65 us +- 0.26 us: 1.01x faster (-1%) - sqlite_synth: 3.24 us +- 0.10 us -> 3.20 us +- 0.06 us: 1.01x faster (-1%) - unpickle: 16.4 us +- 0.5 us -> 16.2 us +- 0.5 us: 1.01x faster (-1%) - logging_format: 10.8 us +- 0.2 us -> 10.7 us +- 0.3 us: 1.01x faster (-1%) - json_loads: 30.2 us +- 0.8 us -> 29.9 us +- 0.6 us: 1.01x faster (-1%) Benchmark hidden because not significant (14): json_dumps, nbody, pickle, pidigits, regex_dna, richards, scimark_lu, scimark_monte_carlo, spectral_norm, unpack_sequence, unpickle_list, unpickle_pure_python, xml_etree_iterparse, xml_etree_process This is related to my earlier work in bpo-36229, however this is a much less invasive change that's limited to ceval.c, where our knowledge of context is much better.

The attached PR adds a fast path for BINARY_ADD instructions involving two lists, where the left list has a refcount of exactly 1.

In this case, we instead do a PySequence_InPlaceConcat operation. This has the affect of avoiding quadratic complexity for list summations, by keeping only one intermediate result and extending it in-place.

For (potentially large) lists:

    a + b + c + d ...

Currently executed as:

    tmp = a + b
    tmp = tmp + c
    tmp = tmp + d
    ...

With this change:

    tmp = a + b
    tmp += c
    tmp += d
    ...

Here are the pyperformance results (optimizations enabled):

Slower (6):
- pickle_list: 3.63 us +- 0.08 us -> 3.78 us +- 0.10 us: 1.04x slower (+4%)
- xml_etree_generate: 102 ms +- 2 ms -> 104 ms +- 3 ms: 1.02x slower (+2%)
- xml_etree_parse: 166 ms +- 4 ms -> 169 ms +- 9 ms: 1.02x slower (+2%)
- scimark_sparse_mat_mult: 5.42 ms +- 0.11 ms -> 5.47 ms +- 0.16 ms: 1.01x slower (+1%)
- pickle_dict: 23.6 us +- 0.8 us -> 23.8 us +- 0.4 us: 1.01x slower (+1%)
- scimark_fft: 413 ms +- 6 ms -> 416 ms +- 6 ms: 1.01x slower (+1%)

Faster (25):
- 2to3: 394 ms +- 18 ms -> 381 ms +- 8 ms: 1.03x faster (-3%)
- python_startup_no_site: 12.0 ms +- 0.2 ms -> 11.7 ms +- 0.2 ms: 1.03x faster (-3%)
- nqueens: 121 ms +- 2 ms -> 118 ms +- 3 ms: 1.03x faster (-2%)
- pathlib: 36.0 ms +- 1.3 ms -> 35.1 ms +- 0.9 ms: 1.02x faster (-2%)
- pickle_pure_python: 523 us +- 15 us -> 511 us +- 13 us: 1.02x faster (-2%)
- deltablue: 8.54 ms +- 0.25 ms -> 8.35 ms +- 0.27 ms: 1.02x faster (-2%)
- logging_silent: 220 ns +- 6 ns -> 215 ns +- 8 ns: 1.02x faster (-2%)
- raytrace: 566 ms +- 8 ms -> 554 ms +- 11 ms: 1.02x faster (-2%)
- hexiom: 11.2 ms +- 0.3 ms -> 11.0 ms +- 0.2 ms: 1.02x faster (-2%)
- go: 294 ms +- 6 ms -> 288 ms +- 5 ms: 1.02x faster (-2%)
- python_startup: 15.7 ms +- 0.4 ms -> 15.4 ms +- 0.5 ms: 1.02x faster (-2%)
- float: 136 ms +- 4 ms -> 134 ms +- 6 ms: 1.02x faster (-2%)
- scimark_sor: 232 ms +- 8 ms -> 228 ms +- 5 ms: 1.02x faster (-2%)
- fannkuch: 554 ms +- 8 ms -> 545 ms +- 9 ms: 1.02x faster (-2%)
- meteor_contest: 115 ms +- 2 ms -> 113 ms +- 3 ms: 1.02x faster (-2%)
- telco: 6.58 ms +- 0.22 ms -> 6.48 ms +- 0.15 ms: 1.02x faster (-2%)
- regex_effbot: 3.76 ms +- 0.08 ms -> 3.70 ms +- 0.09 ms: 1.02x faster (-2%)
- chaos: 129 ms +- 3 ms -> 127 ms +- 2 ms: 1.01x faster (-1%)
- regex_v8: 26.3 ms +- 0.4 ms -> 25.9 ms +- 0.6 ms: 1.01x faster (-1%)
- regex_compile: 200 ms +- 5 ms -> 197 ms +- 5 ms: 1.01x faster (-1%)
- logging_simple: 9.78 us +- 0.19 us -> 9.65 us +- 0.26 us: 1.01x faster (-1%)
- sqlite_synth: 3.24 us +- 0.10 us -> 3.20 us +- 0.06 us: 1.01x faster (-1%)
- unpickle: 16.4 us +- 0.5 us -> 16.2 us +- 0.5 us: 1.01x faster (-1%)
- logging_format: 10.8 us +- 0.2 us -> 10.7 us +- 0.3 us: 1.01x faster (-1%)
- json_loads: 30.2 us +- 0.8 us -> 29.9 us +- 0.6 us: 1.01x faster (-1%)

Benchmark hidden because not significant (14): json_dumps, nbody, pickle, pidigits, regex_dna, richards, scimark_lu, scimark_monte_carlo, spectral_norm, unpack_sequence, unpickle_list, unpickle_pure_python, xml_etree_iterparse, xml_etree_process

This is related to my earlier work in bpo-36229, however this is a much less invasive change that's limited to ceval.c, where our knowledge of context is much better.

History
Date	User	Action	Args
2019-10-10 19:16:04	brandtbucher	set	recipients: + brandtbucher
2019-10-10 19:16:04	brandtbucher	set	messageid: <1570734964.11.0.75032337331.issue38436@roundup.psfhosted.org>
2019-10-10 19:16:04	brandtbucher	link	issue38436 messages
2019-10-10 19:16:03	brandtbucher	create