This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: _collectionsmodule.c: Replace `n++; while (--n)` with `for (; n; --n)`
Type: enhancement Stage: resolved
Components: Extension Modules Versions: Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Kojoley, rhettinger, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2017-03-02 16:46 by Kojoley, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
_collectionsmodule.s rhettinger, 2017-03-02 22:21 GCC-6 Disassembly of reverse() in Modules/_collections.c
Pull Requests
URL Status Linked Edit
PR 396 closed Kojoley, 2017-03-02 16:46
Messages (6)
msg288815 - (view) Author: Nikita Kniazev (Kojoley) * Date: 2017-03-02 16:46
I have failed to find previous discussion when `while (n--)` was changed to `n++; while (--n)`. (commits 306d6b1ea6bf2582b9284be2fd27275abbade3e1, 165eee214bc388eb588db33385ca49ddbb305565)

It is clear for me that n++; while (--n) and for (; n; --n) are interchangable statements, but here is the prof of it http://coliru.stacked-crooked.com/a/a6fc4108b223e7b2.

According to asm out https://godbolt.org/g/heHM33 the `for` loop is even shorter (takes less instructions).

While I believe that location of `cmp`/`jmp` instruction makes no sense and performance is the same, but I have made a benchmark.


```
Run on (4 X 3310 MHz CPU s)
02/27/17 22:10:55
Benchmark                      Time           CPU Iterations
------------------------------------------------------------
BM_while_loop/2               13 ns         13 ns   56089384
BM_while_loop/4               17 ns         16 ns   40792279
BM_while_loop/8               24 ns         24 ns   29914338
BM_while_loop/16              40 ns         40 ns   20396140
BM_while_loop/32              84 ns         80 ns    8974301
BM_while_loop/64             146 ns        146 ns    4487151
BM_while_loop/128            270 ns        269 ns    2492862
BM_while_loop/128            267 ns        266 ns    2639500
BM_while_loop/512           1022 ns       1022 ns     641022
BM_while_loop/4096          8203 ns       8344 ns      89743
BM_while_loop/32768        66971 ns      66750 ns      11218
BM_while_loop/262144      545833 ns     546003 ns       1000
BM_while_loop/2097152    4376095 ns    4387528 ns        160
BM_while_loop/8388608   17654654 ns   17883041 ns         41
BM_for_loop/2                 13 ns         13 ns   56089384
BM_for_loop/4                 15 ns         15 ns   49857230
BM_for_loop/8                 21 ns         21 ns   32051077
BM_for_loop/16                37 ns         37 ns   19509351
BM_for_loop/32                81 ns         80 ns    8974301
BM_for_loop/64               144 ns        128 ns    4985723
BM_for_loop/128              265 ns        263 ns    3205108
BM_for_loop/128              265 ns        266 ns    2639500
BM_for_loop/512             1036 ns       1022 ns     641022
BM_for_loop/4096            8314 ns       8344 ns      89743
BM_for_loop/32768          67345 ns      66750 ns      11218
BM_for_loop/262144        541310 ns     546004 ns       1000
BM_for_loop/2097152      4354986 ns    4387528 ns        160
BM_for_loop/8388608     17592428 ns   17122061 ns         41
```

```cpp
#include <benchmark/benchmark.h>

#define MAKE_ROTL_BENCHMARK(name) \
  static void BM_##name(benchmark::State& state) { \
    while (state.KeepRunning()) { \
      int n = name(state.range(0)); \
    } \
  } \
  /**/


int while_loop(int n)
{
  int sum = 0;
  n++;
  while (--n) {
    sum += 1;
  }
  return sum;
}

int for_loop(int n)
{
  int sum = 0;
  for(; n; --n) {
    sum += 1;
  }
  return sum;
}

MAKE_ROTL_BENCHMARK(while_loop)
MAKE_ROTL_BENCHMARK(for_loop)
BENCHMARK(BM_while_loop)->RangeMultiplier(2)->Range(2, 8<<4);
BENCHMARK(BM_while_loop)->Range(8<<4, 8<<20);
BENCHMARK(BM_for_loop)->RangeMultiplier(2)->Range(2, 8<<4);
BENCHMARK(BM_for_loop)->Range(8<<4, 8<<20);

BENCHMARK_MAIN()
```
msg288816 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-02 17:12
The purpose of the issue is unclear to me. Why do you want to replace while with for? For readability?

I agree that "n++; while (--n) ..." is surprising. It seems strange to start with a "n++" to decrement it in the next instruction :-) It's also unusual to write loops like that.


> I have failed to find previous discussion when `while (n--)` was changed to `n++; while (--n)`.

You should ask Raymond Hettinger for the rationale. The commit message says "better generated code (on both GCC and CLang)", but there is no benchmark nor any data to validate this.


> I have made a benchmark.

Nowadays, C compilers are very smart and implement crazy optimizations. Python releases are compiled using PGO, or even PGO+LTO. Please compile your benchmark using PGO. I expect that compilers emit the same machine code at the end for "while" and "for" loops.

The question is also if your benchmark is revelant for the _collections module.

What should I see in your benchmark? I see that results are the same for while and for_loop except a minor noise in the benchmark.

--

I also dislike spending time on such minor micro-optimization...
msg288817 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-02 17:15
Oh, a little bit more context: Raymond Hettinger is the maintainer of _collectionsmodule.c and spent a lot of time to optimize this file!
msg288818 - (view) Author: Nikita Kniazev (Kojoley) * Date: 2017-03-02 17:28
> The purpose of the issue is unclear to me.

I was asked to open an issue by Serhiy Storchaka on the GitHub PR.

> Why do you want to replace while with for? For readability?

Yes, I have open a PR just to improve the readability, because I was surprised by this incrementing-decrementing statements like you.

> You should ask Raymond Hettinger for the rationale. The commit message says "better generated code (on both GCC and CLang)", but there is no benchmark nor any data to validate this.

The purpose is clear to me - to eliminate postincrement and temporary variable that it requires.

> Nowadays, C compilers are very smart and implement crazy optimizations. Python releases are compiled using PGO, or even PGO+LTO. Please compile your benchmark using PGO. I expect that compilers emit the same machine code at the end for "while" and "for" loops.
> The question is also if your benchmark is revelant for the _collections module.
> What should I see in your benchmark? I see that results are the same for while and for_loop except a minor noise in the benchmark.

Benchmark was made to show that location of cmp+jmp location (look at asm output for more info) makes no sense to performance.
Actually I do not want to spend more time on a such minor change.
msg288819 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-02 17:31
> Yes, I have open a PR just to improve the readability, because I was surprised by this incrementing-decrementing statements like you.

Honestly, I don't really care of the exact syntax of C loops in
_collectionmodule.c.

But if your patch is rejected (right now, I'm neutral), it would be
nice to add a comment with a reference to this issue on the loops to
avoid other questions in the future.
msg288838 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-03-02 22:21
Sorry, but I'm going to keep the code as-is.  To mine eyes (the maintainer), the while-loop better expresses my intention (a decrement-skip-on-zero step).

On at least one compiler (GCC-6), it allows better code to generated (a sub-and-test instead of a sub-compare-and-test).
History
Date User Action Args
2022-04-11 14:58:43adminsetgithub: 73884
2017-03-02 22:21:41rhettingersetstatus: open -> closed
files: + _collectionsmodule.s
messages: + msg288838

resolution: not a bug
stage: patch review -> resolved
2017-03-02 17:31:53vstinnersetmessages: + msg288819
2017-03-02 17:28:17Kojoleysetmessages: + msg288818
2017-03-02 17:15:26vstinnersetmessages: + msg288817
2017-03-02 17:12:08vstinnersetnosy: + vstinner
messages: + msg288816
2017-03-02 17:00:12serhiy.storchakasetversions: - Python 3.6
nosy: + rhettinger, serhiy.storchaka

assignee: rhettinger
components: + Extension Modules, - Interpreter Core
stage: patch review
2017-03-02 16:46:33Kojoleycreate