classification
Title: a suboptimal check in list_extend()
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Oren Milman, rhettinger
Priority: normal Keywords:

Created on 2017-08-08 15:30 by Oren Milman, last changed 2017-08-13 19:14 by Oren Milman. This issue is now closed.

Messages (4)
msg299933 - (view) Author: Oren Milman (Oren Milman) * Date: 2017-08-08 15:30
in listobject.c, in case list_extend() receives an 'iterable' which isn't 'self'
nor a tuple nor a list, we have the following (heavily edited for brevity):
    mn = Py_SIZE(self) + PyObject_LengthHint(iterable);
    list_resize(self, mn);

    ... // self is extended to also include the elements of 'iterable'.

    // (*)
    if (Py_SIZE(self) < self->allocated) {
        list_resize(self, Py_SIZE(self));
    }

IMHO, the condition in (*) is mostly useless, for two reasons:
1. the list_resize() which is called in (*) does nothing in case
   (Py_SIZE(self) >= (self->allocated >> 1)) is true. In particular, this call
   to list_resize() would have done nothing if it had been called while the
   condition in (*) was false.
2. the condition in (*) is false only in the following cases:
    - list_resize(self, mn) caused
      (self->allocated == Py_SIZE(self) + actual_length_of_iterable) to be
      true. e.g.:
          Py_SIZE(self) = 58 and PyObject_LengthHint(iterable) == 8 and
          actual_length_of_iterable == 22
          (because 66 + 66 // 8 + 6 == 80 == 58 + 22).
    - list_resize(self, mn) caused
      (self->allocated < Py_SIZE(self) + actual_length_of_iterable), which
      sometime later caused list_extend() to call app1(), which called
      list_resize(), which caused
      (self->allocated == Py_SIZE(self) + actual_length_of_iterable) to be true.
      e.g.:
          Py_SIZE(self) == 58 and PyObject_LengthHint(iterable) == 8 and
          actual_length_of_iterable == 39
          (because 66 + 66 // 8 + 6 == 80 and
                   81 + 81 // 8 + 6 == 97 == 58 + 39)

   so ISTM that the condition is only rarely false, especially as
   PyObject_LengthHint(iterable) >= actual_length_of_iterable is usually true
   (i guess).


Thus, i believe we should either change the condition in (*) to
(Py_SIZE(self) < (self->allocated >> 1)), or remove it (and always call
list_resize()).


(note that i ignored error cases, as ISTM they are quite irrelevant to the
issue.)
msg299967 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-08-09 04:09
-0 The test "Py_SIZE(self) < self->allocated" is very cheap so I don't mind leaving it in to handle the rare cases (or to prevent future bugs if the list_resize logic ever changes).
msg300211 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-08-13 05:12
I'm concerned that is would provide near zero benefit but would set us up for future bugs by tightly coupling two ideas that I would like to be able to reason about separately.  

The logic for list.extend() makes a potential overallocation, fills the list, and fixes-up the overallocation if needed (I want that logic should all stay together).  And list_resize() should have its own separate logic as well.  

For a course grained operation like extend(), I really don't mind having a code path that makes a relatively cheap test that only occasionally useful.

So, thank you for the suggestion, but I'm going to decline on the grounds that 1) the current code is sometimes useful, 2) it is always cheap, 3) that removing it is odds with the principles of loose coupling and high cohesion, 4) it creates a risk of introducing unnoticed errors in the future the list_resize logic were to change, and 5) no one else has stepped forward to advocate this patch.
msg300229 - (view) Author: Oren Milman (Oren Milman) * Date: 2017-08-13 19:14
thank you for the elaborate reply :)

do you feel the same about changing the check to
(Py_SIZE(self) < (self->allocated >> 1)) ?
History
Date User Action Args
2017-08-13 19:14:51Oren Milmansetmessages: + msg300229
2017-08-13 05:12:20rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg300211

stage: resolved
2017-08-12 10:40:35Oren Milmansettitle: a mostly useless check in list_extend() -> a suboptimal check in list_extend()
2017-08-09 04:09:30rhettingersetnosy: + rhettinger
messages: + msg299967
2017-08-08 15:30:36Oren Milmancreate