classification
Title: optimize bytes.join()
Type: performance Stage:
Components: Interpreter Core Versions: Python 3.0
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: loewis, pitrou, rhettinger, terry.reedy
Priority: normal Keywords: patch

Created on 2008-07-28 18:51 by pitrou, last changed 2008-08-02 20:36 by terry.reedy. This issue is now closed.

Files
File name Uploaded Description Edit
bytesjoin.patch pitrou, 2008-07-28 18:51
Messages (7)
msg70365 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-07-28 18:51
When the separator is empty and the sequence only contains one non-empty
item, it is possible to optimize b"".join([...]) by simply returning the
non-empty item.

Since b"".join() is the recommended idiom to join large strings
together, I think it can be useful. I encountered this when trying to
optimize io.BufferedReader.

./python -m timeit -s "l=[b'', b'a' * 10000]" "b''.join(l)"

before the patch:
100000 loops, best of 3: 2.35 usec per loop
after the patch:
1000000 loops, best of 3: 0.335 usec per loop
msg70389 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-07-29 13:00
Raymond, Martin, any opinion on this?
msg70605 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2008-08-01 23:20
How much does the optimization speed up (or slow down?) a more normal
case when it is applicable?
  bl = [b'', b'a']

How much does the optimization slow down the normal case where it is not
applied?
  bl = [b'a']*2
  bl = [b'a']*10000

Could not the .join user simply not add empty list items?

Looking at the code, there appear to be 4 extra operations for every
item in the normal case: assign item_size, test item_size, assign
nonempty, increment nb_nonempty.  I believe this alternative might be
generally faster.  Before the normal scan,

if seplen == 0:
  for item in seq:
    <if second non-null item>: break
  else:
    <do shortcut and return>

<do normal process>

Then normal cases will bail out on the second item and continue without
further impact.
msg70607 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-08-01 23:44
bl = [b'', b'a'] gives a 20% speedup:

before patch:
1000000 loops, best of 3: 0.443 usec per loop
after patch:
1000000 loops, best of 3: 0.349 usec per loop
(20% speedup)

bl = [b'a']*2 gives a 2% slowdown:

before patch:
1000000 loops, best of 3: 0.439 usec per loop
after patch:
1000000 loops, best of 3: 0.447 usec per loop

bl = [b'a']*10000 gives a 1% slowdown:

before patch:
1000 loops, best of 3: 510 usec per loop
after patch:
1000 loops, best of 3: 515 usec per loop

So, even in the worst case of joining the shortest possible non-empty
strings, the overhead is still minimal.

> Could not the .join user simply not add empty list items?

The .join user could also detect whether there is a single element in
the list and the separator is empty, and then avoid calling join :)

The point of integrating the optimization in bytes.join is that:
1) it makes it implicit, so that user code stays clean of "performance
hacks"
2) the optimization itself has much less overhead, since it is written
in C rather than in Python
msg70635 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-08-02 17:22
I dislike this change, not because of the potential slowdown, but
because of the added code complexity.

How plausible is it that you want to join a list of bytes objects, and
the list has more than one item, yet all but one item are empty? If you
have such an application, and performance matters, you shouldn't add the
empty bytes to the list in the first place.

Tentatively rejecting the patch.
msg70636 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-08-02 18:57
Well as I said it occurred to me when doing some measurements of
BufferedReader performance, but I agree that the gain may not justify
the complexity. If nobody objects, feel free to close the issue.
msg70642 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2008-08-02 20:36
>bl = [b'a']*2 gives a 2% slowdown:
>bl = [b'', b'a'] gives a 20% speedup:

If the second case is less than 9% of cases, which I expect is true,
the change would cause an average slowdown ;-)

>I encountered this when trying to optimize io.BufferedReader.

Is there something peculiar about that code (and potentially fixable)
that it tries to join lots of null strings?

>I dislike this change... because of the added code complexity.

That was part of my reason for suggesting the new code be separated from
the old, but it would not much change the calculation above.
So I agree on closing this.
History
Date User Action Args
2008-08-02 20:36:33terry.reedysetstatus: pending -> closed
messages: + msg70642
2008-08-02 18:57:35pitrousetmessages: + msg70636
2008-08-02 17:22:57loewissetstatus: open -> pending
resolution: rejected
messages: + msg70635
2008-08-01 23:44:05pitrousetmessages: + msg70607
2008-08-01 23:20:51terry.reedysetnosy: + terry.reedy
messages: + msg70605
2008-07-29 13:00:03pitrousetnosy: + loewis, rhettinger
messages: + msg70389
2008-07-28 18:51:26pitroucreate