Title: array compare is hideously slow
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.7
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Adrian Wielgosik, josh.r, lou1306, pitrou, rhettinger, swanson
Priority: normal Keywords:

Created on 2015-07-23 23:01 by swanson, last changed 2017-08-17 12:46 by pitrou. This issue is now closed.

File name Uploaded Description Edit lou1306, 2017-02-27 18:35 Testing comparison with subclassed array
Pull Requests
URL Status Linked Edit
PR 3009 merged Adrian Wielgosik, 2017-08-06 19:03
Messages (8)
msg247234 - (view) Author: (swanson) Date: 2015-07-23 23:01
Comparing two array.array objects is much slower than it ought to be.

The whole point of array.array is to be efficient:
 "array — Efficient arrays of numeric values"

But comparing them is orders of magnitude less efficient than comparing tuples or lists of numbers.  It would seem that array's __eq__ is converting every single number into an int, float, etc. first instead of just comparing the arrays in their native format.
If arrays can be copied efficiently, and bytearray can be copied or compared efficiently, there's no good reason for array's equality test to be so stupid.

example code:

from timeit import timeit

setup = '''
from array import array
a = array("I", %s)
ac = a.__copy__
b = ac()
t = tuple(a)
u = t[:1] + t[1:]
for init in ("[1]*10", "[0xffffffff]*10", "[1]*1000", "[0xffffffff]*1000"):
 print("\n", init)
 for action in ("ac()", "a == b", "a == ac()", "t == u"):
  print(("%6.2f" % timeit(action, setup % init)), action)


  0.31 ac()
  0.50 a == b
  0.73 a == ac()
  0.17 t == u

  0.29 ac()
  1.59 a == b
  1.87 a == ac()
  0.15 t == u

  0.84 ac()
 37.06 a == b
 37.72 a == ac()
  2.91 t == u

  0.84 ac()
146.03 a == b
145.97 a == ac()
  2.90 t == u
msg247236 - (view) Author: Josh Rosenberg (josh.r) * Date: 2015-07-24 00:34
You're correct about what is going on; aside from bypassing a bounds check (when not compiled with asserts enabled), the function it uses to get each index is the same as that used to implement indexing at the Python layer. It looks up the getitem function appropriate to the type code over and over, then calls it to create the PyLongObject and performs a rich compare.

The existing behavior is probably necessary to work with array subclasses, but it's also incredibly slow as you noticed. Main question is whether to keep the slow path for subclasses, or (effectively) require that array subclasses overriding __getitem__ also override he rich comparison operators to make them work as expected.

For cases where the signedness and element size are identical, it's trivial to acquire readonly buffers for both arrays and directly compare the memory (with memcmp for EQ/NE or size 1 elements, wmemcmp for appropriately sized wider elements, and simple loops for anything else).
msg288661 - (view) Author: lou1306 (lou1306) Date: 2017-02-27 18:35
I noticed the issue is still there in Python 3.6.
But I can't see why array subclasses should be the reason for this implementation.
By looking at getarrayitem, it looks like __getitem__ does not play any role in how the elements are accessed.
Consider the attached example, where SubclassedArray.__getitem__ is overridden to always return 0: nonetheless, equality checks with an array.array containing the same elements always succeed.

> For cases where the signedness and element size are identical, it's trivial to acquire readonly buffers for both arrays and directly compare the memory

I would argue that _integerness_ sholud also be identical: otherwise
array("l", range(10)) == array("f", range(10))
would evaluate to False, while it is True in the current implementation.
msg299811 - (view) Author: Adrian Wielgosik (Adrian Wielgosik) * Date: 2017-08-06 19:05
Added a PR with a fast path that triggers when compared arrays store values of the same type. In this fast path, no Python objects are created. For big arrays the runtime reduction can reach 50-100x.

It's possible to optimize the comparison loop a bit more - for example array('B') comparison could use memcmp and become extra 10x faster, and other types could receive similar treatment - but I wanted to keep the code relatively simple.
Code duplication with macros is a bit ugly, but that's the best I could come up with so far.

Benchmark results:

Test                                Before      After       % of old time
Equal, same stored type (new fast path)
array('i', range(0))                20.4ns      22.07ns     108.15%
array('i', range(1))                33.39ns     22.32ns     66.86%
array('i', range(10))               152.0ns     31.21ns     20.54%
array('i', range(10000))            447.7us     6.571us     1.47%
array('i', range(100000))           4.626ms     67.24us     1.45%
array('i', [1<<30]*100000))         5.234ms     65.8us      1.26%
array('B', range(10))               151.8ns     28.53ns     18.79%
array('B', range(250))              3.14us      194.5ns     6.19%
array('B', [1,2,3]*1000)            37.76us     2.088us     5.53%
array('d', range(10))               311.9ns     31.22ns     10.01%
array('d', range(100000))           2.889ms     99.3us      3.44%
Equal, different types (slow path)
array('X', range(0))                20.37ns     19.45ns     95.48%
array('X', range(1))                34.87ns     34.42ns     98.72%
array('X', range(10))               169.1ns     169.0ns     99.97%
array('X', range(10000))            462.2us     444.8us     96.23%
array('X', range(100000))           4.752ms     4.571ms     96.20%
Not equal: first element (X) different
array('i', [X] + [1,2,3]*10000)     42.77ns     21.84ns     51.06%
Not equal: last element (X) different
array('i', [1,2,3]*10000 + [X])     375.4us     19.8us      5.27%
msg299860 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-08-07 18:29
Not saying that this issue shouldn't be fixed, but Numpy arrays are a much more powerful and well-optimized replacement for array.array.
msg299900 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-08-08 06:07
The PR looks very reasonable.  This is a nice optimization.
msg300415 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-08-17 12:46
New changeset 7c17e2304b9387f321c813516bf134e4f0bd332a by Antoine Pitrou (Adrian Wielgosik) in branch 'master':
bpo-24700: Add a fast path for comparing array.array of equal type (#3009)
msg300416 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-08-17 12:46
Adrian's PR was merged. Thank you Adrian!
Date User Action Args
2017-08-17 12:46:35pitrousetstatus: open -> closed
resolution: fixed
messages: + msg300416

stage: patch review -> resolved
2017-08-17 12:46:08pitrousetmessages: + msg300415
2017-08-08 09:55:06pitrousetstage: patch review
versions: + Python 3.7, - Python 3.6
2017-08-08 06:07:14rhettingersetnosy: + rhettinger
messages: + msg299900
2017-08-07 18:29:55pitrousetnosy: + pitrou
messages: + msg299860
2017-08-06 19:05:38Adrian Wielgosiksetnosy: + Adrian Wielgosik
messages: + msg299811
2017-08-06 19:03:36Adrian Wielgosiksetpull_requests: + pull_request3042
2017-02-27 18:35:02lou1306setfiles: +
nosy: + lou1306
messages: + msg288661

2015-07-24 02:36:11rhettingersetversions: + Python 3.6, - Python 3.4
2015-07-24 00:34:08josh.rsetnosy: + josh.r
messages: + msg247236
2015-07-23 23:01:14swansoncreate