This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: allow array.array construction from memoryview w/o copy
Type: enhancement Stage: patch review
Components: Library (Lib) Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: bjkeen, davin, serhiy.storchaka, skrah
Priority: normal Keywords: patch

Created on 2020-04-29 19:01 by bjkeen, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 19800 open bjkeen, 2020-04-29 21:33
Messages (5)
msg367688 - (view) Author: Benjamin Keen (bjkeen) * Date: 2020-04-29 19:01
Currently the array.array object can export a memoryview, but there is no way to construct one from a memoryview without making a copy of the underlying data.  So in that sense array.array only supports one half of the buffer protocol and this is to allow for the other half.

This proposal is to allow the array object to be constructed from an existing exported buffer without copying and reallocating the memory, permitting operations that can modify the underlying buffer's contents but not the allocation.

This is useful when working with many small pieces of one very large underlying buffer that you do not want to copy, when desiring to work with different parts of it with different types, and as part of a way to work with shared memory in multiple processes.

I will shortly have a PR for this, including updates for the documentation and unit tests.

 - Modules/arraymodule.c already must check if the array object has exported a buffer for methods that might resize. If the array was constructed from an imported buffer, the same restrictions apply.  So the object just needs to know whether it is constructed from a Py_Buffer or not and check in the same places it checks for the export count being nonzero. So the code doesn't need to be perturbed that much.

- Only MemoryView objects with contiguous layout, size, and alignment compatible with the data type of the array element are allowed.

- I'm proposing this is only for when it's an actual memoryview object, not just if the object can export buffers. This preserves more of the existing behavior.

- Currently you /can/ initialize an array with a type-compatible memoryview - but it makes a copy, iterating the elements and the types have to match, not just in size. We could maintain exact backward compatibility by adding an extra argument to array.array() or another letter to the format specifier; my current patch doesn't do this though.

-----------------------------------------------------------
Example of current behavior:

>>> import array
>>> x = array.array('b', [1,2,3,4])
>>> y = memoryview(x)
>>> z = array.array('b', y)
>>> z
array('b', [1, 2, 3, 4])
>>> z[0] = 42
>>> x
array('b', [1, 2, 3, 4])
>>> z
array('b', [42, 2, 3, 4])
     # x and z are backed by different memory
>>> x.append(17)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BufferError: cannot resize an array that is exporting buffers
     # this is because y is still a live object
>>> z.append(17)
     # it is really a copy, x and y are irrelevant to z
>>> z
array('b', [42, 2, 3, 4, 17])

----------------------------------------
Example of new behavior:

>>> import array
>>> x = array.array('b', [1,2,3,4])
>>> x
array('b', [1, 2, 3, 4])
>>> y = memoryview(x)
>>> z = array.array('b', y)
>>> z
array('b', [1, 2, 3, 4])
>>> z[0] = 42
>>> x
array('b', [42, 2, 3, 4])
>>> x.append(4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BufferError: cannot resize an array that is exporting buffers
>>> z.append(4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BufferError: cannot resize an array constructed from an imported buffer
msg367705 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-04-29 21:56
array.array should copy the content, to be able to modify it. It implements both the storage for data and the view of that storage.

What you want is already implemented as the memoryview object.

>>> import array
>>> x = array.array('b', [1,2,3,4])
>>> x
array('b', [1, 2, 3, 4])
>>> z = memoryview(x).cast('h')
>>> z
<memory at 0x7f31e79d2c80>
>>> list(z)
[513, 1027]
>>> z[0] = 42
>>> x
array('b', [42, 0, 3, 4])
>>> x.append(4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BufferError: cannot resize an array that is exporting buffers
msg367879 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2020-05-01 19:59
Being able to create an array.array without making a copy of a memoryview's contents does sound valuable.  We do not always want to modify the size of the array, as evidenced by array.array's existing functionality where its size-changing manipulations (like append) are suppressed when exporting a buffer.  So I think it is okay to not require a copy be made when constructing an array.array in this way.

Serhiy's example is a good one for demonstrating how different parts of an array.array can be treated as having different types as far as getting and setting items.  I have met a number of hardware groups in mostly larger companies that use array.array to expose raw data being read directly from devices.  They wastefully make copies of their often large array.array objects, each with a distinct type code, so that they can make use of array.array's index() and count() and other functions, which are not available on a memoryview.

Within the core of Python (that is, including the standard library but excluding 3rd party packages), we have a healthy number of examples of objects that expose a buffer via the Buffer Protocol but they lack the symmetry of going the other way to enable creation from an existing buffer.  My sense is it would be a welcome thing to see something like array.array, that is designed to work with low-level data types, support creation from an existing buffer without the need for a copy -- this is the explicit purpose of the Buffer Protocol after all but array.array only supports export, not creation, which currently makes array.array feel inconsistent.
msg367883 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-01 20:10
> My sense is it would be a welcome thing to see something like array.array, that is designed to work with low-level data types, support creation from an existing buffer without the need for a copy

It is called memoryview.
msg368258 - (view) Author: Benjamin Keen (bjkeen) * Date: 2020-05-06 14:32
memoryview has a lot of overlap with array, but there are still useful methods (index and count for instance) that memoryview does not have. I don't see a workaround that will run with equivalent speed without writing some extension or adding them to memoryview.

Constructing an array from the memoryview when one wants these isn't a workaround because there may not be enough memory to make a copy. For instance - a memoryview of a mapped disk file that is much larger than the physical memory in the machine.

When writing functions that use an array you don't always know ahead of time whether you are going to be in this situation.  The functions may come from someone else with a bigger machine. This lets the client of the function decide what the right thing to do is as needed without changing the function itself.

So for that reason this brings something that just using the memoryview directly won't provide.  There's value in being able to write things that use the array interface consistently.

You could also think of this as a way of providing compiled-speed index() and count() on certain memoryviews without needing to add new code for that to memoryview itself.
History
Date User Action Args
2022-04-11 14:59:30adminsetgithub: 84620
2020-05-06 14:32:22bjkeensetmessages: + msg368258
2020-05-01 20:10:57serhiy.storchakasetmessages: + msg367883
2020-05-01 19:59:37davinsetnosy: + davin
messages: + msg367879
2020-04-29 21:56:05serhiy.storchakasetnosy: + skrah, serhiy.storchaka
messages: + msg367705
2020-04-29 21:33:42bjkeensetkeywords: + patch
stage: patch review
pull_requests: + pull_request19121
2020-04-29 19:01:33bjkeencreate