classification
Title: Change format of a memoryview
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Problems with Py_buffer management in memoryobject.c (and elsewhere?)
View: 10181
Assigned To: Nosy List: gregory.p.smith, haypo, jcon, mark.dickinson, ncoghlan, pitrou, python-dev, skrah, teoliphant, xuanji
Priority: normal Keywords:

Created on 2009-02-12 21:39 by pitrou, last changed 2012-02-25 11:25 by python-dev. This issue is now closed.

Messages (21)
msg81823 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-02-12 21:39
Memoryview objects provide a structured view over a memory area, meaning
the length, indexing and slicing operations respect the itemsize:

>>> import array
>>> a = array.array('i', [1,2,3])
>>> m = memoryview(a)
>>> len(a)
3
>>> m.itemsize
4
>>> m.format
'i'

However, in some cases, you want the memoryview to behave as a chunk of
pure bytes regardless of the original object *and without making a
copy*. Therefore, it would be handy to be able to change the format of
the memoryview, or ask for a new memoryview with another format.

An example of use could be:
>>> a = array.array('i', [1,2,3])
>>> m = memoryview(a).with_format('B')
>>> len(a), m.itemsize, m.format
(12, 1, 'B')
msg81824 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-02-12 21:47
(Another way to see it is as supplying a Python equivalent to the C
buffer API, with access to the raw Py_buffer)
msg81839 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-02-12 23:53
Agreed, this would be useful.  See http://codereview.appspot.com/12470/show if anyone doesn't believe us. 
;)
msg128486 - (view) Author: Xuanji Li (xuanji) Date: 2011-02-13 12:09
Is this issue from 2 years ago still open? I checked the docs and it seems to be.

If it is, I would like to work on a patch and submit it soon.
msg128488 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-02-13 13:07
It is, but keep issue 10181 in mind (since that may lead to some restructuring of the memoryview code, potentially leading to a need to update your patch).
msg135600 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-09 15:32
In the mean time I had to resort to dirty hacks in 1ac03e071d65 (such as using io.BytesIO.write(), which I know is implemented in C and doesn't care about item size).

At the minimum, a memoryview.getflatview() function would be nice (and probably easier to code than the generic version). Or a "flat" optional argument in the memoryview constructor.
msg135601 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-09 15:35
Read a int32 array as a raw byte string is useful, but the opposite is also useful.
msg135976 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-05-14 15:47
Unassigning.  Sorry;  no time for this at the moment.
msg142820 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-08-23 13:10
I think this would be useful and I'll try it out in features/pep-3118#memoryview.

Syntax options that I'd prefer:

a = array.array('i', [1,2,3])
m = memoryview(a, 'B')


Or go all the way and make memoryview take any flag:

a = array.array('i', [1,2,3])
m = memoryview(a, getbuf=PyBUF_SIMPLE)


This is what I currently do in _testbuffer.c:


>>> from _testbuffer import *
>>> import array
>>> a = array.array('i', [1,2,3])
>>> nd = ndarray(a, getbuf=PyBUF_SIMPLE)
>>> nd.format
''
>>> nd.len
12
>>> nd.shape
()
>>> nd.strides
()
>>> nd.itemsize # XXX array_getbuf should set this to 1.
4



We would need to fix various getbuffer() methods to adhere to
strict rules that I've posed here:

http://mail.scipy.org/pipermail/numpy-discussion/2011-August/058189.html
msg142821 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-23 13:24
> Or go all the way and make memoryview take any flag:
> 
> a = array.array('i', [1,2,3])
> m = memoryview(a, getbuf=PyBUF_SIMPLE)

This is good for testing, but Python developers shouldn't have to know
about the low-level flags.
msg142826 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-08-23 13:51
Antoine Pitrou <report@bugs.python.org> wrote:
> > Or go all the way and make memoryview take any flag:
> > 
> > a = array.array('i', [1,2,3])
> > m = memoryview(a, getbuf=PyBUF_SIMPLE)
> 
> This is good for testing, but Python developers shouldn't have to know
> about the low-level flags.

Hmm, indeed. How about:

1) memoryview(a, format='B')

Shadows a builtin function; annoying syntax highlighting in current Vim.

2) memoryview(a, fmt='B')

I'm fully expecting a comment about 'strpbrk' again, but I like it. :)

Also, we've to see about speed implications. My current version of memoryview
(not pushed yet to the public repo) also solves #10227, but is pretty sensitive
even to small changes.
msg142828 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-23 14:06
> Hmm, indeed. How about:
> 
> 1) memoryview(a, format='B')
> 
> Shadows a builtin function; annoying syntax highlighting in current Vim.
> 
> 2) memoryview(a, fmt='B')
> 
> I'm fully expecting a comment about 'strpbrk' again, but I like it. :)

I really prefer "format", it's the natural word to use there.
I don't think this the only place where we shadow a builtin function.
There are probably variables named "dict" in many places.

> Also, we've to see about speed implications. My current version of memoryview
> (not pushed yet to the public repo) also solves #10227, but is pretty sensitive
> even to small changes.

Well, solving #10227 would be nice, but I don't think it's critical
either.
msg142830 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-08-23 14:28
Good, I'll use 'format'. I was mainly worried about the shadowing
issue.
msg142832 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-08-23 15:15
Rethinking a bit: Casting to arbitrary formats might go a bit far.

Currently, the combination (format=NULL, shape=NULL) can serve as
a warning "This buffer has been cast to unsigned bytes".

If we allow casts from bytes to int32, we'll have (format="i", shape=x)
and consumers of that buffer have no indication that the original
exporter had a different format.

If you know what you are doing, fine. On the other hand following
the buffer paths in #12817 quickly turned into a very complex
maze of getbuffer requests.


So, an option would be to try out the cast to bytes first and
disallow other casts.
msg142833 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-08-23 15:22
Casting to a flat 1-D array of bytes is reasonable (it's essentially saying 'look, just give me the raw data, it's on my own head if I stuff up the formatting').

However, requiring an explicit two-step process for any other casting (i.e. take a 1-D view, then a shaped view of that flat 1-D view) also sounds reasonable.

So I agree with Victor that 1-D bytes -> any shape/format and any shape/format -> 1-D bytes should be allowed, but I think we should hold off on allowing arbitrary transformations in a single step.
msg142834 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-23 15:31
> However, requiring an explicit two-step process for any other casting
> (i.e. take a 1-D view, then a shaped view of that flat 1-D view) also
> sounds reasonable.
> 
> So I agree with Victor that 1-D bytes -> any shape/format and any
> shape/format -> 1-D bytes should be allowed, but I think we should
> hold off on allowing arbitrary transformations in a single step.

Converting to 1-D bytes is my main motivation for this feature request,
so I'm fine with such a limitation.

The point is to be able to do in Python what we can do in C, take an
arbitrary buffer and handle it as pure bytes (for I/O or cryptography
purposes, for example).
msg142842 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-08-23 16:27
Nick Coghlan <report@bugs.python.org> wrote:
> So I agree with Victor that 1-D bytes -> any shape/format and any
> shape/format -> 1-D bytes should be allowed, but I think we should
> hold off on allowing arbitrary transformations in a single step.

1-D bytes -> any shape/format would work if everyone agrees on the
Numpy mailing list post that I linked to in an earlier message.

[Summary: PyBUF_SIMPLE may downcast any C-contiguous array to unsigned bytes.]

Otherwise a PyBUF_SIMPLE getbuffer call to the newly shaped memoryview
would be required to fail, and these calls are almost certain to occur
somewhere, e.g. in PyObject_AsWriteBuffer().

But then memoryview would also need a 'shape' parameter:

m = memoryview(x, format='L', shape=[3, 4])

In that case, making it a method might indeed be more clear to underline
that something extraordinary is going on:

m = memoryview(x).cast(format='L', shape=[3, 4])

It also takes away a potential speed loss for regular uses.

1-D bytes would then be defined as 'b', 'B' and 'c', I presume? Being able
to cast to 'c' would also solve certain memoryview index assignment problems
that arise if we opt for strict typing as the struct module does.
msg143729 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-09-08 14:56
The cast method is completely implemented over at #10181.
msg152256 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-29 20:13
Shouldn't this be closed in favour of #10181?
msg152259 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-01-29 20:59
Yes, it's really superseded by #10181 now. I'm closing as 'duplicate',
since technically it'll be fixed once the patch for #10181 is committed.
msg154238 - (view) Author: Roundup Robot (python-dev) Date: 2012-02-25 11:25
New changeset 3f9b3b6f7ff0 by Stefan Krah in branch 'default':
- Issue #10181: New memoryview implementation fixes multiple ownership
http://hg.python.org/cpython/rev/3f9b3b6f7ff0
History
Date User Action Args
2012-02-25 11:25:29python-devsetnosy: + python-dev
messages: + msg154238
2012-01-29 20:59:16skrahsetstatus: open -> closed
superseder: Problems with Py_buffer management in memoryobject.c (and elsewhere?)
messages: + msg152259

dependencies: - Problems with Py_buffer management in memoryobject.c (and elsewhere?)
resolution: duplicate
stage: needs patch -> resolved
2012-01-29 20:13:12pitrousetmessages: + msg152256
2011-09-08 14:56:21skrahsetdependencies: + Problems with Py_buffer management in memoryobject.c (and elsewhere?)
messages: + msg143729
2011-08-23 16:27:03skrahsetmessages: + msg142842
2011-08-23 15:31:53pitrousetmessages: + msg142834
2011-08-23 15:22:39ncoghlansetmessages: + msg142833
2011-08-23 15:15:01skrahsetmessages: + msg142832
2011-08-23 14:28:06skrahsetmessages: + msg142830
2011-08-23 14:06:34pitrousetmessages: + msg142828
2011-08-23 13:51:58skrahsetmessages: + msg142826
2011-08-23 13:24:17pitrousetmessages: + msg142821
2011-08-23 13:10:40skrahsetnosy: + skrah
messages: + msg142820
2011-06-20 18:35:46jconsetnosy: + jcon
2011-05-14 15:47:51mark.dickinsonsetmessages: + msg135976
2011-05-14 15:47:14mark.dickinsonsetassignee: mark.dickinson ->
2011-05-09 15:35:32hayposetnosy: + haypo
messages: + msg135601
2011-05-09 15:32:23pitrousetstage: patch review -> needs patch
2011-05-09 15:32:17pitrousetstage: test needed -> patch review
messages: + msg135600
versions: + Python 3.3, - Python 3.2
2011-02-13 13:07:51ncoghlansetnosy: gregory.p.smith, teoliphant, mark.dickinson, ncoghlan, pitrou, xuanji
messages: + msg128488
2011-02-13 12:09:25xuanjisetnosy: gregory.p.smith, teoliphant, mark.dickinson, ncoghlan, pitrou, xuanji
messages: + msg128486
2011-02-13 11:53:28xuanjisetnosy: + xuanji
2011-01-04 01:44:06pitrousetassignee: mark.dickinson

nosy: + mark.dickinson
2010-08-09 03:19:09terry.reedysetstage: test needed
versions: + Python 3.2, - Python 3.1
2009-02-12 23:53:36gregory.p.smithsetmessages: + msg81839
2009-02-12 21:47:20pitrousetmessages: + msg81824
2009-02-12 21:39:02pitroucreate