classification
Title: struct module 'c' specifier does not follow PEP-3118
Type: Stage: resolved
Components: Versions: Python 3.3
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, chris.jerdonek, loewis, ncoghlan, pitrou, skrah
Priority: normal Keywords:

Created on 2012-08-11 07:48 by loewis, last changed 2012-08-11 17:09 by loewis. This issue is now closed.

Messages (15)
msg167937 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-11 07:48
PEP 3118 specifies that the 'c'format denotes UCS-1 characters, yet .tolist() converts the memoryview into a list of bytes objects. This is incorrect; it ought to be a list of string objects (as it should for 'u' and 'w' codes). The same holds for item access.
msg167938 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-11 07:54
To reproduce:

>>> memoryview(array.array('B',b'foo')).cast('c').tolist()
[b'f', b'o', b'o']
msg167940 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-11 08:04
You have rejected the PEP-3118 'u' and 'w' specifiers here:

http://mail.python.org/pipermail/python-dev/2012-March/117390.html


Otherwise, memoryview follows the existing struct module syntax:

http://docs.python.org/dev/library/struct.html#format-characters



I hope it did not escape you that _testbuffer.c *uses* the struct
module to verify the correctness of memoryview.
msg167943 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-11 09:18
No, I haven't rejected the format codes. What I did ask to revert is that 'u' in the array module denotes Py_UCS4, I requested that it should continue to be compatible with 3.2. I didn't have an opinion on memoryview at all then.

It's unfortunate that PEP 3118 deviates from the struct module, however, memoryview is based onthe buffer interface,and its formatcodes ought to conform to the PEP, not to the struct module (IMO).

It's easy to see that it *doesn't* follow the struct syntax, as it is possjible to create memoryview objects with other format codes in 3.3.
msg167944 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-11 09:32
That the struct module hasn't been updated to support the PEP 3118 is already reported as issue 3132, please don't confuse the issues. This issue is about memoryview.

One solution would be to revert the PEPs decision that 'c' is UCS-1.
msg167945 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-11 09:39
I don't know which behaviour is more desirable, but I would consider PEP 3118 a historical document more than a normative spec. Especially when it comes to struct format codes.
msg167948 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-11 09:44
Martin v. L??wis <report@bugs.python.org> wrote:
> It's unfortunate that PEP 3118 deviates from the struct module, however,
> memoryview is based onthe buffer interface,and its formatcodes ought to
> conform to the PEP, not to the struct module (IMO).

The struct module itself should conform to PEP-3118, see #3132.

I think the struct module should be updated first. The proliferation of
subtly different format codes is not manageable. For example, if you use
NumPy, there are already differences between NumPy syntax and struct syntax.

Also, one should always be able to unpack the tobytes() representation
using the struct module and get the same result as from flatten(tolist()).

> It's easy to see that it *doesn't* follow the struct syntax, as it is
> possjible to create memoryview objects with other format codes in 3.3.

memoryview has *always* allowed arbitrary format strings during construction.
In 3.3, it keeps this property for backwards compatibility.

It does follow struct syntax whenever it *uses* one of the format codes,
like in tolist().
msg167949 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-11 10:07
Martin v. L??wis <report@bugs.python.org> wrote:
> That the struct module hasn't been updated to support the PEP 3118 is
> already reported as issue 3132, please don't confuse the issues.
> This issue is about memoryview.

No, it isn't. It was always planned to use struct to do the unpacking for
memoryview, see msg71338.

On a meta note, I'd appreciate if you were less liberal with words like
"confusing", especially if you are just beginning to work on an issue
that other people have already spent a lot of time on.
msg167951 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-11 10:35
Do you agree or not agree that memoryview.tolist should return a list of str objects for the c code?

If you agree, can you please change the title back?

If you disagree, please explain why, change the title back, and close the issue as rejected.

If you agree, but think that struct should be changed first, create a new issue for the struct change, make that a dependency of this issue, and change the title back.
msg167957 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-11 14:20
Whatever the struct module produces for a format code is the same thing that memoryview.to_list() should produce.

PEP 3118 contains way too many errors (as has been found out the hard way) to be considered a normative document.
msg167961 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-11 14:40
<Closing with rationale, as Martin requested>

The struct module documentation takes precedence over PEP 3118 when it comes to pre-existing format codes, as changing struct is not feasible due to backwards compatibility concerns, and we don't want two conflicting notations for binary format descriptions. PEP 3118 was intended only to define *additional* format characters, which may or may not yet be understood by the struct module.

As 'c' is defined by the struct module as returning a bytes object of length one, this is the same interpretation used by memoryview.

Thus the current behaviour of both memoryview and struct are considered correct, while it is PEP 3118 that is incorrect in this case: the 'c' entry should not have been in the table, as 'c' was already defined at least as long ago as 1.5.2 (returning an 8-bit string, which then became a bytes object in 3.x).

The PEP was also written in a 2.x context (note the mention of "2.5" above the table of new format codes), where the idea of providing a separate code that implicitly performed x.decode("latin-1") to produce a unicode object instead of an 8-bit string object wouldn't necessarily come up.
msg167969 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-11 15:03
However, based on this issue, I have added some comments to #3132 (I think PEP 3118's simplistic approach to embedded text data is broken and a bad idea)
msg167970 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-08-11 15:06
> <Closing with rationale, as Martin requested>

Status was still open. Was that a tracker bug?
msg167973 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-11 15:21
Pretty sure it was just an error on my part.
msg167980 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-11 17:09
Nick: that's a reasonable view, thanks - in particular the point that PEP 3118 should not be considered normative.

I still think that the c code in struct is fairly redundant (with B) as it stands, so I think it should get deprecated and removed - but that's a different issue.
History
Date User Action Args
2012-08-11 17:09:51loewissetmessages: + msg167980
2012-08-11 15:21:46ncoghlansetmessages: + msg167973
2012-08-11 15:06:19chris.jerdoneksetnosy: + chris.jerdonek
messages: + msg167970
2012-08-11 15:03:45ncoghlansetstatus: open -> closed
2012-08-11 15:03:26ncoghlansetmessages: + msg167969
2012-08-11 14:40:31ncoghlansetresolution: not a bug
dependencies: - implement PEP 3118 struct changes
messages: + msg167961
stage: resolved
2012-08-11 14:20:31ncoghlansetmessages: + msg167957
2012-08-11 10:35:20loewissetmessages: + msg167951
2012-08-11 10:08:27skrahsetdependencies: + implement PEP 3118 struct changes
title: memoryview.to_list() incorrect for 'c' format -> struct module 'c' specifier does not follow PEP-3118
2012-08-11 10:07:10skrahsetmessages: + msg167949
title: struct module 'c' specifier does not follow PEP-3118 -> memoryview.to_list() incorrect for 'c' format
2012-08-11 10:00:19Arfreversetnosy: + Arfrever
2012-08-11 09:44:06skrahsetmessages: + msg167948
title: memoryview.to_list() incorrect for 'c' format -> struct module 'c' specifier does not follow PEP-3118
2012-08-11 09:39:30pitrousetnosy: + pitrou
messages: + msg167945
2012-08-11 09:32:02loewissetmessages: + msg167944
title: struct module 'c' specifier does not follow PEP-3118 -> memoryview.to_list() incorrect for 'c' format
2012-08-11 09:18:15loewissetmessages: + msg167943
2012-08-11 08:06:01skrahsettitle: memoryview.to_list() incorrect for 'c' format -> struct module 'c' specifier does not follow PEP-3118
2012-08-11 08:04:03skrahsetnosy: + ncoghlan
messages: + msg167940
2012-08-11 07:54:10loewissetmessages: + msg167938
2012-08-11 07:48:53loewiscreate