msg167937 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-08-11 07:48 |
PEP 3118 specifies that the 'c'format denotes UCS-1 characters, yet .tolist() converts the memoryview into a list of bytes objects. This is incorrect; it ought to be a list of string objects (as it should for 'u' and 'w' codes). The same holds for item access.
|
msg167938 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-08-11 07:54 |
To reproduce:
>>> memoryview(array.array('B',b'foo')).cast('c').tolist()
[b'f', b'o', b'o']
|
msg167940 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-08-11 08:04 |
You have rejected the PEP-3118 'u' and 'w' specifiers here:
http://mail.python.org/pipermail/python-dev/2012-March/117390.html
Otherwise, memoryview follows the existing struct module syntax:
http://docs.python.org/dev/library/struct.html#format-characters
I hope it did not escape you that _testbuffer.c *uses* the struct
module to verify the correctness of memoryview.
|
msg167943 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-08-11 09:18 |
No, I haven't rejected the format codes. What I did ask to revert is that 'u' in the array module denotes Py_UCS4, I requested that it should continue to be compatible with 3.2. I didn't have an opinion on memoryview at all then.
It's unfortunate that PEP 3118 deviates from the struct module, however, memoryview is based onthe buffer interface,and its formatcodes ought to conform to the PEP, not to the struct module (IMO).
It's easy to see that it *doesn't* follow the struct syntax, as it is possjible to create memoryview objects with other format codes in 3.3.
|
msg167944 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-08-11 09:32 |
That the struct module hasn't been updated to support the PEP 3118 is already reported as issue 3132, please don't confuse the issues. This issue is about memoryview.
One solution would be to revert the PEPs decision that 'c' is UCS-1.
|
msg167945 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-08-11 09:39 |
I don't know which behaviour is more desirable, but I would consider PEP 3118 a historical document more than a normative spec. Especially when it comes to struct format codes.
|
msg167948 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-08-11 09:44 |
Martin v. L??wis <report@bugs.python.org> wrote:
> It's unfortunate that PEP 3118 deviates from the struct module, however,
> memoryview is based onthe buffer interface,and its formatcodes ought to
> conform to the PEP, not to the struct module (IMO).
The struct module itself should conform to PEP-3118, see #3132.
I think the struct module should be updated first. The proliferation of
subtly different format codes is not manageable. For example, if you use
NumPy, there are already differences between NumPy syntax and struct syntax.
Also, one should always be able to unpack the tobytes() representation
using the struct module and get the same result as from flatten(tolist()).
> It's easy to see that it *doesn't* follow the struct syntax, as it is
> possjible to create memoryview objects with other format codes in 3.3.
memoryview has *always* allowed arbitrary format strings during construction.
In 3.3, it keeps this property for backwards compatibility.
It does follow struct syntax whenever it *uses* one of the format codes,
like in tolist().
|
msg167949 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-08-11 10:07 |
Martin v. L??wis <report@bugs.python.org> wrote:
> That the struct module hasn't been updated to support the PEP 3118 is
> already reported as issue 3132, please don't confuse the issues.
> This issue is about memoryview.
No, it isn't. It was always planned to use struct to do the unpacking for
memoryview, see msg71338.
On a meta note, I'd appreciate if you were less liberal with words like
"confusing", especially if you are just beginning to work on an issue
that other people have already spent a lot of time on.
|
msg167951 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-08-11 10:35 |
Do you agree or not agree that memoryview.tolist should return a list of str objects for the c code?
If you agree, can you please change the title back?
If you disagree, please explain why, change the title back, and close the issue as rejected.
If you agree, but think that struct should be changed first, create a new issue for the struct change, make that a dependency of this issue, and change the title back.
|
msg167957 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2012-08-11 14:20 |
Whatever the struct module produces for a format code is the same thing that memoryview.to_list() should produce.
PEP 3118 contains way too many errors (as has been found out the hard way) to be considered a normative document.
|
msg167961 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2012-08-11 14:40 |
<Closing with rationale, as Martin requested>
The struct module documentation takes precedence over PEP 3118 when it comes to pre-existing format codes, as changing struct is not feasible due to backwards compatibility concerns, and we don't want two conflicting notations for binary format descriptions. PEP 3118 was intended only to define *additional* format characters, which may or may not yet be understood by the struct module.
As 'c' is defined by the struct module as returning a bytes object of length one, this is the same interpretation used by memoryview.
Thus the current behaviour of both memoryview and struct are considered correct, while it is PEP 3118 that is incorrect in this case: the 'c' entry should not have been in the table, as 'c' was already defined at least as long ago as 1.5.2 (returning an 8-bit string, which then became a bytes object in 3.x).
The PEP was also written in a 2.x context (note the mention of "2.5" above the table of new format codes), where the idea of providing a separate code that implicitly performed x.decode("latin-1") to produce a unicode object instead of an 8-bit string object wouldn't necessarily come up.
|
msg167969 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2012-08-11 15:03 |
However, based on this issue, I have added some comments to #3132 (I think PEP 3118's simplistic approach to embedded text data is broken and a bad idea)
|
msg167970 - (view) |
Author: Chris Jerdonek (chris.jerdonek) * |
Date: 2012-08-11 15:06 |
> <Closing with rationale, as Martin requested>
Status was still open. Was that a tracker bug?
|
msg167973 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2012-08-11 15:21 |
Pretty sure it was just an error on my part.
|
msg167980 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-08-11 17:09 |
Nick: that's a reasonable view, thanks - in particular the point that PEP 3118 should not be considered normative.
I still think that the c code in struct is fairly redundant (with B) as it stands, so I think it should get deprecated and removed - but that's a different issue.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:34 | admin | set | github: 59827 |
2012-08-11 17:09:51 | loewis | set | messages:
+ msg167980 |
2012-08-11 15:21:46 | ncoghlan | set | messages:
+ msg167973 |
2012-08-11 15:06:19 | chris.jerdonek | set | nosy:
+ chris.jerdonek messages:
+ msg167970
|
2012-08-11 15:03:45 | ncoghlan | set | status: open -> closed |
2012-08-11 15:03:26 | ncoghlan | set | messages:
+ msg167969 |
2012-08-11 14:40:31 | ncoghlan | set | resolution: not a bug dependencies:
- implement PEP 3118 struct changes messages:
+ msg167961 stage: resolved |
2012-08-11 14:20:31 | ncoghlan | set | messages:
+ msg167957 |
2012-08-11 10:35:20 | loewis | set | messages:
+ msg167951 |
2012-08-11 10:08:27 | skrah | set | dependencies:
+ implement PEP 3118 struct changes title: memoryview.to_list() incorrect for 'c' format -> struct module 'c' specifier does not follow PEP-3118 |
2012-08-11 10:07:10 | skrah | set | messages:
+ msg167949 title: struct module 'c' specifier does not follow PEP-3118 -> memoryview.to_list() incorrect for 'c' format |
2012-08-11 10:00:19 | Arfrever | set | nosy:
+ Arfrever
|
2012-08-11 09:44:06 | skrah | set | messages:
+ msg167948 title: memoryview.to_list() incorrect for 'c' format -> struct module 'c' specifier does not follow PEP-3118 |
2012-08-11 09:39:30 | pitrou | set | nosy:
+ pitrou messages:
+ msg167945
|
2012-08-11 09:32:02 | loewis | set | messages:
+ msg167944 title: struct module 'c' specifier does not follow PEP-3118 -> memoryview.to_list() incorrect for 'c' format |
2012-08-11 09:18:15 | loewis | set | messages:
+ msg167943 |
2012-08-11 08:06:01 | skrah | set | title: memoryview.to_list() incorrect for 'c' format -> struct module 'c' specifier does not follow PEP-3118 |
2012-08-11 08:04:03 | skrah | set | nosy:
+ ncoghlan messages:
+ msg167940
|
2012-08-11 07:54:10 | loewis | set | messages:
+ msg167938 |
2012-08-11 07:48:53 | loewis | create | |