msg107744 - (view) |
Author: Thomas Jollans (tjollans) |
Date: 2010-06-13 19:42 |
Currently, the array constructor, if given a bytearray, detects this with PyByteArray_Check, and hands it on to array_fromstring, which does not support bytearray (by using "s#" with PyArg_ParseTuple) and raises TypeError.
>>> array('h', bytearray(b'xyxyxyxyxyxyxyxyxy'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: must be bytes or read-only buffer, not bytearray
>>>
I see no reason to insist on read-only buffers. I'm attaching a patch that I think fixes this.
|
msg108159 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2010-06-19 01:05 |
Summary: real bug, needs test, patch provided, looks plausible but I am not qualified to even read-review it, let alone apply and test.
(I am removing 3.3 because that is only listed for syntax feature requests that cannot go into 3.2 because of its moratorium on such.)
I agree that this is a bug. For 3.1, the '8.6. array' doc says
"class array.array(typecode[, initializer])
A new array whose items are restricted by typecode, and initialized from the optional initializer value, which must be a list, object supporting the buffer interface, or iterable over elements of the appropriate type."
I believe bytearray 'supports the buffer interface', whatever that means in 3.x. 'Readonly' is definitely *not* specified. Even if bytearray did not qualify on that, instances are *definitely* iterable.
The doc goes on: "If given a list or string, the initializer is passed to the new array’s fromlist(), fromstring(), or fromunicode() method (see below) to add initial items to the array. Otherwise, the iterable initializer is passed to the extend() method."
For 3.x, this is pretty bad as it is copied without change from 2.x and uses 'string' in the 2.x sense. It seems to me that fromstring/fromunicode should have been renamed to frombytes/fromstring with a 2to3 fixer. Too late now, I suppose. However, if indeed 'fromstring' is intended to mean 'from bytes only and not bytearrays', then bytearrays should be passed on to .extend, which accepts them as well as anything else.
>>> from array import array
>>> a=array('h')
>>> a.extend(bytearray(b'xjxjx'))
>>> a
array('h', [120, 106, 120, 106, 120])
The two possible fixes:
1. Extend .fromstring to accept bytearrays. Its current doc "Appends items from the string, interpreting the string as an array of machine values " to me supports doing that. This is what Thomas's patch does.
2. If that is rejected, then bytearray initializers should be passed to .extend.
Thomas: a complete patch needs to include a update to the array unit test suite. This can then move to the patch review stage.
|
msg108161 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-06-19 02:17 |
Thomas' patch does more than just allow bytearray. It allows any object that can present itself as a buffer with byte-size items. It is a bit unfortunate that such method will end up being called "fromstring" in 3.x. With string being unicode, this is hopelessly ambiguous. Is the string interpreted as array of bytes, UTF-8 style, 16 or 32-bit integers? It would be much better to have frombuffer, and even better extendfrombuffer method with this functionality.
I am +1 on Terry's option 2. It may be possible to optimize .extend to detect objects that support buffer interface and bypass iterator protocol. Of course, when receiving array is not type 'b', extend and fromxyz are not the same, so maybe we should just rename fromunicode to fromstring and fromstring to frombytes?
|
msg108163 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2010-06-19 03:19 |
Whatever is done, I think a bytearray should be handled the same as bytes. It must be that they give the same result. In basic operations, I believe that bytearrays can *always* substitute for bytes. "Bytes and bytearray objects contain single bytes – the former is immutable while the latter is a mutable sequence." For example:
>>> b'abc'.capitalize()
b'Abc'
>>> bytearray(b'abc').capitalize()
bytearray(b'Abc')
This, to me, implies that .fromstring should accept bytearray (though probably not general buffer objects). In 2.x, I understand .fromstring to initialize an array from machine bytes read into a string, but not .fromunicode. This is *not* a text method, and the result may vary for 2 and 4-byte unicode builds. So I can see that 3.x needs .fromstring(bytes) but not .fromunicode(string).
|
msg108164 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-06-19 03:34 |
> This, to me, implies that .fromstring should accept bytearray (though
> probably not general buffer objects). In 2.x, I understand .fromstring
> to initialize an array from machine bytes read into a string, but not
> .fromunicode. This is *not* a text method, and the result may vary for 2
> and 4-byte unicode builds. So I can see that 3.x needs
> .fromstring(bytes) but not .fromunicode(string).
No, fromunicode is still needed. It is used to fill type 'u' array with unicode wide characters. In 3.x unicode type was renamed to str and str to bytes, so fromunicode should become fromstring (or maybe fromstr to avoid confusion) and fromstring should become frombytes.
|
msg108186 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-06-19 15:24 |
On the principle the patch looks good. In practice, it lacks a call to `PyBuffer_Release(&buffer)` in the various error cases (before returning NULL). It also lacks some tests in Lib/test/test_array.py.
> In 3.x unicode type was renamed to str and str to bytes, so fromunicode
> should become fromstring (or maybe fromstr to avoid confusion) and
> fromstring should become frombytes.
This should be discussed separately, perhaps on python-dev first.
|
msg108226 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2010-06-20 08:53 |
About the patch:
- Why not reading itemsize from buffer.itemsize?
- A test would be nice in Lib/test/test_array.py (in test_tofromstring?). Please add also a.extend(bytearray(b'xjxjx')) to the test suite.
I agree that it is a bugfix and so i can be applied to 3.1 (and 3.2).
--
> fromstring should become frombytes.
I remember an issue proposing that. I also prefer .frombytes() name: we can add .frombytes() and keep .fromstring() as an alias to .frombytes() (and maybe mark it as deprecated).
A message:
http://bugs.python.org/issue3565#msg71205
An issue starting at:
http://bugs.python.org/issue1023290#msg92069
Can someone open a issue about .frombytes()?
> fromunicode is still needed.
Can someone open a new issue about .fromunicodes()?
|
msg108312 - (view) |
Author: Thomas Jollans (tjollans) |
Date: 2010-06-21 20:09 |
Thanks for the input. I'm going to re-work the patch a bit (releasing buffers and such) and add a test within the next few days.
The question remains whether or not to accept other buffers with itemsize == 1. The way I understand it, fromstring already accepted any read-only buffer object, no matter the item size / whether it actually makes sense to call it a "string". I don't think accepting a hypothetical read-only buffer with items wider than 1 in fromstring (yes, bad naming) is desirable behaviour - I see a few options on how to deal with input validation:
1. ignore the item size. This'd be similar to current behaviour, plus r/w buffers
2. only accept byte-based buffers. ("things that look like 'const char*'") - this is what I've been aiming at.
3. only accept bytes and bytearray, and let the user think about how to deal with other objects. Question is - shouldn't array('B') be treated like bytearray in this respect?
|
msg108419 - (view) |
Author: Thomas Jollans (tjollans) |
Date: 2010-06-22 21:11 |
OK, here's the new patch. I added tests for array(typecode, bytearray(b'abab')), a.extend(b'123') and a.extend(bytearray(b'123')).
@Victor: int itemsize is the array's item size, buffer.itemsize is the strings' (and must be 1)
PROBLEM with this patch:
I changed "s#" to "y*". This means that str arguments are no longer accepted by fromstring. I don't think they ever should have been in 3.x, but it is an incompatible change and this got the test suite, which (I assume the code hasn't changed since 2.x) used a str argument. (changed in patch). It might be best to use "s*" instead of "y*", especially if this is applied to 3.1?
|
msg109028 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2010-07-01 02:30 |
array2.diff:
- can you reindent the line "Py_ssize_t old_size = Py_SIZE(self);"?(even if it's not part of your patch)
- you can avoid the "char *str;" variable (use directly buffer.buf)
--
> I changed "s#" to "y*". This means that str arguments are no longer
> accepted by fromstring (...) it is an incompatible change
It's maybe time to create .frombytes() and .tobytes() methods:
- .tostring() will be a (deprecated?) alias to .tobytes()
- .frombytes() only accepts bytes, bytearray and buffer compatible objects: use "y*" format
- .fromstring() accepts str, bytes, bytearray and buffer compatible objects (encode str to utf-8): use "s*" format
But I still don't understand why array.fromstring() accepts character strings. So an easier solution is to apply array2.diff to Python 3.2, and replace "y*" by "s*" when the patch is applied to 3.1.
|
msg109049 - (view) |
Author: Thomas Jollans (tjollans) |
Date: 2010-07-01 14:00 |
Two more patches:
Firstly, this patch (array_3.2_fromstring.diff) is nearly identical to array2.diff. "y*" would (again) have to be changed to "s*" to apply this to 3.1
|
msg109051 - (view) |
Author: Thomas Jollans (tjollans) |
Date: 2010-07-01 14:09 |
Secondly, this is my attempt to add the more sensibly named {to|from}bytes methods, and to deprecate {to|from}string.
I doubt it's perfect, maybe there's some policy on deprecating methods that I didn't find? This may be better discussed in a separate forum, eg a separate issue or python-dev maybe? I wouldn't know.
The unpatched test suite passes with the rest of this patch applied. (unless using -Werror)
|
msg111872 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2010-07-28 23:25 |
I prefer the second solution (add to/frombytes, deprecate to/fromstring) because I prefer the new method names and it keeps backward compatibility (until we choose to remove the old methods, which should be in Python 3.3).
About the patch (tofrombytes.diff). You should use PyErr_WarnEx() result: if it is not nul (eg. if the user choosed to raise exceptions on warnings), you have to exit.
|
msg114675 - (view) |
Author: Thomas Jollans (tjollans) |
Date: 2010-08-22 12:35 |
Hello again, sorry for the absense.
Victor, thanks for the input. I've attached a new patch that checks the PyErr_WarnEx return value.
|
msg114684 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-08-22 17:59 |
From a quick glance, the latest patch looks ok.
|
msg115271 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-08-31 18:10 |
The patch shouldn't remove the tests for tostring()/fromstring() (they might be deprecated, but they are still supported).
|
msg115274 - (view) |
Author: Thomas Jollans (tjollans) |
Date: 2010-08-31 19:35 |
That sounds reasonable. I've updated the patch to keep the old test_tofromstring testcase.
I'll also attach another patch in a moment that removes what I'm reasonably sure is all the uses of array.tostring and .fromstring in the standard library and the other modules' tests.
|
msg115331 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-09-01 20:30 |
The patches were committed in r84403. Thank you very much!
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:02 | admin | set | github: 53236 |
2011-07-12 19:51:43 | pitrou | link | issue3565 superseder |
2010-09-01 20:30:46 | pitrou | set | status: open -> closed versions:
- Python 3.1 messages:
+ msg115331
resolution: fixed stage: test needed -> resolved |
2010-08-31 19:35:55 | tjollans | set | files:
- tofrombytes.diff |
2010-08-31 19:35:48 | tjollans | set | files:
+ tostring_usage.diff |
2010-08-31 19:35:15 | tjollans | set | files:
+ tofrombytes.diff
messages:
+ msg115274 |
2010-08-31 18:10:26 | pitrou | set | messages:
+ msg115271 |
2010-08-22 17:59:30 | pitrou | set | messages:
+ msg114684 |
2010-08-22 17:59:15 | pitrou | set | files:
- tofrombytes.diff |
2010-08-22 12:35:42 | tjollans | set | files:
+ tofrombytes.diff
messages:
+ msg114675 |
2010-08-22 12:11:24 | tjollans | set | files:
- tofrombytes.diff |
2010-08-22 12:03:47 | tjollans | set | files:
+ tofrombytes.diff |
2010-07-28 23:25:32 | vstinner | set | messages:
+ msg111872 |
2010-07-01 14:23:18 | tjollans | set | files:
- tofrombytes.diff |
2010-07-01 14:23:13 | tjollans | set | files:
+ tofrombytes.diff |
2010-07-01 14:09:17 | tjollans | set | files:
+ tofrombytes.diff
messages:
+ msg109051 |
2010-07-01 14:00:52 | tjollans | set | files:
+ array_3.2_fromstring.diff
messages:
+ msg109049 |
2010-07-01 02:30:17 | vstinner | set | messages:
+ msg109028 |
2010-06-22 21:11:32 | tjollans | set | files:
+ array2.diff
messages:
+ msg108419 |
2010-06-21 20:09:50 | tjollans | set | messages:
+ msg108312 |
2010-06-20 08:53:27 | vstinner | set | messages:
+ msg108226 |
2010-06-19 15:24:09 | pitrou | set | nosy:
+ vstinner, pitrou messages:
+ msg108186
|
2010-06-19 03:34:13 | belopolsky | set | messages:
+ msg108164 |
2010-06-19 03:19:03 | terry.reedy | set | messages:
+ msg108163 |
2010-06-19 02:17:25 | belopolsky | set | nosy:
+ belopolsky messages:
+ msg108161
|
2010-06-19 01:05:26 | terry.reedy | set | versions:
- Python 3.3 nosy:
+ terry.reedy
messages:
+ msg108159
stage: test needed |
2010-06-13 19:42:52 | tjollans | create | |