New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
array constructor and array.fromstring should accept bytearray. #53236
Comments
Currently, the array constructor, if given a bytearray, detects this with PyByteArray_Check, and hands it on to array_fromstring, which does not support bytearray (by using "s#" with PyArg_ParseTuple) and raises TypeError. >>> array('h', bytearray(b'xyxyxyxyxyxyxyxyxy'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: must be bytes or read-only buffer, not bytearray
>>> I see no reason to insist on read-only buffers. I'm attaching a patch that I think fixes this. |
Summary: real bug, needs test, patch provided, looks plausible but I am not qualified to even read-review it, let alone apply and test. (I am removing 3.3 because that is only listed for syntax feature requests that cannot go into 3.2 because of its moratorium on such.) I agree that this is a bug. For 3.1, the '8.6. array' doc says I believe bytearray 'supports the buffer interface', whatever that means in 3.x. 'Readonly' is definitely *not* specified. Even if bytearray did not qualify on that, instances are *definitely* iterable. The doc goes on: "If given a list or string, the initializer is passed to the new array’s fromlist(), fromstring(), or fromunicode() method (see below) to add initial items to the array. Otherwise, the iterable initializer is passed to the extend() method." For 3.x, this is pretty bad as it is copied without change from 2.x and uses 'string' in the 2.x sense. It seems to me that fromstring/fromunicode should have been renamed to frombytes/fromstring with a 2to3 fixer. Too late now, I suppose. However, if indeed 'fromstring' is intended to mean 'from bytes only and not bytearrays', then bytearrays should be passed on to .extend, which accepts them as well as anything else. >>> from array import array
>>> a=array('h')
>>> a.extend(bytearray(b'xjxjx'))
>>> a
array('h', [120, 106, 120, 106, 120]) The two possible fixes:
Thomas: a complete patch needs to include a update to the array unit test suite. This can then move to the patch review stage. |
Thomas' patch does more than just allow bytearray. It allows any object that can present itself as a buffer with byte-size items. It is a bit unfortunate that such method will end up being called "fromstring" in 3.x. With string being unicode, this is hopelessly ambiguous. Is the string interpreted as array of bytes, UTF-8 style, 16 or 32-bit integers? It would be much better to have frombuffer, and even better extendfrombuffer method with this functionality. I am +1 on Terry's option 2. It may be possible to optimize .extend to detect objects that support buffer interface and bypass iterator protocol. Of course, when receiving array is not type 'b', extend and fromxyz are not the same, so maybe we should just rename fromunicode to fromstring and fromstring to frombytes? |
Whatever is done, I think a bytearray should be handled the same as bytes. It must be that they give the same result. In basic operations, I believe that bytearrays can *always* substitute for bytes. "Bytes and bytearray objects contain single bytes – the former is immutable while the latter is a mutable sequence." For example:
>>> b'abc'.capitalize()
b'Abc'
>>> bytearray(b'abc').capitalize()
bytearray(b'Abc') This, to me, implies that .fromstring should accept bytearray (though probably not general buffer objects). In 2.x, I understand .fromstring to initialize an array from machine bytes read into a string, but not .fromunicode. This is *not* a text method, and the result may vary for 2 and 4-byte unicode builds. So I can see that 3.x needs .fromstring(bytes) but not .fromunicode(string). |
No, fromunicode is still needed. It is used to fill type 'u' array with unicode wide characters. In 3.x unicode type was renamed to str and str to bytes, so fromunicode should become fromstring (or maybe fromstr to avoid confusion) and fromstring should become frombytes. |
On the principle the patch looks good. In practice, it lacks a call to
This should be discussed separately, perhaps on python-dev first. |
About the patch:
I agree that it is a bugfix and so i can be applied to 3.1 (and 3.2). --
I remember an issue proposing that. I also prefer .frombytes() name: we can add .frombytes() and keep .fromstring() as an alias to .frombytes() (and maybe mark it as deprecated). A message: An issue starting at: Can someone open a issue about .frombytes()?
Can someone open a new issue about .fromunicodes()? |
Thanks for the input. I'm going to re-work the patch a bit (releasing buffers and such) and add a test within the next few days. The question remains whether or not to accept other buffers with itemsize == 1. The way I understand it, fromstring already accepted any read-only buffer object, no matter the item size / whether it actually makes sense to call it a "string". I don't think accepting a hypothetical read-only buffer with items wider than 1 in fromstring (yes, bad naming) is desirable behaviour - I see a few options on how to deal with input validation:
|
OK, here's the new patch. I added tests for array(typecode, bytearray(b'abab')), a.extend(b'123') and a.extend(bytearray(b'123')). @victor: int itemsize is the array's item size, buffer.itemsize is the strings' (and must be 1) PROBLEM with this patch: I changed "s#" to "y*". This means that str arguments are no longer accepted by fromstring. I don't think they ever should have been in 3.x, but it is an incompatible change and this got the test suite, which (I assume the code hasn't changed since 2.x) used a str argument. (changed in patch). It might be best to use "s*" instead of "y*", especially if this is applied to 3.1? |
array2.diff:
--
It's maybe time to create .frombytes() and .tobytes() methods:
But I still don't understand why array.fromstring() accepts character strings. So an easier solution is to apply array2.diff to Python 3.2, and replace "y*" by "s*" when the patch is applied to 3.1. |
Two more patches: Firstly, this patch (array_3.2_fromstring.diff) is nearly identical to array2.diff. "y*" would (again) have to be changed to "s*" to apply this to 3.1 |
Secondly, this is my attempt to add the more sensibly named {to|from}bytes methods, and to deprecate {to|from}string. I doubt it's perfect, maybe there's some policy on deprecating methods that I didn't find? This may be better discussed in a separate forum, eg a separate issue or python-dev maybe? I wouldn't know. The unpatched test suite passes with the rest of this patch applied. (unless using -Werror) |
I prefer the second solution (add to/frombytes, deprecate to/fromstring) because I prefer the new method names and it keeps backward compatibility (until we choose to remove the old methods, which should be in Python 3.3). About the patch (tofrombytes.diff). You should use PyErr_WarnEx() result: if it is not nul (eg. if the user choosed to raise exceptions on warnings), you have to exit. |
Hello again, sorry for the absense. Victor, thanks for the input. I've attached a new patch that checks the PyErr_WarnEx return value. |
From a quick glance, the latest patch looks ok. |
The patch shouldn't remove the tests for tostring()/fromstring() (they might be deprecated, but they are still supported). |
That sounds reasonable. I've updated the patch to keep the old test_tofromstring testcase. I'll also attach another patch in a moment that removes what I'm reasonably sure is all the uses of array.tostring and .fromstring in the standard library and the other modules' tests. |
The patches were committed in r84403. Thank you very much! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: