classification
Title: array constructor and array.fromstring should accept bytearray.
Type: behavior Stage: resolved
Components: Extension Modules, Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, haypo, pitrou, terry.reedy, tjollans
Priority: normal Keywords: patch

Created on 2010-06-13 19:42 by tjollans, last changed 2010-09-01 20:30 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
array.diff tjollans, 2010-06-13 19:42 array.fromstring patch
array2.diff tjollans, 2010-06-22 21:11 patch including test
array_3.2_fromstring.diff tjollans, 2010-07-01 14:00 updated version of the patch
tofrombytes.diff tjollans, 2010-08-31 19:35 patch introducing to/frombytes
tostring_usage.diff tjollans, 2010-08-31 19:35 remove to/fromstring usage in stdlib
Messages (18)
msg107744 - (view) Author: Thomas Jollans (tjollans) Date: 2010-06-13 19:42
Currently, the array constructor, if given a bytearray, detects this with PyByteArray_Check, and hands it on to array_fromstring, which does not support bytearray (by using "s#" with PyArg_ParseTuple) and raises TypeError.

>>> array('h', bytearray(b'xyxyxyxyxyxyxyxyxy'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: must be bytes or read-only buffer, not bytearray
>>>  


I see no reason to insist on read-only buffers. I'm attaching a patch that I think fixes this.
msg108159 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-06-19 01:05
Summary: real bug, needs test, patch provided, looks plausible but I am not qualified to even read-review it, let alone apply and test.

(I am removing 3.3 because that is only listed for syntax feature requests that cannot go into 3.2 because of its moratorium on such.)

I agree that this is a bug. For 3.1, the '8.6. array' doc says
"class array.array(typecode[, initializer]) 
A new array whose items are restricted by typecode, and initialized from the optional initializer value, which must be a list, object supporting the buffer interface, or iterable over elements of the appropriate type."

I believe bytearray 'supports the buffer interface', whatever that means in 3.x. 'Readonly' is definitely *not* specified. Even if bytearray did not qualify on that, instances are *definitely* iterable.

The doc goes on: "If given a list or string, the initializer is passed to the new array’s fromlist(), fromstring(), or fromunicode() method (see below) to add initial items to the array. Otherwise, the iterable initializer is passed to the extend() method."

For 3.x, this is pretty bad as it is copied without change from 2.x and uses 'string' in the 2.x sense. It seems to me that fromstring/fromunicode should have been renamed to frombytes/fromstring with a 2to3 fixer. Too late now, I suppose. However, if indeed 'fromstring' is intended to mean 'from bytes only and not bytearrays', then bytearrays should be passed on to .extend, which accepts them as well as anything else.

>>> from array import array
>>> a=array('h')
>>> a.extend(bytearray(b'xjxjx'))
>>> a
array('h', [120, 106, 120, 106, 120])

The two possible fixes:

1. Extend .fromstring to accept bytearrays. Its current doc "Appends items from the string, interpreting the string as an array of machine values " to me supports doing that. This is what Thomas's patch does.

2. If that is rejected, then bytearray initializers should be passed to .extend.

Thomas: a complete patch needs to include a update to the array unit test suite. This can then move to the patch review stage.
msg108161 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-19 02:17
Thomas' patch does more than just allow bytearray.  It allows any object that can present itself as a buffer with byte-size items.  It is a bit unfortunate that such method will end up being called "fromstring" in 3.x.    With string being unicode, this is hopelessly ambiguous. Is the string interpreted as array of bytes, UTF-8 style, 16 or 32-bit integers?  It would be much better to have frombuffer, and even better extendfrombuffer method with this functionality.

I am +1 on Terry's option 2.  It may be possible to optimize .extend to detect objects that support buffer interface and bypass iterator protocol.  Of course, when receiving array is not type 'b', extend and fromxyz are not the same, so maybe we should just rename fromunicode to fromstring and fromstring to frombytes?
msg108163 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-06-19 03:19
Whatever is done, I think a bytearray should be handled the same as bytes. It must be that they give the same result. In basic operations, I believe that bytearrays can *always* substitute for bytes. "Bytes and bytearray objects contain single bytes – the former is immutable while the latter is a mutable sequence." For example:
>>> b'abc'.capitalize()
b'Abc'
>>> bytearray(b'abc').capitalize()
bytearray(b'Abc')

This, to me, implies that .fromstring should accept bytearray (though probably not general buffer objects). In 2.x, I understand .fromstring to initialize an array from machine bytes read into a string, but not .fromunicode. This is *not* a text method, and the result may vary for 2 and 4-byte unicode builds. So I can see that 3.x needs .fromstring(bytes) but not .fromunicode(string).
msg108164 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-19 03:34
> This, to me, implies that .fromstring should accept bytearray (though
> probably not general buffer objects). In 2.x, I understand .fromstring 
> to initialize an array from machine bytes read into a string, but not
> .fromunicode. This is *not* a text method, and the result may vary for 2
> and 4-byte unicode builds. So I can see that 3.x needs
>  .fromstring(bytes) but not .fromunicode(string).


No, fromunicode is still needed.  It is used to fill type 'u' array with unicode wide characters.  In 3.x unicode type was renamed to str and str to bytes, so fromunicode should become fromstring (or maybe fromstr to avoid confusion) and fromstring should become frombytes.
msg108186 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-06-19 15:24
On the principle the patch looks good. In practice, it lacks a call to `PyBuffer_Release(&buffer)` in the various error cases (before returning NULL). It also lacks some tests in Lib/test/test_array.py.

> In 3.x unicode type was renamed to str and str to bytes, so fromunicode 
> should become fromstring (or maybe fromstr to avoid confusion) and
> fromstring should become frombytes.

This should be discussed separately, perhaps on python-dev first.
msg108226 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-06-20 08:53
About the patch:

 - Why not reading itemsize from buffer.itemsize?
 - A test would be nice in Lib/test/test_array.py (in test_tofromstring?). Please add also a.extend(bytearray(b'xjxjx')) to the test suite.


I agree that it is a bugfix and so i can be applied to 3.1 (and 3.2).

--

> fromstring should become frombytes.

I remember an issue proposing that. I also prefer .frombytes() name: we can add .frombytes() and keep .fromstring() as an alias to .frombytes() (and maybe mark it as deprecated).

A message:
http://bugs.python.org/issue3565#msg71205

An issue starting at:
http://bugs.python.org/issue1023290#msg92069

Can someone open a issue about .frombytes()?

> fromunicode is still needed. 

Can someone open a new issue about .fromunicodes()?
msg108312 - (view) Author: Thomas Jollans (tjollans) Date: 2010-06-21 20:09
Thanks for the input. I'm going to re-work the patch a bit (releasing buffers and such) and add a test within the next few days. 

The question remains whether or not to accept other buffers with itemsize == 1. The way I understand it, fromstring already accepted any read-only buffer object, no matter the item size / whether it actually makes sense to call it a "string". I don't think accepting a hypothetical read-only buffer with items wider than 1 in fromstring (yes, bad naming) is desirable behaviour - I see a few options on how to deal with input validation:

1. ignore the item size. This'd be similar to current behaviour, plus r/w buffers

2. only accept byte-based buffers. ("things that look like 'const char*'") - this is what I've been aiming at.

3. only accept bytes and bytearray, and let the user think about how to deal with other objects. Question is - shouldn't array('B') be treated like bytearray in this respect?
msg108419 - (view) Author: Thomas Jollans (tjollans) Date: 2010-06-22 21:11
OK, here's the new patch. I added tests for array(typecode, bytearray(b'abab')), a.extend(b'123') and a.extend(bytearray(b'123')). 

@Victor: int itemsize is the array's item size, buffer.itemsize is the strings' (and must be 1)

PROBLEM with this patch:

I changed "s#" to "y*". This means that str arguments are no longer accepted by fromstring. I don't think they ever should have been in 3.x, but it is an incompatible change and this got the test suite, which (I assume the code hasn't changed since 2.x) used a str argument. (changed in patch). It might be best to use "s*" instead of "y*", especially if this is applied to 3.1?
msg109028 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-07-01 02:30
array2.diff:
 - can you reindent the line "Py_ssize_t old_size = Py_SIZE(self);"?(even if it's not part of your patch)
 - you can avoid the "char *str;" variable (use directly buffer.buf)

--

> I changed "s#" to "y*". This means that str arguments are no longer
> accepted by fromstring (...) it is an incompatible change

It's maybe time to create .frombytes() and .tobytes() methods:

 - .tostring() will be a (deprecated?) alias to .tobytes()
 - .frombytes() only accepts bytes, bytearray and buffer compatible objects: use "y*" format
 - .fromstring() accepts str, bytes, bytearray and buffer compatible objects (encode str to utf-8): use "s*" format

But I still don't understand why array.fromstring() accepts character strings. So an easier solution is to apply array2.diff to Python 3.2, and replace "y*" by "s*" when the patch is applied to 3.1.
msg109049 - (view) Author: Thomas Jollans (tjollans) Date: 2010-07-01 14:00
Two more patches:

Firstly, this patch (array_3.2_fromstring.diff) is nearly identical to array2.diff. "y*" would (again) have to be changed to "s*" to apply this to 3.1
msg109051 - (view) Author: Thomas Jollans (tjollans) Date: 2010-07-01 14:09
Secondly, this is my attempt to add the more sensibly named {to|from}bytes methods, and to deprecate {to|from}string. 

I doubt it's perfect, maybe there's some policy on deprecating methods that I didn't find? This may be better discussed in a separate forum, eg a separate issue or python-dev maybe? I wouldn't know.

The unpatched test suite passes with the rest of this patch applied. (unless using -Werror)
msg111872 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-07-28 23:25
I prefer the second solution (add to/frombytes, deprecate to/fromstring) because I prefer the new method names and it keeps backward compatibility (until we choose to remove the old methods, which should be in Python 3.3).

About the patch (tofrombytes.diff). You should use PyErr_WarnEx() result: if it is not nul (eg. if the user choosed to raise exceptions on warnings), you have to exit.
msg114675 - (view) Author: Thomas Jollans (tjollans) Date: 2010-08-22 12:35
Hello again, sorry for the absense.

Victor, thanks for the input. I've attached a new patch that checks the PyErr_WarnEx return value.
msg114684 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-22 17:59
From a quick glance, the latest patch looks ok.
msg115271 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-31 18:10
The patch shouldn't remove the tests for tostring()/fromstring() (they might be deprecated, but they are still supported).
msg115274 - (view) Author: Thomas Jollans (tjollans) Date: 2010-08-31 19:35
That sounds reasonable. I've updated the patch to keep the old test_tofromstring testcase. 

I'll also attach another patch in a moment that removes what I'm reasonably sure is all the uses of array.tostring and .fromstring in the standard library and the other modules' tests.
msg115331 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-01 20:30
The patches were committed in r84403. Thank you very much!
History
Date User Action Args
2011-07-12 19:51:43pitroulinkissue3565 superseder
2010-09-01 20:30:46pitrousetstatus: open -> closed
versions: - Python 3.1
messages: + msg115331

resolution: fixed
stage: test needed -> resolved
2010-08-31 19:35:55tjollanssetfiles: - tofrombytes.diff
2010-08-31 19:35:48tjollanssetfiles: + tostring_usage.diff
2010-08-31 19:35:15tjollanssetfiles: + tofrombytes.diff

messages: + msg115274
2010-08-31 18:10:26pitrousetmessages: + msg115271
2010-08-22 17:59:30pitrousetmessages: + msg114684
2010-08-22 17:59:15pitrousetfiles: - tofrombytes.diff
2010-08-22 12:35:42tjollanssetfiles: + tofrombytes.diff

messages: + msg114675
2010-08-22 12:11:24tjollanssetfiles: - tofrombytes.diff
2010-08-22 12:03:47tjollanssetfiles: + tofrombytes.diff
2010-07-28 23:25:32hayposetmessages: + msg111872
2010-07-01 14:23:18tjollanssetfiles: - tofrombytes.diff
2010-07-01 14:23:13tjollanssetfiles: + tofrombytes.diff
2010-07-01 14:09:17tjollanssetfiles: + tofrombytes.diff

messages: + msg109051
2010-07-01 14:00:52tjollanssetfiles: + array_3.2_fromstring.diff

messages: + msg109049
2010-07-01 02:30:17hayposetmessages: + msg109028
2010-06-22 21:11:32tjollanssetfiles: + array2.diff

messages: + msg108419
2010-06-21 20:09:50tjollanssetmessages: + msg108312
2010-06-20 08:53:27hayposetmessages: + msg108226
2010-06-19 15:24:09pitrousetnosy: + haypo, pitrou
messages: + msg108186
2010-06-19 03:34:13belopolskysetmessages: + msg108164
2010-06-19 03:19:03terry.reedysetmessages: + msg108163
2010-06-19 02:17:25belopolskysetnosy: + belopolsky
messages: + msg108161
2010-06-19 01:05:26terry.reedysetversions: - Python 3.3
nosy: + terry.reedy

messages: + msg108159

stage: test needed
2010-06-13 19:42:52tjollanscreate