classification
Title: Strange behavior of bytearray slice assignment
Type: behavior Stage: patch review
Components: Interpreter Core Versions: Python 3.2, Python 3.1, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: georg.brandl Nosy List: abacabadabacaba, ezio.melotti, georg.brandl, loewis, pitrou
Priority: normal Keywords: patch

Created on 2010-04-14 15:41 by abacabadabacaba, last changed 2010-09-04 08:02 by georg.brandl.

Files
File name Uploaded Description Edit
issue8401.diff ezio.melotti, 2010-04-16 06:45 Proof of concept against trunk. review
Messages (12)
msg103133 - (view) Author: Evgeny Kapun (abacabadabacaba) Date: 2010-04-14 15:41
>>> a = bytearray()
>>> a[:] = 0 # Is it a feature?
>>> a
bytearray(b'')
>>> a[:] = 10 # If so, why not document it?
>>> a
bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
>>> a[:] = -1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: negative count
>>> a[:] = -1000000000000000000000 # This should raise ValueError, not TypeError.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
>>> a[:] = 1000000000000000000
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
>>> a[:] = 1000000000000000000000 # This should raise OverflowError, not TypeError.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
>>> a[:] = [] # Are some empty sequences better than others?
>>> a[:] = ()
>>> a[:] = list("")
>>> a[:] = ""
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding
msg103135 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-14 15:47
It looks rather like a bug to me, and should be forbidden.
msg103138 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-04-14 18:39
pitrou: I agree, it should be a TypeError.
msg103292 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-04-16 05:56
This happens because bytearray_ass_subscript() (Objects/bytearrayobject.c:588) calls PyByteArray_FromObject() (:641) that in turn calls bytearray_init() (:746), so the results are similar to the ones returned by calling bytearray(...) directly.
msg103296 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-04-16 06:45
Here is a proof of concept that fixes the problem.

The doc of bytearray() says about its first arg:
 * If it is a string, you must also give the encoding [...].
 * If it is an integer, the array will have that size and will be initialized with null bytes.
 * If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.
 * If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.

All these things except the string[1] and the integer seem OK to me while assigning to a slice, so in the patch I've special-cased ints to raise a TypeError (it fails already for unicode strings).

[1]: note that string here means unicode string (the doc should probably be more specific about it.). Byte strings work fine, but for unicode strings there's no way to specify the encoding while doing ba[:] = u'ustring'.
msg103298 - (view) Author: Evgeny Kapun (abacabadabacaba) Date: 2010-04-16 07:27
Empty string is an iterable of integers in the range 0 <= x < 256, so it should be allowed.

>>> all(isinstance(x, int) and 0 <= x < 256 for x in "")
True
>>> bytearray()[:] = ""
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding
msg103299 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-04-16 07:35
Not really, chars are not ints and anyway the empty string fall in the first case.
msg103302 - (view) Author: Evgeny Kapun (abacabadabacaba) Date: 2010-04-16 08:13
> Not really, chars are not ints
Yes, however, empty string contains exactly zero chars.
> and anyway the empty string fall in the first case.
Strings aren't mentioned in documentation of bytearray slice assignment. However, I think that bytearray constructor should accept empty string too, without an encoding, for consistency.
msg103306 - (view) Author: Evgeny Kapun (abacabadabacaba) Date: 2010-04-16 08:51
__doc__ of bytearray says:
> bytearray(iterable_of_ints) -> bytearray
> bytearray(string, encoding[, errors]) -> bytearray
> bytearray(bytes_or_bytearray) -> mutable copy of bytes_or_bytearray
> bytearray(memory_view) -> bytearray
So, unless an encoding is given, empty string should be interpreted as an iterable of ints. BTW, documentation and docstring should be made consistent with each other.
msg103345 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-04-16 17:54
-1 on assigning strings to slices of bytearrays. As Ezio mentions, this operation conceptually requires an encoding, and no encoding is readily available in the slice assignment.

-1 on special-casing empty strings.
msg103402 - (view) Author: Evgeny Kapun (abacabadabacaba) Date: 2010-04-17 14:15
-1 on special-casing string without an encoding. Current code does (almost) this:
...
if argument_is_a_string:
	if not encoding_is_given: # Special case
		raise TypeError("string argument without an encoding")
	encode_argument()
	return
if encoding_is_given:
	raise TypeError("encoding or errors without a string argument")
...
IMO, it should do this instead:
...
if encoding_is_given:
	if not argument_is_a_string:
		raise TypeError("encoding or errors without a string argument")
	encode_argument()
	return
...
This way, bytearray("") would work without any special cases.
msg103474 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-04-18 09:27
Python is not (e.g.) Haskell; Python strings are not lists whose contents happen to be characters.  Allowing an empty string here is a step backwards in the direction of "why not allow any string whose contents have an unambiguous meaning as bytes", i.e. the default encoding ASCII in Python 2.x.  Passing a string where bytes are expected is a programming error, and it should be rewarded with an exception, no matter if the string happens to be empty or not.
History
Date User Action Args
2010-09-04 08:02:38georg.brandlsetassignee: georg.brandl
2010-09-04 00:13:27pitrousetstage: test needed -> patch review
versions: - Python 2.6
2010-04-18 09:27:01georg.brandlsetnosy: + georg.brandl
messages: + msg103474
2010-04-17 14:15:42abacabadabacabasetmessages: + msg103402
2010-04-16 17:54:25loewissetmessages: + msg103345
2010-04-16 08:51:54abacabadabacabasetmessages: + msg103306
2010-04-16 08:13:45abacabadabacabasetmessages: + msg103302
2010-04-16 07:35:36ezio.melottisetmessages: + msg103299
2010-04-16 07:27:12abacabadabacabasetmessages: + msg103298
2010-04-16 06:45:05ezio.melottisetfiles: + issue8401.diff
keywords: + patch
messages: + msg103296

stage: needs patch -> test needed
2010-04-16 05:56:55ezio.melottisetnosy: + ezio.melotti
messages: + msg103292
2010-04-14 18:39:04loewissetnosy: + loewis
messages: + msg103138
2010-04-14 15:47:04pitrousetpriority: normal
versions: + Python 2.6, Python 2.7, Python 3.2
nosy: + pitrou

messages: + msg103135

stage: needs patch
2010-04-14 15:41:48abacabadabacabacreate