classification
Title: Regression in bytes constructor
Type: behavior Stage: resolved
Components: Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, inada.naoki, ncoghlan, python-dev, serhiy.storchaka
Priority: normal Keywords: 3.6regression, patch

Created on 2017-01-04 19:49 by belopolsky, last changed 2018-10-14 16:56 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
29159-index-fallback.patch inada.naoki, 2017-01-05 00:39 review
Messages (13)
msg284663 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2017-01-04 19:49
Consider the following code:

$ cat bug.py
from array import array
class C(array):
    def __new__(cls):
        return array.__new__(cls, 'B', b'abc')
    def __index__(self):
        raise TypeError
x = C()
print(bytes(x))

It works under python 3.5:

$ python3.5 bug.py
b'abc'

but raises a TypeError under python 3.6:

$ python3.6 bug.py
Traceback (most recent call last):
  File "bug.py", line 8, in <module>
    print(bytes(x))
  File "bug.py", line 6, in __index__
    raise TypeError
TypeError

It looks like this was introduced in issue #27704.

(Ref: e/pyq#827)
msg284664 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2017-01-04 20:12
My test code may seem contrived, but numpy arrays exhibit similar behavior:

>>> a = numpy.array([2, 2])
>>> bytes(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: only integer arrays with one element can be converted to an index

Under python 3.5, the result was

>>> bytes(a)
b'\x02\x00\x00\x00\x02\x00\x00\x00'
msg284665 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-04 20:24
Indeed, there is behavior change. It is easy to revert the old behavior. But 
was the old behavior designed? It looks as a side effect of the implementation.

What is a reason of making an array an integer type?
msg284666 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-04 20:27
Ah, if numpy arrays are affected, this should be restored. But swallowing an arbitrary exception doesn't look good to me.
msg284671 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2017-01-04 21:38
I don't think we should put too much effort into preserving numpy behavior.  Consider this python 3.5 session:

>>> import numpy
>>> a = numpy.array([1])
>>> bytes(a)
__main__:1: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
b'\x00'
>>> a = numpy.array([2, 2])
>>> bytes(a)
b'\x02\x00\x00\x00\x02\x00\x00\x00'

It looks like this behavior is just an artifact of ndarray providing both __index__ and buffer protocol and not something thought out by numpy developers.

I wonder if we could check for buffer protocol support before detecting an integer argument?  I also recall a discussion of deprecating bytes(int) altogether.  See <https://mail.python.org/pipermail/python-ideas/2014-March/027295.html>.
msg284672 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2017-01-04 21:51
Also relevant:

  * #20895 - Add bytes.empty_buffer and deprecate bytes(17) for the same purpose
  * PEP 467 - Minor API improvements for binary sequences
msg284689 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2017-01-05 02:00
Inada,

Can you explain what your patch does?  I don't understand why you single out the OverflowError.  When is __index__ expected to raise it?

>>> (2**300).__index__()
2037035976334486086268445688409378161051468393665936250636140449354381299763336706183397376
msg284699 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2017-01-05 03:35
@belopolsky
It is backported behavior from Python 3.5, to fix regression.

See code review discussion in #27704.
The first patch kept the behavior, but we simplified it in second patch.
That cause this regression.
msg284767 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2017-01-05 19:17
@inada.naoki

Sorry, I still don't understand the role of OverflowError.

With respect to my earlier suggestion that buffer protocol should have priority over __index__, note that the documentation implies the same order:

>>> help(bytes)

class bytes(object)
 |  bytes(iterable_of_ints) -> bytes
 |  bytes(string, encoding[, errors]) -> bytes
 |  bytes(bytes_or_buffer) -> immutable copy of bytes_or_buffer
 |  bytes(int) -> bytes object of size given by the parameter initialized with null bytes
 |  bytes() -> empty bytes object
 |
 |  Construct an immutable array of bytes from:
 |    - an iterable yielding integers in range(256)
 |    - a text string encoded using the specified encoding
 |    - any object implementing the buffer API.
 |    - an integer
..
msg284768 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2017-01-05 19:36
On the other hand, the documentation for the bytearray constructor [1] lists integer argument first. 

[1]: https://docs.python.org/3/library/functions.html#bytearray
msg284774 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-05 22:04
The code restored by 29159-index-fallback.patch is slightly different from the code before issue27704, but the patch LGTM except that arguments of assertEqual are in reversed order.

> Can you explain what your patch does?  I don't understand why you single out the OverflowError.  When is __index__ expected to raise it?

PyNumber_AsSsize_t(arg, PyExc_OverflowError) raises an OverflowError when the result of __index__ don't fit in Py_ssize_t.

> I wonder if we could check for buffer protocol support before detecting an integer argument?

This would fix an issue with NumPy arrays, but it is much harder to implement.
msg284801 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2017-01-06 08:45
New changeset 505cc50ddc82 by INADA Naoki in branch '3.6':
Issue #29159: Fix regression in bytes(x) when x.__index__() raises Exception.
https://hg.python.org/cpython/rev/505cc50ddc82
msg284802 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2017-01-06 08:46
> patch LGTM except that arguments of assertEqual are in reversed order.

fixed.
History
Date User Action Args
2018-10-14 16:56:09serhiy.storchakasetpull_requests: - pull_request1003
2017-03-31 16:36:26dstufftsetpull_requests: + pull_request1003
2017-01-06 08:46:25inada.naokisetstatus: open -> closed
resolution: fixed
messages: + msg284802

stage: resolved
2017-01-06 08:45:05python-devsetnosy: + python-dev
messages: + msg284801
2017-01-05 22:04:54serhiy.storchakasetmessages: + msg284774
2017-01-05 19:36:40belopolskysetmessages: + msg284768
2017-01-05 19:17:35belopolskysetmessages: + msg284767
2017-01-05 03:35:58inada.naokisetmessages: + msg284699
2017-01-05 02:00:24belopolskysetmessages: + msg284689
2017-01-05 00:39:25inada.naokisetfiles: + 29159-index-fallback.patch
keywords: + patch
2017-01-04 21:51:37belopolskysetmessages: + msg284672
2017-01-04 21:38:36belopolskysetnosy: + ncoghlan
2017-01-04 21:38:21belopolskysetmessages: + msg284671
2017-01-04 20:28:05serhiy.storchakasetnosy: + inada.naoki
2017-01-04 20:27:53serhiy.storchakasetmessages: + msg284666
2017-01-04 20:24:47serhiy.storchakasetmessages: + msg284665
2017-01-04 20:12:51belopolskysetmessages: + msg284664
2017-01-04 19:49:54belopolskycreate