classification
Title: Support bytes-like objects when base is given to int()
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: martin.panter, r.david.murray, rhettinger, serhiy.storchaka, xiang.zhang
Priority: normal Keywords: patch

Created on 2016-07-19 08:56 by xiang.zhang, last changed 2018-09-17 07:17 by serhiy.storchaka.

Files
File name Uploaded Description Edit
bytes_like_support_to_int.patch xiang.zhang, 2016-07-19 09:01 review
bytes_like_support_to_int_v2.patch xiang.zhang, 2016-07-22 06:55 review
deprecate_byte_like_support_in_int.patch serhiy.storchaka, 2016-09-27 10:24 review
Pull Requests
URL Status Linked Edit
PR 779 open serhiy.storchaka, 2017-03-23 09:09
Messages (11)
msg270818 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-07-19 08:56
Right now, int() supports bytes-like objects when *base* is not given:

>>> int(memoryview(b'100'))
100

When *base* is given bytes-like objects are not supported:

>>> int(memoryview(b'100'), base=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: int() can't convert non-string with explicit base

Is there any obvious reason not to support it when *base* is given? I suggest add it.
msg270972 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-07-22 06:55
Thanks for the reviews Martin. Change the doc and test.
msg270973 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-07-22 07:17
I am torn on this. On one hand, it would be good to be consistent with the single-argument behaviour. But on the other hand, APIs normally accept arbitrary bytes-like objects (like memoryview) to minimise unnecessary copying, whereas this case has to make a copy to append a null terminator.

Perhaps another option is to deprecate int(byteslike) support instead, in favour of explicitly making a copy using bytes(byteslike). Similarly for float, compile, eval, exec, which also do copying thanks to Issue 24802. But PyNumber_Long() has called PyObject_AsCharBuffer() (predecessor of Python 3’s bytes-like objects) since 1.5.2 (revision 74b7213fb609). So this option would probably need wider discussion.
msg270975 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-07-22 07:27
It's reasonable. My original intention is to make the behaviour consistent. If the single-argument behaviour is OK with bytes-like objects, why not others? So I think we'd better wait for other developers to see what their opinions are.
msg270989 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-22 14:01
Since bytes are accepted in both cases, the inconsistency does seem odd.  Looking at the history, I think the else statement that checks the types that can be handled was introduced during the initial py3k conversion, and I'm guessing that else was just forgotten in subsequent updates that added additional bytes-like types.  The non-base branch calls PyNumber_Long, where I presume it picked up the additional type support.

If a copy has to be done anyway, perhaps we can future proof the code by doing a bytes conversion internally in long_new?

Disallowing something that currently works without a good reason isn't good for backward compatibility, so I'd vote for making this work consistently one way or another.
msg270994 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-07-22 15:32
It looks to me that the support of bytes-like objects besides bytes and bytearray was added accidentally, as a side effect of supporting Unicode. Note, that this support had a bug until issue24802, thus correct support of other bytes-like objects exists less than a year. The option of deprecating other bytes-like objects support looks reasonable to me. Especially in the light of deprecating bytearray paths support (issue26800).

On other side, the need of copying a buffer can be considered as implementation detail, since low-level int parsing functions require NUL-terminated C strings. We can add alternative low-level functions that work with not-NUL-terminated strings. This needs more work.
msg270999 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-22 16:31
So less than a year means only some versions of 3.5?  So we could drop it in 3.6 and hope we don't break anybody's code?  I'm not sure I like that...I think the real problem is the complexity of handling multiple bytes types, and that ought to have a more general solution.  I'm not volunteering to work on it, though, so I'm not voting against dropping it.
msg271009 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-07-22 18:32
No, the fix was applied to all maintained versions (2.7 and 3.4+). This means that we need some deprecation period before dropping this feature (if decide to drop it).

What about other Python implementations? Are they support byte-likes objects besides bytes and bytearray? Do they correctly handle embedded NUL and not-NUL-terminated buffers?
msg271101 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-07-23 19:00
pypy seems so.

[PyPy 5.2.0-alpha0 with GCC 4.8.2] on linux
>>>> int(memoryview(b'123A'[1:3]))
23
>>>> int(memoryview(b'123 '[1:3]))
23
msg277509 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-27 10:08
Here is a patch that deprecates support of bytes-like objects except bytes and bytearray in int(), float(), compile(), eval(), exec(). I'm not arguing for it, this is just for the ground of the discussion.
msg290032 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-23 09:10
Created PR 779 for the deprecation.
History
Date User Action Args
2018-09-17 07:17:27serhiy.storchakasetversions: + Python 3.8, - Python 3.7
2017-03-23 09:10:55serhiy.storchakasetstage: patch review
messages: + msg290032
versions: + Python 3.7, - Python 3.6
2017-03-23 09:09:47serhiy.storchakasetpull_requests: + pull_request683
2016-09-27 10:24:08serhiy.storchakasetfiles: + deprecate_byte_like_support_in_int.patch
2016-09-27 10:23:37serhiy.storchakasetfiles: - deprecate_byte_like_support_in_int.patch
2016-09-27 10:08:17serhiy.storchakasetfiles: + deprecate_byte_like_support_in_int.patch

messages: + msg277509
2016-07-23 19:00:17xiang.zhangsetmessages: + msg271101
2016-07-22 18:32:34serhiy.storchakasetmessages: + msg271009
2016-07-22 16:31:50r.david.murraysetmessages: + msg270999
2016-07-22 15:32:02serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg270994
2016-07-22 14:01:55r.david.murraysetnosy: + r.david.murray
messages: + msg270989
2016-07-22 07:27:06xiang.zhangsetnosy: + rhettinger
messages: + msg270975
2016-07-22 07:17:19martin.pantersetmessages: + msg270973
2016-07-22 06:55:09xiang.zhangsetfiles: + bytes_like_support_to_int_v2.patch

messages: + msg270972
2016-07-19 09:01:58xiang.zhangsetfiles: + bytes_like_support_to_int.patch
2016-07-19 09:01:44xiang.zhangsetfiles: - bytes_like_support_to_int.patch
2016-07-19 08:56:23xiang.zhangsettype: enhancement
2016-07-19 08:56:16xiang.zhangcreate