classification
Title: Get rid of C limitation for shift count in right shift
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Oren Milman, haypo, mark.dickinson, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-03-15 08:55 by serhiy.storchaka, last changed 2017-04-22 18:50 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
patchDraft1.diff Oren Milman, 2017-03-15 21:00 a patch draft for reference only. also handles big positive ints review
long-shift-overflow-long-long.diff serhiy.storchaka, 2017-03-20 19:02
long-shift-overflow-divrem1.diff serhiy.storchaka, 2017-03-20 19:02
Pull Requests
URL Status Linked Edit
PR 680 merged serhiy.storchaka, 2017-03-15 20:25
PR 1258 merged serhiy.storchaka, 2017-04-22 17:50
Messages (19)
msg289650 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-15 08:55
Currently the value of right operand of the right shift operator is limited by C Py_ssize_t type.

>>> 1 >> 10**100
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C ssize_t
>>> (-1) >> 10**100
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C ssize_t
>>> 1 >> -10**100
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C ssize_t
>>> (-1) >> -10**100
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C ssize_t

But this is artificial limitation. Right shift can be extended to support arbitrary integers. `x >> very_large_value` should be 0 for non-negative x and -1 for negative x. `x >> negative_value` should raise ValueError.

>>> 1 >> 10
0
>>> (-1) >> 10
-1
>>> 1 >> -10
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: negative shift count
>>> (-1) >> -10
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: negative shift count
msg289651 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-03-15 08:57
If we change something, I suggest to be consistent with lshift. I expect a memory error on "1 << (1 << 1024)" (no unlimited loop before a global system collapse please ;-))
msg289652 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-03-15 09:00
FYI I saw recently that the C limitation of len() was reported in the "owasp-pysec" project:
https://github.com/ebranca/owasp-pysec/wiki/Overflow-in-len-function

I don't understand what such "deliberate" limitation was reported in a hardened CPython project?
msg289654 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-15 09:19
> If we change something, I suggest to be consistent with lshift. I expect a memory error on "1 << (1 << 1024)" (no unlimited loop before a global system collapse please ;-))

I agree that left shift should raise an ValueError rather than OverflowError for large negative shifts. But is hard to handle large positive shifts. `1 << count` consumes `count*2/15` bytes of memory. There is a gap between the maximal value of bits represented as Py_ssize_t (PY_SSIZE_T_MAX) and the number of bits of maximal Python int (PY_SSIZE_T_MAX*15/2). _PyLong_NumBits() starves from the same issue. I think an OverflowError is appropriate here for denoting the platform and implementation limitation.
msg289658 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-15 09:52
This may be a part of this issue or a separate issue: bytes(-1) raises a ValueError, but bytes(-10**100) raises an OverflowError.
msg289660 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-03-15 10:03
> I think an OverflowError is appropriate here for denoting the platform and implementation limitation.

It's common that integer overflow on memory allocation in C code raises a MemoryError, not an OverflowError.

>>> "x" * (2**60)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

I suggest to raise a MemoryError.
msg289662 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-15 10:17
This is not MemoryError. On 32-bit platform `1 << (sys.maxsize + 1)` raises an OverflowError, but `1 << sys.maxsize << 1` can be calculated.
msg289692 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-15 20:30
Unfortunately it is hard to totally avoid OverflowError in right shift. Righ shift of huge positive value can get non-zero result even if shift count is larger than PY_SSIZE_T_MAX. PR 680 just decreases the opportunity of getting a OverflowError.
msg289697 - (view) Author: Oren Milman (Oren Milman) * Date: 2017-03-15 21:00
i played a little with a patch earlier today, but stopped because I
am short on time.

anyway, just in case my code is not totally rubbish, I attach my
patch draft, which should avoid OverflowError also for big positive
ints.

(of course, I don't suggest to use my code instead of PR 680. I just
put it here in case it might be useful for someone.)

(on my Windows 10, it passed some manual tests by me, and the test
module (except for test_venv, which fails also without the patch))
msg289751 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-17 09:36
Thank you Oren, but your code doesn't work when PY_SSIZE_T_MAX < b < PY_SSIZE_T_MAX * PyLong_SHIFT and a > 2 ** b. When you drop wordshift and left only loshift_d you should drop lower wordshift digits in a.

The code for left shift would be even more complex.
msg289767 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-17 16:06
Updated PR. Now OverflowError is never raised if the result is representable.

Mark, could you please make a review?
msg289878 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2017-03-20 08:53
> Mark, could you please make a review?

I'll try to find time this week. At least in principle, the change sounds good to me.
msg289898 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-20 19:01
Here are two patches. The first uses C long long arithmetic (it corresponds current PR 680), the second uses PyLong arithmetic. What is easier to read and verify?
msg289984 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2017-03-22 14:04
I much prefer the `divrem1`-based version: it makes fewer assumptions about relative sizes of long / long long / size_t and about the number of bits per digit. I'd rather not have another place that would have to be carefully examined in the future if the number of bits per digit changed again. Overall, Objects/longobject.c is highly portable, and I'd like to keep it that way as much as possible.
msg290011 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-22 19:35
Updated the PR to divrem1-based version. The drawback is that divrem1 can fail with MemoryError while C long long arithmetic always works for integers of the size less than 1 exbibyte.
msg290012 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-22 19:52
The special case would be not needed if limit Python ints on 32-bit platforms to approximately 2**2**28. int.bit_length() could be simpler too.
msg290824 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-30 06:47
New changeset 918403cfc3304d27e80fb792357f40bb3ba69c4e by Serhiy Storchaka in branch 'master':
bpo-29816: Shift operation now has less opportunity to raise OverflowError. (#680)
https://github.com/python/cpython/commit/918403cfc3304d27e80fb792357f40bb3ba69c4e
msg290826 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-30 07:00
Thank you for your review Mark.
msg292133 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-22 18:50
New changeset 997a4adea606069e01beac6269920709db3994d1 by Serhiy Storchaka in branch 'master':
Remove outdated note about constraining of the bit shift right operand. (#1258)
https://github.com/python/cpython/commit/997a4adea606069e01beac6269920709db3994d1
History
Date User Action Args
2017-04-22 18:50:11serhiy.storchakasetmessages: + msg292133
2017-04-22 17:50:19serhiy.storchakasetpull_requests: + pull_request1371
2017-03-30 07:00:24serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg290826

stage: patch review -> resolved
2017-03-30 06:47:09serhiy.storchakasetmessages: + msg290824
2017-03-22 19:52:55serhiy.storchakasetmessages: + msg290012
2017-03-22 19:35:29serhiy.storchakasetmessages: + msg290011
2017-03-22 14:04:56mark.dickinsonsetmessages: + msg289984
2017-03-20 19:02:33serhiy.storchakasetfiles: + long-shift-overflow-divrem1.diff
2017-03-20 19:02:18serhiy.storchakasetfiles: + long-shift-overflow-long-long.diff
2017-03-20 19:01:21serhiy.storchakasetmessages: + msg289898
2017-03-20 08:53:27mark.dickinsonsetmessages: + msg289878
2017-03-17 16:06:38serhiy.storchakasetmessages: + msg289767
2017-03-17 09:36:10serhiy.storchakasetmessages: + msg289751
2017-03-17 08:36:16serhiy.storchakalinkissue29833 dependencies
2017-03-15 21:00:49Oren Milmansetfiles: + patchDraft1.diff
keywords: + patch
messages: + msg289697
2017-03-15 20:30:52serhiy.storchakasetmessages: + msg289692
stage: needs patch -> patch review
2017-03-15 20:25:42serhiy.storchakasetpull_requests: + pull_request557
2017-03-15 10:17:07serhiy.storchakasetmessages: + msg289662
2017-03-15 10:03:27hayposetmessages: + msg289660
2017-03-15 09:52:04serhiy.storchakasetmessages: + msg289658
2017-03-15 09:19:13serhiy.storchakasetmessages: + msg289654
2017-03-15 09:06:14serhiy.storchakalinkissue15988 dependencies
2017-03-15 09:00:29hayposetmessages: + msg289652
2017-03-15 08:57:40hayposetnosy: + haypo
messages: + msg289651
2017-03-15 08:55:17serhiy.storchakacreate