Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of C limitation for shift count in right shift #74002

Closed
Tracked by #74019
serhiy-storchaka opened this issue Mar 15, 2017 · 19 comments
Closed
Tracked by #74019

Get rid of C limitation for shift count in right shift #74002

serhiy-storchaka opened this issue Mar 15, 2017 · 19 comments
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 29816
Nosy @mdickinson, @vstinner, @serhiy-storchaka, @orenmn
PRs
  • bpo-29816: Shift operation now has less opportunity to raise OverflowError. #680
  • Remove outdated note about the constraning of the bit shift right operand. #1258
  • Files
  • patchDraft1.diff: a patch draft for reference only. also handles big positive ints
  • long-shift-overflow-long-long.diff
  • long-shift-overflow-divrem1.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-03-30.07:00:24.059>
    created_at = <Date 2017-03-15.08:55:17.278>
    labels = ['interpreter-core', 'type-feature', '3.7']
    title = 'Get rid of C limitation for shift count in right shift'
    updated_at = <Date 2017-04-22.18:50:11.618>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2017-04-22.18:50:11.618>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-03-30.07:00:24.059>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2017-03-15.08:55:17.278>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['46727', '46747', '46748']
    hgrepos = []
    issue_num = 29816
    keywords = ['patch']
    message_count = 19.0
    messages = ['289650', '289651', '289652', '289654', '289658', '289660', '289662', '289692', '289697', '289751', '289767', '289878', '289898', '289984', '290011', '290012', '290824', '290826', '292133']
    nosy_count = 4.0
    nosy_names = ['mark.dickinson', 'vstinner', 'serhiy.storchaka', 'Oren Milman']
    pr_nums = ['680', '1258']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue29816'
    versions = ['Python 3.7']

    @serhiy-storchaka
    Copy link
    Member Author

    Currently the value of right operand of the right shift operator is limited by C Py_ssize_t type.

    >>> 1 >> 10**100
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: Python int too large to convert to C ssize_t
    >>> (-1) >> 10**100
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: Python int too large to convert to C ssize_t
    >>> 1 >> -10**100
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: Python int too large to convert to C ssize_t
    >>> (-1) >> -10**100
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: Python int too large to convert to C ssize_t

    But this is artificial limitation. Right shift can be extended to support arbitrary integers. x >> very_large_value should be 0 for non-negative x and -1 for negative x. x >> negative_value should raise ValueError.

    >>> 1 >> 10
    0
    >>> (-1) >> 10
    -1
    >>> 1 >> -10
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: negative shift count
    >>> (-1) >> -10
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: negative shift count

    @serhiy-storchaka serhiy-storchaka added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Mar 15, 2017
    @vstinner
    Copy link
    Member

    If we change something, I suggest to be consistent with lshift. I expect a memory error on "1 << (1 << 1024)" (no unlimited loop before a global system collapse please ;-))

    @vstinner
    Copy link
    Member

    FYI I saw recently that the C limitation of len() was reported in the "owasp-pysec" project:
    https://github.com/ebranca/owasp-pysec/wiki/Overflow-in-len-function

    I don't understand what such "deliberate" limitation was reported in a hardened CPython project?

    @serhiy-storchaka
    Copy link
    Member Author

    If we change something, I suggest to be consistent with lshift. I expect a memory error on "1 << (1 << 1024)" (no unlimited loop before a global system collapse please ;-))

    I agree that left shift should raise an ValueError rather than OverflowError for large negative shifts. But is hard to handle large positive shifts. 1 << count consumes count*2/15 bytes of memory. There is a gap between the maximal value of bits represented as Py_ssize_t (PY_SSIZE_T_MAX) and the number of bits of maximal Python int (PY_SSIZE_T_MAX*15/2). _PyLong_NumBits() starves from the same issue. I think an OverflowError is appropriate here for denoting the platform and implementation limitation.

    @serhiy-storchaka
    Copy link
    Member Author

    This may be a part of this issue or a separate issue: bytes(-1) raises a ValueError, but bytes(-10**100) raises an OverflowError.

    @vstinner
    Copy link
    Member

    I think an OverflowError is appropriate here for denoting the platform and implementation limitation.

    It's common that integer overflow on memory allocation in C code raises a MemoryError, not an OverflowError.

    >>> "x" * (2**60)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    MemoryError

    I suggest to raise a MemoryError.

    @serhiy-storchaka
    Copy link
    Member Author

    This is not MemoryError. On 32-bit platform 1 << (sys.maxsize + 1) raises an OverflowError, but 1 << sys.maxsize << 1 can be calculated.

    @serhiy-storchaka
    Copy link
    Member Author

    Unfortunately it is hard to totally avoid OverflowError in right shift. Righ shift of huge positive value can get non-zero result even if shift count is larger than PY_SSIZE_T_MAX. PR 680 just decreases the opportunity of getting a OverflowError.

    @orenmn
    Copy link
    Mannequin

    orenmn mannequin commented Mar 15, 2017

    i played a little with a patch earlier today, but stopped because I
    am short on time.

    anyway, just in case my code is not totally rubbish, I attach my
    patch draft, which should avoid OverflowError also for big positive
    ints.

    (of course, I don't suggest to use my code instead of PR 680. I just
    put it here in case it might be useful for someone.)

    (on my Windows 10, it passed some manual tests by me, and the test
    module (except for test_venv, which fails also without the patch))

    @serhiy-storchaka
    Copy link
    Member Author

    Thank you Oren, but your code doesn't work when PY_SSIZE_T_MAX < b < PY_SSIZE_T_MAX * PyLong_SHIFT and a > 2 ** b. When you drop wordshift and left only loshift_d you should drop lower wordshift digits in a.

    The code for left shift would be even more complex.

    @serhiy-storchaka
    Copy link
    Member Author

    Updated PR. Now OverflowError is never raised if the result is representable.

    Mark, could you please make a review?

    @mdickinson
    Copy link
    Member

    Mark, could you please make a review?

    I'll try to find time this week. At least in principle, the change sounds good to me.

    @serhiy-storchaka
    Copy link
    Member Author

    Here are two patches. The first uses C long long arithmetic (it corresponds current PR 680), the second uses PyLong arithmetic. What is easier to read and verify?

    @mdickinson
    Copy link
    Member

    I much prefer the divrem1-based version: it makes fewer assumptions about relative sizes of long / long long / size_t and about the number of bits per digit. I'd rather not have another place that would have to be carefully examined in the future if the number of bits per digit changed again. Overall, Objects/longobject.c is highly portable, and I'd like to keep it that way as much as possible.

    @serhiy-storchaka
    Copy link
    Member Author

    Updated the PR to divrem1-based version. The drawback is that divrem1 can fail with MemoryError while C long long arithmetic always works for integers of the size less than 1 exbibyte.

    @serhiy-storchaka
    Copy link
    Member Author

    The special case would be not needed if limit Python ints on 32-bit platforms to approximately 2**2**28. int.bit_length() could be simpler too.

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 918403c by Serhiy Storchaka in branch 'master':
    bpo-29816: Shift operation now has less opportunity to raise OverflowError. (#680)
    918403c

    @serhiy-storchaka
    Copy link
    Member Author

    Thank you for your review Mark.

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 997a4ad by Serhiy Storchaka in branch 'master':
    Remove outdated note about constraining of the bit shift right operand. (bpo-1258)
    997a4ad

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants