Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange error when convert hexadecimal with underscores to int #75800

Closed
serhiy-storchaka opened this issue Sep 28, 2017 · 14 comments
Closed

Strange error when convert hexadecimal with underscores to int #75800

serhiy-storchaka opened this issue Sep 28, 2017 · 14 comments
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@serhiy-storchaka
Copy link
Member

BPO 31619
Nosy @birkenfeld, @mdickinson, @skrah, @serhiy-storchaka, @nitishch
PRs
  • bpo-31619 Compare only the number of digits in the check for overflow #3816
  • bpo-31619: Fixed a ValueError when convert a string with large number #3827
  • [3.6] bpo-31619: Fixed a ValueError when convert a string with large number of underscores (GH-3827) #3863
  • bpo-31619: Fixed integer overflow in converting huge strings to int. #3884
  • [3.6] bpo-31619: Fixed integer overflow in converting huge strings to int. (GH-3884) #4690
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-12-03.21:28:05.191>
    created_at = <Date 2017-09-28.14:14:07.175>
    labels = ['interpreter-core', '3.7']
    title = 'Strange error when convert hexadecimal with underscores to int'
    updated_at = <Date 2017-12-03.21:28:05.190>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2017-12-03.21:28:05.190>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-12-03.21:28:05.191>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2017-09-28.14:14:07.175>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 31619
    keywords = ['patch']
    message_count = 14.0
    messages = ['303240', '303263', '303310', '303321', '303331', '303337', '303339', '303346', '303347', '303595', '303606', '303686', '307523', '307526']
    nosy_count = 5.0
    nosy_names = ['georg.brandl', 'mark.dickinson', 'skrah', 'serhiy.storchaka', 'nitishch']
    pr_nums = ['3816', '3827', '3863', '3884', '4690']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue31619'
    versions = ['Python 3.6', 'Python 3.7']

    @serhiy-storchaka
    Copy link
    Member Author

    >>> int('1_2_3_4_5_6_7_89', 16)
    4886718345
    >>> int('1_2_3_4_5_6_7_8_9', 16)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: int string too large to convert

    @serhiy-storchaka serhiy-storchaka added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Sep 28, 2017
    @mdickinson
    Copy link
    Member

    Looks like an overflow check that was questionable in the first place (depending on undefined behaviour), and is now both questionable and wrong. I'll see if I can find time for a fix this evening (UTC+1).

    @nitishch
    Copy link
    Mannequin

    nitishch mannequin commented Sep 29, 2017

    The problem is with the following check:

        n = digits * bits_per_char + PyLong_SHIFT - 1;
        if (n / bits_per_char < p - start) {
            PyErr_SetString(PyExc_ValueError,
                            "int string too large to convert");

    The problem is that n / bits_per_char gives the number of digits required. But p - start has the number of '_'s included in it. So even for valid cases, if the number of '_'s is above a limit, this check holds true and an overflow is raised.

    Fix: Replace p - start with the actual number of digits.

    @serhiy-storchaka
    Copy link
    Member Author

    PR 3816 fixes the symptom, but not the core issue -- an overflow check depending on undefined behaviour.

    @mdickinson
    Copy link
    Member

    There's also the (lesser) issue that we're using C ints for things that should really be ssize_ts. But that can be fixed / changed separately for the fix for this issue.

    I'll see if I can find time for a fix this evening (UTC+1).

    Yep, sorry; this didn't happen. I blame the children. :-(

    @nitishch
    Copy link
    Mannequin

    nitishch mannequin commented Sep 29, 2017

    PR 3816 fixes the symptom, but not the core issue -- an overflow check depending on undefined behaviour.

    I don't understand this check completely actually. When exactly is an int too large to convert?

    @nitishch
    Copy link
    Mannequin

    nitishch mannequin commented Sep 29, 2017

    > PR 3816 fixes the symptom, but not the core issue -- an overflow check depending on undefined behaviour.

    I don't understand this check completely actually. When exactly is an int too large to convert?

    We see if digits * bits_per_char + PyLong_SHIFT -1 overflows an int?

    @serhiy-storchaka
    Copy link
    Member Author

    The error is raised when the number of underscores is at least (PyLong_SHIFT - 1) // bits_per_char.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 29, 2017

    We see if digits * bits_per_char + PyLong_SHIFT -1 overflows an int?

    Yes, but the check is too late: UB can already occur in this calculation
    and then all bets are off.

    It looks like 'n' was Py_ssize_t in 2.7. :)

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 85c0b89 by Serhiy Storchaka in branch 'master':
    bpo-31619: Fixed a ValueError when convert a string with large number of underscores (bpo-3827)
    85c0b89

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset b5a630f by Serhiy Storchaka (Miss Islington (bot)) in branch '3.6':
    [3.6] bpo-31619: Fixed a ValueError when convert a string with large number of underscores (GH-3827) (bpo-3863)
    b5a630f

    @serhiy-storchaka
    Copy link
    Member Author

    I missed the issue mentioned by Mark msg303331. I looked at the type of n, and okay, it is Py_ssize_t. But I forgot to check the type of digits and bits_per_char, and it is int. My fix doesn't guard from integer overflow.

    The following PR fixes this. It also tries to guard against integer overflow in other place. Unfortunately there is yet one issue with that code. The guard works on 32-bit platform, where the precision of double is large than the size of Py_size_t. But on 64-bit platform it is possible that (Py_ssize_t)(digits * log_base_BASE[base]) takes the value not enough for writing the result of the parsing, because its lowest bits are lost.

    I know that converting str to int with non-binary base has nonlinear complexity.

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 29ba688 by Serhiy Storchaka in branch 'master':
    bpo-31619: Fixed integer overflow in converting huge strings to int. (bpo-3884)
    29ba688

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 30a6bc8 by Serhiy Storchaka (Miss Islington (bot)) in branch '3.6':
    bpo-31619: Fixed integer overflow in converting huge strings to int. (GH-3884) (bpo-4690)
    30a6bc8

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants