Strange error when convert hexadecimal with underscores to int #75800

serhiy-storchaka · 2017-09-28T14:14:07Z

BPO	31619
Nosy	@birkenfeld, @mdickinson, @skrah, @serhiy-storchaka, @nitishch
PRs	bpo-31619 Compare only the number of digits in the check for overflow #3816 bpo-31619: Fixed a ValueError when convert a string with large number #3827 [3.6] bpo-31619: Fixed a ValueError when convert a string with large number of underscores (GH-3827) #3863 bpo-31619: Fixed integer overflow in converting huge strings to int. #3884 [3.6] bpo-31619: Fixed integer overflow in converting huge strings to int. (GH-3884) #4690

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2017-12-03.21:28:05.191>
created_at = <Date 2017-09-28.14:14:07.175>
labels = ['interpreter-core', '3.7']
title = 'Strange error when convert hexadecimal with underscores to int'
updated_at = <Date 2017-12-03.21:28:05.190>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2017-12-03.21:28:05.190>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = True
closed_date = <Date 2017-12-03.21:28:05.191>
closer = 'serhiy.storchaka'
components = ['Interpreter Core']
creation = <Date 2017-09-28.14:14:07.175>
creator = 'serhiy.storchaka'
dependencies = []
files = []
hgrepos = []
issue_num = 31619
keywords = ['patch']
message_count = 14.0
messages = ['303240', '303263', '303310', '303321', '303331', '303337', '303339', '303346', '303347', '303595', '303606', '303686', '307523', '307526']
nosy_count = 5.0
nosy_names = ['georg.brandl', 'mark.dickinson', 'skrah', 'serhiy.storchaka', 'nitishch']
pr_nums = ['3816', '3827', '3863', '3884', '4690']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue31619'
versions = ['Python 3.6', 'Python 3.7']

serhiy-storchaka · 2017-09-28T14:14:07Z

>>> int('1_2_3_4_5_6_7_89', 16)
4886718345
>>> int('1_2_3_4_5_6_7_8_9', 16)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: int string too large to convert

mdickinson · 2017-09-28T17:11:48Z

Looks like an overflow check that was questionable in the first place (depending on undefined behaviour), and is now both questionable and wrong. I'll see if I can find time for a fix this evening (UTC+1).

nitishch · 2017-09-29T08:28:48Z

The problem is with the following check:

    n = digits * bits_per_char + PyLong_SHIFT - 1;
    if (n / bits_per_char < p - start) {
        PyErr_SetString(PyExc_ValueError,
                        "int string too large to convert");

The problem is that n / bits_per_char gives the number of digits required. But p - start has the number of '_'s included in it. So even for valid cases, if the number of '_'s is above a limit, this check holds true and an overflow is raised.

Fix: Replace p - start with the actual number of digits.

serhiy-storchaka · 2017-09-29T10:37:14Z

PR 3816 fixes the symptom, but not the core issue -- an overflow check depending on undefined behaviour.

mdickinson · 2017-09-29T13:36:50Z

There's also the (lesser) issue that we're using C ints for things that should really be ssize_ts. But that can be fixed / changed separately for the fix for this issue.

I'll see if I can find time for a fix this evening (UTC+1).

Yep, sorry; this didn't happen. I blame the children. :-(

nitishch · 2017-09-29T16:27:38Z

PR 3816 fixes the symptom, but not the core issue -- an overflow check depending on undefined behaviour.

I don't understand this check completely actually. When exactly is an int too large to convert?

nitishch · 2017-09-29T16:45:11Z

> PR 3816 fixes the symptom, but not the core issue -- an overflow check depending on undefined behaviour.

I don't understand this check completely actually. When exactly is an int too large to convert?

We see if digits * bits_per_char + PyLong_SHIFT -1 overflows an int?

serhiy-storchaka · 2017-09-29T17:44:29Z

The error is raised when the number of underscores is at least (PyLong_SHIFT - 1) // bits_per_char.

skrah · 2017-09-29T17:49:44Z

We see if digits * bits_per_char + PyLong_SHIFT -1 overflows an int?

Yes, but the check is too late: UB can already occur in this calculation
and then all bets are off.

It looks like 'n' was Py_ssize_t in 2.7. :)

serhiy-storchaka · 2017-10-03T11:13:47Z

New changeset 85c0b89 by Serhiy Storchaka in branch 'master':
bpo-31619: Fixed a ValueError when convert a string with large number of underscores (bpo-3827)
85c0b89

serhiy-storchaka · 2017-10-03T12:38:49Z

New changeset b5a630f by Serhiy Storchaka (Miss Islington (bot)) in branch '3.6':
[3.6] bpo-31619: Fixed a ValueError when convert a string with large number of underscores (GH-3827) (bpo-3863)
b5a630f

serhiy-storchaka · 2017-10-04T13:23:32Z

I missed the issue mentioned by Mark msg303331. I looked at the type of n, and okay, it is Py_ssize_t. But I forgot to check the type of digits and bits_per_char, and it is int. My fix doesn't guard from integer overflow.

The following PR fixes this. It also tries to guard against integer overflow in other place. Unfortunately there is yet one issue with that code. The guard works on 32-bit platform, where the precision of double is large than the size of Py_size_t. But on 64-bit platform it is possible that (Py_ssize_t)(digits * log_base_BASE[base]) takes the value not enough for writing the result of the parsing, because its lowest bits are lost.

I know that converting str to int with non-binary base has nonlinear complexity.

serhiy-storchaka · 2017-12-03T20:16:23Z

New changeset 29ba688 by Serhiy Storchaka in branch 'master':
bpo-31619: Fixed integer overflow in converting huge strings to int. (bpo-3884)
29ba688

serhiy-storchaka · 2017-12-03T21:27:24Z

New changeset 30a6bc8 by Serhiy Storchaka (Miss Islington (bot)) in branch '3.6':
bpo-31619: Fixed integer overflow in converting huge strings to int. (GH-3884) (bpo-4690)
30a6bc8

serhiy-storchaka added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Sep 28, 2017

serhiy-storchaka closed this as completed Dec 3, 2017

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange error when convert hexadecimal with underscores to int #75800

Strange error when convert hexadecimal with underscores to int #75800

serhiy-storchaka commented Sep 28, 2017

serhiy-storchaka commented Sep 28, 2017

mdickinson commented Sep 28, 2017

nitishch mannequin commented Sep 29, 2017

serhiy-storchaka commented Sep 29, 2017

mdickinson commented Sep 29, 2017

nitishch mannequin commented Sep 29, 2017

nitishch mannequin commented Sep 29, 2017

serhiy-storchaka commented Sep 29, 2017

skrah mannequin commented Sep 29, 2017

serhiy-storchaka commented Oct 3, 2017

serhiy-storchaka commented Oct 3, 2017

serhiy-storchaka commented Oct 4, 2017

serhiy-storchaka commented Dec 3, 2017

serhiy-storchaka commented Dec 3, 2017

Strange error when convert hexadecimal with underscores to int #75800

Strange error when convert hexadecimal with underscores to int #75800

Comments

serhiy-storchaka commented Sep 28, 2017

serhiy-storchaka commented Sep 28, 2017

mdickinson commented Sep 28, 2017

nitishch mannequin commented Sep 29, 2017

serhiy-storchaka commented Sep 29, 2017

mdickinson commented Sep 29, 2017

nitishch mannequin commented Sep 29, 2017

nitishch mannequin commented Sep 29, 2017

serhiy-storchaka commented Sep 29, 2017

skrah mannequin commented Sep 29, 2017

serhiy-storchaka commented Oct 3, 2017

serhiy-storchaka commented Oct 3, 2017

serhiy-storchaka commented Oct 4, 2017

serhiy-storchaka commented Dec 3, 2017

serhiy-storchaka commented Dec 3, 2017