New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange error when convert hexadecimal with underscores to int #75800
Comments
>>> int('1_2_3_4_5_6_7_89', 16)
4886718345
>>> int('1_2_3_4_5_6_7_8_9', 16)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: int string too large to convert |
Looks like an overflow check that was questionable in the first place (depending on undefined behaviour), and is now both questionable and wrong. I'll see if I can find time for a fix this evening (UTC+1). |
The problem is with the following check: n = digits * bits_per_char + PyLong_SHIFT - 1;
if (n / bits_per_char < p - start) {
PyErr_SetString(PyExc_ValueError,
"int string too large to convert"); The problem is that n / bits_per_char gives the number of digits required. But p - start has the number of '_'s included in it. So even for valid cases, if the number of '_'s is above a limit, this check holds true and an overflow is raised. Fix: Replace p - start with the actual number of digits. |
PR 3816 fixes the symptom, but not the core issue -- an overflow check depending on undefined behaviour. |
There's also the (lesser) issue that we're using C
Yep, sorry; this didn't happen. I blame the children. :-( |
I don't understand this check completely actually. When exactly is an int too large to convert? |
We see if digits * bits_per_char + PyLong_SHIFT -1 overflows an int? |
The error is raised when the number of underscores is at least (PyLong_SHIFT - 1) // bits_per_char. |
Yes, but the check is too late: UB can already occur in this calculation It looks like 'n' was Py_ssize_t in 2.7. :) |
I missed the issue mentioned by Mark msg303331. I looked at the type of n, and okay, it is Py_ssize_t. But I forgot to check the type of digits and bits_per_char, and it is int. My fix doesn't guard from integer overflow. The following PR fixes this. It also tries to guard against integer overflow in other place. Unfortunately there is yet one issue with that code. The guard works on 32-bit platform, where the precision of double is large than the size of Py_size_t. But on 64-bit platform it is possible that (Py_ssize_t)(digits * log_base_BASE[base]) takes the value not enough for writing the result of the parsing, because its lowest bits are lost. I know that converting str to int with non-binary base has nonlinear complexity. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: