This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Strange error when convert hexadecimal with underscores to int
Type: Stage: resolved
Components: Interpreter Core Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: georg.brandl, mark.dickinson, nitishch, serhiy.storchaka, skrah
Priority: normal Keywords: patch

Created on 2017-09-28 14:14 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 3816 closed nitishch, 2017-09-29 08:22
PR 3827 merged serhiy.storchaka, 2017-09-29 17:41
PR 3863 merged python-dev, 2017-10-03 11:14
PR 3884 merged serhiy.storchaka, 2017-10-04 13:11
PR 4690 merged python-dev, 2017-12-03 20:17
Messages (14)
msg303240 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-28 14:14
>>> int('1_2_3_4_5_6_7_89', 16)
4886718345
>>> int('1_2_3_4_5_6_7_8_9', 16)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: int string too large to convert
msg303263 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2017-09-28 17:11
Looks like an overflow check that was questionable in the first place (depending on undefined behaviour), and is now both questionable and wrong. I'll see if I can find time for a fix this evening (UTC+1).
msg303310 - (view) Author: Nitish (nitishch) * Date: 2017-09-29 08:28
The problem is with the following check:

    n = digits * bits_per_char + PyLong_SHIFT - 1;
    if (n / bits_per_char < p - start) {
        PyErr_SetString(PyExc_ValueError,
                        "int string too large to convert");

The problem is that n / bits_per_char gives the number of digits required. But p - start has the number of '_'s included in it. So even for valid cases, if the number of '_'s is above a limit, this check holds true and an overflow is raised.

Fix: Replace p - start with the actual number of digits.
msg303321 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-29 10:37
PR 3816 fixes the symptom, but not the core issue -- an overflow check depending on undefined behaviour.
msg303331 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2017-09-29 13:36
There's also the (lesser) issue that we're using C `int`s for things that should really be `ssize_t`s. But that can be fixed / changed separately for the fix for this issue.

> I'll see if I can find time for a fix this evening (UTC+1).

Yep, sorry; this didn't happen. I blame the children. :-(
msg303337 - (view) Author: Nitish (nitishch) * Date: 2017-09-29 16:27
> PR 3816 fixes the symptom, but not the core issue -- an overflow check depending on undefined behaviour.

I don't understand this check completely actually. When exactly is an int too large to convert?
msg303339 - (view) Author: Nitish (nitishch) * Date: 2017-09-29 16:45
>> PR 3816 fixes the symptom, but not the core issue -- an overflow check depending on undefined behaviour.

> I don't understand this check completely actually. When exactly is an int too large to convert?

We see if digits * bits_per_char + PyLong_SHIFT -1 overflows an int?
msg303346 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-29 17:44
The error is raised when the number of underscores is at least (PyLong_SHIFT - 1) // bits_per_char.
msg303347 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2017-09-29 17:49
> We see if digits * bits_per_char + PyLong_SHIFT -1 overflows an int?

Yes, but the check is too late: UB can already occur in this calculation
and then all bets are off.


It looks like 'n' was Py_ssize_t in 2.7. :)
msg303595 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-03 11:13
New changeset 85c0b8941f0c8ef3ed787c9d504712c6ad3eb5d3 by Serhiy Storchaka in branch 'master':
bpo-31619: Fixed a ValueError when convert a string with large number of underscores (#3827)
https://github.com/python/cpython/commit/85c0b8941f0c8ef3ed787c9d504712c6ad3eb5d3
msg303606 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-03 12:38
New changeset b5a630f3dd30ed628e088efe7523e650087adba2 by Serhiy Storchaka (Miss Islington (bot)) in branch '3.6':
[3.6] bpo-31619: Fixed a ValueError when convert a string with large number of underscores (GH-3827) (#3863)
https://github.com/python/cpython/commit/b5a630f3dd30ed628e088efe7523e650087adba2
msg303686 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-04 13:23
I missed the issue mentioned by Mark msg303331. I looked at the type of n, and okay, it is Py_ssize_t. But I forgot to check the type of digits and bits_per_char, and it is int. My fix doesn't guard from integer overflow.

The following PR fixes this. It also tries to guard against integer overflow in other place. Unfortunately there is yet one issue with that code. The guard works on 32-bit platform, where the precision of double is large than the size of Py_size_t. But on 64-bit platform it is possible that (Py_ssize_t)(digits * log_base_BASE[base]) takes the value not enough for writing the result of the parsing, because its lowest bits are lost.

I know that converting str to int with non-binary base has nonlinear complexity.
msg307523 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-03 20:16
New changeset 29ba688034fc4eef0693b86002cf7bee55d692af by Serhiy Storchaka in branch 'master':
bpo-31619: Fixed integer overflow in converting huge strings to int. (#3884)
https://github.com/python/cpython/commit/29ba688034fc4eef0693b86002cf7bee55d692af
msg307526 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-03 21:27
New changeset 30a6bc842945e3e9c9c7db887ab495c428ec7074 by Serhiy Storchaka (Miss Islington (bot)) in branch '3.6':
bpo-31619: Fixed integer overflow in converting huge strings to int. (GH-3884) (#4690)
https://github.com/python/cpython/commit/30a6bc842945e3e9c9c7db887ab495c428ec7074
History
Date User Action Args
2022-04-11 14:58:52adminsetgithub: 75800
2017-12-03 21:28:05serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-12-03 21:27:23serhiy.storchakasetmessages: + msg307526
2017-12-03 20:17:33python-devsetpull_requests: + pull_request4603
2017-12-03 20:16:23serhiy.storchakasetmessages: + msg307523
2017-10-04 13:23:32serhiy.storchakasetmessages: + msg303686
2017-10-04 13:11:23serhiy.storchakasetpull_requests: + pull_request3857
2017-10-03 12:38:48serhiy.storchakasetmessages: + msg303606
2017-10-03 11:14:03python-devsetpull_requests: + pull_request3843
2017-10-03 11:13:46serhiy.storchakasetmessages: + msg303595
2017-09-29 17:49:43skrahsetnosy: + skrah
messages: + msg303347
2017-09-29 17:44:29serhiy.storchakasetmessages: + msg303346
2017-09-29 17:41:35serhiy.storchakasetpull_requests: + pull_request3811
2017-09-29 16:45:11nitishchsetmessages: + msg303339
2017-09-29 16:27:38nitishchsetmessages: + msg303337
2017-09-29 13:36:49mark.dickinsonsetmessages: + msg303331
2017-09-29 10:37:14serhiy.storchakasetmessages: + msg303321
2017-09-29 08:28:48nitishchsetnosy: + nitishch
messages: + msg303310
2017-09-29 08:22:18nitishchsetkeywords: + patch
stage: patch review
pull_requests: + pull_request3800
2017-09-28 17:11:48mark.dickinsonsetmessages: + msg303263
2017-09-28 14:14:07serhiy.storchakacreate