Issue 19638: dtoa: conversion from '__int64' to 'int', possible loss of data

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/63837

classification

Title:	dtoa: conversion from '__int64' to 'int', possible loss of data
Type:	compile error	Stage:	resolved
Components:	Windows	Versions:	Python 3.3, Python 3.4, Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	mark.dickinson	Nosy List:	christian.heimes, eric.smith, jkloth, mark.dickinson, python-dev, vstinner, zach.ware
Priority:	normal	Keywords:	patch

Created on 2013-11-18 09:23 by christian.heimes, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
dtoa.patch	christian.heimes, 2013-11-18 09:23		review
limit_dtoa_string.patch	mark.dickinson, 2013-11-23 19:59		review
float_overflow.patch	christian.heimes, 2013-11-23 20:10		review
limit_dtoa_string_v2.patch	mark.dickinson, 2013-11-24 11:44		review
limit_dtoa_string_v3.patch	mark.dickinson, 2013-11-24 11:51		review
limit_dtoa_string_v4.patch	mark.dickinson, 2013-11-24 12:46		review
limit_dtoa_string_v5.patch	mark.dickinson, 2013-11-24 12:59		review

Messages (27)
msg203264 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2013-11-18 09:23
The patch silences three compiler warnings on Win64
msg203299 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-18 15:28
I don't much like the use of Py_SSIZE_T and Py_SAFE_DOWNCAST here: the dtoa.c code knows almost nothing about Python.h (aside from right at the top), and I'd like to keep it that way if possible. And in fact I'd say that we shouldn't be silencing these warnings; rather, we should take them seriously. It looks to me as though it is possible for that conversion to overflow. I'll try to take a look sometime soon. Adding 2.7 and 3.3, since the bug is present there, too.
msg203300 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-11-18 15:34
"I don't much like the use of Py_SSIZE_T and Py_SAFE_DOWNCAST here (...) And in fact I'd say that we shouldn't be silencing these warnings; rather, we should take them seriously." So size_t should be used instead ("s - s1" is unsigned, s >= s1).
msg203302 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-11-18 15:38
See also the "meta-issue" #9566: "Compilation warnings under x64 Windows".
msg204078 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-23 19:20
So I think the right approach here is to signal an error if the input string is longer than INT_MAX. The the safe downcast really is a safe downcast.
msg204082 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-23 19:49
I'm thinking of something like the attached patch, which limits nd to 2 billion, comfortably within the range of int on machines that Python's likely to meet. (If anyone's worried about machines with ints smaller than 32 bits, we could add a configure check for that.) I don't think we can take the limit all the way to 2**31 - 1, since nd may be combined with the exponent (which is limited to less than 20000 in the current code), but a limit of 2 billion should be safe. With this limit in place, it should then be safe to silence the warnings.
msg204085 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-23 19:51
> So size_t should be used instead ("s - s1" is unsigned, s >= s1). That was one of my original thoughts, but the size_t would then creep into all the exponent calculations, requiring fairly substantial changes throughout the code. That's not something I'd want to consider on the maintenance branches, at least. I think just limiting the size of the input string is a simpler option.
msg204087 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-23 19:59
Whoops; half-baked patch. This one's better (but still not as tested as I'd like).
msg204090 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-23 20:08
Christian: I don't think this code is safe: - nd0 = nd = s - s1; + tmp = s - s1; + nd0 = nd = Py_SAFE_DOWNCAST(tmp, Py_SSIZE_T, int); The result of the Py_SAFE_DOWNCAST could be almost anything, and in particular could be negative. It would take a careful examination of the code to guarantee that a negative nd or nd0 won't lead to difficulties further down the algorithm. I think we need to raise an error if tmp is too large, before the downcast.
msg204091 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2013-11-23 20:10
Mark, I was talking about the attached "fix" (or more a workaround). You are right, my initial patch wasn't a good idea.
msg204095 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-23 20:14
Ah, sorry; I misunderstood. Yep, that looks like a fine solution to me.
msg204096 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-23 20:16
And if you limit the length to a little less than INT_MAX, I don't think you need to modify dtoa.c at all.
msg204098 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2013-11-23 20:23
PyOS_string_to_double() is a public function. The check is needed in case 3rd party code uses it without extra checks.
msg204099 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-23 20:25
> PyOS_string_to_double() is a public function. The check is needed in case > 3rd party code uses it without extra checks. In that case, I'd suggest something more like my modifications to dtoa.c. By the time that you end up with a negative 'e', all sorts of other things have already gone wrong, possibly including undefined behaviour from signed integer overflow. IF we need to modify dtoa.c, the check should be earlier IMO.
msg204195 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-24 11:08
Picking this up again; a better patch is on the way.
msg204198 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-24 11:44
Updated patch. I've also taken the opportunity to increase MAX_ABS_EXP, which fixes the following long-standing bug (not that anyone has noticed it in the last 4 years, apparently): >>> s = '0.' + '0'*19999 + '1e+20000' >>> float(s) # should be 1.0 0.1 (The new limit on the number of digits means that the clipping of the exponent can no longer result in incorrect results from strtod.) Unlike the previous patch, I'm reasonably confident in this one. :-) I'll plan to apply it soon, though it probably won't make 3.4 beta 1.
msg204199 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-24 11:51
Tweak: make MAX_DIGITS unsigned to avoid compiler complaints about comparing signed with unsigned.
msg204200 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-11-24 12:27
Could you add a test with more than more MAX_DIGITS (and maybe another one with more than MAX_ABS_EXP) using @bigmemtest()?
msg204206 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-24 12:46
Done. I've tested this locally by reducing the limits and the sizes; can someone with access to a beefy machine can verify that the new test passes on that machine?
msg204211 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-24 12:59
And now with the @bigmemtest fixed to take a second parameter (thanks, Larry!).
msg204497 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-11-26 16:19
New changeset eac133e13bb5 by Mark Dickinson in branch '3.3': Issue #19638: Raise ValueError instead of crashing when converting billion character strings to float. http://hg.python.org/cpython/rev/eac133e13bb5 New changeset 9c7ab3e68243 by Mark Dickinson in branch 'default': Issue #19638: Merge from 3.3 http://hg.python.org/cpython/rev/9c7ab3e68243
msg204498 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-11-26 16:38
New changeset 6d6a018a3bb0 by Mark Dickinson in branch '2.7': Issue #19638: Raise ValueError instead of crashing when converting billion character strings to float. http://hg.python.org/cpython/rev/6d6a018a3bb0
msg204499 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-26 16:39
Now fixed.
msg204500 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-11-26 16:45
I ran test_strtod using: ./python -m test -M 6G test_strtod. The test passed successfully on my 64-bit Linux box. But there is an issue with the announced memory limit: it looks closer to 6 GB than 5 GB. $ ./python -m test -M 6G -v test_strtod == CPython 3.4.0b1 (default:9c7ab3e68243, Nov 26 2013, 17:24:05) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] == Linux-3.9.4-200.fc18.x86_64-x86_64-with-fedora-18-Spherical_Cow little-endian == hash algorithm: siphash24 64bit == /home/haypo/prog/python/default/build/test_python_2997 Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0) [1/1] test_strtod test_bigcomp (test.test_strtod.StrtodTests) ... ok test_boundaries (test.test_strtod.StrtodTests) ... ok test_halfway_cases (test.test_strtod.StrtodTests) ... ok test_large_exponents (test.test_strtod.StrtodTests) ... ok test_oversized_digit_strings (test.test_strtod.StrtodTests) ... ... expected peak memory use: 5.0G ... process data size: 2.1G (...) ... process data size: 4.1G (...) ... process data size: 6.2G (...) ... process data size: 2.1G ok test_parsing (test.test_strtod.StrtodTests) ... ok test_particular (test.test_strtod.StrtodTests) ... ok test_short_halfway_cases (test.test_strtod.StrtodTests) ... ok test_underflow_boundary (test.test_strtod.StrtodTests) ... ok ---------------------------------------------------------------------- Ran 9 tests in 63.871s OK 1 test OK.
msg204501 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-26 16:52
This caused test failures on 32-bit machines on 2.7, since they can't create these huge strings in the first place. I'm working on a fix. > But there is an issue with the announced memory limit: Thanks. I'll look into it.
msg204505 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-11-26 17:03
New changeset 4c523ea2b429 by Mark Dickinson in branch '2.7': Issue #19638: Skip large digit string tests on 32-bit platforms. http://hg.python.org/cpython/rev/4c523ea2b429
msg204525 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-11-26 20:02
Peak memory usage appears to be around 4 times the string length on Python 3.3, and around 3 times the string length on Python 3.4. For 3.4, the peak occurs while formatting the exception message; presumably at that point we've got all three of (a) the original string, (b) the exception message string being built, and (c) some sort of temporary string used during formatting. For the purposes of this issue, I'll update the constants in the bigmemtest decorators.

History
Date	User	Action	Args
2022-04-11 14:57:53	admin	set	github: 63837
2013-11-30 09:23:15	mark.dickinson	set	status: open -> closed
2013-11-26 20:02:33	mark.dickinson	set	status: closed -> open messages: + msg204525
2013-11-26 17:03:07	python-dev	set	messages: + msg204505
2013-11-26 16:52:01	mark.dickinson	set	messages: + msg204501
2013-11-26 16:45:43	vstinner	set	messages: + msg204500
2013-11-26 16:39:55	mark.dickinson	set	status: open -> closed resolution: fixed messages: + msg204499 stage: patch review -> resolved
2013-11-26 16:38:42	python-dev	set	messages: + msg204498
2013-11-26 16:19:58	python-dev	set	nosy: + python-dev messages: + msg204497
2013-11-24 12:59:04	mark.dickinson	set	files: + limit_dtoa_string_v5.patch messages: + msg204211
2013-11-24 12:46:46	mark.dickinson	set	files: + limit_dtoa_string_v4.patch messages: + msg204206
2013-11-24 12:27:56	vstinner	set	messages: + msg204200
2013-11-24 11:51:27	mark.dickinson	set	files: + limit_dtoa_string_v3.patch messages: + msg204199
2013-11-24 11:44:46	mark.dickinson	set	files: + limit_dtoa_string_v2.patch messages: + msg204198
2013-11-24 11:08:01	mark.dickinson	set	assignee: mark.dickinson messages: + msg204195
2013-11-23 20:25:39	mark.dickinson	set	messages: + msg204099
2013-11-23 20:23:31	christian.heimes	set	messages: + msg204098
2013-11-23 20:16:10	mark.dickinson	set	messages: + msg204096
2013-11-23 20:14:58	mark.dickinson	set	messages: + msg204095
2013-11-23 20:10:24	christian.heimes	set	files: + float_overflow.patch messages: + msg204091
2013-11-23 20:08:24	mark.dickinson	set	messages: + msg204090
2013-11-23 19:59:52	mark.dickinson	set	files: - limit_dtoa_string.patch
2013-11-23 19:59:35	mark.dickinson	set	files: + limit_dtoa_string.patch messages: + msg204087
2013-11-23 19:51:22	mark.dickinson	set	messages: + msg204085
2013-11-23 19:49:03	mark.dickinson	set	files: + limit_dtoa_string.patch messages: + msg204082
2013-11-23 19:20:48	mark.dickinson	set	messages: + msg204078
2013-11-23 19:09:04	mark.dickinson	set	assignee: mark.dickinson -> (no value)
2013-11-18 16:51:37	jkloth	set	nosy: + jkloth
2013-11-18 15:38:11	vstinner	set	messages: + msg203302
2013-11-18 15:34:14	vstinner	set	messages: + msg203300
2013-11-18 15:29:10	mark.dickinson	set	assignee: mark.dickinson
2013-11-18 15:28:57	mark.dickinson	set	messages: + msg203299 versions: + Python 2.7, Python 3.3
2013-11-18 12:32:45	eric.smith	set	nosy: + eric.smith
2013-11-18 11:03:57	skrah	set	nosy: + mark.dickinson
2013-11-18 09:23:20	christian.heimes	create