classification
Title: dtoa: conversion from '__int64' to 'int', possible loss of data
Type: compile error Stage: resolved
Components: Windows Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: mark.dickinson Nosy List: christian.heimes, eric.smith, haypo, jkloth, mark.dickinson, python-dev, zach.ware
Priority: normal Keywords: patch

Created on 2013-11-18 09:23 by christian.heimes, last changed 2013-11-30 09:23 by mark.dickinson. This issue is now closed.

Files
File name Uploaded Description Edit
dtoa.patch christian.heimes, 2013-11-18 09:23 review
limit_dtoa_string.patch mark.dickinson, 2013-11-23 19:59 review
float_overflow.patch christian.heimes, 2013-11-23 20:10 review
limit_dtoa_string_v2.patch mark.dickinson, 2013-11-24 11:44 review
limit_dtoa_string_v3.patch mark.dickinson, 2013-11-24 11:51 review
limit_dtoa_string_v4.patch mark.dickinson, 2013-11-24 12:46 review
limit_dtoa_string_v5.patch mark.dickinson, 2013-11-24 12:59 review
Messages (27)
msg203264 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-11-18 09:23
The patch silences three compiler warnings on Win64
msg203299 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-18 15:28
I don't much like the use of Py_SSIZE_T and Py_SAFE_DOWNCAST here:  the dtoa.c code knows almost nothing about Python.h (aside from right at the top), and I'd like to keep it that way if possible.

And in fact I'd say that we *shouldn't* be silencing these warnings; rather, we should take them seriously.  It looks to me as though it is possible for that conversion to overflow.

I'll try to take a look sometime soon.  Adding 2.7 and 3.3, since the bug is present there, too.
msg203300 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-11-18 15:34
"I don't much like the use of Py_SSIZE_T and Py_SAFE_DOWNCAST here (...) 
And in fact I'd say that we *shouldn't* be silencing these warnings; rather, we should take them seriously."

So size_t should be used instead ("s - s1" is unsigned, s >= s1).
msg203302 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-11-18 15:38
See also the "meta-issue" #9566: "Compilation warnings under x64 Windows".
msg204078 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-23 19:20
So I think the right approach here is to signal an error if the input string is longer than INT_MAX.  The the safe downcast really *is* a safe downcast.
msg204082 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-23 19:49
I'm thinking of something like the attached patch, which limits nd to 2 billion, comfortably within the range of int on machines that Python's likely to meet.  (If anyone's worried about machines with ints smaller than 32 bits, we could add a configure check for that.)

I don't think we can take the limit all the way to 2**31 - 1, since nd may be combined with the exponent (which is limited to less than 20000 in the current code), but a limit of 2 billion should be safe.

With this limit in place, it should then be safe to silence the warnings.
msg204085 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-23 19:51
> So size_t should be used instead ("s - s1" is unsigned, s >= s1).

That was one of my original thoughts, but the size_t would then creep into all the exponent calculations, requiring fairly substantial changes throughout the code.  That's not something I'd want to consider on the maintenance branches, at least.  I think just limiting the size of the input string is a simpler option.
msg204087 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-23 19:59
Whoops; half-baked patch.  This one's better (but still not as tested as I'd like).
msg204090 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-23 20:08
Christian:  I don't think this code is safe:

-    nd0 = nd = s - s1;
+    tmp = s - s1;
+    nd0 = nd = Py_SAFE_DOWNCAST(tmp, Py_SSIZE_T, int);

The result of the Py_SAFE_DOWNCAST could be almost anything, and in particular could be negative.  It would take a careful examination of the code to guarantee that a negative nd or nd0 won't lead to difficulties further down the algorithm.  I think we need to raise an error if tmp is too large, *before* the downcast.
msg204091 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-11-23 20:10
Mark, I was talking about the attached "fix" (or more a workaround). You are right, my initial patch wasn't a good idea.
msg204095 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-23 20:14
Ah, sorry; I misunderstood.

Yep, that looks like a fine solution to me.
msg204096 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-23 20:16
And if you limit the length to a little less than INT_MAX, I don't think you need to modify dtoa.c at all.
msg204098 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-11-23 20:23
PyOS_string_to_double() is a public function. The check is needed in case 3rd party code uses it without extra checks.
msg204099 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-23 20:25
> PyOS_string_to_double() is a public function. The check is needed in case 
> 3rd party code uses it without extra checks.

In that case, I'd suggest something more like my modifications to dtoa.c.  By the time that you end up with a negative 'e', all sorts of other things have already gone wrong, possibly including undefined behaviour from signed integer overflow.  IF we need to modify dtoa.c, the check should be earlier IMO.
msg204195 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-24 11:08
Picking this up again;  a better patch is on the way.
msg204198 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-24 11:44
Updated patch.  I've also taken the opportunity to increase MAX_ABS_EXP, which fixes the following long-standing bug (not that anyone has noticed it in the last 4 years, apparently):

>>> s = '0.' + '0'*19999 + '1e+20000'
>>> float(s)  # should be 1.0
0.1

(The new limit on the number of digits means that the clipping of the exponent can no longer result in incorrect results from strtod.)

Unlike the previous patch, I'm reasonably confident in this one. :-)  I'll plan to apply it soon, though it probably won't make 3.4 beta 1.
msg204199 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-24 11:51
Tweak: make MAX_DIGITS unsigned to avoid compiler complaints about comparing signed with unsigned.
msg204200 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-11-24 12:27
Could you add a test with more than more MAX_DIGITS (and maybe another
one with more than MAX_ABS_EXP) using @bigmemtest()?
msg204206 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-24 12:46
Done.  I've tested this locally by reducing the limits and the sizes;  can someone with access to a beefy machine can verify that the new test passes on that machine?
msg204211 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-24 12:59
And now with the @bigmemtest fixed to take a second parameter (thanks, Larry!).
msg204497 - (view) Author: Roundup Robot (python-dev) Date: 2013-11-26 16:19
New changeset eac133e13bb5 by Mark Dickinson in branch '3.3':
Issue #19638: Raise ValueError instead of crashing when converting billion character strings to float.
http://hg.python.org/cpython/rev/eac133e13bb5

New changeset 9c7ab3e68243 by Mark Dickinson in branch 'default':
Issue #19638: Merge from 3.3
http://hg.python.org/cpython/rev/9c7ab3e68243
msg204498 - (view) Author: Roundup Robot (python-dev) Date: 2013-11-26 16:38
New changeset 6d6a018a3bb0 by Mark Dickinson in branch '2.7':
Issue #19638: Raise ValueError instead of crashing when converting billion character strings to float.
http://hg.python.org/cpython/rev/6d6a018a3bb0
msg204499 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-26 16:39
Now fixed.
msg204500 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-11-26 16:45
I ran test_strtod using: ./python -m test -M 6G test_strtod. The test passed successfully on my 64-bit Linux box. But there is an issue with the announced memory limit: it looks closer to 6 GB than 5 GB.

$ ./python -m test -M 6G -v test_strtod
== CPython 3.4.0b1 (default:9c7ab3e68243, Nov 26 2013, 17:24:05) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)]
==   Linux-3.9.4-200.fc18.x86_64-x86_64-with-fedora-18-Spherical_Cow little-endian
==   hash algorithm: siphash24 64bit
==   /home/haypo/prog/python/default/build/test_python_2997
Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0)
[1/1] test_strtod
test_bigcomp (test.test_strtod.StrtodTests) ... ok
test_boundaries (test.test_strtod.StrtodTests) ... ok
test_halfway_cases (test.test_strtod.StrtodTests) ... ok
test_large_exponents (test.test_strtod.StrtodTests) ... ok
test_oversized_digit_strings (test.test_strtod.StrtodTests) ... 
 ... expected peak memory use: 5.0G
 ... process data size: 2.1G
(...)
 ... process data size: 4.1G
(...)
 ... process data size: 6.2G
(...)
 ... process data size: 2.1G
ok
test_parsing (test.test_strtod.StrtodTests) ... ok
test_particular (test.test_strtod.StrtodTests) ... ok
test_short_halfway_cases (test.test_strtod.StrtodTests) ... ok
test_underflow_boundary (test.test_strtod.StrtodTests) ... ok

----------------------------------------------------------------------
Ran 9 tests in 63.871s

OK
1 test OK.
msg204501 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-26 16:52
This caused test failures on 32-bit machines on 2.7, since they can't create these huge strings in the first place.  I'm working on a fix.

> But there is an issue with the announced memory limit:

Thanks.  I'll look into it.
msg204505 - (view) Author: Roundup Robot (python-dev) Date: 2013-11-26 17:03
New changeset 4c523ea2b429 by Mark Dickinson in branch '2.7':
Issue #19638: Skip large digit string tests on 32-bit platforms.
http://hg.python.org/cpython/rev/4c523ea2b429
msg204525 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-11-26 20:02
Peak memory usage appears to be around 4 times the string length on Python 3.3, and around 3 times the string length on Python 3.4.

For 3.4, the peak occurs while formatting the exception message;  presumably at that point we've got all three of (a) the original string, (b) the exception message string being built, and (c) some sort of temporary string used during formatting.

For the purposes of this issue, I'll update the constants in the bigmemtest decorators.
History
Date User Action Args
2013-11-30 09:23:15mark.dickinsonsetstatus: open -> closed
2013-11-26 20:02:33mark.dickinsonsetstatus: closed -> open

messages: + msg204525
2013-11-26 17:03:07python-devsetmessages: + msg204505
2013-11-26 16:52:01mark.dickinsonsetmessages: + msg204501
2013-11-26 16:45:43hayposetmessages: + msg204500
2013-11-26 16:39:55mark.dickinsonsetstatus: open -> closed
resolution: fixed
messages: + msg204499

stage: patch review -> resolved
2013-11-26 16:38:42python-devsetmessages: + msg204498
2013-11-26 16:19:58python-devsetnosy: + python-dev
messages: + msg204497
2013-11-24 12:59:04mark.dickinsonsetfiles: + limit_dtoa_string_v5.patch

messages: + msg204211
2013-11-24 12:46:46mark.dickinsonsetfiles: + limit_dtoa_string_v4.patch

messages: + msg204206
2013-11-24 12:27:56hayposetmessages: + msg204200
2013-11-24 11:51:27mark.dickinsonsetfiles: + limit_dtoa_string_v3.patch

messages: + msg204199
2013-11-24 11:44:46mark.dickinsonsetfiles: + limit_dtoa_string_v2.patch

messages: + msg204198
2013-11-24 11:08:01mark.dickinsonsetassignee: mark.dickinson
messages: + msg204195
2013-11-23 20:25:39mark.dickinsonsetmessages: + msg204099
2013-11-23 20:23:31christian.heimessetmessages: + msg204098
2013-11-23 20:16:10mark.dickinsonsetmessages: + msg204096
2013-11-23 20:14:58mark.dickinsonsetmessages: + msg204095
2013-11-23 20:10:24christian.heimessetfiles: + float_overflow.patch

messages: + msg204091
2013-11-23 20:08:24mark.dickinsonsetmessages: + msg204090
2013-11-23 19:59:52mark.dickinsonsetfiles: - limit_dtoa_string.patch
2013-11-23 19:59:35mark.dickinsonsetfiles: + limit_dtoa_string.patch

messages: + msg204087
2013-11-23 19:51:22mark.dickinsonsetmessages: + msg204085
2013-11-23 19:49:03mark.dickinsonsetfiles: + limit_dtoa_string.patch

messages: + msg204082
2013-11-23 19:20:48mark.dickinsonsetmessages: + msg204078
2013-11-23 19:09:04mark.dickinsonsetassignee: mark.dickinson -> (no value)
2013-11-18 16:51:37jklothsetnosy: + jkloth
2013-11-18 15:38:11hayposetmessages: + msg203302
2013-11-18 15:34:14hayposetmessages: + msg203300
2013-11-18 15:29:10mark.dickinsonsetassignee: mark.dickinson
2013-11-18 15:28:57mark.dickinsonsetmessages: + msg203299
versions: + Python 2.7, Python 3.3
2013-11-18 12:32:45eric.smithsetnosy: + eric.smith
2013-11-18 11:03:57skrahsetnosy: + mark.dickinson
2013-11-18 09:23:20christian.heimescreate