msg203264 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-11-18 09:23 |
The patch silences three compiler warnings on Win64
|
msg203299 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-18 15:28 |
I don't much like the use of Py_SSIZE_T and Py_SAFE_DOWNCAST here: the dtoa.c code knows almost nothing about Python.h (aside from right at the top), and I'd like to keep it that way if possible.
And in fact I'd say that we *shouldn't* be silencing these warnings; rather, we should take them seriously. It looks to me as though it is possible for that conversion to overflow.
I'll try to take a look sometime soon. Adding 2.7 and 3.3, since the bug is present there, too.
|
msg203300 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-11-18 15:34 |
"I don't much like the use of Py_SSIZE_T and Py_SAFE_DOWNCAST here (...)
And in fact I'd say that we *shouldn't* be silencing these warnings; rather, we should take them seriously."
So size_t should be used instead ("s - s1" is unsigned, s >= s1).
|
msg203302 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-11-18 15:38 |
See also the "meta-issue" #9566: "Compilation warnings under x64 Windows".
|
msg204078 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-23 19:20 |
So I think the right approach here is to signal an error if the input string is longer than INT_MAX. The the safe downcast really *is* a safe downcast.
|
msg204082 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-23 19:49 |
I'm thinking of something like the attached patch, which limits nd to 2 billion, comfortably within the range of int on machines that Python's likely to meet. (If anyone's worried about machines with ints smaller than 32 bits, we could add a configure check for that.)
I don't think we can take the limit all the way to 2**31 - 1, since nd may be combined with the exponent (which is limited to less than 20000 in the current code), but a limit of 2 billion should be safe.
With this limit in place, it should then be safe to silence the warnings.
|
msg204085 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-23 19:51 |
> So size_t should be used instead ("s - s1" is unsigned, s >= s1).
That was one of my original thoughts, but the size_t would then creep into all the exponent calculations, requiring fairly substantial changes throughout the code. That's not something I'd want to consider on the maintenance branches, at least. I think just limiting the size of the input string is a simpler option.
|
msg204087 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-23 19:59 |
Whoops; half-baked patch. This one's better (but still not as tested as I'd like).
|
msg204090 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-23 20:08 |
Christian: I don't think this code is safe:
- nd0 = nd = s - s1;
+ tmp = s - s1;
+ nd0 = nd = Py_SAFE_DOWNCAST(tmp, Py_SSIZE_T, int);
The result of the Py_SAFE_DOWNCAST could be almost anything, and in particular could be negative. It would take a careful examination of the code to guarantee that a negative nd or nd0 won't lead to difficulties further down the algorithm. I think we need to raise an error if tmp is too large, *before* the downcast.
|
msg204091 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-11-23 20:10 |
Mark, I was talking about the attached "fix" (or more a workaround). You are right, my initial patch wasn't a good idea.
|
msg204095 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-23 20:14 |
Ah, sorry; I misunderstood.
Yep, that looks like a fine solution to me.
|
msg204096 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-23 20:16 |
And if you limit the length to a little less than INT_MAX, I don't think you need to modify dtoa.c at all.
|
msg204098 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-11-23 20:23 |
PyOS_string_to_double() is a public function. The check is needed in case 3rd party code uses it without extra checks.
|
msg204099 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-23 20:25 |
> PyOS_string_to_double() is a public function. The check is needed in case
> 3rd party code uses it without extra checks.
In that case, I'd suggest something more like my modifications to dtoa.c. By the time that you end up with a negative 'e', all sorts of other things have already gone wrong, possibly including undefined behaviour from signed integer overflow. IF we need to modify dtoa.c, the check should be earlier IMO.
|
msg204195 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-24 11:08 |
Picking this up again; a better patch is on the way.
|
msg204198 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-24 11:44 |
Updated patch. I've also taken the opportunity to increase MAX_ABS_EXP, which fixes the following long-standing bug (not that anyone has noticed it in the last 4 years, apparently):
>>> s = '0.' + '0'*19999 + '1e+20000'
>>> float(s) # should be 1.0
0.1
(The new limit on the number of digits means that the clipping of the exponent can no longer result in incorrect results from strtod.)
Unlike the previous patch, I'm reasonably confident in this one. :-) I'll plan to apply it soon, though it probably won't make 3.4 beta 1.
|
msg204199 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-24 11:51 |
Tweak: make MAX_DIGITS unsigned to avoid compiler complaints about comparing signed with unsigned.
|
msg204200 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-11-24 12:27 |
Could you add a test with more than more MAX_DIGITS (and maybe another
one with more than MAX_ABS_EXP) using @bigmemtest()?
|
msg204206 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-24 12:46 |
Done. I've tested this locally by reducing the limits and the sizes; can someone with access to a beefy machine can verify that the new test passes on that machine?
|
msg204211 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-24 12:59 |
And now with the @bigmemtest fixed to take a second parameter (thanks, Larry!).
|
msg204497 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-11-26 16:19 |
New changeset eac133e13bb5 by Mark Dickinson in branch '3.3':
Issue #19638: Raise ValueError instead of crashing when converting billion character strings to float.
http://hg.python.org/cpython/rev/eac133e13bb5
New changeset 9c7ab3e68243 by Mark Dickinson in branch 'default':
Issue #19638: Merge from 3.3
http://hg.python.org/cpython/rev/9c7ab3e68243
|
msg204498 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-11-26 16:38 |
New changeset 6d6a018a3bb0 by Mark Dickinson in branch '2.7':
Issue #19638: Raise ValueError instead of crashing when converting billion character strings to float.
http://hg.python.org/cpython/rev/6d6a018a3bb0
|
msg204499 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-26 16:39 |
Now fixed.
|
msg204500 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-11-26 16:45 |
I ran test_strtod using: ./python -m test -M 6G test_strtod. The test passed successfully on my 64-bit Linux box. But there is an issue with the announced memory limit: it looks closer to 6 GB than 5 GB.
$ ./python -m test -M 6G -v test_strtod
== CPython 3.4.0b1 (default:9c7ab3e68243, Nov 26 2013, 17:24:05) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)]
== Linux-3.9.4-200.fc18.x86_64-x86_64-with-fedora-18-Spherical_Cow little-endian
== hash algorithm: siphash24 64bit
== /home/haypo/prog/python/default/build/test_python_2997
Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0)
[1/1] test_strtod
test_bigcomp (test.test_strtod.StrtodTests) ... ok
test_boundaries (test.test_strtod.StrtodTests) ... ok
test_halfway_cases (test.test_strtod.StrtodTests) ... ok
test_large_exponents (test.test_strtod.StrtodTests) ... ok
test_oversized_digit_strings (test.test_strtod.StrtodTests) ...
... expected peak memory use: 5.0G
... process data size: 2.1G
(...)
... process data size: 4.1G
(...)
... process data size: 6.2G
(...)
... process data size: 2.1G
ok
test_parsing (test.test_strtod.StrtodTests) ... ok
test_particular (test.test_strtod.StrtodTests) ... ok
test_short_halfway_cases (test.test_strtod.StrtodTests) ... ok
test_underflow_boundary (test.test_strtod.StrtodTests) ... ok
----------------------------------------------------------------------
Ran 9 tests in 63.871s
OK
1 test OK.
|
msg204501 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-26 16:52 |
This caused test failures on 32-bit machines on 2.7, since they can't create these huge strings in the first place. I'm working on a fix.
> But there is an issue with the announced memory limit:
Thanks. I'll look into it.
|
msg204505 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-11-26 17:03 |
New changeset 4c523ea2b429 by Mark Dickinson in branch '2.7':
Issue #19638: Skip large digit string tests on 32-bit platforms.
http://hg.python.org/cpython/rev/4c523ea2b429
|
msg204525 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-11-26 20:02 |
Peak memory usage appears to be around 4 times the string length on Python 3.3, and around 3 times the string length on Python 3.4.
For 3.4, the peak occurs while formatting the exception message; presumably at that point we've got all three of (a) the original string, (b) the exception message string being built, and (c) some sort of temporary string used during formatting.
For the purposes of this issue, I'll update the constants in the bigmemtest decorators.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:53 | admin | set | github: 63837 |
2013-11-30 09:23:15 | mark.dickinson | set | status: open -> closed |
2013-11-26 20:02:33 | mark.dickinson | set | status: closed -> open
messages:
+ msg204525 |
2013-11-26 17:03:07 | python-dev | set | messages:
+ msg204505 |
2013-11-26 16:52:01 | mark.dickinson | set | messages:
+ msg204501 |
2013-11-26 16:45:43 | vstinner | set | messages:
+ msg204500 |
2013-11-26 16:39:55 | mark.dickinson | set | status: open -> closed resolution: fixed messages:
+ msg204499
stage: patch review -> resolved |
2013-11-26 16:38:42 | python-dev | set | messages:
+ msg204498 |
2013-11-26 16:19:58 | python-dev | set | nosy:
+ python-dev messages:
+ msg204497
|
2013-11-24 12:59:04 | mark.dickinson | set | files:
+ limit_dtoa_string_v5.patch
messages:
+ msg204211 |
2013-11-24 12:46:46 | mark.dickinson | set | files:
+ limit_dtoa_string_v4.patch
messages:
+ msg204206 |
2013-11-24 12:27:56 | vstinner | set | messages:
+ msg204200 |
2013-11-24 11:51:27 | mark.dickinson | set | files:
+ limit_dtoa_string_v3.patch
messages:
+ msg204199 |
2013-11-24 11:44:46 | mark.dickinson | set | files:
+ limit_dtoa_string_v2.patch
messages:
+ msg204198 |
2013-11-24 11:08:01 | mark.dickinson | set | assignee: mark.dickinson messages:
+ msg204195 |
2013-11-23 20:25:39 | mark.dickinson | set | messages:
+ msg204099 |
2013-11-23 20:23:31 | christian.heimes | set | messages:
+ msg204098 |
2013-11-23 20:16:10 | mark.dickinson | set | messages:
+ msg204096 |
2013-11-23 20:14:58 | mark.dickinson | set | messages:
+ msg204095 |
2013-11-23 20:10:24 | christian.heimes | set | files:
+ float_overflow.patch
messages:
+ msg204091 |
2013-11-23 20:08:24 | mark.dickinson | set | messages:
+ msg204090 |
2013-11-23 19:59:52 | mark.dickinson | set | files:
- limit_dtoa_string.patch |
2013-11-23 19:59:35 | mark.dickinson | set | files:
+ limit_dtoa_string.patch
messages:
+ msg204087 |
2013-11-23 19:51:22 | mark.dickinson | set | messages:
+ msg204085 |
2013-11-23 19:49:03 | mark.dickinson | set | files:
+ limit_dtoa_string.patch
messages:
+ msg204082 |
2013-11-23 19:20:48 | mark.dickinson | set | messages:
+ msg204078 |
2013-11-23 19:09:04 | mark.dickinson | set | assignee: mark.dickinson -> (no value) |
2013-11-18 16:51:37 | jkloth | set | nosy:
+ jkloth
|
2013-11-18 15:38:11 | vstinner | set | messages:
+ msg203302 |
2013-11-18 15:34:14 | vstinner | set | messages:
+ msg203300 |
2013-11-18 15:29:10 | mark.dickinson | set | assignee: mark.dickinson |
2013-11-18 15:28:57 | mark.dickinson | set | messages:
+ msg203299 versions:
+ Python 2.7, Python 3.3 |
2013-11-18 12:32:45 | eric.smith | set | nosy:
+ eric.smith
|
2013-11-18 11:03:57 | skrah | set | nosy:
+ mark.dickinson
|
2013-11-18 09:23:20 | christian.heimes | create | |