classification
Title: random.choice IndexError due to double-rounding
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Serge Anuchin, mark.dickinson, pitrou, r.david.murray, rhettinger, serhiy.storchaka, skrah, steven.daprano, tim.peters, vstinner
Priority: normal Keywords: patch

Created on 2015-07-05 03:04 by steven.daprano, last changed 2018-06-27 09:37 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
choice_timings.txt rhettinger, 2015-07-11 23:45
shuffle_timings.txt rhettinger, 2015-07-11 23:55
handle_double_rounding2.diff rhettinger, 2015-07-12 06:26
handle_double_rounding3.diff rhettinger, 2015-07-12 07:50
sample_timings.txt serhiy.storchaka, 2015-07-16 19:34 review
random_double_round.diff rhettinger, 2018-06-25 01:26 For Python 3.6 and 3.7
Pull Requests
URL Status Linked Edit
PR 7954 merged rhettinger, 2018-06-27 07:38
PR 7955 merged miss-islington, 2018-06-27 08:08
PR 7956 merged miss-islington, 2018-06-27 08:09
Messages (61)
msg246291 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2015-07-05 03:03
While investigating issue 24546, I discovered that at least some versions of Python on 32-bit Linux have a double-rounding bug in multiplication which may lead to an unexpected IndexError in random.choice.

See http://bugs.python.org/issue24546 for additional discussion, but the short version is that due to double-rounding, if random.random returns (1. - 2.**-53), int(i * random.random()) may return 1 for some i >= 2049, and random.choice will then evaluate seq[len(seq)], raising IndexError.
msg246297 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-07-05 09:24
I've finally gotten hold of a gcc (5.1.0) that produces the unwanted code.

For a *different* test case, even the -O0 and -O2 results differ, since
with -O2 gcc uses mpfr (I think!) for constant folding, which produces
the correct result.


Using -mfpmath=sse seems to fix all test cases that I've tried.
msg246303 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-07-05 10:36
There is a simple fix. If the result is invalid, drop it and generate a new
number. It should keep the distribution uniform.

Another fix is to only use integers. Why do we use float in choice by the
way?
msg246317 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2015-07-05 14:28
Note that this affects other Random methods, too.  There are several places in the source where something of the form `int(i * random.random())` is used to select an integer in the half-open range [0, i).

And a nitpick: it's a stretch to describe double-rounding on multiplication (or any other arithmetic operation) as a bug: Python-the-language makes no promises about adhering to IEEE 754 arithmetic rule.  (In fact, it makes no promises about using IEEE 745 formats in the first place.)
msg246318 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2015-07-05 14:29
> IEEE 745

IEEE 754.  Stupid fingers.
msg246336 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2015-07-05 19:42
I suppose the simplest "fix" would be to replace relevant instances of

int(random() * N)

with

min(int(random() * N), N-1)

That would be robust against many kinds of arithmetic quirks, and ensure that platforms with and without such quirks would, if forced to the same internal random state, yield the same result when random() happens to return 1. - 2.**-53.

Victor, what - precisely - do you have in mind with "only use integers"?  The current method was used because it's simple, efficient, and obvious.  Python 3.4.3 appears to (usually) use a different method, though, relying on C code to generate random bytes without going through floats.
msg246352 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-07-06 10:47
> Python-the-language makes no promises about adhering to IEEE 754 arithmetic rule.

Still, I think we could switch to -mfpmath=sse, which would at least
guarantee consistent behavior across different gcc versions.
msg246363 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2015-07-06 13:57
> Still, I think we could switch to -mfpmath=sse, which would at least
guarantee consistent behavior across different gcc versions.

... which would fix fsum on 32-bit Linux, too.  (I thought there was an open issue for this, but now I can't find it.)
msg246366 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2015-07-06 14:44
Mark, closest I could find to a substantive SSE-vs-fsum report is here, but it was closed (because the fsum tests were changed to ignore the problem ;-) ):

http://bugs.python.org/issue5593
msg246484 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-07-09 11:12
Qt had a similar initiative regarding -msse2 -mfpmath:

http://lists.qt-project.org/pipermail/development/2013-December/014530.html


They say that also Visual Studio 2012 has switched to sse2 by default.

The only problem are the Linux distributions that are stuck in i386 land
and who would probably disable the flags in their builds.
msg246530 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-07-09 23:09
> I suppose the simplest "fix" would be to replace relevant instances of
> 
> int(random() * N)
> 
> with
> 
> min(int(random() * N), N-1)

That sounds simple and generic. It skews the distribution a tiny little bit, but it doesn't sound significant (perhaps Mark would disagree).
msg246531 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2015-07-09 23:42
> It skews the distribution a tiny little bit, ...

But it doesn't - that's the point ;-)

If double-rounding doesn't occur at all (which appears to be the case on most platforms), absolutely nothing changes (because min(int(random() * N), N-1) == int(random() * N) on such boxes).

If double-rounding does occur, double-rounding itself may change results "all over the place", and I haven't tried to analyze what effects that has on the distribution.  In the comparative handful of cases where int(random() * N) == N on such boxes, clamping that back to N-1 just yields the same result we would have gotten on a box that didn't do double-rounding.
msg246532 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-07-09 23:44
Again, why not using only integers?

Pseudo-code to only use integers:

def randint(a, b):
    num = b - a
    if not num:
        return a
    nbits = (num + 1).bit_length()
    while True:
        x = random.getrandbits(nbits)
        if x <= num:
            break
    return a + x

(It doesn't handle negative numbers nor empty ranges.)
msg246534 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2015-07-09 23:59
Victor, if people want to use getrandbits(), we should backport the Python3 code, not reinvent it from scratch.

Note too Mark's comment:  "There are several places in the source where something of the form `int(i * random.random())` is used".  The `min()` trick is relatively easy to apply via simple, local editing of such places.  

Your "It doesn't handle negative numbers nor empty ranges" should be a hint about what a pain it is to rewrite everything wholesale to use a _fundamentally_ different method.
msg246536 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-07-10 00:13
> Victor, if people want to use getrandbits(), we should backport the Python3 code, not reinvent it from scratch.

Sorry, I don't understand your comment. Backport what? getrandbits() is available on Python 2 and Python 3, for Random and SystemRandom.

I propose to rewrite Random.randint() in random.py.

> Your "It doesn't handle negative numbers nor empty ranges" should be a hint about what a pain it is to rewrite everything wholesale to use a _fundamentally_ different method.

In fact, I didn't check if the my code works for negative numbers. A few more lines should be enough to handle them. I just wanted to show a short pseudo-code to discuss the idea.

I don't understand your point on the pain. It's easy to replace int(i * random.random()) with randint(0, i) (sorry, I'm too lazy to check if it should be i, i-1 or i+1 :-)).

The Glib library used floats in their g_rand_int_range() method, but the function now uses only integers. The GSL library also uses integers. I'm surprised that Python uses floats to generate random numbers. For me, they are less reliable than integers, I always fear a bias caused by complex rounding and corner cases.

Do other libraries and/or programming languages use float to generate a random integer?
msg246537 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2015-07-10 00:25
Victor, don't ask me, look at the code:  the random.choice() implementations in Python 2 and Python 3 have approximately nothing in common, and "the bug" here should already be impossible in Python 3 (but I can't check that, because I don't have a platform that does double-rounding).  I already pointed out (in my 2015-07-05 15:42 note) that Python 3 uses "only integers" in most cases.

"It's easy to replace int(i * random.random()) with randint(0, i)".  You didn't say that was your _intent_, and I didn't guess it.  In that case you also have to weigh in the considerable extra expense of adding another Python-level function call.  The speed of these things is important, and it's annoying enough just to add the expense of calling the C-level `min()`.
msg246548 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-07-10 08:15
> I propose to rewrite Random.randint() in random.py.

If that would give a different sequence of random numbers, I'm not sure that's acceptable in a bugfix release. Raymond can shed a light.
msg246550 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-07-10 08:35
As I wrote, glib switch from floats to integers to generate random
numbers. To provide reproductible "random" numbers, an environment
variable was added to select the old PRNG.

Anyway, if we modify random.py, the generated numbers should be different, no?

To me, it's always a strange concept of having reproductible "random"
numbers :-) But I understand the use case. For example, to run a test
suite, we want randomization and to be able to replay a test suite in
the same order.
msg246551 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-07-10 09:02
> As I wrote, glib switch from floats to integers to generate random
> numbers.

And as I wrote, this would be accepted in a feature release. Not necessarily a bugfix release.
msg246565 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2015-07-10 14:49
> Anyway, if we modify random.py, the generated
> numbers should be different, no?

Not in a bugfix release.  The `min()` trick changes no results whatsoever on a box that doesn't do double-rounding.

On a box that does do double-rounding, the only difference in results is that the `min()` trick stops a nasty exception in a relative handful of cases (& makes no difference to any case in which that exception isn't raised).

The sequence of results may be different on platforms with double-rounding and without double-rounding, but that's always been true.  The `min()` trick changes nothing about that either, except to prevent unintended exceptions on double-rounding boxes.

Note that switching to use SSE2 instead also changes nothing on boxes that don't do double-rounding.  It would change some results (beyond _just_ stopping bogus exceptions) on boxes that do double-rounding.
msg246566 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-07-10 14:53
Ok, it looks like most people are in favor of min(). Can anyone propose a patch?
msg246581 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-10 19:46
There are other affected methods: randrange(), randint(), shuffle(), sample().
msg246615 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-07-11 21:57
[Antoine]
> If that would give a different sequence of random numbers, 
> I'm not sure that's acceptable in a bugfix release. Raymond
> can shed a light.

You're right.  It is not acceptable to give a different sequence of random numbers within a bugfix release.  

[Victor]
> Ok, it looks like most people are in favor of min(). 
> Can anyone propose a patch?

I will work up a patch, but I can't say that I feel good about making everyone pay a price for a problem that almost no one ever has.
msg246616 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-07-11 22:52
FWIW, here are some variants (with differing degrees of brevity, clarity, and performance):

   def choice(self, seq):
        """Choose a random element from a non-empty sequence."""
        n = len(seq)
        i = int(self.random() * n)
        if i == n:
            i = n - 1
        return seq[i]

    def choice(self, seq):
        """Choose a random element from a non-empty sequence."""
        try:
            return seq[int(self.random() * len(seq))]                                                                                                         
        except IndexError:
            return seq[-1]

    def choice(self, seq):
        """Choose a random element from a non-empty sequence."""
        n = len(seq)
        return seq[min(int(self.random() * n), n-1)]
msg246623 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2015-07-12 00:58
[Raymond]
> I can't say that I feel good about making everyone pay
> a price for a problem that almost no one ever has.

As far as I know, nobody has ever had the problem.  But if we know a bug exists, I think it's at best highly dubious to wait for a poor user to get bitten by it.  Our bugs aren't their fault, and we have no idea in advance how much our bugs may cost them.

Note that there's no need to change anything on boxes without double-rounding, and those appear to be the majority of platforms now, and "should" eventually become all platforms as people migrate to 64-bit platforms.  So, e.g.,

    if double_rounding_happens_on_this_box:
        def choice(...):
            # fiddled code
    else:
        def choice(...):
            # current code just indented a level

Then most platforms pay nothing beyond a single import-time test.  Note that there's already code to detect double-rounding in test_math.py, but in that context not to _fix_ a problem but to ignore fsum() tests that fail in its presence.

And just to be annoying ;-) , here's another timing variation:

    i = int(random() * n)
    return seq[i - (i == n)]
msg246628 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-07-12 03:41
See the attached timings.  The performance hit isn't bad and the code's beauty isn't marred terribly.   Yes, we'll fix it, but no I don't have to feel good about it ;-)
msg246630 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-12 04:04
For shuffle I would write "if j < i".

I think 3.x should be fixed as well as 2.7. And I like Tim's suggestion about import-time patching.
msg246636 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-12 06:57
Unfortunately a link to Rietveld is not available for review and inlined comments.

In sample() the selected set can be initialized to {n} to avoid additional checks in the loop for large population. In the branch for small population, non-empty pool can be extended ("pool.append(pool[-1])") to avoid additional check in the loop.

In choice() I would write the condition as "i == n > 0" to avoid indexing with negative index. Custom collection can has non-standard behavior with negative indices. This doesn't add additional cost in normal case.
msg246639 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-07-12 07:50
> In sample() the selected set can be initialized to {n}

I originally used the {n} approach but it was less clear and it lead to a re-selection rather than undoing the rounding.  That would change the output.

> I like Tim's suggestion about import-time patching.

Sorry, but there's a limit to how much I'm willing to garbage-up the code over this issue.

> In choice() I would write the condition as "i == n > 0" to 
> avoid indexing with negative index

I'll write that as "'i == n and n > 0" which reads better and runs faster in the common case (look at the disassembly of each).

Attaching a revised patch.

> here's another timing variation:
>
>    i = int(random() * n)
>    return seq[i - (i == n)]

This ran a little slower than the conditional approach.
msg246641 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-12 08:30
> I originally used the {n} approach but it was less clear and it lead to a re-selection rather than undoing the rounding.  That would change the output.

In normal case j is never equal to n. In very rare cases on platforms with double rounding the unpatched code generates an IndexError, and changing this is the purpose of the patch. Re-selection is so good as undoing the rounding.

Please also note that double rounding causes an error not only when int(random() * n) == n. If random() returns 0.5 - 2**-54 and n = 2029*2, the result is different with double rounding and without it. This causes small bias even if IndexError is not raised.
msg246642 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2015-07-12 09:05
Serhiy: there's already a small bias inherent in using `int(random() * n)`, regardless of double rounding, since in general `random()` gives 2**53 equally likely (modulo deficiencies in the source generator) outcomes and `n` need not be a divisor of `2**53`.  I don't think the double rounding is going to make that bias noticeably worse.

See issue 23974, which was resolved as "wont fix".
msg246650 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-12 13:31
I'm aware of issue 23974. But I want to say that double rounding causes not only bias from ideal distribution, but a difference between platforms. The result of sample() (with the same seed) can vary between platforms.
msg246656 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-07-12 17:40
The double rounding problem even occurs between different versions
of gcc. I think ultimately we should put our foot down and ship with
sse2 enabled (not possible for 2.7 though, unless an LTS-PEP emerges).

If distributions want to break that, it's their business. :)
The gcc manual is quite upfront about the issue and really
recommends sse2 for reproducible results.
msg246657 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-07-12 17:42
> I think ultimately we should put our foot down and ship with
> sse2 enabled

That's good for x86 but that wouldn't solve the issue if it exists on other archs.
I trust Raymond to come up with an acceptable solution; though I think it would be good to add a comment linking to this issue, in any case.
msg246664 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2015-07-12 19:36
I have a question about this new snippet in choice():

+        if i == n and n > 0:
+            i = n - 1

What's the purpose of the "and n > 0" clause?  Without it, if i == n == 0 then i will be set to -1, which is just as good as 0 for the purpose of raising IndexError (seq[any_int] raises IndexError when seq is empty).
msg246668 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2015-07-12 19:52
Hmm.  Looks like the answer to my question came before, via "Custom collection can has non-standard behavior with negative indices."  Really?  We're worried about a custom collection that assigns some crazy-ass meaning to a negative index applied to an empty sequence?  Like what?  I don't really care about supporting insane sequences ;-)
msg246674 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2015-07-12 21:42
[Serhiy Storchaka]
> ... I want to say that double rounding causes not
> only bias from ideal distribution, but a difference
> between platforms

That's so, but not too surprising.  On those platforms users will see differences between "primitive" floating addition and multiplication (etc) operations too.

There's a tradeoff here.  In Python's earliest days, the bias was strongly in favor of letting platform C quirks shine through.  Partly to make Python-C interoperability easiest, but at least as much because hiding cross-platform fp quirks is a major effort and even Guido had only finite time ;-)  As the years go by, the bias has been switching in favor of hiding platform quirks.  I hope Python 3 continues in that vein, but Python 2 probably needs to stay pretty much where it is.
msg246815 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-16 19:34
See the attached timings for sample().  Patched sample2 is at worst 4% slower than original sample, and optimized sample3 is between sample and sample2. In any case the difference is pretty small, so I'm good with Raymond's variant if it looks better for him.

Please note that 3.x also needs the patch.
msg267785 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-08 05:16
Ping.
msg267874 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-06-08 17:38
Will get to it shortly.
msg275594 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-09-10 07:54
A similar bug affects the new `choices` method: in the `choices` source, if `random() * total` happens to round up to `total`, the bisect call returns an out-of-range index.

There are two ways that that could happen: (1) double rounding, as in this issue (which will occur very rarely and is hard to reproduce), and (2) `total` being subnormal (easy to reproduce, but unlikely to occur in practice).

>>> from random import choices
>>> choices(500, population=[1, 2], weights=[1e-323, 1e-323])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mdickinson/Python/cpython/Lib/random.py", line 360, in choices
    return [population[bisect(cum_weights, random() * total)] for i in range(k)]
  File "/Users/mdickinson/Python/cpython/Lib/random.py", line 360, in <listcomp>
    return [population[bisect(cum_weights, random() * total)] for i in range(k)]
IndexError: list index out of range
msg277199 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-22 06:43
Ping.
msg277200 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-22 06:48
> I'll write that as "'i == n and n > 0" which reads better and runs faster in the common case (look at the disassembly of each).

issue27236
msg304734 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-22 08:14
Do you mind to create a pull request Raymond?
msg320392 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-06-25 01:26
Attaching a patch for 3.6 and 3.7.  Haven't had a chance to run timings yet.  Since it involves adding code to the inner loop, presumably it will have a noticeable affect on speed.  Am not sure how to test this (I don't seem to be able to reproduce the OP's problem on any system I have access to).  I'm not even sure whether the double rounding still exists any extant platform and if so whether we could just turn it off as part of the configuration.

At any rate, I don't think the code in 2.7 should change at all.  That tool is almost at end-of-life and we should be very conservative with it at this point.
msg320394 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-06-25 01:38
There are a couple bug reports here that have been open for years, and it's about time we closed them.

My stance:  if any platform still exists on which "double rounding" is still a potential problem, Python _configuration_ should be changed to disable double rounding on such platforms (there's usually a C compiler flag that can make this happen, but even if there isn't a couple lines of inline assembler could be added at Python startup to set the Pentium's FPU "precision control" bits to "round to 53 bits" mode).

`random` is a red herring here!  We don't want gratuitously different results across 754 boxes in any operations.
msg320423 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-06-25 13:30
> Python _configuration_ should be changed to disable double rounding on such platforms

That sounds good to me, with the nitpick that setting an x87 FPU to 53-bit precision still doesn't avoid potential double rounding on underflow, and I'm not aware of any easy way to "fix" x87-using code to avoid double rounding on underflow.

That said, the vast majority of practical applications, including random.choice, math.fsum, dtoa.c, aren't going to be affected by double rounding on underflow, so it should only rear its ugly head in corner cases.

If we do this, can we also persuade Guido to Pronounce that Python implementations assume IEEE 754 format and semantics for floating-point?
msg320434 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-06-25 19:45
[Tim]
> a couple lines of inline assembler could be added at Python startup to
> set the Pentium's FPU "precision control" bits to "round to 53 bits" mode

On second thoughts, I don't think this will work; or at least, not easily. The libm on such a platform may expect the FPU to be in 64-bit precision mode. So we'd potentially need to switch the precision before calling any math library function and then switch it back again afterwards. And then there may be 3rd party libraries that need the same treatment.

On the existence of platforms with double rounding, last time I checked, 32-bit Linux was the most obvious offender on Intel hardware. Windows used to use the x87 but set the precision to 53 bits, and I think newer versions may well use SSE2 in preference to x87. macOS has used SSE2 for a good number of releases now.

And indeed, I just tested Python 3.5.2 on Ubuntu 16.04.4 LTS 32-bit (running in a VM on a not-too-ancient MacBook Pro), and it exhibits double rounding:

>>> (1e16 + 2.9999).hex()
'0x1.1c37937e08002p+53'
>>> (1/2731).hex()
'0x1.7ff4005ffd002p-12'
>>> 1/(1 - 2**-53)
1.0

Expected results without double rounding:

>>> (1e16 + 2.9999).hex()
'0x1.1c37937e08001p+53'
>>> (1/2731).hex()
'0x1.7ff4005ffd001p-12'
>>> 1/(1 - 2**-53)
1.0000000000000002
msg320443 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-06-25 21:19
Mark, do you believe that 32-bit Linux uses a different libm?  One that fails if, e.g., SSE2 were used instead?  I don't know, but I'd sure be surprised it if did.  Very surprised - compilers have been notoriously unpredictable in exactly when and where extended precision gets used in compiled code, so sane code (outside of assembler) doesn't rely on it.

I'd be similarly surprised if hypothetical 3rd party libraries _assuming_ extended arithmetic existed.  Any sane person writing such a library would take it upon themselves to force extended precision on entry (if that's what they wanted), and restore the original FPU control state on exit.

I'm no more worried about this than, say, worried about that some dumbass platform may set the rounding mode to "to plus infinity" by default - and I wouldn't hesitate there either for Python startup to force it to nearest/even rounding.  Sure, there _may_ be some library out there for such a platform that assumes +Inf rounding, but I fundamentally don't care ;-)

In any case, `random` remains a red herring.  There are potential gratuitous numeric differences all over the place.
msg320474 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-06-26 07:58
[Tim]
> Mark, do you believe that 32-bit Linux uses a different libm?

I don't know. But I'd expect to see accuracy losses as a result of forcing 53-bit precision, and I wouldn't be totally surprised to see other more catastrophic failure modes (infinite iterations, for example). But I don't have any concrete information on the subject.

There's this from (an unofficial mirror of) the glibc x86 sources: https://github.com/bminor/glibc/blob/43b1048ab9418e902aac8c834a7a9a88c501620a/sysdeps/x86/fpu_control.h#L69

> #define _FPU_EXTENDED 0x300	/* libm requires double extended precision.  */

But it's not clear to me whether that comment is intended to be a statement of fact, or indeed whether it's still true or just an ancient comment that should have been culled long ago.

Like I said, I just don't know, but I'd be worried about the law of unintended consequences.

Of course, with my Enthought hat on, I don't care, because we stopped worrying about 32-bit Linux long ago... :-)
msg320479 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-06-26 08:15
Some relevant notes from Vincent Lefèvre here: http://www.vinc17.net/research/extended.en.html
msg320502 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-06-26 16:21
Mark, ya, I agree it's most prudent to let sleeping dogs lie.

In the one "real" complaint we got (issue 24546) the cause was never determined - but double rounding was ruled out in that specific case, and no _plausible_ cause was identified (short of transient HW error).

As the years go by, my interest in "doing something" about a bug nobody has apparently encountered in the wild, and remains theoretically possible on an ever-diminishing number of platforms ("32-bit Linux not using faster SSE arithmetic" seems to be the universe), has faded somewhat  ;-)

So I'd close as "won't fix".  If I had to do something, I'd add different implementations under an "if this box does double-rounding" import-time test so nobody else has to pay for it.

But I'm cool with whatever Raymond settles for.
msg320511 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-06-26 21:28
[Mark]
> If we do this, can we also persuade Guido to Pronounce that
> Python implementations assume IEEE 754 format and semantics
> for floating-point?

On its own, I don't think a change to force 53-bit precision _on_ 754 boxes would justify that.  That's just changing oddball boxes that already implement 754 to do so in the way most other 754 boxes currently do it (and in the way all future 754 platforms are expected to do it).

To go beyond that seems to require addressing bigger questions, like:

- _Are_ there currently, or expected to be, non-754 Python platforms?  There have been in the past.

- What does "754 semantics" actually mean?  For example, as these bug reports point out, 754 is silent about whether the possibility of double rounding - even in primitive atomic operations - is "a bug" or "a feature" :-(  Would we insist that Python may not work on platforms that flush denorms to 0?  Etc.

- What, concretely, would change in CPython if the Pronouncement were made?  As I recall, we inherit most fp behaviors from the platform C compiler and libraries.  That is, how would the Pronouncement benefit CPython?  For example, do we have significant blobs of code that are trying to cater to non-754 boxes?

Not that I'm asking for answers here.  If there's something to be pursued here, I'd suggest it's better as a topic in python-ideas or python-dev.
msg320516 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-26 21:55
The error goes avoid if we stop using floating point number to generate random integers. Can't we drop/deprecate support of RNG only providing getrandom() but not getrandbits()? I'm talking about the master branch.

Things are simpler when you only manipulate integers...
msg320519 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-06-26 22:17
Victor, look at Raymond's patch.  In Python 3, `randrange()` and friends already use the all-integer `getrandbits()`.  He's changing three other lines, where some variant of `int(random() * someinteger)` is being used in an inner loop for speed.

Presumably the

            return int(random() * n)

line in the `n >= maxsize` branch of `_randbelow_without_getrandbits()` should also get clamped.
msg320521 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-26 22:32
Let me see random_double_round.diff.

In master:

    def shuffle(self, x, random=None):
        """Shuffle list x in place, and return None.

        Optional argument random is a 0-argument function returning a
        random float in [0.0, 1.0); if it is the default None, the
        standard random.random will be used.

        """

This method has a weird API. What is the point of passing a random function, whereas shuffle() is already a method of an object which generates random numbers?! The bug only affects the case when random is set. I proposed to deprecate this argument and remove it later.

                return [population[_min(_int(random() * total), total)]
                        for i in range(k)]

Why not using _randbelow() here? For speed according to:

> some variant of `int(random() * someinteger)` is being used in an inner loop for speed.

Why not optimizing _randbelow() in this case? Like implementing it in C?
msg320526 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-06-27 00:05
[Victor]
> This method [shuffle()] has a weird API. What is
> the point of passing a random function,
> ... I proposed to deprecate this argument and remove it later.

I don't care here.  This is a bug report.  Making backward-incompatible API changes does nothing to repair existing programs.

> ...
>  return [population[_min(_int(random() * total), total)]
>                        for i in range(k)]
>
> Why not using _randbelow() here? For speed

Yup.


> Why not optimizing _randbelow() in this case? Like
> implementing it in C?

This is a bug report, not an invitation to redesign APIs and complicate implementations in non-trivial ways.  If you're keen on that, I suggest opening a different issue and writing the patch.

I'd personally oppose the API change (pointless thrashing), but it's possible the speed benefit of C acceleration of _randbelow() would outweigh the considerable downsides of moving code from Python source to C source.

But none of that is needed to "fix the bug" at hand, if - indeed - people still think it's worth fixing at all.  In the years that have passed since this started, I no longer do.
msg320549 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-06-27 07:07
I concur with Tim.  Marking the original bug as "won't fix" for the reasons that he listed.

In a separate PR, I'll modify the last line random.choices() to be "return [population[bisect(cum_weights, random() * total, 0, hi)] for i in range(k)]".  That will make the function more robust (handling the exotic case with subnormal weights).
msg320553 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-06-27 08:08
New changeset ddf7171911e117aa7ad4b0f9ded4f0c3a4ca0fec by Raymond Hettinger in branch 'master':
bpo-24567: Random subnormal.diff (#7954)
https://github.com/python/cpython/commit/ddf7171911e117aa7ad4b0f9ded4f0c3a4ca0fec
msg320555 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-06-27 08:53
New changeset 0eaf7b975bd61169a8d78945d2d12f23299f61a8 by Raymond Hettinger (Miss Islington (bot)) in branch '3.7':
bpo-24567: Random subnormal.diff (GH-7954) (GH-7955)
https://github.com/python/cpython/commit/0eaf7b975bd61169a8d78945d2d12f23299f61a8
msg320563 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-06-27 09:37
New changeset acda5ea916f4233ab90ca7b4d28af735aa962af3 by Raymond Hettinger (Miss Islington (bot)) in branch '3.6':
bpo-24567: Random subnormal.diff (GH-7954) (GH-7956)
https://github.com/python/cpython/commit/acda5ea916f4233ab90ca7b4d28af735aa962af3
History
Date User Action Args
2018-06-27 09:37:20rhettingersetmessages: + msg320563
2018-06-27 08:53:13rhettingersetmessages: + msg320555
2018-06-27 08:09:42miss-islingtonsetpull_requests: + pull_request7563
2018-06-27 08:08:54miss-islingtonsetpull_requests: + pull_request7562
2018-06-27 08:08:40rhettingersetmessages: + msg320553
2018-06-27 07:38:48rhettingersetpull_requests: + pull_request7561
2018-06-27 07:07:38rhettingersetstatus: open -> closed
resolution: wont fix
messages: + msg320549

stage: needs patch -> resolved
2018-06-27 00:05:46tim.peterssetmessages: + msg320526
2018-06-26 22:32:21vstinnersetmessages: + msg320521
2018-06-26 22:17:52tim.peterssetmessages: + msg320519
2018-06-26 21:55:07vstinnersetmessages: + msg320516
2018-06-26 21:28:11tim.peterssetmessages: + msg320511
2018-06-26 16:21:48tim.peterssetmessages: + msg320502
2018-06-26 08:15:16mark.dickinsonsetmessages: + msg320479
2018-06-26 07:58:46mark.dickinsonsetmessages: + msg320474
2018-06-25 21:19:27tim.peterssetmessages: + msg320443
2018-06-25 19:45:22mark.dickinsonsetmessages: + msg320434
2018-06-25 13:30:03mark.dickinsonsetmessages: + msg320423
2018-06-25 01:38:32tim.peterssetmessages: + msg320394
2018-06-25 01:26:06rhettingersetfiles: + random_double_round.diff

messages: + msg320392
2017-10-22 08:14:32serhiy.storchakasetmessages: + msg304734
components: + Library (Lib)
versions: - Python 3.5
2016-09-22 06:48:31serhiy.storchakasetmessages: + msg277200
2016-09-22 06:43:32serhiy.storchakasetmessages: + msg277199
versions: + Python 3.7, - Python 3.4
2016-09-10 07:54:05mark.dickinsonsetmessages: + msg275594
2016-06-08 17:38:03rhettingersetmessages: + msg267874
2016-06-08 05:16:23serhiy.storchakasetmessages: + msg267785
2015-07-16 19:34:46serhiy.storchakasetfiles: + sample_timings.txt

messages: + msg246815
versions: + Python 3.4, Python 3.5, Python 3.6
2015-07-12 21:42:44tim.peterssetmessages: + msg246674
2015-07-12 19:52:35tim.peterssetmessages: + msg246668
2015-07-12 19:36:54tim.peterssetmessages: + msg246664
2015-07-12 17:42:45pitrousetmessages: + msg246657
2015-07-12 17:40:47skrahsetmessages: + msg246656
2015-07-12 13:31:39serhiy.storchakasetmessages: + msg246650
2015-07-12 09:05:49mark.dickinsonsetmessages: + msg246642
2015-07-12 08:30:27serhiy.storchakasetmessages: + msg246641
2015-07-12 07:50:37rhettingersetfiles: + handle_double_rounding3.diff

messages: + msg246639
2015-07-12 06:57:23serhiy.storchakasetmessages: + msg246636
2015-07-12 06:27:07rhettingersetfiles: - handle_double_rounding.diff
2015-07-12 06:26:59rhettingersetfiles: + handle_double_rounding2.diff
2015-07-12 06:10:11rhettingersetfiles: + handle_double_rounding.diff
keywords: + patch
2015-07-12 04:04:31serhiy.storchakasetmessages: + msg246630
2015-07-12 03:41:40rhettingersetmessages: + msg246628
2015-07-12 00:58:04tim.peterssetmessages: + msg246623
2015-07-11 23:55:57rhettingersetfiles: + shuffle_timings.txt
2015-07-11 23:45:05rhettingersetfiles: + choice_timings.txt
2015-07-11 22:52:51rhettingersetassignee: rhettinger
2015-07-11 22:52:36rhettingersetmessages: + msg246616
2015-07-11 21:57:23rhettingersetmessages: + msg246615
2015-07-10 19:46:32serhiy.storchakasetmessages: + msg246581
2015-07-10 14:53:02vstinnersetmessages: + msg246566
2015-07-10 14:49:57tim.peterssetmessages: + msg246565
2015-07-10 09:02:48pitrousetmessages: + msg246551
2015-07-10 08:35:36vstinnersetmessages: + msg246550
2015-07-10 08:15:58pitrousetstage: needs patch
messages: + msg246548
versions: - Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6
2015-07-10 00:25:05tim.peterssetmessages: + msg246537
2015-07-10 00:13:31vstinnersetmessages: + msg246536
2015-07-09 23:59:04tim.peterssetmessages: + msg246534
2015-07-09 23:44:35vstinnersetmessages: + msg246532
2015-07-09 23:42:22tim.peterssetmessages: + msg246531
2015-07-09 23:09:18pitrousetnosy: + pitrou
messages: + msg246530
2015-07-09 11:12:19skrahsetmessages: + msg246484
2015-07-06 14:44:31tim.peterssetmessages: + msg246366
2015-07-06 13:57:21mark.dickinsonsetmessages: + msg246363
2015-07-06 10:47:48skrahsetmessages: + msg246352
2015-07-05 19:42:13tim.peterssetmessages: + msg246336
2015-07-05 14:29:13mark.dickinsonsetmessages: + msg246318
2015-07-05 14:28:29mark.dickinsonsetmessages: + msg246317
2015-07-05 10:36:51vstinnersetmessages: + msg246303
2015-07-05 09:24:11skrahsetmessages: + msg246297
2015-07-05 03:04:00steven.dapranocreate