Author mark.dickinson
Recipients mark.dickinson, rhettinger, steven.daprano, tim.peters
Date 2021-11-26.18:22:54
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
Since I've failed to find a coherent statement and proof of the general principle I articulated anywhere online, I've included one below. (To be absolutely clear, the principle is not new - it's very well known, but oddly hard to find written down anywhere.)


Setup: introducing jots

*Notation.* R is the set of real numbers, Z is the set of integers, operators
like *, / and ^ have their mathematical interpretations, not Python ones.

Fix a precision p > 0 and an IEEE 754 binary floating-point format with
precision p; write F for the set of representable values in that format,
including zeros and infinities. (We don't need NaNs, but feel free to include
them if you want to.)

Let rnd : R -> F be the rounding function corresponding to any of the standard
IEEE 754 rounding modes. We're not ignoring overflow and underflow here: rnd is
assumed to round tiny values to +/-0 and large values to +/-infinity as normal.
(We only really care about round-ties-to-even, but all of the below is
perfectly general.)

*Definition.* For the given fixed precision p, a *jot* is a subinterval of the
positive reals of the form (m 2^e, (m+1) 2^e) for some integers m and e, with
m satisfying 2^p <= m < 2^(p+1).

This is a definition-of-convenience, invented purely for the purposes of this
proof. (And yes, the name is silly. Suggestions for a better name to replace
"jot" are welcome. Naming things is hard.)

We've chosen the size of a jot so that between each consecutive pair a and b of
positive normal floats in F, there are exactly two jots: one spanning from a to
the midpoint (a+b)/2, and another spanning from (a+b)/2 to b. (Since jots are
open, the midpoint itself and the floats a and b don't belong to any jot.)

Now here's the key point: for values that aren't exactly representable and
aren't perfect midpoints, the standard rounding modes, whether directed or
round-to-nearest, only ever care about which side of the midpoint the value to
be rounded lies. In other words:

*Observation.* If x and y belong to the same jot, then rnd(x) = rnd(y).

This is the point of jots: they represent the wiggle-room that we have to
perturb a real number without affecting the way that it rounds under any
of the standard rounding modes.

*Note.* Between any two consecutive *subnormal* values, we have 4 or more
jots, and above the maximum representable float we have infinitely many, but
the observation that rnd is constant on jots remains true at both ends of the
spectrum. Also note that jots, as defined above, don't cover the negative
reals, but we don't need them to for what follows.

Here's a lemma that we'll need shortly.

*Lemma.* Suppose that I is an open interval of the form (m 2^e, (m+1) 2^e)
for some integers m and e satisfying 2^p <= m. Then I is either a jot, or a
subinterval of a jot.

*Proof.* If m < 2^(p+1) then this is immediate from the definition. In
the general case, m satisfies 2^q <= m < 2^(q+1) for some integer q with
p <= q. Write n = floor(m / 2^(q-p)). Then:

    n <= m / 2^(q-p) < n + 1, so
    n * 2^(q-p) <= m < (n + 1) * 2^(q-p), so
    n * 2^(q-p) <= m and m + 1 <= (n + 1) * 2^(q-p)


   n * 2^(e+q-p) <= m * 2^e and (m + 1) * 2^e <= (n + 1) * 2^(e+q-p)

So I is a subinterval of (n * 2^(e+q-p), (n+1) * 2^(e+q-p)), which is a jot.

The magic of round-to-odd

*Definition.* The function to-odd : R -> Z is defined by:

- to-odd(x) = x if x is an integer
- to-odd(x) = floor(x) if x is not an integer and floor(x) is odd
- to-odd(x) = ceil(x) if x is not an integer and floor(x) is even

*Properties.* Some easy monotonicity properties of to-odd, with proofs left
to the reader:

- If x < 2n for real x and integer n, then to-odd(x) < to-odd(2n)
- If 2n < x for real x and integer n, then to-odd(2n) < to-odd(x)

Here's a restatement of the main principle.

*Proposition.* With p and rnd as in the previous section, suppose that x is a
positive real number and that e is any integer satisfying 2^e <= x. Define a
real number y by:

    y = 2^(e-p-1) to-odd(x / 2^(e-p-1))

Then rnd(y) = rnd(x).

Proof of the principle

In a nutshell, we show that either

- y = x, or
- x and y belong to the same jot

Either way, since rnd is constant on jots, we get rnd(y) = rnd(x).

Case 1: x = m * 2^(e-p) for some integer m. Then x / 2^(e-p-1) = 2m is
an (even) integer, so to-odd(x / 2^(e-p-1)) = (x / 2^(e-p-1)) and y = x.
Hence rnd(y) = rnd(x).

Case 2: m * 2^(e-p) < x < (m + 1) * 2^(e-p) for some integer m.

Then rearranging, 2m < x / 2^(e-p-1) < 2(m+1). So from the monotonicity
properties of to-odd we have:

   2m < to-odd(x / 2^(e-p-1)) < 2(m+1)

And multiplying through by 2^(e-p-1) we get

   m * 2^(e-p) < y < (m+1) * 2^(e-p).

So both x and y belong to the interval I = (m*2^(e-p), (m+1)*2^(e-p-1)).

Furthermore, 2^e <= x < (m + 1) * 2^(e-p), so 2^p < m + 1, and 2^p <= m.
So by the lemma above, I is either a jot or a subinterval of a jot, so x
and y belong to the same jot and rnd(x) = rnd(y). This completes the proof.
Date User Action Args
2021-11-26 18:22:55mark.dickinsonsetrecipients: + mark.dickinson, tim.peters, rhettinger, steven.daprano
2021-11-26 18:22:55mark.dickinsonsetmessageid: <>
2021-11-26 18:22:55mark.dickinsonlinkissue45876 messages
2021-11-26 18:22:54mark.dickinsoncreate