Message407078

Author | mark.dickinson |
---|---|

Recipients | mark.dickinson, rhettinger, steven.daprano, tim.peters |

Date | 2021-11-26.18:22:54 |

SpamBayes Score | -1.0 |

Marked as misclassified | Yes |

Message-id | <1637950975.07.0.707061882118.issue45876@roundup.psfhosted.org> |

In-reply-to |

Content | |
---|---|

Since I've failed to find a coherent statement and proof of the general principle I articulated anywhere online, I've included one below. (To be absolutely clear, the principle is not new - it's very well known, but oddly hard to find written down anywhere.) ---------------------- Setup: introducing jots ======================= *Notation.* R is the set of real numbers, Z is the set of integers, operators like *, / and ^ have their mathematical interpretations, not Python ones. Fix a precision p > 0 and an IEEE 754 binary floating-point format with precision p; write F for the set of representable values in that format, including zeros and infinities. (We don't need NaNs, but feel free to include them if you want to.) Let rnd : R -> F be the rounding function corresponding to any of the standard IEEE 754 rounding modes. We're not ignoring overflow and underflow here: rnd is assumed to round tiny values to +/-0 and large values to +/-infinity as normal. (We only really care about round-ties-to-even, but all of the below is perfectly general.) *Definition.* For the given fixed precision p, a *jot* is a subinterval of the positive reals of the form (m 2^e, (m+1) 2^e) for some integers m and e, with m satisfying 2^p <= m < 2^(p+1). This is a definition-of-convenience, invented purely for the purposes of this proof. (And yes, the name is silly. Suggestions for a better name to replace "jot" are welcome. Naming things is hard.) We've chosen the size of a jot so that between each consecutive pair a and b of positive normal floats in F, there are exactly two jots: one spanning from a to the midpoint (a+b)/2, and another spanning from (a+b)/2 to b. (Since jots are open, the midpoint itself and the floats a and b don't belong to any jot.) Now here's the key point: for values that aren't exactly representable and aren't perfect midpoints, the standard rounding modes, whether directed or round-to-nearest, only ever care about which side of the midpoint the value to be rounded lies. In other words: *Observation.* If x and y belong to the same jot, then rnd(x) = rnd(y). This is the point of jots: they represent the wiggle-room that we have to perturb a real number without affecting the way that it rounds under any of the standard rounding modes. *Note.* Between any two consecutive *subnormal* values, we have 4 or more jots, and above the maximum representable float we have infinitely many, but the observation that rnd is constant on jots remains true at both ends of the spectrum. Also note that jots, as defined above, don't cover the negative reals, but we don't need them to for what follows. Here's a lemma that we'll need shortly. *Lemma.* Suppose that I is an open interval of the form (m 2^e, (m+1) 2^e) for some integers m and e satisfying 2^p <= m. Then I is either a jot, or a subinterval of a jot. *Proof.* If m < 2^(p+1) then this is immediate from the definition. In the general case, m satisfies 2^q <= m < 2^(q+1) for some integer q with p <= q. Write n = floor(m / 2^(q-p)). Then: n <= m / 2^(q-p) < n + 1, so n * 2^(q-p) <= m < (n + 1) * 2^(q-p), so n * 2^(q-p) <= m and m + 1 <= (n + 1) * 2^(q-p) so n * 2^(e+q-p) <= m * 2^e and (m + 1) * 2^e <= (n + 1) * 2^(e+q-p) So I is a subinterval of (n * 2^(e+q-p), (n+1) * 2^(e+q-p)), which is a jot. The magic of round-to-odd ========================= *Definition.* The function to-odd : R -> Z is defined by: - to-odd(x) = x if x is an integer - to-odd(x) = floor(x) if x is not an integer and floor(x) is odd - to-odd(x) = ceil(x) if x is not an integer and floor(x) is even *Properties.* Some easy monotonicity properties of to-odd, with proofs left to the reader: - If x < 2n for real x and integer n, then to-odd(x) < to-odd(2n) - If 2n < x for real x and integer n, then to-odd(2n) < to-odd(x) Here's a restatement of the main principle. *Proposition.* With p and rnd as in the previous section, suppose that x is a positive real number and that e is any integer satisfying 2^e <= x. Define a real number y by: y = 2^(e-p-1) to-odd(x / 2^(e-p-1)) Then rnd(y) = rnd(x). Proof of the principle ====================== In a nutshell, we show that either - y = x, or - x and y belong to the same jot Either way, since rnd is constant on jots, we get rnd(y) = rnd(x). Case 1: x = m * 2^(e-p) for some integer m. Then x / 2^(e-p-1) = 2m is an (even) integer, so to-odd(x / 2^(e-p-1)) = (x / 2^(e-p-1)) and y = x. Hence rnd(y) = rnd(x). Case 2: m * 2^(e-p) < x < (m + 1) * 2^(e-p) for some integer m. Then rearranging, 2m < x / 2^(e-p-1) < 2(m+1). So from the monotonicity properties of to-odd we have: 2m < to-odd(x / 2^(e-p-1)) < 2(m+1) And multiplying through by 2^(e-p-1) we get m * 2^(e-p) < y < (m+1) * 2^(e-p). So both x and y belong to the interval I = (m*2^(e-p), (m+1)*2^(e-p-1)). Furthermore, 2^e <= x < (m + 1) * 2^(e-p), so 2^p < m + 1, and 2^p <= m. So by the lemma above, I is either a jot or a subinterval of a jot, so x and y belong to the same jot and rnd(x) = rnd(y). This completes the proof. |

History | |||
---|---|---|---|

Date | User | Action | Args |

2021-11-26 18:22:55 | mark.dickinson | set | recipients: + mark.dickinson, tim.peters, rhettinger, steven.daprano |

2021-11-26 18:22:55 | mark.dickinson | set | messageid: <1637950975.07.0.707061882118.issue45876@roundup.psfhosted.org> |

2021-11-26 18:22:55 | mark.dickinson | link | issue45876 messages |

2021-11-26 18:22:54 | mark.dickinson | create |