Issue36493

Created on **2019-03-31 13:10** by **scoder**, last changed **2019-04-02 16:08** by **steven.daprano**. This issue is now **closed**.

Messages (7) | |||
---|---|---|---|

msg339257 - (view) | Author: Stefan Behnel (scoder) * | Date: 2019-03-31 13:10 | |

I recently read a paper¹ about the difficulty of calculating the most exact midpoint between two floating point values, facing overflow, underflow and rounding errors. The intention is to assure that the midpoint(a,b) is - actually within the interval [a,b] - the floating point number in that interval that is closest to the real midpoint (i.e. (a+b)/2). It feels like a function that should be part of Python's math module. ¹ https://hal.archives-ouvertes.fr/file/index/docid/576641/filename/computing-midpoint.pdf The author proposes the following implementation (pages 20/21): midpoint(a,b) = a if a == b else # covers midpoint(inf, inf) 0 if a == -b else # covers midpoint(-inf, +inf) -realmax if a == -inf else # min. double value +realmax if b == +inf else # max. double value round_to_nearest_even((a - a/2) + b/2) I guess nans should simply pass through as in other functions, i.e. midpoint(a,nan) == midpoint(nan,b) == nan. The behaviour for [a,inf] is decidedly up for discussion. There are certainly cases where midpoint(a,+inf) would best return +inf, but +realmax as an actual finite value also seems reasonable. OTOH, it's easy for users to promote inf->realmax or realmax->inf or even inf->a themselves, just like it's easy to check for +/-inf before calling the function. It just takes a bit longer to do these checks on user side. There could also be a "mode" argument that makes it return one of: a or b (i.e. the finite bound), +/-realmax or +/-inf in the two half-infinity cases. What do you think about this addition? |
|||

msg339275 - (view) | Author: Christian Heimes (christian.heimes) * | Date: 2019-03-31 20:22 | |

What's the argument for midpoint(inf, -inf) == 0? This feels wrong to me. I'm certainly not an expert on math, but I still remember that there are different kinds of infinities. For examples 'n**n' approaches infinity 'faster' than '-(2**n)'. |
|||

msg339281 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2019-04-01 07:03 | |

Yes, I'd definitely expect `midpoint(inf, -inf)` to be `nan`. |
|||

msg339282 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2019-04-01 07:12 | |

Special cases aside, I think this is a YAGNI. The "obvious" formulas `(a + b)/2)` and `0.5 * (a + b)` _do_ do exactly the right thing (including giving a perfectly correctly-rounded answer with round-ties-to-even on a typical IEEE 754-using machine) provided that subnormals and values very close to the upper limit are avoided. If you're doing floating-point arithmetic with values of size > 1e300, you've probably already got significant issues. I could see specialist uses for this, e.g., in a general purpose bisection algorithm, but I'm not convinced it's worth adding something to the math library just for that. |
|||

msg339285 - (view) | Author: Tim Peters (tim.peters) * | Date: 2019-04-01 08:02 | |

I'm inclined to agree with Mark - the wholly naive algorithm is already "perfect" away from extreme (whether large or small) magnitudes, and can't be beat for speed either. This kind of thing gets much more attractive in IEEE single precision, where getting near the extremes can be common in some apps. That said, at a higher level there's no end to FP functions that _could_ be implemented to be more accurate in some (even many) cases. That it's possible isn't enough, on its own, to justify all the eternal costs of supplying it. Significant real life use cases make for far more compelling justification. For new FP stuff, the first thing I ask is "does numpy/scipy/mpmath supply it?". I don't know in this case, but if the answer is "no", then it's a pretty safe bet that number-crunching Python users have insignificant real use for it. If the answer is "yes", then the question changes to whether Pythoneers other than those _already_ using the major number-crunching Python packages have significant real uses for it. On that basis, e.g., accepting the whole `statistics` module was an easy call (many real uses outside of just specialists), but adding a function to do LU decomposition of a matrix is an easy reject (exceedingly useful, in fact, but those who need it almost certainly already use the Python packages that already supply it). |
|||

msg339304 - (view) | Author: Stefan Behnel (scoder) * | Date: 2019-04-01 19:41 | |

I buy the YAGNI argument and won't fight for this. Thanks for your input. |
|||

msg339349 - (view) | Author: Steven D'Aprano (steven.daprano) * | Date: 2019-04-02 16:08 | |

I see this has been rejected for the math module, but I wonder whether it should be considered for Decimal? I recall Mark Dickinson demonstrating this lovely little bit of behaviour: py> from decimal import getcontext, Decimal py> getcontext().prec = 3 py> x = Decimal('0.516') py> y = Decimal('0.518') py> (x + y) / 2 Decimal('0.515') To prove its not a fluke, or an artifact of tiny numbers: py> getcontext().prec = 28 py> a = Decimal('0.10000000000000000009267827205') py> b = Decimal('0.10000000000000000009267827207') py> a <= (a + b)/2 <= b False py> a = Decimal('710000000000000000009267827205') py> b = Decimal('710000000000000000009267827207') py> a <= (a + b)/2 <= b False Thoughts? |

History | |||
---|---|---|---|

Date | User | Action | Args |

2019-04-02 16:08:42 | steven.daprano | set | nosy:
+ steven.daprano messages: + msg339349 |

2019-04-01 19:41:14 | scoder | set | status: open -> closed resolution: rejected messages: + msg339304 stage: resolved |

2019-04-01 08:02:44 | tim.peters | set | messages: + msg339285 |

2019-04-01 07:13:32 | mark.dickinson | set | nosy:
+ tim.peters |

2019-04-01 07:12:31 | mark.dickinson | set | messages: + msg339282 |

2019-04-01 07:03:08 | mark.dickinson | set | messages: + msg339281 |

2019-03-31 20:22:44 | christian.heimes | set | nosy:
+ christian.heimes messages: + msg339275 |

2019-03-31 13:10:06 | scoder | create |