Fused multiply-add (henceforth FMA) is an operation which calculates the product of two numbers and then the sum of the product and a third number with just one floating-point rounding. More concretely:
r = x*y + z
The value of `r` is the same as if the RHS was calculated with infinite precision and the rounded to a 32-bit single-precision or 64-bit double-precision floating-point number [1].
Even though one FMA CPU instruction might be calculated faster than the two separate instructions for multiply and add, its main advantage comes from the increased precision of numerical computations that involve the accumulation of products. Examples which benefit from using FMA are: dot product [2], compensated arithmetic [3], polynomial evaluation [4], matrix multiplication, Newton's method and many more [5].
C99 includes [6] `fma` function to `math.h` and emulates the calculation if the FMA instruction is not present on the host CPU [7]. PEP 7 states that "Python versions greater than or equal to 3.6 use C89 with several select C99 features" and that "Future C99 features may be added to this list in the future depending on compiler support" [8].
This proposal is then about adding new `fma` function with the following signature to `math` module:
math.fma(x, y, z)
'''Return a float representing the result of the operation `x*y + z` with single rounding error, as defined by the platform C library. The result is the same as if the operation was carried with infinite precision and rounded to a floating-point number.'''
Attached is a simple module for Python 3 demonstrating the fused multiply-add operation. On my machine, `example.py` prints:
40037.524591982365 horner_double
40037.48821639768 horner_fma
40037.49486325783 horner_compensated
40037.49486325783 horner_decimal
[1] https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
[2] S. Graillat, P. Langlois, N. Louvet. Accurate dot products with FMA. 2006
[3] S. Graillat, Accurate Floating Point Product and Exponentiation. 2007.
[4] S. Graillat, P. Langlois, N. Louvet. Improving the compensated Horner scheme with a Fused Multiply and Add. 2006
[5] J.-M. Muller, N. Brisebarre, F. de Dinechin, C.-P. Jeannerod, V. LefĂ¨vre, G. Melquiond, N. Revol, D. StehlĂ©, S. Torres. Handbook of Floating-Point Arithmetic. 2010. Chapter 5
[6] ISO/IEC 9899:TC3, "7.12.13.1 The fma functions", Committee Draft - Septermber 7, 2007
[7] https://git.musl-libc.org/cgit/musl/tree/src/math/fma.c
[8] https://www.python.org/dev/peps/pep-0007/ |