msg285547 - (view) |
Author: Juraj Sukop (juraj.sukop) |
Date: 2017-01-16 10:01 |
Fused multiply-add (henceforth FMA) is an operation which calculates the product of two numbers and then the sum of the product and a third number with just one floating-point rounding. More concretely:
r = x*y + z
The value of `r` is the same as if the RHS was calculated with infinite precision and the rounded to a 32-bit single-precision or 64-bit double-precision floating-point number [1].
Even though one FMA CPU instruction might be calculated faster than the two separate instructions for multiply and add, its main advantage comes from the increased precision of numerical computations that involve the accumulation of products. Examples which benefit from using FMA are: dot product [2], compensated arithmetic [3], polynomial evaluation [4], matrix multiplication, Newton's method and many more [5].
C99 includes [6] `fma` function to `math.h` and emulates the calculation if the FMA instruction is not present on the host CPU [7]. PEP 7 states that "Python versions greater than or equal to 3.6 use C89 with several select C99 features" and that "Future C99 features may be added to this list in the future depending on compiler support" [8].
This proposal is then about adding new `fma` function with the following signature to `math` module:
math.fma(x, y, z)
'''Return a float representing the result of the operation `x*y + z` with single rounding error, as defined by the platform C library. The result is the same as if the operation was carried with infinite precision and rounded to a floating-point number.'''
Attached is a simple module for Python 3 demonstrating the fused multiply-add operation. On my machine, `example.py` prints:
40037.524591982365 horner_double
40037.48821639768 horner_fma
40037.49486325783 horner_compensated
40037.49486325783 horner_decimal
[1] https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
[2] S. Graillat, P. Langlois, N. Louvet. Accurate dot products with FMA. 2006
[3] S. Graillat, Accurate Floating Point Product and Exponentiation. 2007.
[4] S. Graillat, P. Langlois, N. Louvet. Improving the compensated Horner scheme with a Fused Multiply and Add. 2006
[5] J.-M. Muller, N. Brisebarre, F. de Dinechin, C.-P. Jeannerod, V. Lefèvre, G. Melquiond, N. Revol, D. Stehlé, S. Torres. Handbook of Floating-Point Arithmetic. 2010. Chapter 5
[6] ISO/IEC 9899:TC3, "7.12.13.1 The fma functions", Committee Draft - Septermber 7, 2007
[7] https://git.musl-libc.org/cgit/musl/tree/src/math/fma.c
[8] https://www.python.org/dev/peps/pep-0007/
|
msg285549 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-01-16 10:04 |
Thread on python-ideas:
https://mail.python.org/pipermail/python-ideas/2017-January/044300.html
|
msg285550 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2017-01-16 10:09 |
What's the point of adding this in the math module rather than a more specialized library like Numpy?
|
msg285551 - (view) |
Author: Juraj Sukop (juraj.sukop) |
Date: 2017-01-16 10:40 |
I would say because it has wide applicability, especially considering the amount of code it adds. It is similar in spirit to `copysign` or `fsum` which are already there and C99 includes it anyway. For simpler things like dot product or polynomial evaluation importing Numpy might be too much of an dependency.
|
msg285552 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2017-01-16 10:46 |
I don't know. If I want to compute a dot product, the first thing I'll do is import Numpy and then use the `@` operator on Numpy arrays.
|
msg285553 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-01-16 11:05 |
The performance argument unlikely is applicable in this case. I suppose that an overhead of function call in Python make two operators faster than one function call.
Alternatives to fma() for exact computations are integer arithmetic (if all values can be scaled to integers), fractions and decimal numbers.
But since fma() is a part of C (C99), C++ (C++11) and POSIX (POSIX.1-2001) standards for long time, I don't have objections against including it in the math module.
|
msg285557 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2017-01-16 11:36 |
> The performance argument unlikely is applicable in this case.
Agreed. This is mainly about accuracy, not speed: the FMA operation is a fundamental building block for floating-point arithmetic, is useful in some numerical algorithms, and essential in others (especially when doing things like double-double arithmetic). It would be valuable to have when prototyping numerical algorithms in pure Python.
Given that it's supported in C99 and on current Windows, I'm +1 on including it in the math module.
Note that implementing this it not quite as straightforward as simply wrapping the libm version, since we'll also want the correct exceptional behaviour, for consistency with the rest of the math module: i.e., we should be raising ValueError where the fma operation would signal the invalid FPE, and OverflowError where the fma operation would signal the overflow FPE.
|
msg285564 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2017-01-16 13:11 |
An implementation note: IEEE 754-2008 leaves it to the implementation to decide whether FMA operations like:
0 * inf + nan
and
inf * 0 + nan
(where nan represents a quiet NaN and the inf and 0 can have arbitrary signs) signal the invalid operation FPE or not. (Ref: 7.2(c) in IEEE 754-2008.)
I'd suggest that in this case we follow what Intel does in its x64 chips with FMA3 support.(according to ). If I'm reading the table in section 2.3 of the Intel Advanced Vector Extensions Programming Reference correctly, Intel does *not* signal the invalid operation FPE in this case. That is, we're following the usual rule of quiet NaN in => quiet NaN out with no exception. This does unfortunately conflict with the IBM decimal specification and Python's decimal module, where these operations *do* set the "invalid" flag (see the spec, and test fmax0809 in the decimal test set).
|
msg285565 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2017-01-16 13:14 |
Isn't the behaviour of quiet NaNs kindof implementation-dependent already?
|
msg285566 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2017-01-16 13:26 |
> Isn't the behaviour of quiet NaNs kindof implementation-dependent already?
Not as far as IEEE 754-2008 is concerned, and not as far as Python's math module is concerned, either: handling of special cases is, as far as I know, both consistent across platforms and compliant with IEEE 754. That's not true across Python as a whole, but it should be true for the math module.
If you find an exception to the above statement, please do open a bug report!
|
msg285582 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2017-01-16 19:04 |
Here's a pure Python reference implementation, with tests.
|
msg285682 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2017-01-17 21:02 |
And here's a patch.
|
msg285685 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-01-17 21:47 |
LGTM except that needed the versionadded directive and What's New entry. And maybe few additional tests.
|
msg285836 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2017-01-19 19:44 |
Serhiy, Victor: thanks for the reviews. Here's a new patch. Differences w.r.t. the old one:
- Converted to argument clinic.
- Updated docstring to talk about special cases.
- More tests, as suggested by Serhiy.
- whatsnew entry and ..versionadded in docs.
|
msg285837 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2017-01-19 19:46 |
Whoops; looks like I failed to attach the updated patch. Here it is.
|
msg285839 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-01-19 20:00 |
LGTM except that lines in What's New are too long.
|
msg285840 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2017-01-19 20:03 |
> lines in What's New are too long.
Thanks. Fixed (I think). I'm not sure what the limit is, but the lines are now all <= 79 characters long.
|
msg285841 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2017-01-19 20:04 |
Ah, the dev guide says 80 characters. (https://docs.python.org/devguide/documenting.html)
|
msg285843 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-01-19 20:13 |
Then LGTM unconditionally.
|
msg285844 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2017-01-19 20:23 |
Thanks, Serhiy. I'm going to hold off committing this for 24 hours or so, because I want to follow the buildbots when I do (and I don't have time for that right now). I wouldn't be at all surprised to see platform-specific test failures.
|
msg285955 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2017-01-21 12:43 |
New changeset b33012ef1417 by Mark Dickinson in branch 'default':
Issue #29282: add fused multiply-add function, math.fma.
https://hg.python.org/cpython/rev/b33012ef1417
|
msg285956 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-01-21 12:45 |
Cross fingers...
|
msg285957 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2017-01-21 13:07 |
Failures on the Windows buildbot (http://buildbot.python.org/all/builders/AMD64%20Windows8.1%20Non-Debug%203.x/builds/238/steps/test/logs/stdio) shown below.
It looks as though Windows is emulating the FMA operation on this machine (and not doing a particularly good job of it). That means that if we want to support Windows (and we do), we may have to emulate ourselves, preferably using something a bit more efficient than the fractions.Fraction module.
I'll let the buildbots complete, to see what else fails, and then roll back the commit. The patch clearly isn't good enough in its current state.
======================================================================
ERROR: test_fma_overflow (test.test_math.FMATests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "D:\buildarea\3.x.ware-win81-release\build\lib\test\test_math.py", line 1565, in test_fma_overflow
self.assertEqual(math.fma(a, b, -c),
OverflowError: overflow in fma
======================================================================
FAIL: test_fma_zero_result (test.test_math.FMATests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "D:\buildarea\3.x.ware-win81-release\build\lib\test\test_math.py", line 1524, in test_fma_zero_result
self.assertIsNegativeZero(math.fma(tiny, -tiny, 0.0))
File "D:\buildarea\3.x.ware-win81-release\build\lib\test\test_math.py", line 1642, in assertIsNegativeZero
msg="Expected a negative zero, got {!r}".format(value)
AssertionError: False is not true : Expected a negative zero, got 0.0
======================================================================
FAIL: test_random (test.test_math.FMATests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "D:\buildarea\3.x.ware-win81-release\build\lib\test\test_math.py", line 1623, in test_random
self.assertEqual(math.fma(a, b, c), expected)
AssertionError: 0.5506672157701096 != 0.5506672157701097
|
msg285958 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2017-01-21 13:11 |
New changeset b5a5f13500b9 by Mark Dickinson in branch 'default':
Issue #29282: Backed out changeset b33012ef1417
https://hg.python.org/cpython/rev/b5a5f13500b9
|
msg285959 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2017-01-21 13:17 |
Also failures on Gentoo: here b is positive (possibly +inf), and c is finite, so we expect an infinite result. Instead, we're apparently getting a NaN. I don't have a good guess about what's causing this: the rest of the tests are passing, so it's unlikely that we're using a bad FMA emulation. Maybe an optimization bug?
======================================================================
ERROR: test_fma_infinities (test.test_math.FMATests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/buildbot/buildarea/3.x.ware-gentoo-x86.installed/build/target/lib/python3.7/test/test_math.py", line 1482, in test_fma_infinities
self.assertEqual(math.fma(math.inf, b, c), math.inf)
ValueError: invalid operation in fma
|
msg286201 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2017-01-24 18:14 |
The patch needs tests for the case where a*b overflows and c is infinite (either of the same sign as a*b or not). This combination should never return NaN, but a poor emulation of fma might do so.
|
msg312431 - (view) |
Author: Nico Schlömer (nschloe) |
Date: 2018-02-20 20:07 |
Do I read this thread correctly assuming that this hasn't been implemented yet? If not, I would probably make my own little library for this -- I really need the feature for the precision.
|
msg312432 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2018-02-20 20:10 |
> Do I read this thread correctly assuming that this hasn't been implemented yet?
Yes. Existing libm implementations don't work, so simply wrapping the libm function isn't enough. And writing an implementation from scratch is non-trivial.
|
msg312433 - (view) |
Author: Nico Schlömer (nschloe) |
Date: 2018-02-20 20:18 |
> Existing libm implementations don't work,
Okay. Is this because of the inf/NaN discrimination hiccups mentioned above or are there any other pitfalls?
|
msg312480 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2018-02-21 09:12 |
> Is this because of the inf/NaN discrimination hiccups [...]
No, more than that. If it were just that, we could work around it by adding the appropriate special case handling before calling the libm fma (as has been done, reluctantly, with some of the other math module functions; see the source for math.pow, for example).
But the fma implementation on Windows is fundamentally broken. For finite numbers, it simply doesn't give what it's supposed to (a * b + c, computed with a _single_ rounding). Since that single rounding is most of the point of fma, that makes the libm fma not fit for purpose on Windows.
It _is_ possible, with care, to code up a not-too-inefficient fma emulation using clever tricks like Veltkamp splitting and Dekker multiplication. I have half such an implementation sitting on my home computer, but so far have not had the cycles to finish it (and it's not high on the priority list right now).
|
msg312488 - (view) |
Author: Nico Schlömer (nschloe) |
Date: 2018-02-21 10:50 |
Okay, thanks for the info.
As a stop-gap measure, I've created pyfma [1, 2]. Install with
```
pip install pyfma
```
and use with
```
pyfma.fma(3.0, 2.0, 1.0)
```
Only works on Unix reliable then, but that's all I care about. :)
Cheers,
Nico
[1] https://github.com/nschloe/pyfma
[2] https://pypi.python.org/pypi/pyfma
|
msg359903 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-01-13 14:11 |
I converted https://hg.python.org/cpython/rev/b33012ef1417 written by Mark Dickinson into a GitHub PR: PR 17987. I still expect tests failures. I plan to use the PR as a starting point to implement math.fma(). If tests continue to fail on some platforms, I plan to manually handle NaN and INF in the C code, before calling libc fma().
|
msg359911 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2020-01-13 14:39 |
> If tests continue to fail on some platforms, I plan to manually handle NaN and INF in the C code, before calling libc fma().
For Windows, you need to do much more than this: it's not just about handling NaNs and infinities, it's about reimplementing the entire function from scratch to give correctly rounded results. Without correctly-rounded results, there's very little point in having fma.
If it were a couple of niche platforms that gave bad results, then we could push this through. But it's Windows. :-(
|
msg359913 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2020-01-13 14:53 |
Okay, looks like Windows is happy in the PR's continuous integration. If the buildbots are also happy, then I'm content to have this pushed through.
|
msg359921 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-01-13 18:04 |
> For Windows, you need to do much more than this: it's not just about handling NaNs and infinities, it's about reimplementing the entire function from scratch to give correctly rounded results. Without correctly-rounded results, there's very little point in having fma.
Would it make sense to only make the function available on non-Windows platforms? As we do for other Unix-only functions in the os module. Maybe even skip more platforms if they provide a broken implementation.
We could implement a test suite in configure to decide if fma() fits our requirements or not.
|
msg359926 - (view) |
Author: Juraj Sukop (juraj.sukop) |
Date: 2020-01-13 19:17 |
FWIW, there is a new implementation of FMA [1] which is licensed very permissively [2]. Perhaps it could be used here as well..?
[1] https://github.com/smasher164/fma
[2] https://github.com/smasher164/fma/commit/4e85d2388c7d4d850be12df918f9431ca687f57a
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:42 | admin | set | github: 73468 |
2021-06-19 09:24:45 | mark.dickinson | set | assignee: mark.dickinson -> |
2020-01-13 19:17:41 | juraj.sukop | set | messages:
+ msg359926 |
2020-01-13 18:04:03 | vstinner | set | messages:
+ msg359921 |
2020-01-13 14:53:17 | mark.dickinson | set | messages:
+ msg359913 |
2020-01-13 14:39:15 | mark.dickinson | set | messages:
+ msg359911 |
2020-01-13 14:11:05 | vstinner | set | messages:
+ msg359903 |
2020-01-13 14:07:29 | vstinner | set | stage: commit review -> patch review pull_requests:
+ pull_request17391 |
2020-01-13 02:30:39 | steven.daprano | set | nosy:
+ steven.daprano
|
2019-05-16 23:11:19 | donovick | set | nosy:
+ donovick
|
2018-02-21 10:50:37 | nschloe | set | messages:
+ msg312488 |
2018-02-21 09:12:24 | mark.dickinson | set | messages:
+ msg312480 |
2018-02-20 20:18:02 | nschloe | set | messages:
+ msg312433 |
2018-02-20 20:10:17 | mark.dickinson | set | messages:
+ msg312432 |
2018-02-20 20:07:36 | nschloe | set | nosy:
+ nschloe messages:
+ msg312431
|
2017-01-24 18:17:04 | eric.fahlgren | set | nosy:
+ eric.fahlgren
|
2017-01-24 18:14:11 | mark.dickinson | set | messages:
+ msg286201 |
2017-01-21 13:17:43 | mark.dickinson | set | messages:
+ msg285959 |
2017-01-21 13:11:09 | python-dev | set | messages:
+ msg285958 |
2017-01-21 13:07:14 | mark.dickinson | set | messages:
+ msg285957 |
2017-01-21 12:45:13 | serhiy.storchaka | set | messages:
+ msg285956 |
2017-01-21 12:43:25 | python-dev | set | nosy:
+ python-dev messages:
+ msg285955
|
2017-01-19 20:23:13 | mark.dickinson | set | messages:
+ msg285844 |
2017-01-19 20:13:18 | serhiy.storchaka | set | messages:
+ msg285843 |
2017-01-19 20:04:38 | mark.dickinson | set | messages:
+ msg285841 |
2017-01-19 20:03:34 | mark.dickinson | set | files:
+ math_fma3.patch
messages:
+ msg285840 |
2017-01-19 20:00:44 | serhiy.storchaka | set | assignee: mark.dickinson messages:
+ msg285839 stage: patch review -> commit review |
2017-01-19 19:48:43 | serhiy.storchaka | set | messages:
- msg285838 |
2017-01-19 19:48:04 | serhiy.storchaka | set | messages:
+ msg285838 |
2017-01-19 19:46:16 | mark.dickinson | set | files:
+ math_fma2.patch
messages:
+ msg285837 |
2017-01-19 19:44:53 | mark.dickinson | set | messages:
+ msg285836 |
2017-01-17 21:47:28 | serhiy.storchaka | set | messages:
+ msg285685 |
2017-01-17 21:15:21 | mark.dickinson | set | stage: patch review |
2017-01-17 21:02:46 | mark.dickinson | set | files:
+ math_fma.patch keywords:
+ patch messages:
+ msg285682
|
2017-01-16 19:04:14 | mark.dickinson | set | files:
+ fma_reference.py
messages:
+ msg285582 |
2017-01-16 13:26:09 | mark.dickinson | set | messages:
+ msg285566 |
2017-01-16 13:14:42 | pitrou | set | messages:
+ msg285565 |
2017-01-16 13:11:01 | mark.dickinson | set | nosy:
+ skrah messages:
+ msg285564
|
2017-01-16 11:36:54 | mark.dickinson | set | messages:
+ msg285557 |
2017-01-16 11:05:48 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg285553
|
2017-01-16 10:46:49 | pitrou | set | messages:
+ msg285552 |
2017-01-16 10:40:50 | juraj.sukop | set | messages:
+ msg285551 |
2017-01-16 10:09:47 | pitrou | set | nosy:
+ pitrou messages:
+ msg285550
|
2017-01-16 10:04:40 | vstinner | set | messages:
+ msg285549 |
2017-01-16 10:03:26 | vstinner | set | nosy:
+ vstinner
|
2017-01-16 10:03:00 | vstinner | set | nosy:
+ mark.dickinson
|
2017-01-16 10:02:45 | juraj.sukop | set | files:
+ setup.py |
2017-01-16 10:02:37 | juraj.sukop | set | files:
+ example.py |
2017-01-16 10:02:24 | juraj.sukop | set | files:
+ xmathmodule.c |
2017-01-16 10:01:37 | juraj.sukop | create | |