This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [AIX] test_math: test_nextafter(float('nan'), 1.0) does not return a NaN on AIX
Type: Stage: resolved
Components: Tests Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: David.Edelsohn, Michael.Felt, lemburg, mark.dickinson, rhettinger, stutzbach, vstinner
Priority: normal Keywords: patch

Created on 2020-11-11 11:34 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 24265 merged vstinner, 2021-01-20 10:38
PR 24381 merged vstinner, 2021-01-29 21:42
Messages (24)
msg380751 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-11 11:34
https://buildbot.python.org/all/#/builders/302/builds/338

FAIL: test_nextafter (test.test_math.MathTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/aixtools/buildarea/3.9.aixtools-aix-power6/build/Lib/test/test_math.py", line 1968, in test_nextafter
    self.assertIsNaN(math.nextafter(NAN, 1.0))
  File "/home/aixtools/buildarea/3.9.aixtools-aix-power6/build/Lib/test/test_math.py", line 2015, in assertIsNaN
    self.fail("Expected a NaN, got {!r}.".format(value))
AssertionError: Expected a NaN, got 1.0.

The test:

        # NaN
        self.assertIsNaN(math.nextafter(NAN, 1.0))   # <=== HERE
        self.assertIsNaN(math.nextafter(1.0, NAN))
        self.assertIsNaN(math.nextafter(NAN, NAN))

The Linux manual page says: "If x or y is a NaN, a NaN is returned."
https://man7.org/linux/man-pages/man3/nextafter.3.html

But it seems like the AIX libc doesn't implement this rule. Should we implement this rule in Python on AIX?

The strange thing is that it worked previously. test.python of build 338:

platform.platform: AIX-2-00F9C1964C00-powerpc-32bit
sysconfig[HOST_GNU_TYPE]: powerpc-ibm-aix7.2.4.0
platform.architecture: 32bit

The latest green build is built 347. test.pythoninfo of build 347:

platform.architecture: 32bit
platform.platform: AIX-2-00F9C1964C00-powerpc-32bit
sysconfig[HOST_GNU_TYPE]: powerpc-ibm-aix7.2.0.0

Was the machine updated two days ago (2020-11-09), between build 338 and build 347?
msg380754 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-11-11 12:39
If AIX were one of our officially supported platforms, then yes, I'd say that we should add a workaround to handle special cases ourselves, similarly to what we already do for a number of math module functions (like math.pow, for example).

But given that it's only a "best effort" platform, I'm not convinced that it's worth the effort or the extra complication in the codebase.

-0 from me, I guess.
msg380755 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-11-11 12:41
Is there any reasonable channel for reporting the issue upstream?
msg380756 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-11 13:26
My worry is that I'm getting emails about AIX buildbot failures. I see different options:

* Skip the test on AIX
* Fix nextafter() on AIX
* Turn off AIX buildbot email notifications
* Remove thE AIX buildbot
msg380769 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-11-11 16:31
> My worry is that I'm getting emails about AIX buildbot failures.

That sounds more like a process problem than a CPython codebase one. The ideal would be that the machinery sending those notifications can be configured to ignore known failures when deciding whether to send email. Is that remotely feasible? (I have zero familiarity with the buildbot machinery.)

Skipping the test on AIX sounds like a reasonable option, but I kinda *want* IBM developers running the Python test suite on AIX to see those failures, in the hope that they might then be motivated to push for a fix to the relevant AIX bug. :-)
msg380774 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-11 16:54
> That sounds more like a process problem than a CPython codebase one. The ideal would be that the machinery sending those notifications can be configured to ignore known failures when deciding whether to send email. Is that remotely feasible? (I have zero familiarity with the buildbot machinery.)

If a test fails all the time and not randomly, a single email is sent at the first failure.

I'm annoyed by test_threading which crash randomly on AIX: https://bugs.python.org/issue40068 It's a known issue, I already fixed 3/4 of the issue, but I didn't fix the remaining part.

In the past, I already disabled AIX email notifications simply because there was nobody to fix issues, and so emails were just spam.


> but I kinda *want* IBM developers running the Python test suite on AIX to see those failures, in the hope that they might then be motivated to push for a fix to the relevant AIX bug. :-)

Well, that would be great.

I'm not sure if Michael Felt is still working on supporting AIX in Python. David Edelsohn might help.
msg380778 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2020-11-11 17:37
nextafter is a known problem on AIX.  I believe that it is being addressed in newer releases of AIX.

Michael and I are helping the IBM AIX Open Source team to increase their attention on Python, but things only move so fast.
msg381096 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-11-16 12:53
I have been experimenting with different hardware and AIX versions.

When building on AIX 5.3 - and the oldest libraries - test_math passes.

When I run the test on POWER8, using either xlc or gcc test_math fails with just one element of the test.

When I run the test on POWER6 I get many more errors - that I never had before. These are all after OS updates (I was not going to build for AIX 5.3 any more).

An idea I have now - that may explain the sudden change in behavior is if the libraries have been optimized to always use the DFP (decimal floating point) internally - for what, from the application perspective - is the normal - no HW acceleration for FP - interface.

I know there are ways to 'discover' this, but I'll need to write some tests so that I can see - if linking to different libraries actuates DFP performance counters yes and no.

At this point - this feels like the a potential explanation.
msg381111 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2020-11-16 15:22
I investigated another problem with nextafter() in 2015 and opened an internal IBM AIX PMR.  At the time it was not using decimal float code.

The earlier problem was the handling of -0.0.  At the time, the code was hand-written assembly language that did not check for IEEE floating point corner cases.
msg381115 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-16 16:20
> The earlier problem was the handling of -0.0.  At the time, the code was hand-written assembly language that did not check for IEEE floating point corner cases.

I'm quite happy that my hand written tests detect bugs in nextafter() implementations ;-)
msg381119 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-16 16:48
How can we fix the buildbot? Add #ifdef in mathmodule.c to implement the special cases, but only on AIX? Skip the test?
msg381128 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-11-16 17:27
There seems to be a lot of interaction of OS level and compiler used.

* Waiting for the next bot run to get a different compiler.

+++
AIX 6.1.6 and older libraries - no test errors reported

AIX 7.1.4 and newer libraries - when using the binary built on 6.1.6 (AIX 6.1 TL6) - no error

AIX 7.1.4, using xlc-v13.1.2 (try and buy version), same AIX level as the bot (7.1 TL4 SP8) - no error

Back to the bot: AIX 7.1 TL4 SP8 and gcc-4.7.4 - strange errors. See, e.g., https://buildbot.python.org/all/#/builders/302/builds/373/steps/5/logs/stdio with additional errors such as:

======================================================================
FAIL: testHypotAccuracy (test.test_math.MathTests) (hx='0x1.89d8c423ea0c6p+29', hy='0x1.d35dcfe902bc3p+29', x=825956484.4892814, y=980138493.1263355)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/test_math.py", line 867, in testHypotAccuracy
    self.assertEqual(hypot(x, y), z)
AssertionError: 1281747081.1271062 != 1281747081.127106

Waiting for bot run 374 - to see if the results change when the compiler changes.

I'll try moving the bot to another system - as the system the bot is on is more than just the bot. Maybe there are side-effects coming in, unexpectedly, from other sources. ** The two systems just mentioned are fresh installs.
msg381164 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-16 21:31
> Back to the bot: AIX 7.1 TL4 SP8 and gcc-4.7.4

The latest GCC version is GCC 10. Is it still relevant to test GCC 4.7 released 8 years ago? (Well, I'm not sure that the C compiler explains all issues.)
msg381166 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2020-11-16 21:33
I believe that Michael was trying to probe under what circumstances the failure appears.  But, not GCC 4.7 is not relevant.
msg381238 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-11-17 14:23
Yes, just probing, the version of gcc is irrelevant.

What I do believe is important is that bot run 374, 375 and 376 passed - On AIX 7.1 TL4 SP8.

The failure starting with 377 is an undefined variable.

"./Modules/posixmodule.c", line 15146.53: 1506-045 (S) Undeclared identifier SPLICE_F_MOVE.
"./Modules/posixmodule.c", line 15147.57: 1506-045 (S) Undeclared identifier SPLICE_F_NONBLOCK.
"./Modules/posixmodule.c", line 15148.53: 1506-045 (S) Undeclared identifier SPLICE_F_MORE.

So, let's put this on hold. I'll get a new environment built up specific for testing python on AIX. Hopefully by Friday.
msg381260 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-17 17:38
> "./Modules/posixmodule.c", line 15146.53: 1506-045 (S) Undeclared identifier SPLICE_F_MOVE.

This is unrelated: https://bugs.python.org/issue41625#msg381259 Please continue the discussion this SPLICE there.
msg381262 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-11-17 17:45
[Victor]

> How can we fix the buildbot? Add #ifdef in mathmodule.c to implement the special cases, but only on AIX? Skip the test?

I'm not super-keen on using #ifdefs to implement the special-case handling _just_ for AIX: that opens the door to a labyrinth of #ifdef'ery working around various different problems on various different platforms. If we're going to handle special cases ourselves, let's do it for all platforms.

But I'd also be fine with skipping the test (just on AIX, of course) for now. If we can also find a way to remind ourselves to revisit once the upstream bug has been fixed, so much the better.
msg383535 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-12-21 15:53
I have been doing a lot of research on this. Wish I had thought do start the way I finished.

Basically, when math.nextafter() was added all the AIX bots were on systems running AIX earlier than AIX 7.2 TL2.

When AIX 7.2 TL2 was released (roughly Q3 2017) a (major?) change was made to the nextafter() function.

root@gcc119:[/home2/root]instfix -k IV95512 -a
IV95512 Abstract: nextafter(+0.0, -0.0) returns +0.0 instead of -0.0.

IV95512 Symptom Text:
 If(x==y) nextafter returns x instead of y.

At first glance - it appears the CPython code is reversing the arguments:

The lines in test_math.py are currently:
 +2026          # NaN
 +2027          self.assertIsNaN(math.nextafter(NAN, 1.0))
 +2028          self.assertIsNaN(math.nextafter(1.0, NAN))
 +2029          self.assertIsNaN(math.nextafter(NAN, NAN))

Moving line 2027 (which is what is failing) to 2029 - the other two lines pass on an AIX system with IV95512 applied. 

As IEEE754 says (and seems to have always said):

https://pubs.opengroup.org/onlinepubs/9699919799: 

If x or y is NaN, a NaN shall be returned.

The current test in Modules/mathmodule.c might be too simple.

I am working on a PR where I check for presence of APAR IV95512 - with the nextafter() changes.
msg383548 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-12-21 18:37
While my patch in working - was successful in what it attempted to do, it did not fix this test issue.

Instead - I reinstalled the `bos.adt.libm-7.2.0.0` fileset, to backout of the so-called bugfix/APAR IV95512.

@David - can you take this up with AIX support - IV95112 (and more) do not seem to return NaN when one of the arguments is NaN.
msg385326 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-20 10:43
I wrote PR 24265 to fix the issue. math.nextafter(x, y) already had a special case for AIX for x==y.

test_nextafter fails on PPC64 AIX 3.x (build 749).

test_nextafter pass on POWER6 AIX 3.x (build 701).
msg385343 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-20 14:20
New changeset c1c3493fb7a3af8efdc50175e592d29e8cb93886 by Victor Stinner in branch 'master':
bpo-42323: Fix math.nextafter() for NaN on AIX (GH-24265)
https://github.com/python/cpython/commit/c1c3493fb7a3af8efdc50175e592d29e8cb93886
msg385348 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-20 15:36
Oh, it seems like Python no long builds on PPC64 AIX 3.x buildbot :-(
https://bugs.python.org/issue42604#msg385347
msg385953 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-29 22:04
New changeset 0837f99d3367ecf200033bbddfa05d061ae9f483 by Victor Stinner in branch 'master':
bpo-42323: Fix math.nextafter() on AIX (GH-24381)
https://github.com/python/cpython/commit/0837f99d3367ecf200033bbddfa05d061ae9f483
msg385960 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-30 00:00
> test_nextafter fails on PPC64 AIX 3.x (build 749).

It pass again in build 788, so I close the issue:
https://buildbot.python.org/all/#/builders/438/builds/788

It would be great if the AIX libm could be fixed, but I wanted to fix the AIX buildbots, to be abl to detect other Python regressions.
History
Date User Action Args
2022-04-11 14:59:38adminsetgithub: 86489
2021-01-30 00:00:29vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg385960

stage: patch review -> resolved
2021-01-29 22:04:53vstinnersetmessages: + msg385953
2021-01-29 21:42:19vstinnersetpull_requests: + pull_request23198
2021-01-20 15:36:16vstinnersetmessages: + msg385348
2021-01-20 14:20:27vstinnersetmessages: + msg385343
2021-01-20 10:43:56vstinnersetmessages: + msg385326
2021-01-20 10:38:42vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request23090
2020-12-21 18:37:22Michael.Feltsetmessages: + msg383548
2020-12-21 15:53:31Michael.Feltsetmessages: + msg383535
2020-11-17 17:45:58mark.dickinsonsetmessages: + msg381262
2020-11-17 17:38:30vstinnersetmessages: + msg381260
2020-11-17 14:23:43Michael.Feltsetmessages: + msg381238
2020-11-16 21:33:06David.Edelsohnsetmessages: + msg381166
2020-11-16 21:31:08vstinnersetmessages: + msg381164
2020-11-16 17:27:33Michael.Feltsetmessages: + msg381128
2020-11-16 16:48:08vstinnersetmessages: + msg381119
2020-11-16 16:20:26vstinnersetmessages: + msg381115
2020-11-16 15:22:31David.Edelsohnsetmessages: + msg381111
2020-11-16 12:53:46Michael.Feltsetmessages: + msg381096
2020-11-11 17:37:17David.Edelsohnsetmessages: + msg380778
2020-11-11 16:54:11vstinnersetnosy: + David.Edelsohn
messages: + msg380774
2020-11-11 16:31:16mark.dickinsonsetmessages: + msg380769
2020-11-11 13:26:31vstinnersetmessages: + msg380756
2020-11-11 12:41:05mark.dickinsonsetmessages: + msg380755
2020-11-11 12:39:26mark.dickinsonsetmessages: + msg380754
2020-11-11 12:07:01xtreaksetnosy: + Michael.Felt
2020-11-11 11:34:58vstinnercreate