classification
Title: AIX: makexp_aix, parallel build (failures) and ld WARNINGS
Type: behavior Stage: resolved
Components: Build Versions: Python 3.10, Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Parallel build race condition on AIX since python-2.7
View: 19521
Assigned To: Nosy List: BTaskaya, Michael.Felt, kadler, skrah
Priority: normal Keywords: patch

Created on 2020-04-28 16:20 by Michael.Felt, last changed 2020-08-16 21:57 by Michael.Felt. This issue is now closed.

Files
File name Uploaded Description Edit
python3-makexp_aix.patch kadler, 2020-06-15 17:04
Pull Requests
URL Status Linked Edit
PR 19759 closed Michael.Felt, 2020-04-28 16:29
Messages (8)
msg367541 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-04-28 16:19
Currently, on AIX, whenever the -j option is passed to make there are many WARNINGS from the loader (ld) re: duplicate symbols.

While it is not possible to eliminate these warnings completely - as some are not related to the Python build, but external (3rd party) packaging - MOST of these warnings can be eliminated by ensuring that the export file creation completes before additional steps try to use it.

By adding a small test to see if the export file is in the process of being made - and waiting for that to finish - the messages "go away".

The PR that is being proposed only affects AIX (a script named makeaix_exp). The script has not been modified in 22 years - so I guess the -j option is something that showed up after 1998 :)

I know it is not perfect - but removes a tremendous amount of noise - most of the time.

Michael

p.s. requesting backport to 3.8 so all buildbots benefit.
msg371258 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-06-11 08:18
specifically, makexp_aix - from 1998-1999 - did not consider parallelization.

make -j2 is sufficient to create the following issue - that frequently leads to a failed compile/build.

./Modules/makexp_aix Modules/python.exp . libpython3.9d.a;  gcc -pthread     -Wl,-bE:Modules/python.exp -lld -o python Programs/python.o libpython3.9d.a -lintl -ldl  -lm   -lm 
./Modules/makexp_aix Modules/python.exp . libpython3.9d.a;  gcc -pthread     -Wl,-bE:Modules/python.exp -lld -o Programs/_testembed Programs/_testembed.o libpython3.9d.a -lintl -ldl  -lm   -lm 
ld: 0711-418 ERROR: Import or export file Modules/python.exp at line 2:
	A symbol name may only be followed by an export attribute
	or an address. The line is being ignored.
ld: 0711-415 WARNING: Symbol PyAST_Check is already exported.
ld: 0711-415 WARNING: Symbol PyAST_Compile is already exported.
ld: 0711-415 WARNING: Symbol PyAST_CompileEx is already exported.
ld: 0711-415 WARNING: Symbol PyAST_CompileObject is already exported.
...
Over 4000 lines of warnings later:
ld: 0711-415 WARNING: Symbol _Py_write is already exported.
ld: 0711-415 WARNING: Symbol _Py_write_noraise is already exported.
collect2: error: ld returned 8 exit status
Makefile:598: recipe for target 'python' failed
make: *** [python] Error 1
program finished with exit code 2

Explanation: makexp_aix is running in parallel - and writing to python.exp in parallel.

The patch/PR "tames" this - and, hopefully, multiple "fails" per day, of the AIX bots will cease.

p.s. needed in 3.8, 3.9 and the new master (3.10)
msg371553 - (view) Author: Kevin (kadler) * Date: 2020-06-15 14:34
This seems to be a duplicate of https://bugs.python.org/issue19521

The PR for that one seems a little less hacky since it uses make rules to prevent duplication instead of lock files.
msg371573 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-06-15 16:50
Yes, it is less hacky - and something to pursue later - as a better
solution. Even the idea of perhaps no longer needing makexp_aix and/or
ld_so_aix and python.exp is much better solution.

However, the goal of this PR is to have something now - that removes the
pain (e.g., false bot failures and bot report storage impact) asap.

Many thanks for looking - and commenting!

On 15/06/2020 16:34, Kevin wrote:
> Kevin <kadler@us.ibm.com> added the comment:
>
> This seems to be a duplicate of https://bugs.python.org/issue19521
>
> The PR for that one seems a little less hacky since it uses make rules to prevent duplication instead of lock files.
>
> ----------
> nosy: +kadler
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue40424>
> _______________________________________
>
msg371576 - (view) Author: Kevin (kadler) * Date: 2020-06-15 17:04
FYI, here's a patch we've been using with our builds on PASE (an AIX compatibility layer on the IBM i OS). It runs all the echos and nm in a sub-shell so that all the output appears as a continuous stream instead of 3 separate open/write/close events.

There's still a race condition, but since it no longer appends, the last one in will win instead of the mixed result there is now. AFAICT, it gets created much earlier than it gets used so nothing _should_ be reading it while the writers are racing. At least it works for us on PASE with -j16 when building Python 3.6.
msg373033 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-07-05 15:28
Thanks Kevin.

I took your patch (added your name to blurb as well).

Only difference was to remove Qsystem (or something), from the pathnames.
msg375496 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2020-08-15 21:05
I understand that both of you are in favor of #19521 (the patch of
which I have not tried yet).

Can we close this as a duplicate? Please just reopen if you disagree.
msg375522 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020-08-16 21:57
If #19521 had been merged I would be all for closing this as a duplicate. However, if i have read all the comments correctly noone has tested the other pr. 

As the approaches are quite different I think both should be open until a decision is made on the better approach. 

Closing one (asap) is a good idea, especially if that leads to something being merged so this is finally repaired. 

Sent from my iPhone

> On 15 Aug 2020, at 23:07, Stefan Krah <report@bugs.python.org> wrote:
> 
> 
> Stefan Krah <stefan@bytereef.org> added the comment:
> 
> I understand that both of you are in favor of #19521 (the patch of
> which I have not tried yet).
> 
> Can we close this as a duplicate? Please just reopen if you disagree.
> 
> ----------
> nosy: +skrah
> resolution:  -> duplicate
> stage: patch review -> resolved
> status: open -> closed
> superseder:  -> Parallel build race condition on AIX since python-2.7
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue40424>
> _______________________________________
>
History
Date User Action Args
2020-08-16 21:57:52Michael.Feltsetmessages: + msg375522
2020-08-15 21:05:31skrahsetstatus: open -> closed

superseder: Parallel build race condition on AIX since python-2.7

nosy: + skrah
messages: + msg375496
resolution: duplicate
stage: patch review -> resolved
2020-07-05 15:28:06Michael.Feltsetmessages: + msg373033
versions: + Python 3.7
2020-06-15 17:04:20kadlersetfiles: + python3-makexp_aix.patch

messages: + msg371576
2020-06-15 16:50:41Michael.Feltsetmessages: + msg371573
2020-06-15 14:34:19kadlersetnosy: + kadler
messages: + msg371553
2020-06-11 14:28:55BTaskayasetnosy: + BTaskaya
2020-06-11 08:19:46Michael.Feltsettitle: AIX: parallel build and ld WARNINGS -> AIX: makexp_aix, parallel build (failures) and ld WARNINGS
2020-06-11 08:18:55Michael.Feltsetmessages: + msg371258
versions: + Python 3.10
2020-04-28 16:29:50Michael.Feltsetkeywords: + patch
stage: patch review
pull_requests: + pull_request19081
2020-04-28 16:20:00Michael.Feltcreate