classification
Title: Support POSIX_SPAWN_USEVFORK flag in posix_spawn
Type: enhancement Stage: resolved
Components: Extension Modules Versions: Python 3.8
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: fweimer, gregory.p.smith, pablogsal, ronaldoussoren, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2018-09-13 18:11 by pablogsal, last changed 2018-09-20 14:14 by ronaldoussoren. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 9271 closed pablogsal, 2018-09-13 18:15
Messages (29)
msg325272 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-09-13 18:11
In some systems, posix_spawn has an optional flag (POSIX_SPAWN_USEVFORK) that is GNU specific and allows the user to force posix_spawn to spawn the child using vfork instead of fork. This is very beneficial as it gives great speedups compare with normal fork+execv.
msg325287 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-13 19:09
I suggest to name the parameter "use_vfork", or maybe even "vfork".
msg325295 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-13 20:38
We discussed with Gregory P. Smith, and we agreed on the "use_vfork=True" name. He is a native english speaker, so I rely on him :-) Moreover, "use_vfork" is closer to POSIX_SPAWN_USEVFORK constant than "vfork".
msg325404 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-14 21:43
I discussed with Pablo (at the CPython sprint) about the case of FreeBSD.

On FreeBSD, posix_spawn() always uses vfork, so it can be surprising to get an error when using use_vfork=True. But, the error message is now very explicit: it mentions that a specific constant is not available on FreeBSD. The documentation is also clear on that point.
msg325418 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2018-09-15 02:31
Given the FreeBSD default and seeming desirability of vfork for this use case, is there a good reason using vfork could not be the default behavior on any OS that supports it?
msg325422 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-15 04:37
If vfork() is used ob FreeBSD, wouldn't be better to make use_vfork=True the default on FreeBSD and raise an error on use_vfork=False?
msg325428 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-15 09:10
Gregory: on Linux, it does change the behavior. The parent blocks until the child is spawned. Python should let the developer decide to opt-in.

Serhiy: IMHO it's better to make posix_spawn() as dumb as possible, a simple wrapper to the C call. use_vfork=True is currently designed for the GNU constant.

posix_spawn(use_vfork=True) raises on FreeBSD: only pass use_vfork=True on Linux.

What do you think?

The problem of changing the default is that we don't know the full list of all platforms that use vfork() by default. If we have a shoet list, how do we know if it's complete? For example, macOS doesn't say anything about vfork. Does it use it? Maybe not. Maybe yes. What if the default changes on a platform?
msg325625 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-09-18 10:26
@vstinner: another option is to ignore "use_vfork" on platforms that don't have POSIX_SPAWN_USEVFORK. Using vfork or not is primarily a optimisation, ignoring the flag should not result in different behaviour (other than speed).
msg325628 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-18 10:37
If this is an optimization, what is the downside of always using vfork()?
msg325629 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-09-18 10:44
Good question. A comment on <https://stackoverflow.com/questions/2731531/faster-forking-of-large-processes-on-linux> says that glibc already uses vfork where appropriate, explicitly using the flag may not be necessary. 

Note that I haven't verified if the comment is correct.
msg325630 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-09-18 11:27
The documentation says:

 The child process is created using vfork(2) instead of fork(2) when
       either of the following is true:

       *  the spawn-flags element of the attributes object pointed to by
          attrp contains the GNU-specific flag POSIX_SPAWN_USEVFORK; or

       *  file_actions is NULL and the spawn-flags element of the attributes
          object pointed to by attrp does not contain
          POSIX_SPAWN_SETSIGMASK, POSIX_SPAWN_SETSIGDEF,
          POSIX_SPAWN_SETSCHEDPARAM, POSIX_SPAWN_SETSCHEDULER,
          POSIX_SPAWN_SETPGROUP, or POSIX_SPAWN_RESETIDS.

So using the flag is necessary if you want to force the use of vfork when passing any of the flags specified in the second point.
msg325645 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-18 13:54
Serhiy Storchaka: "If this is an optimization, what is the downside of always using vfork()?"

I don't know the vfork() function, but you can find articles like:
"vfork considered dangerous" (old article of 2012)
https://ewontfix.com/7/

But it's unclear to me if vfork() drawbacks also affect posix_spawn(). posix_spawn() is well defined: call vfork() and then immediately exec().

Another article:

"First is that vfork pauses the parent thread while the child executes and eventually calls an exec family function, this is a huge latency problem for applications."

https://developers.redhat.com/blog/2015/08/19/launching-helper-process-under-memory-and-latency-constraints-pthread_create-and-vfork/
msg325647 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-09-18 14:05
vfork() is more dangerous than fork() because the parent and child processes share memory (not copy-on-write, but really the same memory). Whether or not this affects posix_spawn depends on its implementation (to give a very vague statement).

Glibc already uses vfork() in a number of cases, I'd expect that those are the cases where it is safe to use vfork() in the implementation of posix_spawn in the context of glibc.  I'd therefore carefully test the use of vfork() in other cases to make sure those don't affect the parent process.
msg325651 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-18 14:18
If glibc already uses vfork() if it is safe, I suppose that using vfork() in other cases is unsafe, and we shouldn't support this nonstandard option.
msg325660 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-09-18 15:57
You still need to allow the flag as being safe or unsafe depends on the user code as glibc cannot know about the details of what is going to be executed. That is the reason they have the flag, so the user can disambiguate if is safe or not. If we don't expose the flag, the user may know that what is going to do is safe or acceptable but then they cannot activate the use of vfork.
msg325661 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-09-18 16:04
Some interesting read:

Go is using vfork/posix_spawn when possible:

https://github.com/golang/go/issues/5838

And it seems that have interesting results:

https://about.gitlab.com/2018/01/23/how-a-fix-in-go-19-sped-up-our-gitaly-service-by-30x/
msg325673 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-09-18 19:32
I get the impression that go uses vfork on Linux, not posix_spawn, because go doesn't use libc (based on reading the referenced issue, not on deep knowledge of go and its implementation).

I do wonder why glibc's implementation of posix_spawn doesn't use vfork more often, and in which cases it is safe to explicitly use vfork even when glibc won't do so by default.

Enabling the use of vfork without determining when it safe to do so is asking for problems, and hard to debug/reproduce ones at that (due to vfork semantics).  Adding a "use_vfork" keyword argument to posix_spawn is IMHO not the right way to go, it would be better to determine when using vfork is safe and then unconditionally enable it.  Otherwise users will have to do the research.
msg325677 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-09-18 19:57
I did some more research:

* <https://sourceware.org/bugzilla/show_bug.cgi?id=10354> seems to indicate that glibc switched to a different implementation of posix_spawn that uses clone(2) instead of fork()/vfork() to avoid some problems with vfork(). 

The start of the issue also contains some information on why glibc is (was?) so conservative about using vfork, and a possible work around (disable cancelation points around the call to posix_spawn).

* <https://github.com/lattera/glibc/blob/master/sysdeps/unix/sysv/linux/spawni.c> is the source for the guts of posix_spawn for some version of glibc, and that indeed does use clone(...|CLONE_VFORK...) unconditionally and does not appear to test for POSIX_SPAWN_USEVFORK. 



It looks like the advise to use POSIX_SPAWN_USEVFORK is outdated, although I'm not 100% sure of my conclusion.  A glibc expert should be able to confirm or refute this. 

@pablogsal: do you have more information on why you want to enable this flag? Do you have measurements that show that adding this flag helps?
msg325678 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-18 20:05
Oh wow, I didn't expect that exposing a constant would be a source of such deep debate!

(I'm not saying that the debate is useless or negative, it's useful and constructive, just I'm surprised how system programming can be hard sometimes!)
msg325697 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2018-09-18 23:38
Give all of this, the lesson I'd take away is perhaps that we should just provide the constant in the os module when available at build time (a configure check) and let people who find a need to use it on their system check for its presence and use it in their code.

The general theme of the posix / os module is intentionally low level:  Expose the APIs and let people make decisions about when to use what at a higher level.

It sounds like a use_vfork=XXX parameter is undesirable at this point unless you wanted to make it clear by its name that it is a glibc only thing feature glibc_use_vfork=XXX perhaps (if exposing the POSIX_SPAWN_USEVFORK flag name itself to be or'ed in is undesirable from a Pythonic API point of view).

The problem with a parameter such named is that you then have to decide on error semantics and need a way for people to know when they can or can't use it prior to calling posix_spawn() itself.

With a constant that you or into flags, you can use hasattr to check the presence of the constant to determine if you can try using it.  (the libc call could still return an error which we'd turn into an OSError exception if the API call doesn't support that flag or otherwise doesn't like the combination of flags you passed - but that situation is always possible in any API)
msg325743 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-09-19 10:43
Note that the POSIX_SPAWN_USEVFORK may not do anything at this point (see the link in my previous message, if I read the code correctly POSIX_SPAWN_USEVFORK is no longer used in the implementation of posix_spawn(3)).

Even if it did do something the user that uses the flag needs to evaluate whether or not it is safe to do so, and this requires inspecting the os.posix_spawn implementation and not just the Python code that calls it. 

IMHO we shouldn't expose or use this flag.

If it were useful to do anything with the flag the os.posix_spawn implementation should do so automatically when it is safe to do so (which may require additional steps around calling posix_spawn(3)).
msg325744 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-19 10:53
I'm discussing with Pablo to see how to use posix_spawn() in the Python subprocess module. IMHO we should consider the specific case of subprocess.

What is the expected API?

kw = {}
if hasattr(os, 'POSIX_SPAWN_USEVFORK'):
   kw['flags'] = os.POSIX_SPAWN_USEVFORK
posix_spawn(*args, **kw)

or

posix_spawn(*args, use_vfork=True)

or

kw = {}
if sys.platform == 'linux':
   kw['use_vfork'] = True
posix_spawn(*args, **kw)

?

For example, if we consider that it's safe to use POSIX_SPAWN_USEVFORK in all cases for posix_spawn(), maybe we should not add an option at the Python level, and hardcode the POSIX_SPAWN_USEVFORK flag in the C code?


--

> if I read the code correctly POSIX_SPAWN_USEVFORK is no longer used in the implementation of posix_spawn(3))

Ok, now I'm confused: what's the point of this issue if the flag became useless? :-)
msg325747 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-09-19 11:26
The removal of POSIX_SPAWN_USEVFORK in glibc was somewhat recent. There is also multiple implementations of posix_spawn in glibc source tree.

vfork was introduced in
commit 9ad684229e7cf2f0b3c6068b2122701d167a5794
Author: Ulrich Drepper <drepper@redhat.com>
Date:   Sun Sep 12 18:05:37 2004 +0000

The new implementation of posix_spawn for unix (/sysdeps/unix/sysv/linux/spawni.c) was introduced in
commit 9ff72da471a509a8c19791efe469f47fa6977410
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Tue Jan 19 17:33:32 2016 -0200

The new implementation of posix_spawn for posix (sysdeps/posix/spawni.c) was introduced in
commit ccfb2964726512f6669fea99a43afa714e2e6a80
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Mon Apr 24 15:50:19 2017 -0300

In these two last commits, posix_spawn removed the flag and made it a no-op. But notice that these commits are "recent" (2016 and 2017). Any older version of glibc that uses python will not benefit from these changes and if it wants to use vfork, it needs the flag.

At this point and after this interesting discussion I am not sure that we should expose the flag or not, but I still want to remark that even if glibc removed the flag for both implementations (unix and posix) we may want to expose the flag for older versions of glibc. I discover this problem when I was benchmarking posix_spawn in different platforms and systems and I notice a performance decrease in some older version of ubuntu from 2015. In that platform (older version of glibc) you need the flag to activate vfork and get the performance benefit. The glibc version for this systems is straightforward:

-  if ((flags & POSIX_SPAWN_USEVFORK) != 0
-      /* If no major work is done, allow using vfork.  Note that we
-        might perform the path searching.  But this would be done by
-        a call to execvp(), too, and such a call must be OK according
-        to POSIX.  */
-      || ((flags & (POSIX_SPAWN_SETSIGMASK | POSIX_SPAWN_SETSIGDEF
-                   | POSIX_SPAWN_SETSCHEDPARAM | POSIX_SPAWN_SETSCHEDULER
-                   | POSIX_SPAWN_SETPGROUP | POSIX_SPAWN_RESETIDS
-                   | POSIX_SPAWN_SETSID)) == 0
-         && file_actions == NULL))
-    new_pid = __vfork ();
-  else
-    new_pid = __fork ();

I am happy to close the issue if everyone agrees that we should look at the future and not care about these older versions of glibc < 2017/2016 (depending on posix/unix version) or this does not worth the hassle.

I am still happy with this great and interesting discussion, so thank you everyone :)


Complete story of posix_spawn in glibc:

Whole story of posix_spawn:

commit c181840c93d3a8fcb50991b2326c40f34eb5e82b
commit c22845744cd931267e0d7cb2e2b35a0da9c9510a
commit db6b2f25220f1cf345656164211fd549c22189dd
commit ccfb2964726512f6669fea99a43afa714e2e6a80
commit daeb1fa2e1b33323e719015f5f546988bd4cc73b
commit 1a2325c06cf309d1d8b4aafcfb1a3d43905baf9b
commit ecb1482ffd85fd3279642b1dc045aa867ad4d415
commit d96de9634a334af16c0ac711074c15ac1762b23c
commit f574184a0e4b6ed69a5d9a3234543fba6d2a7367
commit cfa28e560ef69372b9e15e9a2d924a0fbcfc7bca
commit 1b8373f475105307ee3b64d423ffec995ddd6cde
commit 7231452e5cffcd506f7e7402484708740bc07c18
commit a334319f6530564d22e775935d9c91663623a1b4
commit 0ecb606cb6cf65de1d9fc8a919bceb4be476c602
commit 9ad684229e7cf2f0b3c6068b2122701d167a5794
commit 73299943388c0eebf6a9c8d6288e9da8289f9dca
commit 284128f68f27567f9cad0078c97d7d807475e0a7
commit ba3752d5322448ab54b71c82e63a09542042a3b6
commit 2958e6cc5f39ac2487b4fcbc2db48462a34ce23d
commit 4aebaa6bd906383aca1b7a5e1251b0d1652f9f7c
commit 08c7f6b0082b1b645348518fdc42643b5580d87c
commit a5a6f9262eeffab9f78622258fae306d1bf99d04
msg325765 - (view) Author: Florian Weimer (fweimer) Date: 2018-09-19 14:15
I wouldn't bother with POSIX_SPAWN_USEVFORK on GNU/Linux.  Current versions of glibc always use a vfork-style clone call, so there would be a difference on older versions only.  But there, the vfork code has subtle bugs, so using POSIX_SPAWN_USEVFORK there is not a good idea, either.
msg325770 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-09-19 14:41
Ok, I am going to close the issue and the PR unless someone comments on this issue in 24h saying that we still need to expose the flag and providing an explanation.

Thank you everyone for this interesting discussion :)
msg325773 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-19 15:14
I rely on Florian Weimer who maintains the glibc for Red Hat, and so I agree to close the issue. It seems like Pablo wants to keep the issue open 24h, so I let him close it later ;-)

The good news is that calling posix_spawn() with no flag is safe in all glibc versions and it's fast by default on recent glibc versions ;-)
msg325807 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2018-09-19 22:20
Agreed on not exposing it.

It seems obsolete in recent glibc and the older glibc implementations that had it may have made questionable decisions. :)

Thanks for chiming in Florian, and thanks Pablo for your detailed investigation. :)
msg325871 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-20 13:34
> Thanks for chiming in Florian, and thanks Pablo for your detailed investigation. :)

I concur, very interesting talk ;-)
msg325879 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-09-20 14:14
Same here, even if I learned more about the implementation of posix_spawn than I should have to know ;-)
History
Date User Action Args
2018-09-20 14:14:34ronaldoussorensetmessages: + msg325879
2018-09-20 13:34:41vstinnersetmessages: + msg325871
2018-09-20 10:49:55pablogsalsetstatus: open -> closed
resolution: rejected
stage: patch review -> resolved
2018-09-19 22:20:19gregory.p.smithsetmessages: + msg325807
2018-09-19 15:14:55vstinnersetmessages: + msg325773
2018-09-19 14:41:35pablogsalsetmessages: + msg325770
2018-09-19 14:15:37fweimersetnosy: + fweimer
messages: + msg325765
2018-09-19 11:26:03pablogsalsetmessages: + msg325747
2018-09-19 10:53:03vstinnersetmessages: + msg325744
2018-09-19 10:43:03ronaldoussorensetmessages: + msg325743
2018-09-18 23:38:04gregory.p.smithsetmessages: + msg325697
2018-09-18 20:05:04vstinnersetmessages: + msg325678
2018-09-18 19:57:24ronaldoussorensetmessages: + msg325677
2018-09-18 19:32:01ronaldoussorensetmessages: + msg325673
2018-09-18 16:04:51pablogsalsetmessages: + msg325661
2018-09-18 15:57:02pablogsalsetmessages: + msg325660
2018-09-18 14:18:02serhiy.storchakasetmessages: + msg325651
2018-09-18 14:05:55ronaldoussorensetmessages: + msg325647
2018-09-18 13:54:55vstinnersetmessages: + msg325645
2018-09-18 11:27:00pablogsalsetmessages: + msg325630
2018-09-18 10:44:10ronaldoussorensetmessages: + msg325629
2018-09-18 10:37:54serhiy.storchakasetmessages: + msg325628
2018-09-18 10:26:44ronaldoussorensetnosy: + ronaldoussoren
messages: + msg325625
2018-09-15 09:10:09vstinnersetmessages: + msg325428
2018-09-15 04:37:09serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg325422
2018-09-15 02:31:59gregory.p.smithsetmessages: + msg325418
2018-09-14 21:43:30vstinnersetnosy: + gregory.p.smith
messages: + msg325404
2018-09-13 20:38:19vstinnersetmessages: + msg325295
2018-09-13 19:09:11vstinnersetnosy: + vstinner
messages: + msg325287
2018-09-13 18:15:45pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request8703
2018-09-13 18:11:38pablogsalcreate