This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Windows and Unix run-time differences
Type: behavior Stage: resolved
Components: Windows Versions: Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Kallah, eryksun, paul.moore, steve.dower, steven.daprano, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2020-01-08 08:44 by Kallah, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
sync.py Kallah, 2020-01-08 08:44 Program to highlight the issue
Messages (8)
msg359569 - (view) Author: Kallah (Kallah) * Date: 2020-01-08 08:44
In the attached sync.py, running it on windows and Unix (Ubuntu and OSX tested) will grant different results. On windows it will output:
x = 1
x = 2
x = 3
y = 1
x = 4
x = 5
x = 6
x = 7
y = 1
While on ubuntu it will output:
x = 1
x = 2
x = 3
y = 4
x = 4
x = 5
x = 6
x = 7
y = 8
msg359573 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-01-08 09:57
I'm sorry, but perhaps I may have missed something here. The behaviour you show is what I would expect. In fact, I would expect that any two runs of your code will likely produce different output, even on the same machine using the same OS. I just ran it twice in Python 3.5 and got different results on the same machine:

    Run 1           Run 2
    x = 1           x = 1
    x = 2           x = 2
    x = 3           x = 3
    x = 4           x = 4
    y = 5           y = 5
    x = 5           x = 5
    x = 6           x = 6
    x = 7           x = 7
    x = 8           y = 8
    y = 9           x = 8
    x = 9           x = 9


You are running code concurrently in multiple processes. The order that the results are printed is unpredictable and will depend on many factors, including the OS.

Can you give any reasons why you consider this is to be a bug rather than normal behaviour?

I'm not going to close this as "Not a bug", since I'm not a multiprocessing expert and I may have misunderstood something, but it looks to me like normal behaviour when using threading or multiprocessing.
msg359574 - (view) Author: Kallah (Kallah) * Date: 2020-01-08 10:17
The difference here is that on Windows y will never change, it will stay 1 forever while on Unix systems y will increment. Having done a bit more research it seems this is due to the way multiprocessing works on Windows vs Unix systems. In unix systems the new thread is a fork of the parent while in Windows it is a whole new process built from scratch (if I am understanding it correctly). I am not going to close it as I am unsure if it is by design that Windows and Unix python acts differently.
msg359581 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-01-08 11:38
> I am not going to close it as I am unsure if it is by design that 
> Windows and Unix python acts differently.

For compatibility, a script should support the spawn start method. Spawning child processes is the only available start method in Windows, and, as of Python 3.8 (see issue 33725), it's the default start method in macOS. This entails passing picklable objects as arguments or with a multiprocessing.Queue or multiprocessing.Pipe -- instead of relying on global values that get inherited via fork. 

With a pool you can set up globals with an initializer function. Here's an example of the latter that manually selects the spawn start method:

    import multiprocessing as mp

    def pool_init(x_value, y_value):
        global x, y
        x = x_value
        y = y_value

    if __name__ == '__main__':
        mp.set_start_method('spawn')
        pool = mp.Pool(processes=2, initializer=pool_init,
                        initargs=(mp.Value('i', 0), mp.Value('i', 0)))
msg359609 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-01-08 16:37
Agreed it's not a bug.

The best we could do is display a warning that fork is not portable (won't work on macOS anymore either, IIRC) and you should at least verify that spawn behaves the same.
msg359613 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2020-01-08 17:13
Agreed it's not a bug, but I will say it took me a while to work out *why* it's not a bug (namely, that even though the OP is using shared memory values, the code relies on fork semantics to share the two Value objects that *reference* the shared memory).

It would be worth adding a note to the documentation on shared memory values at https://docs.python.org/3.8/library/multiprocessing.html#sharing-state-between-processes to make it clearer that it's the user's responsibility to make sure the Value object is passed to every subprocess that wants to interact with it.
msg359618 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-01-08 17:53
> Agreed it's not a bug, but I will say it took me a while to work out 
> *why* it's not a bug (namely, that even though the OP is using shared 
> memory values, the code relies on fork semantics to share the two 
> Value objects that *reference* the shared memory).

The programming guidelines cover this under "explicitly pass resources to child processes" and "the spawn and forkserver start methods". Even for scripts that will only ever use the fork start method, it explains why inheriting globals may be a problem due to garbage collection in the parent. What can be done to make the advice there more visible and easily understood?

I'd guess that, even though it took you a while to spot the problem, you wouldn't make the same mistake if writing this from scratch -- assuming you've read and understood the programming guidelines. There's nothing about an arbitrary Value instance that would allow a spawned child process to map it to the shared memory of a Value in the parent process. That information has to be pickled and sent to the child.
msg359630 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2020-01-08 19:53
For me, I headed straight for "Sharing state between processes" and the "Shared memory" object. That's probably because I was reviewing someone else's code, rather than writing my own, but nevertheless when coding I do tend to dive straight for the section that describes what I want to do, and miss "overview" type discussions.

The way the shared memory object is described, it reads that it is just that - shared. And so I'd assume that if a shared memory object is in multiple processes in a pool, it would be the *same* shared memory region, and the value would be accessible from all the processes.

From there, for me at least, it's easy to proceed to the mistake of thinking that the global initialisation of the x and y variables creates the *same* shared memory objects in each process in the pool. Clearly it doesn't, hence this is "not a bug" but for me it's an easy mistake to make.

Maybe it would be enough just to add a comment to the shared memory object documentation that said "every shared memory object is independent - there is no way to create a reference to the same shared memory object in multiple processes, instead you need to create the object in one process and pass it to all of the others". That would probably have made me stop and think long enough to not make the mistake I did.
History
Date User Action Args
2022-04-11 14:59:25adminsetgithub: 83436
2020-01-08 19:53:42paul.mooresetmessages: + msg359630
2020-01-08 17:53:04eryksunsetmessages: + msg359618
2020-01-08 17:13:39paul.mooresetmessages: + msg359613
2020-01-08 16:37:42steve.dowersetmessages: + msg359609
2020-01-08 11:38:59eryksunsetstatus: open -> closed

nosy: + eryksun
messages: + msg359581

resolution: not a bug
stage: resolved
2020-01-08 10:17:48Kallahsetmessages: + msg359574
2020-01-08 09:57:10steven.dapranosetnosy: + steven.daprano
messages: + msg359573
2020-01-08 08:44:16Kallahcreate