classification
Title: Modifications to global variables ignored after instantiating multiprocessing.Pool
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Naftali.Harris, sbt, tim.peters
Priority: normal Keywords:

Created on 2014-02-25 22:00 by Naftali.Harris, last changed 2014-02-26 04:40 by Naftali.Harris. This issue is now closed.

Files
File name Uploaded Description Edit
reproduces.py Naftali.Harris, 2014-02-25 22:00 Reproduces the described behavior
Messages (3)
msg212221 - (view) Author: Naftali Harris (Naftali.Harris) * Date: 2014-02-25 22:00
Hi everyone,

It appears that if you use a global variable in a function that you pass to Pool.map, but modify that global variable after instantiating the Pool, then the modification will not be reflected when Pool.map calls that function.

Here's a short script, (also attached), that demonstrates what I mean:

$ cat reproduces.py
from multiprocessing import Pool

name = "Not Updated"
def f(ignored):
    print(name)


def main():
    global name
    p = Pool(3)
    name = "Updated"
    p.map(f, range(3))

if __name__ == "__main__":
    main()
$ python reproduces.py 
Not Updated
Not Updated
Not Updated


If the `name = "Updated"' line is moved above the `p = Pool(3)' line, then the script will print "Updated" three times instead.

This behavior is present in versions 2.6, 2.7, 3.1, 3.2, 3.3, and 3.4. I run Linux Mint 14 (nadia), on an Intel i5-3210M processor (four cores).

Is this expected behavior?

Thanks very much,

Naftali
msg212240 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2014-02-26 04:30
This is expected.  "global" has only to do with the visibility of a name within a module; it has nothing to do with visibility of mutations across processes.  On a Linux-y system, executing Pool(3) creates 3 child processes, each of which sees a read-only *copy* of the state of the module at the time (the usual copy-on-write fork() semantics).  From that point on, nothing done in the main program can have any effect on the data values seen by the child processes, nor can anything done by a child process have any effect on the data values seen by the main program or by the other child processes, unless such data values are _explicitly_ shared via one of the cross-process data sharing mechanisms the multiprocessing module supports.

So, in your program, all child processes see name == "Not Updated", because that's the value `name` had at the time the processes were created.  The later

    name = "Updated"

changes the binding in the main program, and only in the main program.  If you want child processes to see the new value you should, e.g., pass `name` to f().
msg212241 - (view) Author: Naftali Harris (Naftali.Harris) * Date: 2014-02-26 04:40
Oh, ok, that makes a lot of sense. Thanks for the clear and patient explanation, Tim! Sorry to have bothered the Python bug tracker with this.

--Naftali
History
Date User Action Args
2014-02-26 04:40:55Naftali.Harrissetstatus: open -> closed
resolution: not a bug
messages: + msg212241
2014-02-26 04:30:36tim.peterssetnosy: + tim.peters
messages: + msg212240
2014-02-26 04:00:24ned.deilysetnosy: + sbt
2014-02-25 22:00:00Naftali.Harriscreate