classification
Title: multiprocessing package doesn't flush stderr on child exception
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.5, Python 3.4, Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: brandjon, jnoller, memeplex, neologix, pitrou, python-dev, r.david.murray
Priority: normal Keywords: patch

Created on 2012-01-18 01:15 by brandjon, last changed 2015-08-27 21:42 by memeplex. This issue is now closed.

Files
File name Uploaded Description Edit
mpstderrflush.patch pitrou, 2012-01-23 17:10
mpstderrflush2.patch pitrou, 2012-01-24 09:36 review
Messages (21)
msg151508 - (view) Author: Jon Brandvein (brandjon) Date: 2012-01-18 01:15
When a child process exits due to an exception, a traceback is written, but stderr is not flushed. Thus I see a header like "Process 1:\n", but no traceback.

I don't have a development environment or any experience with Mecurial, so I'm afraid there's no patch, but it's a one-liner.

In   /Lib/multiprocess/process.py :: Process._bootstrap

    except:
        exitcode = 1
        import traceback
        sys.stderr.write('Process %s:\n' % self.name)
        sys.stderr.flush()
        traceback.print_exc()

Append a "sys.stderr.flush()" to the suite.

It surprised me that flushing was even necessary. I would've thought that the standard streams would all be closed just before the process terminated, regardless of exit status. But I observe that unless I explicitly flush stdout and stderr before terminating, the output is lost entirely, even if the exit is not abnormal. This isn't the desired behavior, is it?
msg151509 - (view) Author: Jon Brandvein (brandjon) Date: 2012-01-18 01:17
(Er, that should be /Lib/multiprocessing/process.py :: Process._bootstrap of course.)
msg151779 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-22 17:32
> But I observe that unless I explicitly flush stdout and stderr before 
> terminating, the output is lost entirely, even if the exit is not 
> abnormal. This isn't the desired behavior, is it?

Indeed that's a bit surprising. Which Python version are you using? Under Unix or Windows?
msg151781 - (view) Author: Jon Brandvein (brandjon) Date: 2012-01-22 17:58
On Windows, the problem appears under Python 3.2.2 and 3.1.3, but not under 2.7.1. On Linux, I have not reproduced the problem on versions 2.6.3, 2.7.2, 3.1.1, or 3.2.2.

So to summarize:

  - It seems there should be a stderr flush call on the line I indicated, for symmetry with the surrounding code.
  - Even without this line, it should still be flushed automatically upon child process exit, but this doesn't happen under Windows and Python 3.x.
msg151782 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-22 18:05
Le dimanche 22 janvier 2012 à 17:58 +0000, Jon Brandvein a écrit :
> Jon Brandvein <jon.brandvein@gmail.com> added the comment:
> 
> On Windows, the problem appears under Python 3.2.2 and 3.1.3, but not
> under 2.7.1. On Linux, I have not reproduced the problem on versions
> 2.6.3, 2.7.2, 3.1.1, or 3.2.2.

Thanks.

>   - Even without this line, it should still be flushed automatically
> upon child process exit, but this doesn't happen under Windows and
> Python 3.x.

Yes, that's what surprises me. There's no reason for stderr not to be
flushed implicitly at process end, since that's part of sys.stderr's
destructor, which should be called at shutdown.

That said, it certainly doesn't harm to add a flush() call there.
msg151792 - (view) Author: Jon Brandvein (brandjon) Date: 2012-01-22 23:12
I've been looking over this package some more, and in particular, /Lib/multiprocessing/forking.py. There's plenty I don't understand, and I do have questions, if you would be willing to indulge me.

I see that both the unix and windows codepaths define an "exit" alias for terminating the child process without performing any cleanup. On unix, this is os._exit (though it calls it directly in Popen.__init__() instead of using the alias). On Windows, it is the win32 ExitProcess() function. A quick check confirms that replacing this windows alias with "exit = sys.exit" causes flushing to occur.

So my main question is: What's wrong with using sys.exit()? I would speculate it's either because there's shared state between child and parent, or to avoid propagating SystemExit through user code in the case freeze_support() was used.

If forking.py is to terminate directly via the OS, I think it's forking.py's responsibility to flush stdout and stderr in main() on the Windows side, the way it does in Popen.__init__() on the unix side.

I also want to point out that the sys.exit() in freeze_support() is unreachable due to the exit() in main().

So it no longer surprises me that the output is not being flushed in Python 3 under windows. What surprises me is that it *is* flushed in Python 2. I remember hearing something about differences between 2 and 3 in how they handle buffering, so I'm investigating this for my own curiosity.

Incidentally, Python 2 still flushes when I remove newlines from my output text, so line buffering doesn't seem to be impacting my observations.
msg151797 - (view) Author: Jon Brandvein (brandjon) Date: 2012-01-23 03:58
Some more information: When I write to a new file created by open(), all versions flush correctly. However, if I reassign sys.stdout to that file, Python 3.x does not (again, under Windows). I wonder what it is that causes these other files to flush.

(Note: I am testing by calling time.sleep() and inspecting the output file during and after the pause.)
msg151799 - (view) Author: Jon Brandvein (brandjon) Date: 2012-01-23 06:38
It turns out the file output was flushing due to garbage collection. When I created and held a global reference to it, it ceased to flush. Clearly, reassigning sys.stdout also held a reference to it. So it wasn't any kind of special sys.stdout-specific logic.

I tried using os.fsync to determine whether data was being saved in an OS-level buffer. But since the OS may be free to ignore fsync, it's kind of worthless for this purpose.

I also found that under Python 2.x, even a low-level exit like os._exit or multiprocessing.win32.ExitProcess, called from within a user-level function in the child, caused flushing.

My best guess is that
  1. Python 2.x is not buffering data at the Python level. I can't see how it could be and still flush it out when calling _exit().
  2. Python 3.x is buffering at the Python level, and the Python File object needs to be destroyed or explicitly flushed before the hard low-level exit() in forking.py.

The solutions I can think of for Python 3.x are:
  1. Replace "exit = win32.ExitProcess" with "exit = sys.exit". All outstanding file objects will be destroyed and flushed naturally as the interpreter is torn down.
  2. Add an explicit stdout/stderr flush where appropriate in forking.py and process.py, to ensure tracebacks get written and to match the unix behavior. Leave it to the user to worry about flushing their own streams.
  3. Continue to use win32.ExitProcess, but add some kind of mechanism for walking through all existing Python File objects and flushing/destroying them. This was a fleeting thought; it really amounts to reimplementing the behavior of destructing the Python interpreter.

I'd really like to hear if there are good reasons for why (1) isn't how it's done currently. I'd also like to hear an explanation of Python 2.x's buffering.
msg151808 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-23 11:10
> I also found that under Python 2.x, even a low-level exit like
> os._exit or multiprocessing.win32.ExitProcess, called from within a
> user-level function in the child, caused flushing.

The difference is the following:
- Python 2.x uses C stdio (write() calls C fwrite(), flush() calls C
fflush(), etc.); buffering is managed by the libc and what you see at
shutdown is the behaviour of your platform's libc
- Python 3.x uses its own buffering mechanism; it flushes automatically
when the object destructor is called, but os._exit() bypasses all
destructors

Now why os._exit() is used. The problem with sys.exit() is that it's too
high-level: it merely raises a SystemExit exception. That exception can
be caught by upper levels in the code stack. When you use fork() and you
are in the child process, you don't want to give back control to the
calling function, especially if that function isn't fork()-aware (think
about what happens if that function writes to a file, both in the child
and the parent).

This happens for example when running multiprocessing's own test suite:
if forking.py used sys.exit(), the child's SystemExit exception would be
caught by the unittest framework, be interpreted as a test failure, and
the rest of the test suite would proceed... in both processes!

It is less obvious, however, why ExitProcess is used under Windows.
Windows doesn't use fork(), it launches a separate process from scratch.
Perhaps for "consistency" with the Unix behaviour.

>   2. Add an explicit stdout/stderr flush where appropriate in
> forking.py and process.py, to ensure tracebacks get written and to
> match the unix behavior. Leave it to the user to worry about flushing
> their own streams.

That's the simplest solution, and the least likely to break
compatibility with user code.
msg151828 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-23 17:10
I propose applying the following patch.
We could open a separate issue to suggest "exit = sys.exit" under Windows (it doesn't seem to break any tests).
msg151833 - (view) Author: Jon Brandvein (brandjon) Date: 2012-01-23 18:08
Regarding the patch: I'd also like to see sys.stdout.flush() and sys.stderr.flush() between "exitcode = self._boostrap()" and "exit(exitcode)" in /Lib/multiprocessing/forking.py :: main(). (The extra stderr flush would be for symmetry with Popen.__init__() for unix.)

The remainder of this post is what I wrote before seeing your patch.


Thank you for the explanation concerning libc and buffering.

As for os._exit, I can see the need under unix, where the child process has a stack inherited from the parent. Also, allowing the child to cleanup shared resources would be disastrous for the parent. (I was going to propose the idea of making a sys.exit()-like function that did not rely on the exception mechanism to unwind the stack. But it wouldn't be usable here for this reason.)

Under Windows, the child process's multiprocessing.forking.main() is launched either by directly calling main() on the command line, or from within freeze_support() in the user program. In the latter case, the user can be advised not to catch SystemExit around freeze_support(), just as they are already advised to make calling freeze_support() the first statement within "if __name__ == '__main__':". So I don't see any harm in switching to sys.exit() there.


I think that in general, the differences between multiprocessing's behavior under unix and windows are a weakness, as is the lack of a precise specification for this behavior.

At the same time, multiprocessing is a lot more convenient to use than the subprocess module when the child process is a Python program. In particular, I use multiprocessing for its support of passing pickled objects between parent and child. With subprocess, I don't think it's even possible to share a file descriptor between parent and child under Windows.

I'm wondering whether it would be desirable to make an option for the unix side of multiprocessing to behave more like the windows one, and invoke the Python interpreter from the beginning with no trace of the parent's stack. That is, it'd follow a more traditional fork()-then-exec*() pattern. Then you'd be able to have the interpreter destruct naturally under both platforms, close resources, etc.

Are there use cases that require just forking under unix? Is the performance significantly better without an exec*()?

Perhaps it would be better If I took this discussion to python-ideas.
msg151834 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2012-01-23 18:21
There's already a bug / pending patch for this behavior here:

http://bugs.python.org/issue8713

No need to take it to -ideas.
msg151886 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-01-24 09:07
> I propose applying the following patch.

The test looks good to me (except we don't remove TESTFN explicitely,
but I'm not sure it's really necessary).

As for the patch, couldn't we put all the file stream flushing in one place?

stdout and stderr are already flushed on Unix (which explains why this
only happens on Windows: the libc puts non-tty streams in _IOFBF, so
this wouldn't work without explicit flushing neither).

Lib/multiprocessing/forking.py:Popen.__init__:
           code = process_obj._bootstrap()
           sys.stdout.flush()
           sys.stderr.flush()
           os._exit(code)

Also, there's already a flush() a couple lines above (in _bootstrap):
           sys.stderr.write(e.args[0] + '\n')
           sys.stderr.flush()

So I'd suggest to just put:

           sys.stdout.flush()
           sys.stderr.flush()

At the end of _bootstrap, and remove the flushing done by
Popen.__init__ on Unix.
msg151888 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-24 09:36
Updated patch sanitizing the various flushes done on exit, as per Charles-François's recommendation. Also removes TESTFN explicitly.
msg151890 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-01-24 10:32
LGTM.

(I just noticed a bug in Rietveld: when one selects "expand 10 after",
the line right after that marker appears duplicated in the new view).
msg151917 - (view) Author: Jon Brandvein (brandjon) Date: 2012-01-24 17:09
Patch looks fine. I like the use of "finally" for the flush.
msg152078 - (view) Author: Roundup Robot (python-dev) Date: 2012-01-27 09:58
New changeset 2863d9273abd by Antoine Pitrou in branch '3.2':
Issue #13812: When a multiprocessing Process child raises an exception, flush stderr after printing the exception traceback.
http://hg.python.org/cpython/rev/2863d9273abd

New changeset 96c1de5acbd3 by Antoine Pitrou in branch 'default':
Issue #13812: When a multiprocessing Process child raises an exception, flush stderr after printing the exception traceback.
http://hg.python.org/cpython/rev/96c1de5acbd3
msg152079 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-27 09:58
Patch now committed, thanks.
msg249251 - (view) Author: Memeplex (memeplex) Date: 2015-08-27 19:06
I would like to reopen this issue because of the following, very related, behavior. Try this:

```
import multiprocessing as mp
import time


def g():
    time.sleep(100)

def f():
    mp.Process(target=g).start()
    1/0

mp.Process(target=f).start()
```

It won't show the ZeroDivisionError until you keyboard interrupt the g() process or wait for it to end. This is because _exit_function will join every active non-daemon child, which happens before printing and flushing the error. IMO the exception should be shown before joining children, since keeping the error silent is asking for trouble.
msg249255 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-08-27 19:47
You should open a new issue.  Bonus if you can supply a unit test and patch :)
msg249265 - (view) Author: Memeplex (memeplex) Date: 2015-08-27 21:42
http://bugs.python.org/issue24948

I posted a tentative solution there.
History
Date User Action Args
2015-08-27 21:42:44memeplexsetmessages: + msg249265
2015-08-27 19:47:01r.david.murraysetnosy: + r.david.murray
messages: + msg249255
2015-08-27 19:07:09memeplexsetversions: + Python 3.4, Python 3.5
2015-08-27 19:06:46memeplexsetnosy: + memeplex
messages: + msg249251
2012-01-27 09:58:32pitrousetstatus: open -> closed
resolution: fixed
messages: + msg152079

stage: patch review -> resolved
2012-01-27 09:58:04python-devsetnosy: + python-dev
messages: + msg152078
2012-01-24 17:09:56brandjonsetmessages: + msg151917
2012-01-24 10:32:01neologixsetmessages: + msg151890
2012-01-24 09:36:22pitrousetfiles: + mpstderrflush2.patch

messages: + msg151888
2012-01-24 09:07:39neologixsetmessages: + msg151886
2012-01-23 18:21:39jnollersetmessages: + msg151834
2012-01-23 18:08:58brandjonsetmessages: + msg151833
2012-01-23 17:10:03pitrousetfiles: + mpstderrflush.patch

nosy: + neologix
messages: + msg151828

keywords: + patch
stage: test needed -> patch review
2012-01-23 11:10:52pitrousetmessages: + msg151808
2012-01-23 06:38:41brandjonsetmessages: + msg151799
2012-01-23 03:58:14brandjonsetmessages: + msg151797
2012-01-22 23:12:56brandjonsetmessages: + msg151792
2012-01-22 18:05:46pitrousetmessages: + msg151782
2012-01-22 17:58:03brandjonsetmessages: + msg151781
2012-01-22 17:32:30pitrousetnosy: + pitrou
messages: + msg151779
2012-01-21 03:14:17terry.reedysetnosy: + jnoller
stage: test needed

versions: - Python 3.4
2012-01-18 01:17:52brandjonsetmessages: + msg151509
2012-01-18 01:15:45brandjoncreate