classification
Title: subprocess with stderr=subprocess.STDOUT hang
Type: behavior Stage: resolved
Components: Versions: Python 2.7
process
Status: closed Resolution: third party
Dependencies: Superseder:
Assigned To: Nosy List: gregory.p.smith, l4mer, martin.panter, r.david.murray
Priority: normal Keywords:

Created on 2017-10-06 11:39 by l4mer, last changed 2017-10-07 10:28 by l4mer. This issue is now closed.

Messages (12)
msg303815 - (view) Author: (l4mer) Date: 2017-10-06 11:39
I am using ssh mux + authorized key. So can exectute commands without passwod.

1. disconnect mux by: ssh user@localhost -O exit
2. run simple script
import subprocess

if __name__ == '__main__':
    cmd = ["ssh", "user@localhost", "id"]
    try:
        buf = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
    except subprocess.CalledProcessError as e:
        buf = e.output
3. This always hang in read(3,
fcntl(3, F_GETFL)                       = 0 (flags O_RDONLY)
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f036b5f1000
lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
munmap(0x7f036b5f1000, 4096)            = 0
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(3, "uid=0(", 6)                    = 6
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(3, "root) ", 6)                    = 6
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(3, "gid=0(r", 7)                   = 7
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(3, "oot) gro", 8)                  = 8
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(3, "ups=0(roo", 9)                 = 9
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(3, "t)\n", 10)                     = 3
read(3, 0x7f036b480653, 7)              = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=21620, si_status=0, si_utime=1, si_stime=0} ---
read(3,

4. After ctr+c
read(3, ^CProcess 21619 detached
 <detached ...>
Traceback (most recent call last):
  File "test.py", line 6, in <module>
    buf = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/usr/lib/python2.7/subprocess.py", line 567, in check_output
    output, unused_err = process.communicate()
  File "/usr/lib/python2.7/subprocess.py", line 791, in communicate
wigig@wigig-Latitude-E5270:~$     stdout = _eintr_retry_call(self.stdout.read)
  File "/usr/lib/python2.7/subprocess.py", line 476, in _eintr_retry_call
    return func(*args)

BTW, when MUX is created it always works.
msg303816 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-10-06 13:11
Given the title, are you saying that it works if you omit the stderr=subprocess.STDOUT?  At the beginning you say you are using an authorized key and mux so you don't need to enter a password, and at the end you say it works fine if the MUX is created.  So that sounds like the failure is when you are trying to establish the first connection.  Perhaps the problem is that the subprocess doesn't have access to your key for some reason, and therefore ssh is prompting for a password?
msg303817 - (view) Author: (l4mer) Date: 2017-10-06 13:23
yes, works perfectly when omit stderr=subprocess.STDOUT no matter if mux exists.
Also from bash: ssh user@locahost id always works.

Problem exists only when stderr=subprocess.STDOUT and no MUX yet.
BTW, first connection create a MUX - but this seems to be an unimportant detail.

This is 100% reproducible.
msg303818 - (view) Author: (l4mer) Date: 2017-10-06 13:35
BTW, when you will check point 3 you will notice that we already read id output, so command execute correctly: "uid=0(root" ...
And problem is after that.

Steps I reproduce this issue:
- standard user have keys and
 in ~/.ssh/config
Host *
   ControlMaster auto
   ControlPath /tmp/ssh-%r@%h:%p
   ControlPersist yes

- on root (authorized_key installed)
cat ~user/.ssh/id_rsa.pub >> ~root/.ssh/authorized_keys

- on user, reproduce
ssh root@localhost -O exit # destroy mux
strace python test.py # script I paste before

I hit this problem of few machines, with different linux kernels ...
Seems like generic problem?
msg303865 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-10-07 00:41
Presumuing your file descriptor 3 is the read end of the pipe to the child’s output, then there is probably a process somewhere that could still write to the write end. Normally “check_output” waits until it has read all possible output from the pipe(s).

This is probably not a bug in Python. Maybe it is a bug with SSH or your “MUX” (you didn’t explain what that is) leaving a process running in the background that could output to stderr. Try to track down what processes have your pipe open. Find out the number that identifies the pipe. It is the node number in the “lsof” command, or in the symbolic link under /proc:

$ lsof -a -c python -d 3
COMMAND   PID    USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
python3 26025 vadmium    3r  FIFO    0,8      0t0 4913217 pipe
$ ls -l "/proc/$(pgrep python)/fd/3"
lr-x------ 1 vadmium vadmium 64 Oct  6 22:17 /proc/26025/fd/3 -> pipe:[4913217]

Then look through the other files to find if other process(es) have the write end of the pipe open; “cat” in my example:

$ lsof | grep 4913217
python3   26025       vadmium    3r     FIFO        0,8      0t0    4913217 pipe
cat       26026       vadmium    1w     FIFO        0,8      0t0    4913217 pipe
$ ls -l /proc/*/fd/* | grep 4913217
lr-x------ 1 vadmium vadmium 64 Oct  6 22:17 /proc/26025/fd/3 -> pipe:[4913217]
l-wx------ 1 vadmium vadmium 64 Oct  6 22:16 /proc/26026/fd/1 -> pipe:[4913217]

The general problem does seem to be a recurring thing with the subprocess module, so maybe more documentation or other enhancements could help. Similar reports:

* Issue 31447: communicate not respecting timeout due to grandchild process
* Issue 30154: subprocess.run with timeout and output pipe from grandchild
* Issue 26534: check_output with shell=True and timeout doesn’t kill shell’s child
* Issue 23213: communicate hang with stderr leaked in “systemd” daemon
* Issue 13422: communicate hangs as long as a daemon leaves pipes open
* Issue 4216: communicate hang after starting “cpboot” service
msg303866 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2017-10-07 03:50
Please try to reproduce this with Python 3.6.

On 2.7, I highly recommend you stop using the Python 2.7 subprocess module and install the subprocess32 backport available on PyPI. It is *much* more reliable.
msg303870 - (view) Author: (l4mer) Date: 2017-10-07 08:31
janusz@nc6120:~$ ssh -V
OpenSSH_7.2p2 Ubuntu-4ubuntu2.2, OpenSSL 1.0.2g  1 Mar 2016
janusz@nc6120:~$

I) manual check

janusz@nc6120:~$ ssh root@localhost -p 50494 -O exit
janusz@nc6120:~$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess
>>> cmd = ["ssh", "root@localhost", "-p", "50494"]
>>> buf = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
Terminated

janusz@nc6120:~$ ssh root@localhost -p 50494 -O exit
Exit request sent.
janusz@nc6120:~$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess32
>>> cmd= ["ssh", "root@localhost", "-p", "50494"]
>>> buf = subprocess32.check_output(cmd, stderr=subprocess32.STDOUT)
Terminated

II) strace 

Correct case (mux exist):
pipe2([3, 4], O_CLOEXEC)                = 0
fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
fcntl64(3, F_GETFL)                     = 0 (flags O_RDONLY)
pipe2([5, 6], O_CLOEXEC)                = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb747a768) = 2207
close(6)                                = 0
close(4)                                = 0
brk(0x9741000)                          = 0x9741000
read(5, "", 50000)                      = 0
brk(0x9735000)                          = 0x9735000
close(5)                                = 0
fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
_llseek(3, 0, 0xbfc1d350, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
read(3, "root\n", 6)                    = 5
read(3, "", 1)                          = 0
close(3)                                = 0
waitpid(2207, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 2207
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2207, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
rt_sigaction(SIGINT, {SIG_DFL, [], 0}, {0x8183c10, [], 0}, 8) = 0
brk(0x9733000)                          = 0x9733000
exit_group(0)                           = ?
+++ exited with 0 +++

Fail case:
read(7, "menting schedulers (this is what"..., 8192) = 8192
read(7, "\0\0\0_lent\1\0\0\0ht\10\0\0\0h_appendt\5\0\0\0i"..., 4096) = 2240
read(7, "", 4096)                       = 0
close(7)                                = 0
mmap2(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb70c7000
close(6)                                = 0
close(5)                                = 0
close(4)                                = 0
ugetrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=64*1024}) = 0
close(3)                                = 0
pipe2([3, 4], O_CLOEXEC)                = 0
fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
fcntl64(3, F_GETFL)                     = 0 (flags O_RDONLY)
pipe2([5, 6], O_CLOEXEC)                = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7539768) = 2238
close(6)                                = 0
close(4)                                = 0
brk(0xa30e000)                          = 0xa30e000
read(5, "", 50000)                      = 0
brk(0xa302000)                          = 0xa302000
close(5)                                = 0
fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
_llseek(3, 0, 0xbf8a0600, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
read(3, "root\n", 6)                    = 5
read(3, 0xb710db19, 1)                  = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2238, si_uid=1000, si_status=0, si_utime=4, si_stime=1} ---
read(3,
d(3, ^Cstrace: Process 2237 detached
 <detached ...>
janusz@nc6120:~$ Traceback (most recent call last):
  File "test.py", line 6, in <module>
    buf = subprocess32.check_output(cmd, stderr=subprocess32.STDOUT)
  File "/usr/lib/python2.7/dist-packages/subprocess32.py", line 628, in check_output
    output, unused_err = process.communicate(timeout=timeout)
  File "/usr/lib/python2.7/dist-packages/subprocess32.py", line 915, in communicate
    stdout = _eintr_retry_call(self.stdout.read)
  File "/usr/lib/python2.7/dist-packages/subprocess32.py", line 543, in _eintr_retry_call
    return func(*args)
KeyboardInterrupt

When hang:
janusz@nc6120:~$ ls -l "/proc/$(pgrep python)/fd/3"
lr-x------ 1 janusz janusz 64 paź  7 10:25 /proc/2336/fd/3 -> pipe:[29858]
janusz@nc6120:~$ lsof |grep 29858
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/108/gvfs
      Output information may be incomplete.
python    2336                janusz    3r     FIFO       0,10      0t0      29858 pipe
ssh       2342                janusz    2w     FIFO       0,10      0t0      29858 pipe
janusz@nc6120:~$
msg303871 - (view) Author: (l4mer) Date: 2017-10-07 08:50
I) Manual test for subprocess32:

janusz@nc6120:~$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess32
>>> cmd = ["ssh", "root@localhost", "-p", "50494", "whoami"]
>>> buf = subprocess32.check_output(cmd, stderr=subprocess32.STDOUT)
>>> print buf
root

>>>
janusz@nc6120:~$ ssh root@localhost -p 50494 -O exit
Exit request sent.
janusz@nc6120:~$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess32
>>> cmd = ["ssh", "root@localhost", "-p", "50494", "whoami"]
>>> buf = subprocess32.check_output(cmd, stderr=subprocess32.STDOUT)
after 20 seconds
^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/subprocess32.py", line 628, in check_output
    output, unused_err = process.communicate(timeout=timeout)
  File "/usr/lib/python2.7/dist-packages/subprocess32.py", line 915, in communicate
    stdout = _eintr_retry_call(self.stdout.read)
  File "/usr/lib/python2.7/dist-packages/subprocess32.py", line 543, in _eintr_retry_call
    return func(*args)
KeyboardInterrupt
>>>

II) When omit stderr:

janusz@nc6120:~$ ssh root@localhost -p 50494 -O exit
Exit request sent.
janusz@nc6120:~$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess32
>>> cmd = ["ssh", "root@localhost", "-p", "50494", "whoami"]
>>> buf = subprocess32.check_output(cmd)
>>> print buf
root

>>>

Strace for stderr=None:

mmap2(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7071000
close(6)                                = 0
close(5)                                = 0
close(4)                                = 0
ugetrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=64*1024}) = 0
close(3)                                = 0
pipe2([3, 4], O_CLOEXEC)                = 0
fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
fcntl64(3, F_GETFL)                     = 0 (flags O_RDONLY)
pipe2([5, 6], O_CLOEXEC)                = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb74e3768) = 2728
close(6)                                = 0
close(4)                                = 0
brk(0x93d2000)                          = 0x93d2000
read(5, "", 50000)                      = 0
brk(0x93c6000)                          = 0x93c6000
close(5)                                = 0
fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
_llseek(3, 0, 0xbfe23f90, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
read(3, "root\n", 6)                    = 5
read(3, "", 1)                          = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2728, si_uid=1000, si_status=0, si_utime=6, si_stime=0} ---
close(3)                                = 0
waitpid(2728, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 2728
rt_sigaction(SIGINT, {SIG_DFL, [], 0}, {0x8183c10, [], 0}, 8) = 0
brk(0x93c4000)                          = 0x93c4000
exit_group(0)                           = ?
+++ exited with 0 +++
msg303872 - (view) Author: (l4mer) Date: 2017-10-07 08:57
Interesting when using faster machine with newer ssh don't see this problem :)

janusz@t560:~$ ssh -V
OpenSSH_7.3p1 Ubuntu-1ubuntu0.1, OpenSSL 1.0.2g  1 Mar 2016

janusz@t560:~$ ssh root@localhost -O exit; strace python test.py

getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=4*1024}) = 0
close(3)                                = 0
pipe2([3, 4], O_CLOEXEC)                = 0
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
fcntl(3, F_GETFL)                       = 0 (flags O_RDONLY)
pipe2([5, 6], O_CLOEXEC)                = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb7ba6fc9d0) = 22284
close(6)                                = 0
close(4)                                = 0
brk(0xca73792000)                       = 0xca73792000
read(5, "", 50000)                      = 0
brk(0xca73786000)                       = 0xca73786000
close(5)                                = 0
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(3, "root\n", 6)                    = 5
read(3, "", 1)                          = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=22284, si_uid=1000, si_status=0, si_utime=1, si_stime=0} ---
close(3)                                = 0
wait4(22284, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 22284
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fb7ba2f1630}, {0xca720ef1a0, [], SA_RESTORER, 0x7fb7ba2f1630}, 8) = 0
exit_group(0)                           = ?
+++ exited with 0 +++

So, two options here:
- new version of ssh
- faster CPU

Will try to update ssh on the old/slow machine.
msg303873 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-10-07 09:39
Scanning over the Open SSH commits for 7.3p1

https://github.com/openssh/openssh-portable/compare/V_7_2_P2...V_7_3_P1

it looks like this commit

https://github.com/openssh/openssh-portable/commit/d2d6bf864e52af8491a60dd507f85b74361f5da3

may fix your problem. It points to

https://bugzilla.mindrot.org/show_bug.cgi?id=1988
msg303875 - (view) Author: (l4mer) Date: 2017-10-07 10:28
confirm, OpenSSH_7.3p1 solve the problem.
Thanks for help.
msg303876 - (view) Author: (l4mer) Date: 2017-10-07 10:28
confirm, OpenSSH_7.3p1 solve the problem.
Thanks for help.
History
Date User Action Args
2017-10-07 10:28:27l4mersetstatus: closed
resolution: third party
messages: + msg303876
2017-10-07 10:28:07l4mersetstatus: closed -> (no value)
resolution: third party -> (no value)
messages: + msg303875
2017-10-07 09:39:11martin.pantersetstatus: open -> closed
resolution: third party
messages: + msg303873

stage: resolved
2017-10-07 08:57:57l4mersetmessages: + msg303872
2017-10-07 08:50:15l4mersetmessages: + msg303871
2017-10-07 08:31:09l4mersetmessages: + msg303870
2017-10-07 03:50:11gregory.p.smithsetmessages: + msg303866
2017-10-07 00:41:15martin.pantersetnosy: + martin.panter
messages: + msg303865
2017-10-06 13:35:28l4mersetmessages: + msg303818
2017-10-06 13:23:46l4mersetmessages: + msg303817
2017-10-06 13:11:37r.david.murraysetnosy: + gregory.p.smith, r.david.murray
messages: + msg303816
2017-10-06 11:39:10l4mercreate