classification
Title: test_os.test_chown() failure on koobs-freebsd-{current,9}
Type: behavior Stage: resolved
Components: Tests Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: Claudiu.Popa, berker.peksag, koobs, serhiy.storchaka, vajrasky, vstinner
Priority: normal Keywords: buildbot, patch

Created on 2016-08-23 13:40 by vstinner, last changed 2018-07-11 21:47 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
groups-test.log koobs, 2016-09-28 09:23
test_os_chown.patch serhiy.storchaka, 2016-10-26 08:14 review
koobs-freebsd-current-python-3.5-debug-build-773.txt koobs, 2016-11-11 09:10
python-initial.txt koobs, 2017-06-09 04:39
python-sudo.txt koobs, 2017-06-09 04:39
Messages (21)
msg273444 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-23 13:40
http://buildbot.python.org/all/builders/AMD64%20FreeBSD%20CURRENT%20Debug%203.x/builds/940/steps/test/logs/stdio

======================================================================
ERROR: test_chown (test.test_os.ChownFileTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/test_os.py", line 1138, in test_chown
    os.chown(support.TESTFN, uid, gid_1)
PermissionError: [Errno 1] Operation not permitted: '@test_12983_tmp'

----------------------------------------------------------------------
msg273640 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2016-08-25 10:02
This appears to have spontaneously resolved itself after build #957 after many failures. For reference, other builders on the same host were failing as well:

AMD64 FreeBSD CURRENT Debug 3.5 (#567, #568)
AMD64 FreeBSD CURRENT Non-Debug 3.5 (#8, #9)

And possibly others (Didn't listed any that hadn't failed in the last 5 builds)

According to my (non-expert) reading of the code, the test skips (or is supposed to skip) unless group count of uid is > 1.

The group membership of the buildbot user this worker runs as is only 'buildbot' and on that basis wouldn't a skip expected?

Open questions are:

1) Why/how did it suddenly *start* failing. (I cant see any relevant commits at or around the time)
2) Why/how did it suddenly stop failing (I made no worker/buildbot changes

Nosy vajrasky (original unit test creator) and Claudiu (who reviewed), who might be able to shine a light on what might be going on
msg273641 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-25 10:59
Hum, the test has a fail ratio somewhere near 1/5. It fails for 6 months,
maybe longer.
msg276272 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2016-09-13 13:54
"According to my (non-expert) reading of the code, the test skips (or is supposed to skip) unless group count of uid is > 1.

The group membership of the buildbot user this worker runs as is only 'buildbot' and on that basis wouldn't a skip expected?"

The group here is not the group of buildbot user. The group here refers to all groups in the system.

I am the creator of this test. I will investigate this issue but since I am not BSD user, it may take a while.
msg276273 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2016-09-13 13:57
"The group here is not the group of buildbot user. The group here refers to all groups in the system." -> I retract back this statement.
msg276278 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2016-09-13 14:26
The only way I can reproduce this in Linux (still downloading FreeBSD Current), is to remove user from the group before (I did it in different terminal) executing os.chown method to that specific group id.

I am thinking to add more information in the exception message but let's wait until I finish downloading FreeBSD Current. Maybe I am luckier in BSD.
msg277498 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2016-09-27 07:07
This started failing once again on koobs-freebsd-current:

======================================================================
ERROR: test_chown (test.test_os.ChownFileTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/test_os.py", line 1168, in test_chown
    os.chown(support.TESTFN, uid, gid_1)
PermissionError: [Errno 1] Operation not permitted: '@test_15221_tmp'

Looking into this a bit further:

buildbot defaults (via twisted) to a umask of 077 for the build worker build/ directory, resulting in build directories with permissions as follows:

drwx------   3 buildbot  buildbot         4 May 17 12:36 2.7.koobs-freebsd-current
drwx------   3 buildbot  buildbot         4 Aug 21 11:28 2.7.koobs-freebsd-current.nondebug

Looking inside these directories to where TESTFN's (I think) are written, we find:

[root@CURRENT-amd64:/usr/home/buildbot/python/2.7.koobs-freebsd-current/build/build] ls -la
total 18
drwx------   8 buildbot  buildbot   8 Sep 27 17:03 .
drwx------  19 buildbot  buildbot  45 Sep 27 16:59 ..
drwx------   2 buildbot  buildbot  68 Sep 27 16:59 lib.freebsd-12.0-CURRENT-amd64-2.7-pydebug
drwx------   2 buildbot  buildbot   6 Sep 27 16:59 scripts-2.7
drwx------   4 buildbot  buildbot   4 Sep 27 16:59 temp.freebsd-12.0-CURRENT-amd64-2.7-pydebug
drwx------   2 buildbot  buildbot   2 Sep 27 16:59 test_python_45339
drwx------   2 buildbot  buildbot   2 Sep 27 17:03 test_python_46278
drwx------   2 buildbot  buildbot   2 Sep 27 17:03 test_python_47336

Does this explain the "Operation not permitted" error above?

If so, the question is, why/how is this test being run if it requires an environment within which two groups can read and/or write files?

Again looking at group membership for the buildbot user on the worker host in question (koobs-freebsd-current), the user only has a single group:

# groups buildbot
buildbot
msg277548 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-09-27 21:00
Perhaps running a script like below on the host would help to identify the problem?

import getpass
import grp
import pprint

pprint.pprint(getpass.getuser())
pprint.pprint([(g.gr_gid, g.gr_name) for g in grp.getgrall()])
pprint.pprint([(g.gr_gid, g.gr_name, g.gr_mem) for g in grp.getgrall() if getpass.getuser() in g.gr_mem])
msg277599 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2016-09-28 09:23
Attach file with test results.

It's worth mentioning that these results may (or may not) be different than output when running under the process id started by twistd, which is executed by root (startup script), and results in the following command:

buildbot  9627   0.0  0.3 116200 11112  -  I    Thu17       4:51.87 /usr/local/bin/python2.7 /usr/local/bin/twistd --uid=buildbot --gid=buildbot --pidfile=/usr/home/buildbot/python/twistd.pid --python=/usr/home/buildbot/python/buildbot.tac
msg278857 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2016-10-18 12:14
Ping. All branches on the koobs-freebsd-current buildbot are still failing due to this issue

I could recreate the entire worker environment from scratch, but:

a) I'm not sure it will resolve the issue
b) I'd rather fix the root cause
msg278870 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2016-10-18 13:04
All builders (branches) are failing, not just 3.x DEBUG. Failures also appear no longer random or intermittent
msg279490 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-26 08:14
Proposed patch adds more verbose output in tests. Hope this will help to diagnose a problem.

It also fixes test_chown_without_permission which can fail if run as a user next after root (uid=1 or like).
msg279594 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-28 11:22
Kubilay, could you please run tests with my patch?
msg279596 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2016-10-28 11:36
Serhiy, ah I thought the patch would be applied to say the 'custom' builder for this buildbot or the branch in general :)

If not, I can test this in ~<= 2 days locally, though its worth noting that the issue quite likely not be reproducible outside of the Python buildbot environment.
msg280567 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2016-11-11 09:09
It appears something has changed in the past few weeks (with no changes made to the buildbot worker).

Now only 3.5 branch (both debug and non-debug) builders are failing, except there are now many failing tests, many different errors, and test clean is also failing.

Among others errors:

FileExistsError: [Errno 17] File exists: '@test_42517_tmp/TEST1/SUB1/SUB11'
PermissionError: [Errno 1] Operation not permitted: '@test_42517_tmp'
IsADirectoryError: [Errno 21] Is a directory: '@test_42517_tmp'
PermissionError: [Errno 13] Permission denied: 'SUB21'

Attached is the full log.

Looking into some of the test_xxxxx_tmp directories referenced, I see a SUB21 directory created with no permissions

[root@CURRENT-amd64:/usr/home/buildbot/python/3.5.koobs-freebsd-current/build/build/test_python_85793/@test_85793_tmp/TEST1/SUB2.new] ls -la
total 5
drwx------  3 buildbot  buildbot    5 Nov 11 16:44 .
drwx------  3 buildbot  buildbot    4 Nov 11 16:44 ..
lrwx------  1 buildbot  buildbot   11 Nov 11 16:44 broken_link2 -> tmp3/broken
lrwx------  1 buildbot  buildbot  103 Nov 11 16:44 link -> /usr/home/buildbot/python/3.5.koobs-freebsd-current/build/build/test_python_85793/@test_85793_tmp/TEST2
d---------  2 buildbot  buildbot    3 Nov 11 16:44 SUB21

This is observed across all test_python_xxxx directories:

[root@CURRENT-amd64:/usr/home/buildbot/python/3.5.koobs-freebsd-current/build/build] find . -perm 000
./test_python_85793/@test_85793_tmp/TEST1/SUB2.new/SUB21
./test_python_96334/@test_96334_tmp/TEST1/SUB2.new/SUB21
./test_python_53788/@test_53788_tmp/TEST1/SUB2.new/SUB21
./test_python_56622/@test_56622_tmp/TEST1/SUB2.new/SUB21
./test_python_38482/@test_38482_tmp/TEST1/SUB2.new/SUB21
./test_python_58380/@test_58380_tmp/TEST1/SUB2.new/SUB21
./test_python_42517/@test_42517_tmp/TEST1/SUB2.new/SUB21
msg280570 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2016-11-11 09:19
I also note *one* failure on koobs-freebsd-9 on 3.x and 3.6 branches, identical errors:

Nov 09 14:53 c27269c0d619... failure AMD64 FreeBSD 9.x 3.x #5304 Failed test
Nov 09 14:53 b671ac7ae620... failure AMD64 FreeBSD 9.x 3.6 #282  Failed test

======================================================================
ERROR: test_chown (test.test_os.ChownFileTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/buildbot/python/3.6.koobs-freebsd9/build/Lib/test/test_os.py", line 1200, in test_chown
    os.chown(support.TESTFN, uid, gid_1)
PermissionError: [Errno 1] Operation not permitted: '@test_83654_tmp'

----------------------------------------------------------------------


======================================================================
ERROR: test_chown (test.test_os.ChownFileTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/test_os.py", line 1200, in test_chown
    os.chown(support.TESTFN, uid, gid_1)
PermissionError: [Errno 1] Operation not permitted: '@test_93884_tmp'

----------------------------------------------------------------------

I cannot explain what would cause persistent failure on one host and intermittent failure on another, except perhaps different host resources (cpu/mem) creating favourable conditions on one, but not the other.
msg280837 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2016-11-15 13:44
@Serhiy

I have noticed that the failure is reproducible in the buildbot workers only when startup (of buildbot) is invoked via sudo, and not when started on first-boot (rc runs as root). 

In both situations, twistd then drops privs to --uid=buildbot --gid=buildbot). 

However, I *cannot* reproduce the failure (python -m test.regrtest test_os) on a clean checkout and build of 3.x either under a normal user, *or* under sudo.

I think we need some instrumentation added to test to see whats happening. 

We can add that instrumentation and also test your patch committed to a private hg branch using the 'custom' builder.

I can also provide SSH access to the buildbot hosts.
msg291062 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-04-03 10:58
Similar (or same?) failure on "x86 Gentoo Non-Debug with X 3.x":

======================================================================
ERROR: test_chown (test.test_os.ChownFileTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/buildbot/buildarea/3.x.ware-gentoo-x86.nondebug/build/Lib/test/test_os.py", line 1218, in test_chown
    os.chown(support.TESTFN, uid, gid_1)
PermissionError: [Errno 1] Operation not permitted: '@test_19914_tmp'

http://buildbot.python.org/all/builders/x86%20Gentoo%20Non-Debug%20with%20X%203.x/builds/542/steps/test/logs/stdio
msg295266 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-06 15:38
New failure on AMD64 FreeBSD CURRENT Debug 3.5.

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%20CURRENT%20Debug%203.5/builds/112/steps/test/logs/stdio

======================================================================
ERROR: test_chown (test.test_os.ChownFileTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/buildbot/python/3.5.koobs-freebsd-current/build/Lib/test/test_os.py", line 1209, in test_chown
    os.chown(support.TESTFN, uid, gid_1)
PermissionError: [Errno 1] Operation not permitted: '@test_10547_tmp'
msg295486 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2017-06-09 04:39
as per msg280837, this beings to happen when I restart the buildbot worker via sudo (does not fail on initial startup, executed/invoked using the same script, which does not use sudo) and the environment the worker starts with appears to be relevant.

Attached are the environments for the worker at initial startup (python-initial) and subsequent service restart using sudo (python-sudo, which fails tests).


The delta between the two is:

--- python-initial      2017-06-09 14:35:49.557098000 +1000
+++ python-sudo 2017-06-09 14:35:13.665893000 +1000
@@ -6,17 +6,29 @@
   BLOCKSIZE=K
   EDITOR=vi
   GROUP=buildbot
-  HOME=/
+  HOME=/root
   HOST=CURRENT-amd64
   HOSTTYPE=FreeBSD
-  LOGNAME=buildbot
+  LANG=en_US.UTF-8
+  LC_ALL=en_US.UTF-8
+  LC_CTYPE=en_US.UTF-8
+  LOGNAME=root
   MACHTYPE=x86_64
+  MAIL=/var/mail/root
   OSTYPE=FreeBSD
   PAGER=more
-  PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin://bin
+  PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/root/bin
+  PS1=%B[%{%}%n%{%}%b@%B%{%}%m%b%{%}:%~%B]%b
   PWD=/usr/home/buildbot/python/3.6.koobs-freebsd-current.nondebug/build
-  RC_PID=23
+  RC_PID=22356
+  SHELL=/bin/csh
   SHLVL=1
-  USER=buildbot
+  SUDO_COMMAND=/usr/local/etc/rc.d/buildslave restart
+  SUDO_GID=1001
+  SUDO_UID=1001
+  SUDO_USER=koobs
+  TERM=screen-256color
+  USER=root
+  USERNAME=root
   VENDOR=amd
  using PTY: False
msg321502 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-11 21:47
I didn't see this failrue since one year, so I close the issue.
History
Date User Action Args
2018-07-11 21:47:39vstinnersetstatus: open -> closed
resolution: out of date
messages: + msg321502

stage: needs patch -> resolved
2017-06-09 04:39:13koobssetfiles: + python-sudo.txt
2017-06-09 04:39:01koobssetfiles: + python-initial.txt

messages: + msg295486
2017-06-06 15:38:30vstinnersetmessages: + msg295266
2017-04-03 10:58:54vstinnersetmessages: + msg291062
2016-11-15 13:44:53koobssetmessages: + msg280837
2016-11-11 09:19:07koobssetmessages: + msg280570
title: test_os.test_chown() failure on koobs-freebsd-current -> test_os.test_chown() failure on koobs-freebsd-{current,9}
2016-11-11 09:10:13koobssetfiles: + koobs-freebsd-current-python-3.5-debug-build-773.txt
2016-11-11 09:09:48koobssetmessages: + msg280567
2016-10-28 11:36:32koobssetmessages: + msg279596
2016-10-28 11:22:37serhiy.storchakasetmessages: + msg279594
2016-10-26 08:14:21serhiy.storchakasetfiles: + test_os_chown.patch

nosy: + serhiy.storchaka
messages: + msg279490

keywords: + patch
2016-10-18 13:04:38koobssetmessages: + msg278870
title: test_os.test_chown() random failure on "AMD64 FreeBSD CURRENT Debug 3.x" buildbot -> test_os.test_chown() failure on koobs-freebsd-current
2016-10-18 12:14:46koobssetmessages: + msg278857
2016-09-28 09:23:45koobssetfiles: + groups-test.log

messages: + msg277599
2016-09-27 21:00:44berker.peksagsetversions: + Python 3.5, Python 3.7
nosy: + berker.peksag

messages: + msg277548

type: behavior
stage: needs patch
2016-09-27 07:07:58koobssetmessages: + msg277498
2016-09-13 14:26:27vajraskysetmessages: + msg276278
2016-09-13 13:57:33vajraskysetmessages: + msg276273
2016-09-13 13:54:35vajraskysetmessages: + msg276272
2016-08-25 10:59:06vstinnersetmessages: + msg273641
2016-08-25 10:02:09koobssetnosy: + Claudiu.Popa, vajrasky
messages: + msg273640
2016-08-23 13:40:58vstinnercreate