classification
Title: test_getgroups of test_posix fails (on OS X 10.10)
Type: behavior Stage:
Components: Tests Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: JDLH, ned.deily, ronaldoussoren
Priority: normal Keywords:

Created on 2017-02-15 02:03 by JDLH, last changed 2017-02-20 07:32 by JDLH.

Messages (7)
msg287806 - (view) Author: Jim DeLaHunt (JDLH) * Date: 2017-02-15 02:03
When I run test.test_posix.PosixTester.test_getgroups on my Mac OS X system, it fails:

% ./python.exe -m unittest -v test.test_posix.PosixTester.test_getgroups
test_getgroups (test.test_posix.PosixTester) ... FAIL

======================================================================
FAIL: test_getgroups (test.test_posix.PosixTester)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jdlh/workspace/cpython/Lib/test/test_posix.py", line 824, in test_getgroups
    self.assertTrue(not symdiff or symdiff == {posix.getegid()})
AssertionError: False is not true

----------------------------------------------------------------------
Ran 1 test in 0.013s

FAILED (failures=1)


Details of my system:
% sw_vers
ProductName:	Mac OS X
ProductVersion:	10.10.5
BuildVersion:	14F2109

% id -G
20 507 12 61 80 
98 399 33 100 
204 395 398 
701
% id -G -n
staff xampp everyone localaccounts admin 
_lpadmin com.apple.access_ssh _appstore _lpoperator 
_developer com.apple.access_ftp com.apple.access_screensharing 
com.apple.sharepoint.group.1
# I wrapped these lines similarly, to make the correspondence clearer

% ./python.exe -c 'import grp,os; g={i: (n, p, i, mem) for (n, p, i, mem) in grp.getgrall()}; print(sorted([(i, g[i][0]) for i in os.getgroups()]) )'
[(12, 'everyone'), (20, 'staff'), (33, '_appstore'), (61, 'localaccounts'), (80, 'admin'), (98, '_lpadmin'), (100, '_lpoperator'), (204, '_developer'), (395, 'com.apple.access_ftp'), (399, 'com.apple.access_ssh'), (507, 'xampp')]

So the difference, which triggers the test failure, is that id -G is returning groups (701, 'com.apple.sharepoint.group.1'), and (398, 'com.apple.access_screensharing'), while posix.getgroups() is not.  I do not yet understand why.

Others say this test works on their OS X 10.10 system, so maybe it's triggered by something in my environment. 

Also: python3.6 from MacPorts, and python2.7 from MacPorts, return the same set of groupids as does the dev build of python3.7.

This bug affects the same test, and the same posix.getgroups() call, as http://bugs.python.org/issue17557 "test_getgroups of test_posix can fail on OS X 10.8 if more than 16 groups" (2013-2014, closed).  But I think it is a different problem: issue17557 is related to how posix.getgroups() deals with large numbers of groups, and it is fixed.

I would appreciate help in getting this test to pass. Maybe my environment is wrong, in which case I should fix my environment. But maybe the cpython code is sensitive to some detail of my environment, in which case perhaps I should fix the cpython code.
msg287807 - (view) Author: Jim DeLaHunt (JDLH) * Date: 2017-02-15 02:19
I have pushed a branch for this issue to my cpython fork:

https://github.com/JDLH/cpython/tree/bpo-29562_failing_test_getgroups_on_os_x

It modifies test_getgroups in test_posix.py to give better diagnostics in the event of a test failure. It says specifically which groups were in id -G, and posix.getgroups(), but not in the other.

% ./python.exe -m unittest -v test.test_posix.PosixTester.test_getgroups 
test_getgroups (test.test_posix.PosixTester) ... FAIL

======================================================================
FAIL: test_getgroups (test.test_posix.PosixTester)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jdlh/workspace/cpython/Lib/test/test_posix.py", line 841, in test_getgroups
    self.assertEqual(len(symdiff), 0, msg)
AssertionError: 2 != 0 : id -G and posix.groups() should have zero difference.
Groups in id -G but not posix.groups(): [(701, 'com.apple.sharepoint.group.1'), (398, 'com.apple.access_screensharing')]
Groups in posix.groups() but not id -G: []
(Effective GID (20) was disregarded.)

----------------------------------------------------------------------
Ran 1 test in 0.020s

I don't think this branch is ready yet to submit to the main codebase, but it may help people diagnose the issue.
msg287911 - (view) Author: Jim DeLaHunt (JDLH) * Date: 2017-02-16 03:54
Some diagnosis.

Group `com.apple.sharepoint.group.1` appears to be related to a certain kind of file sharing, but I don't have hard evidence. 

Its only member was a test user I created as part of screen sharing with Apple Support. 
```
% dscacheutil -q group -a name com.apple.sharepoint.group.1
name: com.apple.sharepoint.group.1
password: *
gid: 701
users: testuser
```

I removed File Sharing for this user's home directory.

1. Open System Preferences... Sharing. 
2. Click on "File Sharing", which is checked. In the right pane, a list of shared folders appears.
3. Click on the entry "Testuser Public Folder" in the Shared Folders list.
4. Click on the "-" button below the Shared Folders list. The "Testuser Public Folder" entry disappears.

Having done that, the group `com.apple.sharepoint.group.1` no longer appeared.

```
% dscacheutil -q group -a name com.apple.sharepoint.group.1
%
```

Interestingly, `test_getgroups` still failed, and still had a discrepancy of two groups from the output of `id -G`.

```
% ./python.exe -m unittest -v test.test_posix.PosixTester.test_getgroups
test_getgroups (test.test_posix.PosixTester) ... FAIL

======================================================================
FAIL: test_getgroups (test.test_posix.PosixTester)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jdlh/workspace/cpython/Lib/test/test_posix.py", line 841, in test_getgroups
    self.assertEqual(len(symdiff), 0, msg)
AssertionError: 2 != 0 : id -G and posix.groups() should have zero difference.
Groups in id -G but not posix.groups(): [(395, 'com.apple.access_ftp'), (398, 'com.apple.access_screensharing')]
Groups in posix.groups() but not id -G: []
(Effective GID (20) was disregarded.)

----------------------------------------------------------------------
Ran 1 test in 0.013s

FAILED (failures=1)
```

Earlier, group `com.apple.access_ftp` was not part of the difference. Now it is. The output of `id -G` didn't change. The implementation of `posix.getgroups()` didn't change. It calls getgroups (2), I believe: https://github.com/python/cpython/blob/master/Modules/posixmodule.c#L6078-L6103

That makes me think that the behaviour of getgroups (2) in Mac OS is behaving differently than we expect. 

`man 2 getgroups` gives documentation. (I can't find this page at an apple URL, but http://www.manpagez.com/man/2/getgroups/ seems to have the same content.) It says, 

>>> "To provide compatibility with applications that use getgroups() in environments where users may be in more than {NGROUPS_MAX} groups, a variant of getgroups(), obtained when compiling with either the macros _DARWIN_UNLIMITED_GETGROUPS or _DARWIN_C_SOURCE defined, can be used that is not limited to {NGROUPS_MAX} groups.  However, this variant only returns the user's default group access list and not the group list modified by a call to setgroups(2) (either in the current process or an ancestor process).  Use of setgroups(2) is highly discouraged, and there is no foolproof way to determine if it has been previously called."

I don't know how to determine if my copy of Mac OS X 10.10 was complied with either of these two macros. 

On my system, I chased NGROUPS_MAX down to /usr/include/sys/syslimits.h:84, where it is set to 16. That is more than the number of groups `id -G` is reporting, so I don't see how that is relevant.

```% id -G
20 507 12 61 80 98 399 33 100 204 395 398
```

This is 12 groups, whereas before it was 13 groups (see my message from 2017-02-15 02:03). This is unsurprising.  However, the number of groups returned by posix.getgroups() has also shrunk by 1:

```% ./python.exe -c 'import grp,os; g={i: (n, p, i, mem) for (n, p, i, mem) in grp.getgrall()}; print(sorted([(i, g[i][0]) for i in os.getgroups()]) )'
[(12, 'everyone'), (20, 'staff'), (33, '_appstore'), (61, 'localaccounts'), (80, 'admin'), (98, '_lpadmin'), (100, '_lpoperator'), (204, '_developer'), (399, 'com.apple.access_ssh'), (507, 'xampp')]
```

Notice that group (395, 'com.apple.access_ftp') is no longer being returned by os.getgroups().  This is as a consequence of a different group being deleted.

The test_getgroups comment asserts: "# 'id -G' and 'os.getgroups()' should return the same groups, ignoring order, duplicates, and the effective gid." https://github.com/python/cpython/blob/master/Lib/test/test_posix.py#L819-L820

I'm getting skeptical about that claim. Does Mac OS X actually guarantee that 'id -G' and 'getgroups(2)' return the same groups?
msg287915 - (view) Author: Jim DeLaHunt (JDLH) * Date: 2017-02-16 05:19
I guess I didn't state the things I find odd about what the new test_getgroups results. 

1. `os.getgroups()` used to return group (395, 'com.apple.access_ftp'), but no longer does.  I don't see a reason why.

2. `os.getgroups()` is returning 2 fewer group id's than `id -G`, even as the total number of groups is reduced.  This is not the behaviour of an API limited by {NGROUPS_MAX}.
msg287917 - (view) Author: Jim DeLaHunt (JDLH) * Date: 2017-02-16 06:53
The Mac OS 10.10 man page for initgroups(3) says:

"Processes should not use the group ID numbers from getgroups(2) to determine a user's group membership.  The list obtained from getgroups() may only be a partial list of a user's group membership.  Membership checks should use the mbr_gid_to_uuid(3), mbr_uid_to_uuid(3), and mbr_check_membership(3) functions."
(http://www.manpagez.com/man/3/initgroups/ -- not official Apple page, but it matches what I see in my OS.)

When the man page says, "The list obtained from getgroups() may only be a partial list of a user's group membership.", and the list from `id -G` is presumably a complete list, should we understand that Apple is saying their getgroups(2) implementation isn't POSIX-compliant? If so, maybe we should skip test_getgroups on Mac OS X systems?

Or, should we consider rewriting os_getgroups_impl() to use a Mac-specific implementation on Mac OS X?
msg288185 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2017-02-20 07:22
Note that the result of getgroups(2) is fixed on login, while "id -G" reflects the current state of the user database on macOS. Could this explain this failure? That is, have you tried logging out and in again before running the test suite?
msg288186 - (view) Author: Jim DeLaHunt (JDLH) * Date: 2017-02-20 07:32
> Note that the result of getgroups(2) is fixed on login, while "id -G" reflects the current state of the user database on macOS.

Wow, that's interesting!  Thank you for this information.

The test code for test_getgroups does not mention this interaction.  I can certainly see how it could affect the test. Maybe it should be added?

Since I last tried that test, I've logged out and restarted several times, and changed OS to Mac OS X 10.11 El Capitan. Nothing like changing several independent variables at once while diagnosing! I will try the test again and report back.
History
Date User Action Args
2017-02-20 07:32:53JDLHsetmessages: + msg288186
2017-02-20 07:22:43ronaldoussorensetmessages: + msg288185
2017-02-20 06:11:16xiang.zhangsetnosy: + ronaldoussoren, ned.deily
2017-02-16 06:53:19JDLHsetmessages: + msg287917
2017-02-16 05:19:21JDLHsetmessages: + msg287915
2017-02-16 03:54:16JDLHsetmessages: + msg287911
2017-02-15 02:19:33JDLHsetmessages: + msg287807
2017-02-15 02:03:05JDLHcreate