New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
posix.getgroups() failure on Mac OS X #52148
Comments
test_posix fails on trunk on Mac OS X (Snow Leopard) test.test_support.TestFailed: Traceback (most recent call last):
File "Lib/test/test_posix.py", line 42, in testNoArgFunctions
posix_func()
OSError: [Errno 22] Invalid argument Python 2.7a3+ (trunk:78129M, Feb 10 2010, 10:40:28)
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
>>> import posix
>>> posix.getgroups()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument |
I don't see any issue here, runs perfectly fine on Mac OS X (Snow Leopard) Shashwat-Anands-MacBook-Pro:test l0nwlf$ pwd ---------------------------------------------------------------------- OK Shashwat-Anands-MacBook-Pro:test l0nwlf$ python2.7 --versionPython 2.7a3+
Shashwat-Anands-MacBook-Pro:test l0nwlf$ python2.7
Python 2.7a3+ (trunk:78165, Feb 12 2010, 22:36:03)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import posix
>>> posix.getgroups()
[20, 204, 100, 98, 81, 80, 79, 61, 12, 402, 401]
>>> |
What is the (Apple Inc. build 5646) (dot 1) vs normal (Apple Inc. build 5646). ? While, ronald.oussoren did make a lot some changes recently (r78149 to r78152).This fix could have been a side-effect of one of it, thought I could not find the direct correlation. |
I still see it on trunk (revision 78165). No idea what the (dot 1) means. |
It seems they are basically the same thing, the version of GCC and the build of OS X(latest in the case here). Was not able to figure out the (dot 1) stuff though. |
please not remove the nosy list. ( I guess, you did it by accident). |
Thanks for correcting it back. I did not even realized it. |
5646 and 5646.1 are the builds of GCC by Apple. The various builds of gcc are present on http://www.opensource.apple.com/source/gcc/ [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin -> http://www.opensource.apple.com/source/gcc/gcc-5646.1/ [GCC 4.2.1 (Apple Inc. build 5646)] on darwin |
Michael:
I cannot reproduce this with r78205, OSX 10.6.2/10C540, gcc version 4.2.1 (Apple Inc. build 5659), Xcode 3.2.2/10M2135. |
A related question: is this issue present in the 3.x trunk? (BTW: feel free to assign all OSX related issues to me) |
I'm not seeing the same issue on my Macbook Pro. I can get all this info from my desktop machine (Mac Pro) when I return from PyCon. |
Michael, Can you post the output of "groups" and "id" command from your Mac? It looks like posix_getgroups cannot handle more than NGROUPS_MAX groups and NGROUPS_MAX is 16 on Mac OS. |
I was able to reproduce the error. First, add your user name to multiple test groups as follows: $ sudo dscl . -create /Groups/testN GroupMembership username
(repeat 16 times with different Ns) $ ./python.exe
Python 2.7a3+ (trunk:78265M, Feb 20 2010, 13:18:22)
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import posix
>>> posix.getgroups()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument |
I am submitting a fix. I am using the following feature documented in getgroups(2): It appears that _DARWIN_C_SOURCE is defined in the standard python configuration on Mac OS X. Tested on 10.6 only. |
It looks like the current implementation is not POSIX compliant because it assumes that NGROUPS_MAX is compile time constant. However, according to <http://www.opengroup.org/onlinepubs/000095399/functions/getgroups.html\>, "Application writers should note that {NGROUPS_MAX} is not necessarily a constant on all implementations." I would suggest using my _DARWIN_C_SOURCE implementation unconditionally and make similar changes to posix_setgroups, but this is probably a subject for a separate issue. |
1 similar comment
It looks like the current implementation is not POSIX compliant because it assumes that NGROUPS_MAX is compile time constant. However, according to <http://www.opengroup.org/onlinepubs/000095399/functions/getgroups.html\>, "Application writers should note that {NGROUPS_MAX} is not necessarily a constant on all implementations." I would suggest using my _DARWIN_C_SOURCE implementation unconditionally and make similar changes to posix_setgroups, but this is probably a subject for a separate issue. |
I would propose a different strategy: if _SC_NGROUPS_MAX is defined, use |
On Sun, Feb 21, 2010 at 1:58 PM, Martin v. Löwis <report@bugs.python.org> wrote:
I am afraid that the following is the evidence that it won't: Python 2.7a3+ (trunk:78265M, Feb 20 2010, 15:20:36)
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.sysconf('SC_NGROUPS_MAX')
16
>>> len(os.getgroups()) # with the patch
22 |
Here is another interesting fact: Mac OS 10.6 comes with python 2.5 and 2.6 preinstalled: $ python2.5 -V
Python 2.5.3c1
$ python2.6 -V
Python 2.6.1 Neither of these exhibit the same bug, but both are broken in some way. Given $ cat tg.py
import os
g = os.getgroups()
print g
os.setgroups(g[:5])
print os.getgroups()
$ sudo python2.5 tg.py
[0, 101, 204, 100, 98, 80, 61, 29, 20, 12, 9, 8, 5, 4, 3, 2]
[0, 101, 204, 100, 98]
$ sudo python2.6 tg.py
[0, 101, 204, 100, 98, 80, 61, 29, 20, 12, 9, 8, 5, 4, 3, 2, 1, 401]
[0, 101, 204, 100, 98, 80, 61, 29, 20, 12, 9, 8, 5, 4, 3, 2, 1, 401] Note that python2.5 truncates the group list which is but setgroups works as expected. In contrast, python2.6 reports all groups correctly, but setgroups has no effect. |
Apparently, Apple patches posix_[gs]etgroups functions as follows: for 2.5: http://www.opensource.apple.com/source/python/python-44/2.5/fix/posixmodule.c.ed
for 2.6: http://www.opensource.apple.com/source/python/python-44/2.6/fix/posixmodule.c.ed |
And as usual they can't be bothered to describe what the patch does, or even use regular universal diffs. |
I've converted apple patches to unified diffs, but I cannot reproduce 2.5 behavior. |
After some head-scratching, I figured out how to reproduce stock python2.5 behavior. It turns out that defining _DARWIN_C_SOURCE not only allows getgroups() output to exceed NGROUPS_MAX (as documented), but also effectively disables setgroups() which is not documented. With no-darwin-ext.diff patch and previously attached tg.py, I see $ cat tg.py
import os
g = os.getgroups()
print(g)
os.setgroups(g[:5])
print(os.getgroups())
$ sudo ./python.exe tg.py
[0, 101, 204, 100, 98, 80, 61, 29, 20, 12, 9, 8, 5, 4, 3, 2]
[0, 101, 204, 100, 98] which is the same as with stock python2.5: $ sudo python2.5 tg.py
[0, 101, 204, 100, 98, 80, 61, 29, 20, 12, 9, 8, 5, 4, 3, 2]
[0, 101, 204, 100, 98] Note that root is a member of 18 groups on my system, but the last two are truncated by os.getgroups(). It is tempting to adopt no-darwin-ext.diff as a solution to this issue because allowing more than NGROUPS_MAX (or sysconf(_SC_NGROUPS_MAX) which should be the same) groups is really a Mac OS bug. In order to have both working os.setgroups() and os.getgroups() supporting more than NGROUPS_MAX results, it appears that the two functions should be compiled in separate compilation units which is probably too big of a price to pay for the functionality. Also, my bpo-7900.diff, while likely to work in most practical situation is vulnerable to a race condition if group membership is expanded between two calls to getgroups. |
I am reclassifying this as a crash because os.getgroups() crashes the interpreter when python is running as root on an unmodified system: $ sudo ./python.exe -c "import os; os.getgroups()"
Traceback (most recent call last):
File "<string>", line 1, in <module>
OSError: [Errno 22] Invalid argument This is also a regression apparently introduced in r63955. |
Alexander: What makes you think r63955 introduced the problem? Btw. This does not crash the interpreter: the example you give causes an exception and cleanly shuts down the interpreter. The exception is unwanted, but I wouldn't call it a crash. The Apple fix for getgroups in python2.6 is odd, it uses an undocumented API (getgrouplist_2). If I read the manpage correctly there is a posixly correct way to implement os.getgroups:
I'll work on a patch that implements this. |
s/2.7/2.7.1/ |
I've added bpo-9344 for adding os.getgroupslist. I'd prefer to keep adding that function separate from this issue. Btw. I'm +1 on adding such a function. I will shortly commit a port of os-getgroups-v3.patch to 3.2, but without the tests in "PosixGroupsTester" because those explictly exclude OSX. |
Committed a port to python3 for os-getgroups-v3.patch in r83088, Backports: I'll backport to 2.7 and 2.6 tomorrow. To complete the documentation for picking this patch: I've spoken with an Apple engineer about this issue. He says the the _DARWIN_C_SOURCE behavior is intentional and will not be reverted. Apple's build of python, and other system tools (including perl) also use the _DARWIN_C_SOURCE behavior. |
2.7: r83124 The fix is now in all active branches, and I therefore close the issue. |
Reopening. This seems to have broken a couple of buildbots (two different issues): If you want to have a global look at buildbot status, you can use bbreport: Please don't commit platform-dependent code without at least watching the buildbots afterwards... |
The 2.6 problem (the solaris buildbot you link to) should be fixed in r83420. |
The other problem is fixed in r83431 for the py3k trunk. I'll check the buildbot status tomorow morning, if that shows that the issue is truly gone I'll backport to the other branches and close this issue. |
Some else backported to 3.1 (that is, 3.1 already contained the fix when I tried the svnmerge) Backported to 2.7 in r83643 Backported to 2.6 in r83650 |
This test is failing again, and IIUC, largely due to the same sort of issues: http://www.python.org/dev/buildbot/all/builders/AMD64%20Leopard%203.1/builds/65 I was able to track down what exactly caused it to fail in this case on my box, though. Whatever "posix.getgroups()" ends up calling, appears to be tied to the current users login -- or at least, doesn't get updated when new groups are added to the user. This failure happened because at some point after the buildbot was up and running, I added a new user to the machine (totally unconnected to the existing buildbot runner): this caused a new group to be added to the buildbot runner's user. "id -G" starts returning that group immediately, but "posix.getgroups()" returns the same list as it had before. I was able to further reproduce it in Terminal, by having a console open, and compiling 3.1 there then adding a user, and running the test. It fails. Opening up a new terminal window, running the test-- and it succeeds. The original console continues to fail. |
This is the expected behavior on OSX. Apple has a pretty odd interpretation of the standards wrt getgroups and setgroups behavior. This behavior is not a bug in python Sent from my iPhone On 11 nov. 2010, at 22:17, Stephen Hansen <report@bugs.python.org> wrote:
|
Well, yes: the result of posix.getgroups is not a bug in Python, but is it a bug in the test? Should it be skipped on OSX, or some other solution? Having buildbots fail because of something that's expected behavior is bad, isn't it? |
Right, regardless of whether or not it is a bug in python, IMO it *is* a bug in the python test suite, since we *expect* buildbots to be long running processes and therefore they are going to get hit by this failure on OSX periodically with a pretty high likelyhood. Yes it is easily fixable (restart the builder), but it seems to me the test should be fixed somehow instead of putting that burden on the buildbot owner. A skip on OSX would certainly be the simplest solution, and we could thereby indicate that we consider this behavior to be a bug in OSX. |
If anything should be done the test that checks the output of id -G should be removed if we want the buildbot to keep running without problems when you change the buildbots account. After reading the message about the new failures again I don't think this is the OSX issue I mentioned (an which is explain in painfull detail earlier in the message list): it's just that the buildbot account got changed (unintentionally) while buildbot was running. BTW. I don't understand why adding a new account to an OSX machine adds existing accounts to a new group, I have never seen that behaviour before (on OSX). I'm -1 on changing anything for now and do not consider this to be a bug in Python or its testset. |
Ronald, on a normal unix system if you add a user to a group, any existing process/terminal session that runs 'id -G' will return the *old* group list. Only a new process/terminal session will see the new group. On OSX, 'id -G' returns the new group when run in an existing process/terminal session, according to what you wrote. You can't just remove the 'id -G' from that test, because the test is using 'id -G' to get an independent verification of the list of group numbers as a check against what getgroups returns. On a normal unix system, these two would match. On OSX, they don't. At the moment I don't see any alternative to skipping the test on OSX with a message that 'id -G' and 'getgroups' do not return the same group list on OSX. |
I'm still -1 on changing the test. The test only fails when run from the buildbot and the buildbot account is changed without restarting buildbot. Changing the buildbot account should happen almost never, and IMO you should restart the buildbot daemon when you do so (and that's just good practice) Disabling the test on OSX means that os.getgroups will not get tested at all on OSX, even when I run the testsuite from the command-line. |
The test is clearly verifying a *wrong* assumption: that id -G will match posix.getgroups() which simply does not hold on OSX. I can reproduce this reliably on a completely clean, brand new installation of 10.5: from there the only things that have been done to the box is updating to 10.5.8, and then downloading the latest XCode tools that run on Leopard. From here, launch Terminal: leave the console open. Run id -G; then run python and look at posix.getgroups(). Now, go into System Preferences and add a new user. Don't do anything else. Don't change anything with existing user. In the console that was already open, do id -G again. Now run python again, and do posix.getgroups() -- those no longer match. Clearly IMHO the assumption that the test is declaring to be an expected result simply is not true in a OSX-Unix environment. Yes, if I go and *edit the actual slave user* then surely I can expect failures until I restarted the buildslave. But, if by merely adding a user causes a change to the buildslaves user by no action of my own, and that causes this test to be invalid... the test itself seems to be founded on assumptions which simply are not reliably true. I understand disabling the test means os.getgroups() will no longer be tested on OSX: and yet, the current situation is a specific behavior of os.getgroups() is tested which is *not* actually the guaranteed behavior of that operation. There is at least one very easy to reproduce situation in which id -G and posix.getgroups() do not match: I don't know if there are more. But for the test to assert the truth that its only correct when they match seems to be a mistake. |
I agree with Stephen. The test in question is *not a valid test* on OSX. Therefore on OSX it should be skipped. If you can think of a way to test the actual behavior of getgroups on OSX, that's even better. |
Please explain how the failure can be reproduced. I've done some testing on my machine using Apple's copy of python 2.6.1 (on OSX 10.6), which has the same getgroups implementation as the current heads of the active branches. >>> os.getgroups()
[20, 402, 204, 61, 12, 401]
>>> os.system("id -G")
20 402 204 61 12 401
0 (Now open the Accounts preference pane and add a new user) >>> os.getgroups()
[20, 403, 402, 204, 61, 12, 401]
>>> os.system("id -G")
20 403 402 204 61 12 401
0 Note how the result of both os.getgroups and id -G changes, which should mean that tests shouldn't fail unless you happened to add a new account in the split-second between the "calls" to os.getgroups and "id -G" in a testrun. Was the buildbot started using launchd (the recipe at <http://buildbot.net/trac/wiki/UsingLaunchd\> seems correct)? If not, how is it started? |
Having just reread this issue more carefully, my understanding is that Ronald had elected to make the results returned from os.getgroups match that returned by "system tools" (by which I understood him to mean the 'id' command). Since Ronald reports he sees the intended behavior, Stephen's results seem to show that there is a problem with the fix in some circumstances which need to be understood. Alexander noted that this should all be documented, and I agree, so I'm opening a new issue for the doc update. |
And it's entirely possible (even likely) that what Stephen is seeing here is a platform bug in OSX's quirky implementation of group management. |
On 11/16/10 5:44 AM, Ronald Oussoren wrote:
I have. But to do so more directly:
As I said, the slave is running the latest on 10.5. Perhaps its a Perhaps the test should only be skipped on 10.5? I am happy to provide a I verified posix.getgroups() on 10.6 does not appear to exhibit this
It was started with launchd, yes: with a variation of that recipe. |
The problem Stephen is seeing with the buildbot machine is ABI-dependent; the behavior of getgroups(2) changed in 10.6. You can demonstrate this all on a 10.6 system. Open a terminal session and verify the process's groups: $ id -G
20 40200 401 204 100 98 80 61 12 403 40100 103
$ /usr/local/bin/python3.2 -c 'import posix; print(posix.getgroups())'
[20, 40200, 401, 204, 100, 98, 80, 61, 12, 403, 40100, 103] Now create a new user with System Preferences. One of the quirks here is that OS X 10.5 and 10.6 create a new group for that user and assign other existing users to that group. (The new group is one of the somewhat mysteriously named com.apple.sharepoint.group.n groups.) Still in the same terminal session after the new user/group was created and the existing user name we are running under was automatically added to the new group: Only the version built with a deployment target of 10.6 - that is, using the 10.6 SDK and the 10.6 ABI - reflects the updated grouplist. And that difference can be seen, as Alexander noted earlier, in the symbols referenced. An nm ./python | grep getgroups for each shows: So unless building for a deployment target of 10.6 (or higher), it is to be expected that the output of /usr/bin/id will not match the results of getgroups(2) if the user's group membership changes during the run (as can happen when another user is created or deleted). This particular problem should only be an issue when running on 10.5 and higher and using a 10.5 or earlier ABI. On 10.4, neither getgroups(2) (as expected) nor /usr/bin/id see updates to group memberships made during the lifetime of the parent terminal session; starting a new login terminal session does see the updates. Also note that this issue would be observable with all existing current python.org OS X installers running on 10.5 or 10.6 as most have been built with a 10.3 deployment target while 2.7 also provides an additional 32-/64-bit one with a 10.5 deployment target. (I believe Ronald intends to build future 32-/64-bit installers with a 10.6 deployment target so they would be the first to not be subject to this issue.) FTR, here are the configure options I used for each build: ./configure --enable-universalsdk=/Developer/SDKs/MacOSX10.4u.sdk --with-universal-archs=32-bit MACOSX_DEPLOYMENT_TARGET=10.4 ./configure --enable-universalsdk=/Developer/SDKs/MacOSX10.5.sdk --with-universal-archs=intel MACOSX_DEPLOYMENT_TARGET=10.5 ./configure --enable-universalsdk=/Developer/SDKs/MacOSX10.6.sdk --with-universal-archs=intel MACOSX_DEPLOYMENT_TARGET=10.6 |
(Argh! Just to be very clear, those ./configure commands are all one line, including the MACOSX_DEPLOYMENT_TARGET as an argument to the configure script.) |
I'm closing this issue again, the current behavior is intended (as it mirrors platform behavior). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: