classification
Title: os.makedirs breaks under autofs directories
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: amcnabb, hynek, loewis, neologix, pitrou, r.david.murray
Priority: normal Keywords:

Created on 2012-04-30 19:14 by amcnabb, last changed 2012-06-25 17:46 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
mkdir-p.out amcnabb, 2012-05-16 18:03 strace from mkdir -p
makedirs.out amcnabb, 2012-05-16 18:05 strace of python script running os.makedirs
Messages (35)
msg159728 - (view) Author: Andrew McNabb (amcnabb) Date: 2012-04-30 19:14
When a os.makedirs is used under an autofs directory, it crashes. For example, on my machine, `os.makedirs('/net/prodigy/tmp')` crashes with the following traceback:

Traceback (most recent call last):
  ...
  File "/usr/lib64/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/net/prodigy/tmp'

In this case, '/net' is an autofs directory that automatically mounts the "prodigy" directory by connecting to a host called "prodigy" using NFS. The problem seems to be related to the fact that the "/net/prodigy" directory does not actually exist until it is first accessed.

I tried running `mkdir -p /net/prodigy/tmp`, and it succeeds even though the "/net/prodigy" directory did not exist before the "mkdir" command was run.

I'm not sure exactly how `mkdir -p` is implemented, but one potential workaround for Python's makedirs would be to add the following at the top of the function:

    os.stat(name)

This stat call really only needs to be run the first time makedirs is called (it does not need to be used for each recursive call).
msg159730 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-04-30 19:32
As makedirs in 3.x doesn’t handle EPERM and is otherwise the same, I presume the error is there as well.

I also presume, that after the failed makedirs(), the directory is mounted? I'd just handle the error just we handle EEXIST in 3.x now.
msg159977 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-05-05 01:45
Hynek: it doesn't seem like that would work, since legitimate EPERM errors are much more likely.
msg159980 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-05-05 08:23
David: What do you mean? I'd just catch EPERM and look whether the directory "appeared" in the meantime. If it exists, continue. If not: re-raise. Am I missing something? I may have a look at GNU mkdir if it doesn't help.
msg159983 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-05-05 10:05
To me, this doesn't look like a os.makedirs() bug, but rather an autofs one (didn't know it was still in use).
I don't like the idea of adding such a kludge (i.e. catching EPERM and checking whether the directory appeared in between).

Could you please provide an strace output of both "os.makedirs()" and "mkdir -p"?

Thanks.
msg159988 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-05-05 11:37
Charles, I don't think you can blame autofs here. The problem at hand is that makedirs() never checks whether the directory exists (that would trigger the mount too I presume). Instead, it tries a mkdir and looks if it gets an EEXIST.

If you try that approach in this case where /net is non-writable and /net/prodigy appears only on demand, it fails with an EPERM instead.

Have a look at http://hg.python.org/cpython/file/8215aaccc9dd/Lib/os.py#l136 especially line 151.
msg159990 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-05-05 13:27
> Charles, I don't think you can blame autofs here. The problem at hand is that makedirs() never checks whether the directory exists (that would trigger the mount too I presume).

Yes, it does. Have a look at line 148:
"""
     if head and tail and not path.exists(head):
"""

> Instead, it tries a mkdir and looks if it gets an EEXIST.

Actually, EEXIST is just caught to cope with race conditions (i.e. the
directory got created in between, TOCTTOU race).

> If you try that approach in this case where /net is non-writable and /net/prodigy appears only on demand, it fails with an EPERM instead.

Actually, no.
makedirs() does a recursive depth-first traversal:
makedirs('/net/prodigy/foo') will actually do something like:
"""
stat('/net/prodigy/foo') == ENOENT
stat('/net/prodigy') == ENONENT
mkdir('/net/prodigy') == EPERM
"""
The NFS mount should appear upon the first - or second - stat() call,
before mkdir().

But I'd like to be sure about that, that's why I think an strace()
output would be useful ;-)
msg159991 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-05-05 13:30
Hynek: you said "just like EEXIST", which doesn't check to see if the directory exists before continuing, thus my confusion.

If mkdir -p does a stat first, then changing the makedirs algorithm to match is probably not a bad idea for OS compatibility reasons, with autofs just being an example of why.
msg159993 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-05-05 14:08
>> Charles, I don't think you can blame autofs here. The problem at hand is that makedirs() never checks whether the directory exists (that would trigger the mount too I presume).
> 
> Yes, it does. Have a look at line 148:
> """
>      if head and tail and not path.exists(head):
> """

Now that's embarrassing. I take everything back and claim the opposite.
 
>> If you try that approach in this case where /net is non-writable and /net/prodigy appears only on demand, it fails with an EPERM instead.
> 
> Actually, no.
> makedirs() does a recursive depth-first traversal:
> makedirs('/net/prodigy/foo') will actually do something like:
> """
> stat('/net/prodigy/foo') == ENOENT
> stat('/net/prodigy') == ENONENT
> mkdir('/net/prodigy') == EPERM
> """
> The NFS mount should appear upon the first - or second - stat() call,
> before mkdir().
> 
> But I'd like to be sure about that, that's why I think an strace()
> output would be useful ;-)

I glanced over coreutil's mkdir code and it seems that it changes the directory while creating the directories. I suspect it's related to that. At least it does no fstab parsing. ;) The code is wayyy too intricate to be grokable at glancing though. mkdir -p seems to be quite a can of worms.


Andrew, please send straces. :)
msg160850 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-05-16 12:40
Andrew, are you still with us?
msg160892 - (view) Author: Andrew McNabb (amcnabb) Date: 2012-05-16 17:31
> Andrew, are you still with us?

I'm here, but it's been a busy few weeks. I'll see if I can spend some time on this today.
msg160899 - (view) Author: Andrew McNabb (amcnabb) Date: 2012-05-16 17:59
Some interesting information. If I do `os.mkdir('/net/prodigy/tmp')`, it gives "OSError: [Errno 13] Permission denied: '/net/prodigy/tmp'". However, if I instead do `os.mkdir('/net/prodigy/tmp/hi')`, it succeeds. (Note that I'm being careful to start from a fresh unmounted state in both cases).

I will post straces in a few minutes.
msg160920 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-05-16 19:21
I guess this is the magic in mkdir -p:

mkdir("expert", 0755)                   = -1 EACCES (Permission denied)
chdir("expert")                         = 0
mkdir("tmp", 0755)                      = -1 EEXIST (File exists)

I'm not sure how I feel about that. I think we have more or less a policy in os & shutil not to change directories unless really necessary (like make_archive).
msg160979 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-05-17 16:35
> I guess this is the magic in mkdir -p:
>
> mkdir("expert", 0755)                   = -1 EACCES (Permission denied)
> chdir("expert")                         = 0
> mkdir("tmp", 0755)                      = -1 EEXIST (File exists)
>
> I'm not sure how I feel about that. I think we have more or less a policy in os & shutil not to change directories unless really necessary (like make_archive).

Calling chdir() should be avoided, as it is not thread safe.

Actually, the real problem is that stat() alone doesn't trigger
automount, to avoid mount storms e.g. when using find, see:
https://bugzilla.redhat.com/show_bug.cgi?id=497830
http://lkml.indiana.edu/hypermail/linux/kernel/1109.1/00100.html

Since calling chdir() is not an option, if we want to fix this, I
guess the way to go would be to open() the directory, and not use
path.exists().

But I find really uncomfortable with adding such a kludge, especially
since this problem is probably present in many different places
throughout the stdlib.

Furthermore, it seems that kernel from 2.6.38 do trigger automount
upon stat() (see
http://lkml.indiana.edu/hypermail/linux/kernel/1109.1/00210.html), so
I'd suggest to drop this as "won't fix".
msg160995 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-05-17 18:24
I agree, closing.

Thank you for your valuable assistance Andrew!
msg160997 - (view) Author: Andrew McNabb (amcnabb) Date: 2012-05-17 18:31
This isn't fixed. All of the examples I've given were with a 3.3.0 kernel. Doing a stat would be a fix.
msg161003 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-05-17 18:59
If I read it correctly, we _do_ a stat:

stat("/net/prodigy", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
mkdir("/net/prodigy/tmp", 0777)         = -1 EACCES (Permission denied)

ISTM that this is a different problem altogether, or am I missing something?

If mkdir'ing of deeper directories works, it sounds like a race condition to me.
msg161006 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-05-17 19:07
My suspicion re the race condition is that a chdir waits for autofs to mount but mkdir doesn't. You could check that yourself if a chdir/sleep would help.
msg161007 - (view) Author: Andrew McNabb (amcnabb) Date: 2012-05-17 19:22
Hmm. Maybe there's a difference between doing stat('/net/prodigy') vs. stat('/net/prodigy/tmp'). Just a guess, but maybe the former can succeed before the mount completes, but the latter has to wait for the mount to complete.
msg161008 - (view) Author: Andrew McNabb (amcnabb) Date: 2012-05-17 19:24
By the way, if my hunch about the difference in stat of '/net/prodigy' vs. '/net/prodigy/tmp' is correct, then this would explain why makedirs on deeper directories work. Specifically, one of the shallower stat calls would force the mount to complete before proceeding to the next deeper level.
msg161018 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-05-17 21:34
Yes, creating the directories in a bottom-up way (i.e. '/net', '/net/prodigy', '/net/prodigy/foo') could maybe avoid this problem.
But this is definietely an autofs bug, and there are probably many other places where such code might break.
msg161024 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-05-17 22:52
I sense a very easy fix: just do a stat() on the final path before we do anything (and quickly return if present). Moving current makedirs into a private _makedirs that gets called if the stat fails.

I'm still feeling uneasy about that though. It _is_ a kludge.
msg161028 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-05-18 01:19
I agree that the proposed solutions are hacks, and still propose to close this as "won't fix".

I think this is a bug in Linux. mkdir("/net/prodigy/tmp") ought to trigger the mount, then fail with EEXIST.
msg161038 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-05-18 06:46
Alright, closing for good then.
Andrew, if you want to get this fixed, you should report this to the autofs folks, because it's definitely not a Python bug.
msg161055 - (view) Author: Andrew McNabb (amcnabb) Date: 2012-05-18 15:33
I see no evidence that this is a bug in Linux, and I think it's ridiculous to close it when a trivial one-line fix is available. I won't reopen it because it's obvious no one wants to address this. :(
msg161057 - (view) Author: Andrew McNabb (amcnabb) Date: 2012-05-18 15:45
I posted a bug report with the kernel here:

https://bugzilla.kernel.org/show_bug.cgi?id=43262
msg161116 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-05-19 13:57
> I see no evidence that this is a bug in Linux,

"""
stat("/net/prodigy", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
mkdir("/net/prodigy/tmp", 0777)         = -1 EACCES (Permission denied)
"""

As you can see, a stat() is already done on /net/prodigy.
Event if it wasn't done, calling mkdir() ought to trigger the mount.
So this is *definitely* as bug in autofs.

> and I think it's ridiculous to close it when a trivial one-line fix is available.

Which one-line fix do you propose?

> I won't reopen it because it's obvious no one wants to address this. :(

It's not that we don't want to address this, but rather that we want
to avoid introducing hacks to work around OS bugs.
msg161118 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-05-19 14:11
> """
> stat("/net/prodigy", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> mkdir("/net/prodigy/tmp", 0777)         = -1 EACCES (Permission denied)
> """
> 
> As you can see, a stat() is already done on /net/prodigy.

To be fair, that shouldn’t trigger a mount. Otherwise a `ls -l` on /net
would mount all volumes.

> Event if it wasn't done, calling mkdir() ought to trigger the mount.

I’m not sure if/where the behavior is defined – let’s what the Linux
people say.
msg161120 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-05-19 14:17
>> """
>> stat("/net/prodigy", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
>> mkdir("/net/prodigy/tmp", 0777)         = -1 EACCES (Permission denied)
>> """
>>
>> As you can see, a stat() is already done on /net/prodigy.
>
> To be fair, that shouldn’t trigger a mount. Otherwise a `ls -l` on /net
> would mount all volumes.

Not sure what "that" is: my view is that mkdir should trigger the mount,
/net/prodigy is already there and available. ls -l doesn't invoke mkdir(2),
so you wouldn't get a mount storm when it is mkdir that triggers the mount.
msg161121 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-05-19 14:20
>>> """
>>> stat("/net/prodigy", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
>>> mkdir("/net/prodigy/tmp", 0777)         = -1 EACCES (Permission denied)
>>> """
>>>
>>> As you can see, a stat() is already done on /net/prodigy.
>>
>> To be fair, that shouldn’t trigger a mount. Otherwise a `ls -l` on /net
>> would mount all volumes.
> 
> Not sure what "that" is: my view is that mkdir should trigger the mount,
> /net/prodigy is already there and available. ls -l doesn't invoke mkdir(2),
> so you wouldn't get a mount storm when it is mkdir that triggers the mount.

Sure, I was refering (as I hoped that my quoting would indicate) to the
stat on /net/prodigy, not the mkdir. I commented on the mkdir in the
next paragraph.
msg161280 - (view) Author: Andrew McNabb (amcnabb) Date: 2012-05-21 15:35
> Which one-line fix do you propose?

Doing a stat("/net/prodigy/tmp") before mkdir("/net/prodigy/tmp") is an extremely simple workaround.

Of course, I would love to see clear documentation of how the kernel is defined to behave in this situation. It would certainly be intuitive for mkdir("/net/prodigy/tmp") to force a mount of "/net/prodigy", but defined behavior isn't always intuitive. :)
msg161284 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-05-21 17:48
> Of course, I would love to see clear documentation of how the kernel  
> is defined to behave in this situation. It would certainly be  
> intuitive for mkdir("/net/prodigy/tmp") to force a mount of  
> "/net/prodigy", but defined behavior isn't always intuitive. :)

Maybe I'm confused, but the presence of "/net/prodigy" is *not*
the issue here, and what gets mounted is *not* "/net/prodigy",
but "/net/prodigy/tmp" (do "mount" to confirm or dispute).
msg161285 - (view) Author: Andrew McNabb (amcnabb) Date: 2012-05-21 17:58
> Maybe I'm confused, but the presence of "/net/prodigy" is *not*
> the issue here, and what gets mounted is *not* "/net/prodigy",
> but "/net/prodigy/tmp" (do "mount" to confirm or dispute).

No, /net/prodigy is the mountpoint in this case:

amcnabb@sage:~ :) mount |grep prodigy
prodigy:/ on /net/prodigy type nfs4 ([redacted])
msg163997 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-06-25 17:37
I still don't like the idea of adding such a kludge to work around OS
bugs, so I'd suggest closing as won't fix.
msg163998 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-25 17:46
+1 for closing.
History
Date User Action Args
2012-06-25 17:46:33pitrousetnosy: + pitrou
messages: + msg163998
2012-06-25 17:37:10neologixsetmessages: + msg163997
2012-05-21 17:58:18amcnabbsetmessages: + msg161285
2012-05-21 17:48:24loewissetmessages: + msg161284
2012-05-21 15:35:12amcnabbsetmessages: + msg161280
2012-05-19 14:20:32hyneksetmessages: + msg161121
2012-05-19 14:17:19loewissetmessages: + msg161120
2012-05-19 14:11:27hyneksetmessages: + msg161118
2012-05-19 13:57:34neologixsetmessages: + msg161116
2012-05-18 15:45:34amcnabbsetmessages: + msg161057
2012-05-18 15:33:40amcnabbsetmessages: + msg161055
2012-05-18 06:46:23neologixsetstatus: open -> closed

messages: + msg161038
stage: resolved
2012-05-18 01:19:30loewissetnosy: + loewis
messages: + msg161028
2012-05-17 22:53:00hyneksetmessages: + msg161024
2012-05-17 21:34:31neologixsetmessages: + msg161018
2012-05-17 19:24:54amcnabbsetmessages: + msg161008
2012-05-17 19:22:47amcnabbsetmessages: + msg161007
2012-05-17 19:07:06hyneksetmessages: + msg161006
2012-05-17 18:59:02hyneksetmessages: + msg161003
2012-05-17 18:31:07amcnabbsetstatus: closed -> open

messages: + msg160997
2012-05-17 18:24:40hyneksetstatus: open -> closed
resolution: wont fix
messages: + msg160995

stage: needs patch -> (no value)
2012-05-17 16:35:20neologixsetmessages: + msg160979
2012-05-16 19:21:43hyneksetmessages: + msg160920
2012-05-16 18:05:09amcnabbsetfiles: + makedirs.out
2012-05-16 18:03:07amcnabbsetfiles: + mkdir-p.out
2012-05-16 17:59:46amcnabbsetmessages: + msg160899
2012-05-16 17:31:47amcnabbsetmessages: + msg160892
2012-05-16 12:40:41hyneksetmessages: + msg160850
2012-05-05 14:08:57hyneksetkeywords: - easy

messages: + msg159993
2012-05-05 13:30:32r.david.murraysetmessages: + msg159991
2012-05-05 13:27:29neologixsetmessages: + msg159990
2012-05-05 11:37:24hyneksetmessages: + msg159988
2012-05-05 10:05:13neologixsetnosy: + neologix
messages: + msg159983
2012-05-05 08:23:50hyneksetmessages: + msg159980
2012-05-05 01:45:53r.david.murraysetnosy: + r.david.murray
messages: + msg159977
2012-04-30 20:02:11hyneksetkeywords: + easy
2012-04-30 19:32:16hyneksetversions: + Python 3.2, Python 3.3
nosy: + hynek

messages: + msg159730

type: behavior
stage: needs patch
2012-04-30 19:14:17amcnabbcreate