classification
Title: os.getcwd causes infinite loop on solaris
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: csernazs, pitrou, python-dev, skrah
Priority: normal Keywords: needs review, patch

Created on 2010-07-07 08:56 by csernazs, last changed 2012-08-29 13:21 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
issue9185.patch skrah, 2010-07-13 11:15
issue9185-2.patch skrah, 2010-07-13 16:50
Messages (17)
msg109459 - (view) Author: Zsolt Cserna (csernazs) * Date: 2010-07-07 08:56
os.getcwd() causes infinite loop on solaris10 when the length of the current directory is greater than 1024 (them limit of the maximum absolute path).

os.getcwd is implemented by a while loop in python, see the function posix_getcwd in posixmodule.c. That while loop loops forever because on solaris it sets errno to ERANGE and res to NULL - even if there will be no positive results from getcwd due to the extra long path.
This infinite while cycle also causes eating up the memory.

I think the solution would be implementing a hard limit for this while loop. It could be either fixed (16k for example), or dymanic: comparing the MAXPATHLEN macro to the size of the allocated buffer and if the size of the buffer is greater than MAXLPATHLEN then raise an OSError exception with the current errno.

That bug was triggered by test_posix unittest.
msg109462 - (view) Author: Zsolt Cserna (csernazs) * Date: 2010-07-07 09:18
There would be also a solution to allocate a fix buffer with MAXLPATHLEN  size and call the getcwd function only one time - if MAXLPATHLEN is available.
msg109464 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-07-07 09:40
I've encountered this a while ago on OpenSolaris. According to the man
page

  http://docs.sun.com/app/docs/doc/816-5168/getcwd-3c?a=view

this would be a bug in getcwd():


ERANGE

The size argument is greater than 0 and less than the length of the
pathname plus 1.
msg109465 - (view) Author: Zsolt Cserna (csernazs) * Date: 2010-07-07 09:48
You are right, but this bug could be easily avoided by using one of the suggested solutions. In my experience a fix sized buffer (whose size is MAXLPATHLEN - or 1024 if not defined) is usually passed to getcwd and the errno is propagated back to the caller if the result is NULL.

While getcwd() is buggy on solaris I think it's not the correct behavior from python to get to an infinite loop and eat up the memory.
msg109466 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-07-07 09:59
I agree that there should be a workaround. In py3k the function is
implemented like you suggest.
msg110170 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-07-13 11:15
Here's a patch. If PATH_MAX is defined, a static buffer is used. I left
the arbitrary path length version since apparently some systems (HURD)
don't have any limits for PATH_MAX.


In case anyone is mystified how to reproduce the original problem on
OpenSolaris:

# This worked fine:
./python Lib/test/regrtest.py -uall test_posix


# This showed the bug:
./python Lib/test/test_posix.py


The bug could also be reproduced by creating a path name longer than
1024 chars, changing into that directory and attempting to start python.


Antoine, could I ask you to comment on the patch?
msg110176 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-13 12:06
I'm not sure I understand the cause of the problem. Does getcwd() fail on Solaris when the path length is higher than PATH_MAX, even if you pass a big enough buffer?

First, your patch makes getcwd() return an error when the path length is longer than PATH_MAX, which it doesn't today. This is a regression and shouldn't probably go in.

Second, the test_posix change is a bit too tolerant. IMO it should check that the error is ERANGE, and that we are under Solaris. Otherwise the error shouldn't happen, should it?

Also, I wonder why py3k uses a simple hard-coded buffer of 1026 bytes (not even PATH_MAX+2). This is even worse than what you are proposing. I'll open a separate issue for it.
msg110182 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-07-13 12:41
Thanks for having a look at the patch!

Antoine Pitrou <report@bugs.python.org> wrote:
> I'm not sure I understand the cause of the problem. Does getcwd() fail on Solaris when the path length is higher than PATH_MAX, even if you pass a big enough buffer?

If the path length exceeds PATH_MAX, getcwd() keeps returning NULL and setting ERANGE,
so it enters an infinite loop, exhausting all memory. This is a bug in Solaris getcwd().

> First, your patch makes getcwd() return an error when the path length is longer than PATH_MAX, which it doesn't today. This is a regression and shouldn't probably go in.

Hm, on Linux I can't use os.getcwd() with paths longer than PATH_MAX as things are now:

$ cd /tmp/
$ for i in `seq 1 410`; do mkdir "123456789"; cd "123456789"; done
cd: error retrieving current directory: getcwd: cannot access parent directories: File name too long
$ python2.6
Python 2.6.5+ (release26-maint:81682M, Jun  6 2010, 11:22:46) 
[GCC 4.1.3 20080623 (prerelease) (Ubuntu 4.1.2-23ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.getcwd()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 36] File name too long

> Second, the test_posix change is a bit too tolerant. IMO it should check that the error is ERANGE, and that we are under Solaris. Otherwise the error shouldn't happen, should it?

If you change 1027 to 4098, the test currently fails on Linux, too. I think the only
reason why it never failed is that most systems have PATH_MAX=4096.

OSError: [Errno 36] File name too long

> Also, I wonder why py3k uses a simple hard-coded buffer of 1026 bytes (not even PATH_MAX+2). This is even worse than what you are proposing. I'll open a separate issue for it.

Good question. posix_getcwdu() also uses a buffer of 1026 in 2.7.
msg110184 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-13 12:51
> Hm, on Linux I can't use os.getcwd() with paths longer than PATH_MAX
> as things are now:

Oh, right. I was assuming 1024 for PATH_MAX when doing my tests, but it
really is 4096.

> > Second, the test_posix change is a bit too tolerant. IMO it should
> check that the error is ERANGE, and that we are under Solaris.
> Otherwise the error shouldn't happen, should it?
> 
> If you change 1027 to 4098, the test currently fails on Linux, too. I
> think the only
> reason why it never failed is that most systems have PATH_MAX=4096.

Ok, then perhaps the test should be fixed?
msg110188 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-07-13 13:28
Antoine Pitrou <report@bugs.python.org> wrote:
> > If you change 1027 to 4098, the test currently fails on Linux, too. I
> > think the only
> > reason why it never failed is that most systems have PATH_MAX=4096.
> 
> Ok, then perhaps the test should be fixed?

I wasn't really precise. The test fails on Linux, but for a different reason.
Linux legitimately sets ENAMETOOLONG and raises OSError. This only becomes
apparent when using 4098 in the test.

Solaris, on the other hand, does not even raise, since it keeps setting
ERANGE and thus does not leave the loop in posix_getcwd(). IOW, only the
fix in posixmodule.c allows the test to fail properly in the first place.

If you prefer, of course it's possible to be conservative and make the new
version of posix_getcwd() Solaris specific.
msg110196 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-13 14:00
> I wasn't really precise. The test fails on Linux, but for a different reason.
> Linux legitimately sets ENAMETOOLONG and raises OSError. This only becomes
> apparent when using 4098 in the test.
> 
> Solaris, on the other hand, does not even raise, since it keeps setting
> ERANGE and thus does not leave the loop in posix_getcwd(). IOW, only the
> fix in posixmodule.c allows the test to fail properly in the first place.

Ok. Still, silencing all OSErrors in test_posix is much too broad. The
code should check for expected error codes, possibly depending on the
OS.

> If you prefer, of course it's possible to be conservative and make the new
> version of posix_getcwd() Solaris specific.

I think this would be the most reasonable solution for 2.7, indeed.
msg110212 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-07-13 16:50
OpenBSD has the same getcwd() bug. It was uncovered by raising current_path_length
to 4099 in test_posix.

Here is a new patch that enables the changed posix_getcwd() function
on Solaris and OpenBSD only. I've tested the patch on Linux, OpenSolaris,
OpenBSD and FreeBSD.


So far, there are three categories of behavior if PATH_MAX is exceeded:


1) Solaris, OpenBSD: buggy, getcwd() keeps returning NULL/ERANGE.

2) Linux: getcwd() returns NULL/ENAMETOOLONG.

3) FreeBSD: getcwd() returns SUCCESS/0.


So FreeBSD is one of the systems that benefits from principally
unlimited path lengths (though I doubt it is used much).


I think the changes in the unit test handle all categories well,
and perhaps they will uncover more problem systems.
msg110213 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-13 17:03
Nice! The patch looks good to me.
msg110221 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-07-13 19:22
Antoine, thanks! Committed in release27-maint (r82853).


Zsolt, could you confirm that issue9185-2.patch (or r82853) works for you?
msg110765 - (view) Author: Zsolt Cserna (csernazs) * Date: 2010-07-19 15:23
I confirm that test_posix passes after applying the patch issue9185-2.patch on solaris 8.
Thank you. Now solaris and openbsd have a clean os.getcwd() implementation.
msg110768 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-07-19 15:47
Thanks for testing! - Fix also backported to release26-maint in r82975.
Closing this one.
msg169376 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-08-29 13:21
New changeset bfa5d0ddfbeb by Trent Nelson in branch '2.7':
Issue #15765: Fix quirky NetBSD getcwd() behaviour.
http://hg.python.org/cpython/rev/bfa5d0ddfbeb
History
Date User Action Args
2012-08-29 13:21:34python-devsetnosy: + python-dev
messages: + msg169376
2010-07-20 18:48:26skrahlinkissue6976 superseder
2010-07-19 15:47:05skrahsetstatus: open -> closed
resolution: fixed
messages: + msg110768

stage: patch review -> resolved
2010-07-19 15:23:40csernazssetmessages: + msg110765
2010-07-13 19:22:36skrahsetmessages: + msg110221
2010-07-13 17:03:24pitrousetmessages: + msg110213
2010-07-13 16:50:10skrahsetfiles: + issue9185-2.patch

messages: + msg110212
2010-07-13 14:00:57pitrousetmessages: + msg110196
2010-07-13 13:28:09skrahsetmessages: + msg110188
2010-07-13 12:51:19pitrousetmessages: + msg110184
2010-07-13 12:41:54skrahsetmessages: + msg110182
2010-07-13 12:06:02pitrousetmessages: + msg110176
2010-07-13 11:15:49skrahsetfiles: + issue9185.patch

nosy: + pitrou
messages: + msg110170

keywords: + needs review, patch
stage: needs patch -> patch review
2010-07-07 09:59:06skrahsetmessages: + msg109466
stage: needs patch
2010-07-07 09:48:03csernazssetmessages: + msg109465
2010-07-07 09:40:34skrahsetnosy: + skrah
messages: + msg109464
2010-07-07 09:18:33csernazssetmessages: + msg109462
2010-07-07 08:56:35csernazscreate