Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os.getcwd causes infinite loop on solaris #53431

Closed
csernazs mannequin opened this issue Jul 7, 2010 · 17 comments
Closed

os.getcwd causes infinite loop on solaris #53431

csernazs mannequin opened this issue Jul 7, 2010 · 17 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@csernazs
Copy link
Mannequin

csernazs mannequin commented Jul 7, 2010

BPO 9185
Nosy @csernazs, @pitrou, @skrah
Files
  • issue9185.patch
  • issue9185-2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-07-19.15:47:05.824>
    created_at = <Date 2010-07-07.08:56:35.518>
    labels = ['type-bug', 'library']
    title = 'os.getcwd causes infinite loop on solaris'
    updated_at = <Date 2012-08-29.13:21:33.994>
    user = 'https://github.com/csernazs'

    bugs.python.org fields:

    activity = <Date 2012-08-29.13:21:33.994>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-07-19.15:47:05.824>
    closer = 'skrah'
    components = ['Library (Lib)']
    creation = <Date 2010-07-07.08:56:35.518>
    creator = 'csernazs'
    dependencies = []
    files = ['17980', '17989']
    hgrepos = []
    issue_num = 9185
    keywords = ['patch', 'needs review']
    message_count = 17.0
    messages = ['109459', '109462', '109464', '109465', '109466', '110170', '110176', '110182', '110184', '110188', '110196', '110212', '110213', '110221', '110765', '110768', '169376']
    nosy_count = 4.0
    nosy_names = ['csernazs', 'pitrou', 'skrah', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue9185'
    versions = ['Python 2.7']

    @csernazs
    Copy link
    Mannequin Author

    csernazs mannequin commented Jul 7, 2010

    os.getcwd() causes infinite loop on solaris10 when the length of the current directory is greater than 1024 (them limit of the maximum absolute path).

    os.getcwd is implemented by a while loop in python, see the function posix_getcwd in posixmodule.c. That while loop loops forever because on solaris it sets errno to ERANGE and res to NULL - even if there will be no positive results from getcwd due to the extra long path.
    This infinite while cycle also causes eating up the memory.

    I think the solution would be implementing a hard limit for this while loop. It could be either fixed (16k for example), or dymanic: comparing the MAXPATHLEN macro to the size of the allocated buffer and if the size of the buffer is greater than MAXLPATHLEN then raise an OSError exception with the current errno.

    That bug was triggered by test_posix unittest.

    @csernazs csernazs mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jul 7, 2010
    @csernazs
    Copy link
    Mannequin Author

    csernazs mannequin commented Jul 7, 2010

    There would be also a solution to allocate a fix buffer with MAXLPATHLEN size and call the getcwd function only one time - if MAXLPATHLEN is available.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 7, 2010

    I've encountered this a while ago on OpenSolaris. According to the man
    page

    http://docs.sun.com/app/docs/doc/816-5168/getcwd-3c?a=view

    this would be a bug in getcwd():

    ERANGE

    The size argument is greater than 0 and less than the length of the
    pathname plus 1.

    @csernazs
    Copy link
    Mannequin Author

    csernazs mannequin commented Jul 7, 2010

    You are right, but this bug could be easily avoided by using one of the suggested solutions. In my experience a fix sized buffer (whose size is MAXLPATHLEN - or 1024 if not defined) is usually passed to getcwd and the errno is propagated back to the caller if the result is NULL.

    While getcwd() is buggy on solaris I think it's not the correct behavior from python to get to an infinite loop and eat up the memory.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 7, 2010

    I agree that there should be a workaround. In py3k the function is
    implemented like you suggest.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 13, 2010

    Here's a patch. If PATH_MAX is defined, a static buffer is used. I left
    the arbitrary path length version since apparently some systems (HURD)
    don't have any limits for PATH_MAX.

    In case anyone is mystified how to reproduce the original problem on
    OpenSolaris:

    # This worked fine:
    ./python Lib/test/regrtest.py -uall test_posix

    # This showed the bug:
    ./python Lib/test/test_posix.py

    The bug could also be reproduced by creating a path name longer than
    1024 chars, changing into that directory and attempting to start python.

    Antoine, could I ask you to comment on the patch?

    @pitrou
    Copy link
    Member

    pitrou commented Jul 13, 2010

    I'm not sure I understand the cause of the problem. Does getcwd() fail on Solaris when the path length is higher than PATH_MAX, even if you pass a big enough buffer?

    First, your patch makes getcwd() return an error when the path length is longer than PATH_MAX, which it doesn't today. This is a regression and shouldn't probably go in.

    Second, the test_posix change is a bit too tolerant. IMO it should check that the error is ERANGE, and that we are under Solaris. Otherwise the error shouldn't happen, should it?

    Also, I wonder why py3k uses a simple hard-coded buffer of 1026 bytes (not even PATH_MAX+2). This is even worse than what you are proposing. I'll open a separate issue for it.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 13, 2010

    Thanks for having a look at the patch!

    Antoine Pitrou <report@bugs.python.org> wrote:

    I'm not sure I understand the cause of the problem. Does getcwd() fail on Solaris when the path length is higher than PATH_MAX, even if you pass a big enough buffer?

    If the path length exceeds PATH_MAX, getcwd() keeps returning NULL and setting ERANGE,
    so it enters an infinite loop, exhausting all memory. This is a bug in Solaris getcwd().

    First, your patch makes getcwd() return an error when the path length is longer than PATH_MAX, which it doesn't today. This is a regression and shouldn't probably go in.

    Hm, on Linux I can't use os.getcwd() with paths longer than PATH_MAX as things are now:

    $ cd /tmp/
    $ for i in `seq 1 410`; do mkdir "123456789"; cd "123456789"; done
    cd: error retrieving current directory: getcwd: cannot access parent directories: File name too long
    $ python2.6
    Python 2.6.5+ (release26-maint:81682M, Jun  6 2010, 11:22:46) 
    [GCC 4.1.3 20080623 (prerelease) (Ubuntu 4.1.2-23ubuntu3)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os
    >>> os.getcwd()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: [Errno 36] File name too long

    Second, the test_posix change is a bit too tolerant. IMO it should check that the error is ERANGE, and that we are under Solaris. Otherwise the error shouldn't happen, should it?

    If you change 1027 to 4098, the test currently fails on Linux, too. I think the only
    reason why it never failed is that most systems have PATH_MAX=4096.

    OSError: [Errno 36] File name too long

    Also, I wonder why py3k uses a simple hard-coded buffer of 1026 bytes (not even PATH_MAX+2). This is even worse than what you are proposing. I'll open a separate issue for it.

    Good question. posix_getcwdu() also uses a buffer of 1026 in 2.7.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 13, 2010

    Hm, on Linux I can't use os.getcwd() with paths longer than PATH_MAX
    as things are now:

    Oh, right. I was assuming 1024 for PATH_MAX when doing my tests, but it
    really is 4096.

    > Second, the test_posix change is a bit too tolerant. IMO it should
    check that the error is ERANGE, and that we are under Solaris.
    Otherwise the error shouldn't happen, should it?

    If you change 1027 to 4098, the test currently fails on Linux, too. I
    think the only
    reason why it never failed is that most systems have PATH_MAX=4096.

    Ok, then perhaps the test should be fixed?

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 13, 2010

    Antoine Pitrou <report@bugs.python.org> wrote:

    > If you change 1027 to 4098, the test currently fails on Linux, too. I
    > think the only
    > reason why it never failed is that most systems have PATH_MAX=4096.

    Ok, then perhaps the test should be fixed?

    I wasn't really precise. The test fails on Linux, but for a different reason.
    Linux legitimately sets ENAMETOOLONG and raises OSError. This only becomes
    apparent when using 4098 in the test.

    Solaris, on the other hand, does not even raise, since it keeps setting
    ERANGE and thus does not leave the loop in posix_getcwd(). IOW, only the
    fix in posixmodule.c allows the test to fail properly in the first place.

    If you prefer, of course it's possible to be conservative and make the new
    version of posix_getcwd() Solaris specific.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 13, 2010

    I wasn't really precise. The test fails on Linux, but for a different reason.
    Linux legitimately sets ENAMETOOLONG and raises OSError. This only becomes
    apparent when using 4098 in the test.

    Solaris, on the other hand, does not even raise, since it keeps setting
    ERANGE and thus does not leave the loop in posix_getcwd(). IOW, only the
    fix in posixmodule.c allows the test to fail properly in the first place.

    Ok. Still, silencing all OSErrors in test_posix is much too broad. The
    code should check for expected error codes, possibly depending on the
    OS.

    If you prefer, of course it's possible to be conservative and make the new
    version of posix_getcwd() Solaris specific.

    I think this would be the most reasonable solution for 2.7, indeed.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 13, 2010

    OpenBSD has the same getcwd() bug. It was uncovered by raising current_path_length
    to 4099 in test_posix.

    Here is a new patch that enables the changed posix_getcwd() function
    on Solaris and OpenBSD only. I've tested the patch on Linux, OpenSolaris,
    OpenBSD and FreeBSD.

    So far, there are three categories of behavior if PATH_MAX is exceeded:

    1. Solaris, OpenBSD: buggy, getcwd() keeps returning NULL/ERANGE.

    2. Linux: getcwd() returns NULL/ENAMETOOLONG.

    3. FreeBSD: getcwd() returns SUCCESS/0.

    So FreeBSD is one of the systems that benefits from principally
    unlimited path lengths (though I doubt it is used much).

    I think the changes in the unit test handle all categories well,
    and perhaps they will uncover more problem systems.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 13, 2010

    Nice! The patch looks good to me.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 13, 2010

    Antoine, thanks! Committed in release27-maint (r82853).

    Zsolt, could you confirm that bpo-9185-2.patch (or r82853) works for you?

    @csernazs
    Copy link
    Mannequin Author

    csernazs mannequin commented Jul 19, 2010

    I confirm that test_posix passes after applying the patch bpo-9185-2.patch on solaris 8.
    Thank you. Now solaris and openbsd have a clean os.getcwd() implementation.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 19, 2010

    Thanks for testing! - Fix also backported to release26-maint in r82975.
    Closing this one.

    @skrah skrah mannequin closed this as completed Jul 19, 2010
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 29, 2012

    New changeset bfa5d0ddfbeb by Trent Nelson in branch '2.7':
    Issue bpo-15765: Fix quirky NetBSD getcwd() behaviour.
    http://hg.python.org/cpython/rev/bfa5d0ddfbeb

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant