Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ntpath.abspath fails for long str paths #48321

Closed
JDay mannequin opened this issue Oct 8, 2008 · 22 comments
Closed

ntpath.abspath fails for long str paths #48321

JDay mannequin opened this issue Oct 8, 2008 · 22 comments
Labels
OS-windows type-bug An unexpected behavior, bug, or error

Comments

@JDay
Copy link
Mannequin

JDay mannequin commented Oct 8, 2008

BPO 4071
Nosy @loewis, @amauryfa, @tjguk, @zware, @serhiy-storchaka, @eryksun, @zooba
Files
  • fix_getfullpathname.patch
  • quick_hack_for_getfullpathname_v2.patch: 1st PyArg_ParseTuple: "et#" => "S"
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-03-16.07:32:12.858>
    created_at = <Date 2008-10-08.00:27:14.716>
    labels = ['type-bug', 'OS-windows']
    title = 'ntpath.abspath fails for long str paths'
    updated_at = <Date 2019-03-16.07:32:12.857>
    user = 'https://bugs.python.org/JDay'

    bugs.python.org fields:

    activity = <Date 2019-03-16.07:32:12.857>
    actor = 'eryksun'
    assignee = 'ocean-city'
    closed = True
    closed_date = <Date 2019-03-16.07:32:12.858>
    closer = 'eryksun'
    components = ['Windows']
    creation = <Date 2008-10-08.00:27:14.716>
    creator = 'JDay'
    dependencies = []
    files = ['11755', '11758']
    hgrepos = []
    issue_num = 4071
    keywords = ['patch']
    message_count = 22.0
    messages = ['74502', '74504', '74508', '74510', '74519', '74522', '74530', '74549', '74593', '74594', '74607', '75628', '77500', '77501', '77523', '77737', '77799', '175113', '220007', '236972', '237007', '338059']
    nosy_count = 9.0
    nosy_names = ['loewis', 'amaury.forgeotdarc', 'ocean-city', 'tim.golden', 'JDay', 'zach.ware', 'serhiy.storchaka', 'eryksun', 'steve.dower']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue4071'
    versions = ['Python 2.7']

    @JDay
    Copy link
    Mannequin Author

    JDay mannequin commented Oct 8, 2008

    On my system (Windows Server 2008 SP1 - 64-bit, Python 2.5.2 - 32-bit),
    simple actions like:
    >>> help(help) # Or any function
    or
    >>> import tempfile
    >>> f = tempfile.mktemp()
    result in this (rather confusing) error:
    TypeError: _getfullpathname() argument 1 must be (buffer overflow), not str

    Apparently, _getfullpathname() chokes on certain paths if they are not
    supplied as unicode. Locally, I was able to work around the issue by
    changing the call to _getfullpathname in ntpath.abspath to:
    path = str(_getfullpathname(unicode(path)))

    @JDay JDay mannequin added type-crash A hard crash of the interpreter, possibly with a core dump OS-windows labels Oct 8, 2008
    @amauryfa
    Copy link
    Member

    amauryfa commented Oct 8, 2008

    The (buffer overflow) message indicates that the argument is too long to
    be converted. In your case, it seems that the path is longer than
    MAX_PATH=255 characters. What is the value of the argument of _getfullpathname()?

    @JDay
    Copy link
    Mannequin Author

    JDay mannequin commented Oct 8, 2008

    Running help() or mktemp() causes _getfullpathname to be called with the whole system path (791 characters). If you pass that to _getfullpathname as str it throws the aforementioned TypeError. If it's passed as unicode, it returns an empty string.
    The offending _getfullpathname call occurs on the first call to one of these methods. Future calls to either do not call it (unless, of course, the first failed).

    @ocean-city
    Copy link
    Mannequin

    ocean-city mannequin commented Oct 8, 2008

    According to http://msdn.microsoft.com/en-us/library/aa364963.aspx,
    current implementation have several problems.

    In the ANSI version of this function, the name is limited to MAX_PATH
    characters. To extend this limit to 32,767 wide characters, call the
    Unicode version of the function and prepend "\\?\" to the path.

    :-(

    If the lpBuffer buffer is too small to contain the path, the return
    value is the size, in TCHARs, of the buffer that is required to hold
    the path and the terminating null character.

    We should allocate new buffer with this size and retry like already
    doing for GetCurrentDirectory.

    And one decision problem... What should we do when too long str is
    passed to ntpath._getfullpathname? Report overflow error? Or convert to
    unicode and retry with GetFullPathNameW? Hmm....

    @amauryfa
    Copy link
    Member

    amauryfa commented Oct 8, 2008

    Running help() or mktemp() causes _getfullpathname to be called
    with the whole system path (791 characters)

    I am not sure to understand. Do you mean the whole PATH environment
    variable? I doubt that it is passed to _getfullpathname.
    Or do you have very long paths for one directory? the TEMP environment
    variable, for example? I'd be curious to see its value.

    @JDay
    Copy link
    Mannequin Author

    JDay mannequin commented Oct 8, 2008

    I am not sure to understand. Do you mean the whole PATH environment
    variable? I doubt that it is passed to _getfullpathname.
    Or do you have very long paths for one directory? the TEMP environment
    variable, for example? I'd be curious to see its value.

    I don't have it offhand, but it was the whole PATH environment variable, complete with semicolons. That's probably the *real* bug. Whatever was passing that into abspath didn't seem to mind getting back an empty string (although that may have been further processed in the function, I didn't follow past the call to _getfullpathname).

    And one decision problem... What should we do when too long str is
    passed to ntpath._getfullpathname? Report overflow error? Or convert to
    unicode and retry with GetFullPathNameW? Hmm....

    abspath should be able to be called with str or unicode of arbitrary lengths. Consumers of it shouldn't have to be concerned with the platform implementation when it can be smoothed over by the module. Whether this is done in abspath or _getfullpathname probably isn't too important, since end-users generally shouldn't be calling _getfullpathname, directly.

    @amauryfa
    Copy link
    Member

    amauryfa commented Oct 8, 2008

    I don't have it offhand, but it was the whole PATH environment
    variable, complete with semicolons. That's probably the *real* bug.

    Indeed. Do you happen to have the complete traceback of the failing
    tempfile.mktemp() call? I don't see where it can use the PATH
    environment variable.

    @JDay JDay mannequin changed the title ntpath.abspath can fail on Win Server 2008 (64-bit) ntpath.abspath fails for long str paths Oct 8, 2008
    @JDay
    Copy link
    Mannequin Author

    JDay mannequin commented Oct 8, 2008

    Indeed. Do you happen to have the complete traceback of the failing
    tempfile.mktemp() call? I don't see where it can use the PATH
    environment variable.

    The problem was that somehow, on our systems, the TEMP environmental variable had been copied over with PATH. Most likely some batch file tried to store a copy of PATH, without realizing the significance of TEMP. [groan]

    Anyway, I still think that it's a bug that abspath() can't be called with a perfectly good str path, because of limitations with the windows api. I edited the bug title to reflect the actual bug.

    The str path length could be checked and upgraded to the Unicode version, if necessary (or try again with the unicode version, in the case of an exception). I think it's important to ensure that when abspath() is called with str, it returns str, even if it was upgraded to the unicode call.

    @JDay JDay mannequin changed the title ntpath.abspath fails for long str paths ntpath.abspath can fail on Win Server 2008 (64-bit) Oct 8, 2008
    @JDay JDay mannequin changed the title ntpath.abspath can fail on Win Server 2008 (64-bit) ntpath.abspath fails for long str paths Oct 9, 2008
    @ocean-city
    Copy link
    Mannequin

    ocean-city mannequin commented Oct 9, 2008

    I think attached patch "fix_getfullpathname.patch" will fix unicode
    issue at least. For ansi issue, I followed the manner of win32_chdir for
    now.

    After some investigation, GetFullPathNameA fails if output size is more
    than MAX_PATH even if input size is less than MAX_PATH. I feel it's
    difficult check this before invoking GetFullPathNameA.

    This is test for unicode issue.

    /////////////////////////////////////////////////////////

    import unittest
    import ntpath
    import os
    
    class TestCase(unittest.TestCase):
        def test_getfullpathname(self):
            for count in xrange(1, 1000):
                name = u"x" * count
                path = ntpath._getfullpathname(name)
                self.assertEqual(os.path.basename(path), name)
    
    if __name__ == '__main__':
        unittest.main()

    @ocean-city
    Copy link
    Mannequin

    ocean-city mannequin commented Oct 9, 2008

    I feel it's difficult to check this before invoking GetFullPathNameA.

    And error number via GetLastError() is vogus, sometimes 0, sometimes others.

    @ocean-city
    Copy link
    Mannequin

    ocean-city mannequin commented Oct 9, 2008

    Or, if PyArg_ParseTuple overflowed or GetFullPathNameA failed, (not
    check GetLastError() because it's vogus) try GetFullPathNameW like
    attached file "quick_hack_for_getfullpathname.patch".

    This inverses flow

        if (unicode_file_names()) {
            /* unicode */
        }
        /* ascii */

    # Maybe it would be nice if convert_to_unicode() functionality is built
    in PyArg_ParseTuple. (inverse of "et")

    Be care, this is quick hack, so maybe buggy. I confirmed test_os and
    test_ntpath passed though.

    /////////////////////////////////////////////////////

    import unittest
    import ntpath
    import os
    
    class TestCase(unittest.TestCase):
        def test_getfullpathname(self):
            for c in ('x', u'x'):
                for count in xrange(1, 1000):
                    name = c * count
                    path = ntpath._getfullpathname(name)
                    self.assertEqual(os.path.basename(path), name)
    
    if __name__ == '__main__':
        unittest.main()

    @ocean-city
    Copy link
    Mannequin

    ocean-city mannequin commented Nov 8, 2008

    Fixed unicode issue in r67154(trunk). I'm not sure how to handle long
    str, so I didn't touch it for now.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Dec 10, 2008

    ocean-city, can you please backport this to the 2.5 branch?

    @loewis loewis mannequin added the release-blocker label Dec 10, 2008
    @loewis loewis mannequin assigned ocean-city Dec 10, 2008
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Dec 10, 2008

    As a follow-up: please don't forget to add Misc/NEWS entries (both for
    the already-committed patch, as well as for the backport).

    @ocean-city
    Copy link
    Mannequin

    ocean-city mannequin commented Dec 10, 2008

    can you please backport this to the 2.5 branch?

    Sorry for clarification. Which should I backport r67154 or
    quick_hack_for_getfullpathname_v2.patch?

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Dec 13, 2008

    As it is apparently not clear what change exactly needs to be applied to
    2.5, I'm declaring that this fix will not be backported.

    I would appreciate if somebody could summarize what still needs to be
    done, or (if nothing needs to be done), close this issue.

    @loewis loewis mannequin removed the release-blocker label Dec 13, 2008
    @ocean-city
    Copy link
    Mannequin

    ocean-city mannequin commented Dec 14, 2008

    GetFullPathNameW may return the required buffer size (non-zero value)
    when buffer is too short. Before r67154, this case was treated as
    success, so there was possibility of access to uninitialized buffer
    woutbuf. Fortunately, GetFullPathNameW sets '\0' to woutbuf (this is
    undocumented behavior), so abspath() returned empty string instead of
    segmentation fault. But this is still potentially dangerous, so I fixed
    this by allocating required size buffer and calling GetFullPathNameW
    again. abspath() returns correct result for any long unicode path now.

    But original poster hopes abspath() should return correct result for
    any long both *str* and unicode path. For str, this issue is not solved
    yet.

    --
    I'm skeptical abspath() should support longer path than MAX_PATH if
    ANSI version of Windows API cannot handle such path. Anyway, I won't
    use such long path. ;-)

    @serhiy-storchaka
    Copy link
    Member

    I doubt this issue exists on Python >= 3.2.

    See also bpo-1776160.

    @serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Dec 20, 2012
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jun 8, 2014

    I don't see that any further work can be done here owing to limitations of the Windows str API. Note that the same argument can be applied to bpo-1776160.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Mar 1, 2015

    Can we close this as I'm not aware of any possible way to fix this? See also bpo-1776160.

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 2, 2015

    Can we close this as I'm not aware of any possible way to fix this?

    Windows system and C runtime calls that take paths could be restricted to wide-character APIs, such as calling GetFullPathnameW in this case, or _wexecve instead of execve (bpo-23462). Then for bytes paths an extension can call PyUnicode_FSDecoder (PyUnicode_DecodeMBCS). In posxmodule.c this can be handled in the path_converter function.

    path_converter
    https://hg.python.org/cpython/file/5d4b6a57d5fd/Modules/posixmodule.c#l698

    path_converter could be moved to Python/fileutils.c to make it available for use by other modules such as io.

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 16, 2019

    I'm closing this as a resolved issue. Python 2 is approaching end of life, and I don't see a pressing need for my suggestion in msg237007. We should be using unicode for paths in Windows anyway since its file systems are natively Unicode.

    @eryksun eryksun closed this as completed Mar 16, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    OS-windows type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants