classification
Title: ntpath.abspath fails for long str paths
Type: behavior Stage: resolved
Components: Windows Versions: Python 2.7
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: ocean-city Nosy List: JDay, amaury.forgeotdarc, eryksun, loewis, ocean-city, serhiy.storchaka, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2008-10-08 00:27 by JDay, last changed 2019-03-16 07:32 by eryksun. This issue is now closed.

Files
File name Uploaded Description Edit
fix_getfullpathname.patch ocean-city, 2008-10-09 18:37 review
quick_hack_for_getfullpathname_v2.patch ocean-city, 2008-10-09 22:41 1st PyArg_ParseTuple: "et#" => "S" review
Messages (22)
msg74502 - (view) Author: Jason Day (JDay) Date: 2008-10-08 00:27
On my system (Windows Server 2008 SP1 - 64-bit, Python 2.5.2 - 32-bit),
simple actions like:
>>> help(help) # Or any function
or
>>> import tempfile
>>> f = tempfile.mktemp()
result in this (rather confusing) error:
TypeError: _getfullpathname() argument 1 must be (buffer overflow), not str

Apparently, _getfullpathname() chokes on certain paths if they are not
supplied as unicode. Locally, I was able to work around the issue by
changing the call to _getfullpathname in ntpath.abspath to:
                path = str(_getfullpathname(unicode(path)))
msg74504 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-10-08 00:42
The (buffer overflow) message indicates that the argument is too long to 
be converted. In your case, it seems that the path is longer than 
MAX_PATH=255 characters. What is the value of the argument of _getfullpathname()?
msg74508 - (view) Author: Jason Day (JDay) Date: 2008-10-08 01:21
Running help() or mktemp() causes _getfullpathname to be called with the whole system path (791 characters). If you pass that to _getfullpathname as str it throws the aforementioned TypeError. If it's passed as unicode, it returns an empty string.
The offending _getfullpathname call occurs on the first call to one of these methods. Future calls to either do not call it (unless, of course, the first failed).
msg74510 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-10-08 02:23
According to http://msdn.microsoft.com/en-us/library/aa364963.aspx,
current implementation have several problems.

>In the ANSI version of this function, the name is limited to MAX_PATH
>characters. To extend this limit to 32,767 wide characters, call the
>Unicode version of the function and prepend "\\?\" to the path.

:-(

>If the lpBuffer buffer is too small to contain the path, the return
>value is the size, in TCHARs, of the buffer that is required to hold
>the path and the terminating null character.

We should allocate new buffer with this size and retry like already
doing for GetCurrentDirectory.

And one decision problem... What should we do when too long str is
passed to ntpath._getfullpathname? Report overflow error? Or convert to
unicode and retry with GetFullPathNameW? Hmm....
msg74519 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-10-08 08:03
> Running help() or mktemp() causes _getfullpathname to be called 
> with the whole system path (791 characters)

I am not sure to understand. Do you mean the whole PATH environment
variable? I doubt that it is passed to _getfullpathname. 
Or do you have very long paths for one directory? the TEMP environment
variable, for example? I'd be curious to see its value.
msg74522 - (view) Author: Jason Day (JDay) Date: 2008-10-08 09:07
> I am not sure to understand. Do you mean the whole PATH environment
> variable? I doubt that it is passed to _getfullpathname.
> Or do you have very long paths for one directory? the TEMP environment
> variable, for example? I'd be curious to see its value.

I don't have it offhand, but it was the whole PATH environment variable, complete with semicolons. That's probably the *real* bug. Whatever was passing that into abspath didn't seem to mind getting back an empty string (although that may have been further processed in the function, I didn't follow past the call to _getfullpathname).

> And one decision problem... What should we do when too long str is
> passed to ntpath._getfullpathname? Report overflow error? Or convert to
> unicode and retry with GetFullPathNameW? Hmm....

abspath should be able to be called with str or unicode of arbitrary lengths. Consumers of it shouldn't have to be concerned with the platform implementation when it can be smoothed over by the module. Whether this is done in abspath or _getfullpathname probably isn't too important, since end-users generally shouldn't be calling _getfullpathname, directly.
msg74530 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-10-08 14:49
> I don't have it offhand, but it was the whole PATH environment
> variable, complete with semicolons. That's probably the *real* bug. 

Indeed. Do you happen to have the complete traceback of the failing
tempfile.mktemp() call? I don't see where it can use the PATH
environment variable.
msg74549 - (view) Author: Jason Day (JDay) Date: 2008-10-08 22:00
> Indeed. Do you happen to have the complete traceback of the failing
> tempfile.mktemp() call? I don't see where it can use the PATH
> environment variable.

The problem was that somehow, on our systems, the TEMP environmental variable had been copied over with PATH. Most likely some batch file tried to store a copy of PATH, without realizing the significance of TEMP. [groan]

Anyway, I still think that it's a bug that abspath() can't be called with a perfectly good str path, because of limitations with the windows api. I edited the bug title to reflect the actual bug.

The str path length could be checked and upgraded to the Unicode version, if necessary (or try again with the unicode version, in the case of an exception). I think it's important to ensure that when abspath() is called with str, it returns str, even if it was upgraded to the unicode call.
msg74593 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-10-09 18:37
I think attached patch "fix_getfullpathname.patch" will fix unicode
issue at least. For ansi issue, I followed the manner of win32_chdir for
now.

After some investigation, GetFullPathNameA fails if output size is more
than MAX_PATH even if input size is less than MAX_PATH. I feel it's
difficult check this before invoking GetFullPathNameA.

This is test for unicode issue.

/////////////////////////////////////////////////////////

import unittest
import ntpath
import os

class TestCase(unittest.TestCase):
    def test_getfullpathname(self):
        for count in xrange(1, 1000):
            name = u"x" * count
            path = ntpath._getfullpathname(name)
            self.assertEqual(os.path.basename(path), name)

if __name__ == '__main__':
    unittest.main()
msg74594 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-10-09 18:39
>I feel it's difficult to check this before invoking GetFullPathNameA.

And error number via GetLastError() is vogus, sometimes 0, sometimes others.
msg74607 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-10-09 21:33
Or, if PyArg_ParseTuple overflowed or GetFullPathNameA failed, (not
check GetLastError() because it's vogus) try GetFullPathNameW like
attached file "quick_hack_for_getfullpathname.patch".

This inverses flow

    if (unicode_file_names()) {
        /* unicode */
    }
    /* ascii */

# Maybe it would be nice if convert_to_unicode() functionality is built
in PyArg_ParseTuple. (inverse of "et")

Be care, this is quick hack, so maybe buggy. I confirmed test_os and
test_ntpath passed though.

/////////////////////////////////////////////////////

import unittest
import ntpath
import os

class TestCase(unittest.TestCase):
    def test_getfullpathname(self):
        for c in ('x', u'x'):
            for count in xrange(1, 1000):
                name = c * count
                path = ntpath._getfullpathname(name)
                self.assertEqual(os.path.basename(path), name)

if __name__ == '__main__':
    unittest.main()
msg75628 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-11-08 03:49
Fixed unicode issue in r67154(trunk). I'm not sure how to handle long
str, so I didn't touch it for now.
msg77500 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-10 08:38
ocean-city, can you please backport this to the 2.5 branch?
msg77501 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-10 08:39
As a follow-up: please don't forget to add Misc/NEWS entries (both for
the already-committed patch, as well as for the backport).
msg77523 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-12-10 09:28
>can you please backport this to the 2.5 branch?

Sorry for clarification. Which should I backport r67154 or
quick_hack_for_getfullpathname_v2.patch?
msg77737 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-13 14:53
As it is apparently not clear what change exactly needs to be applied to
2.5, I'm declaring that this fix will not be backported.

I would appreciate if somebody could summarize what still needs to be
done, or (if nothing needs to be done), close this issue.
msg77799 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-12-14 14:59
GetFullPathNameW may return the required buffer size (non-zero value) 
when buffer is too short. Before r67154, this case was treated as 
success, so there was possibility of access to uninitialized buffer 
woutbuf. Fortunately, GetFullPathNameW sets '\0' to woutbuf (this is 
undocumented behavior), so abspath() returned empty string instead of 
segmentation fault. But this is still potentially dangerous, so I fixed 
this by allocating required size buffer and calling GetFullPathNameW 
again. abspath() returns correct result for any long unicode path now.

But original poster hopes abspath() should return correct result for 
any long both *str* and unicode path. For str, this issue is not solved 
yet.

--
I'm skeptical abspath() should support longer path than MAX_PATH if 
ANSI version of Windows API cannot handle such path. Anyway, I won't 
use such long path. ;-)
msg175113 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-07 17:50
I doubt this issue exists on Python >= 3.2.

See also issue1776160.
msg220007 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-06-08 00:46
I don't see that any further work can be done here owing to limitations of the Windows str API.  Note that the same argument can be applied to issue1776160.
msg236972 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-03-01 18:29
Can we close this as I'm not aware of any possible way to fix this?  See also issue1776160.
msg237007 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-03-02 02:47
> Can we close this as I'm not aware of any possible way to fix this? 

Windows system and C runtime calls that take paths could be restricted to wide-character APIs, such as calling GetFullPathnameW in this case, or _wexecve instead of execve (issue 23462). Then for bytes paths an extension can call PyUnicode_FSDecoder (PyUnicode_DecodeMBCS). In posxmodule.c this can be handled in the path_converter function.

path_converter
https://hg.python.org/cpython/file/5d4b6a57d5fd/Modules/posixmodule.c#l698

path_converter could be moved to Python/fileutils.c to make it available for use by other modules such as io.
msg338059 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-03-16 07:32
I'm closing this as a resolved issue. Python 2 is approaching end of life, and I don't see a pressing need for my suggestion in msg237007. We should be using unicode for paths in Windows anyway since its file systems are natively Unicode.
History
Date User Action Args
2019-03-16 07:32:12eryksunsetstatus: open -> closed

messages: + msg338059
stage: resolved
2019-03-15 22:34:26BreamoreBoysetnosy: - BreamoreBoy
2015-03-02 02:47:22eryksunsetnosy: + eryksun
messages: + msg237007
2015-03-01 18:29:10BreamoreBoysetnosy: + tim.golden, zach.ware, steve.dower
messages: + msg236972
2014-06-08 00:46:57BreamoreBoysetnosy: + BreamoreBoy
messages: + msg220007
2012-12-20 20:06:01serhiy.storchakasettype: crash -> behavior
2012-11-07 17:50:36serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg175113
versions: - Python 3.2, Python 3.3, Python 3.4
2012-10-02 05:54:06ezio.melottisetversions: + Python 3.2, Python 3.3, Python 3.4, - Python 2.6, Python 3.0
2008-12-14 14:59:25ocean-citysetmessages: + msg77799
2008-12-13 14:53:57loewissetpriority: release blocker -> normal
messages: + msg77737
versions: - Python 2.5, Python 2.5.3
2008-12-10 09:28:38ocean-citysetmessages: + msg77523
2008-12-10 08:39:28loewissetmessages: + msg77501
2008-12-10 08:38:38loewissetpriority: release blocker
assignee: ocean-city
messages: + msg77500
resolution: accepted
nosy: + loewis
2008-11-08 03:49:44ocean-citysetmessages: + msg75628
versions: + Python 2.7, Python 2.5.3
2008-10-09 22:41:33ocean-citysetfiles: - quick_hack_for_getfullpathname.patch
2008-10-09 22:41:22ocean-citysetfiles: + quick_hack_for_getfullpathname_v2.patch
2008-10-09 21:33:24ocean-citysetfiles: + quick_hack_for_getfullpathname.patch
messages: + msg74607
2008-10-09 18:39:53ocean-citysetmessages: + msg74594
2008-10-09 18:37:38ocean-citysetfiles: + fix_getfullpathname.patch
keywords: + patch
messages: + msg74593
versions: + Python 2.6, Python 3.0
2008-10-09 07:02:03JDaysettitle: ntpath.abspath can fail on Win Server 2008 (64-bit) -> ntpath.abspath fails for long str paths
2008-10-08 22:00:12JDaysetmessages: + msg74549
title: ntpath.abspath fails for long str paths -> ntpath.abspath can fail on Win Server 2008 (64-bit)
2008-10-08 21:57:09JDaysettitle: ntpath.abspath can fail on Win Server 2008 (64-bit) -> ntpath.abspath fails for long str paths
2008-10-08 14:49:48amaury.forgeotdarcsetmessages: + msg74530
2008-10-08 09:07:06JDaysetmessages: + msg74522
2008-10-08 08:03:41amaury.forgeotdarcsetmessages: + msg74519
2008-10-08 02:23:07ocean-citysetnosy: + ocean-city
messages: + msg74510
2008-10-08 01:21:30JDaysetmessages: + msg74508
2008-10-08 00:42:55amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg74504
2008-10-08 00:27:14JDaycreate