Issue22719
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2014-10-24 16:00 by hosford42, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Messages (14) | |||
---|---|---|---|
msg229936 - (view) | Author: Aaron (hosford42) | Date: 2014-10-24 16:00 | |
When using os.path.isfile() and os.path.exists() in a while loop under certain conditions, os.path.isfile() returns True for paths that do not actually exist. Conditions: The folder "C:\Users\EAARHOS\Desktop\Python Review" exists, as do the files "C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py" and "C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py.bak". (Note that I also tested this on a path that contained no spaces, and got the same results.) Code: >>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py" >>> while os.path.isfile(bak_path): ... bak_path += '.bak' ... if not os.path.isfile(bak_path): ... break Traceback (most recent call last): File "<interactive input>", line 3, in <module> File "C:\Installs\Python33\Lib\genericpath.py", line 29, in isfile st = os.stat(path) ValueError: path too long for Windows >>> os.path.isfile(r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py.bak.bak") False >>> >>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py" >>> while os.path.exists(bak_path): ... bak_path += '.bak' ... if not os.path.exists(bak_path): ... break Traceback (most recent call last): File "<interactive input>", line 3, in <module> File "C:\Installs\Python33\Lib\genericpath.py", line 18, in exists st = os.stat(path) ValueError: path too long for Windows >>> os.path.exists(r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py.bak.bak") False >>> >>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py" >>> os.path.isfile(bak_path), os.path.exists(bak_path) (True, True) >>> bak_path += '.bak' >>> os.path.isfile(bak_path), os.path.exists(bak_path) (True, True) >>> bak_path += '.bak' >>> os.path.isfile(bak_path), os.path.exists(bak_path) (True, True) >>> bak_path 'C:\\Users\\EAARHOS\\Desktop\\Python Review\\baseExcel.py.bak.bak' >>> temp = bak_path >>> os.path.isfile(temp), os.path.exists(temp) (True, True) >>> os.path.isfile('C:\\Users\\EAARHOS\\Desktop\\Python Review\\baseExcel.py.bak.bak'), os.path.exists('C:\\Users\\EAARHOS\\Desktop\\Python Review\\baseExcel.py.bak.bak') (False, False) >>> On the other hand, this code works as expected: >>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py" >>> while os.path.isfile(bak_path): ... temp = bak_path + '.bak' ... bak_path = temp ... >>> bak_path 'C:\\Users\\EAARHOS\\Desktop\\Python Review\\baseExcel.py.bak.bak' >>> >>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py" >>> while os.path.exists(bak_path): ... temp = bak_path + '.bak' ... bak_path = temp ... >>> bak_path 'C:\\Users\\EAARHOS\\Desktop\\Python Review\\baseExcel.py.bak.bak' >>> |
|||
msg229940 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2014-10-24 16:34 | |
Interesting bug. The obvious difference between the two cases is that in the += version the address of the string pointing to the filepath doesn't change, whereas when you use a temp variable it does (there's an optimization in += that reuses the same memory location if possible). It looks like something is seeing that repeated addresses and returning the same result as the last time that address was passed, which is wrong. I don't see anything obvious in os module. Although I can't rule out a Python bug, since this works fine on unix I suspect this is a Windows CRT bug. |
|||
msg229942 - (view) | Author: Steve Dower (steve.dower) * | Date: 2014-10-24 16:37 | |
I wonder whether the same thing occurs if you're not appending a new extension each time? There could be some optimisation (from the dark old days of 8.3 filename) that compares "baseExcel" and ".bak" separately and assumes that the name is known. Last I looked at the code for stat() and isfile(), it was going directly to the Win32 API and not via the CRT. Though that may not have been the case in 3.3... |
|||
msg229944 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2014-10-24 17:16 | |
Could we encode both paths to the unicode_internal encoding and check if results are equal? |
|||
msg229945 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2014-10-24 17:53 | |
Looking at the code, it looks like it calls the win32 api directly if path->wide is true, which I'm guessing is the case unless you are using bytes paths in windows? It looks like the critical call, then, is CreateFileA (why A in a _w method I have no idea...so my reading of this code is suspect :) |
|||
msg229949 - (view) | Author: Eryk Sun (eryksun) * | Date: 2014-10-24 18:49 | |
What do you get for os.stat? bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py" print(os.stat(bak_path)) bak_path += '.bak' print(os.stat(bak_path)) bak_path += '.bak' print(os.stat(bak_path)) # This should raise FileNotFoundError |
|||
msg229950 - (view) | Author: Aaron (hosford42) | Date: 2014-10-24 19:24 | |
Interesting. It continues to reuse the last one's stats once the path is no longer valid. >>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py" >>> print(os.stat(bak_path)) nt.stat_result(st_mode=33206, st_ino=8162774324652726, st_dev=0, st_nlink=1, st_uid=0, st_gid=0, st_size=29874, st_atime=1413389016, st_mtime=1413389016, st_ctime=1413388655) >>> bak_path += '.bak' >>> print(os.stat(bak_path)) nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0, st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088, st_mtime=1413389088, st_ctime=1413388654) >>> bak_path += '.bak' >>> print(os.stat(bak_path)) nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0, st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088, st_mtime=1413389088, st_ctime=1413388654) >>> bak_path += '.bak' >>> print(os.stat(bak_path)) nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0, st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088, st_mtime=1413389088, st_ctime=1413388654) >>> bak_path += '.bak' >>> print(os.stat(bak_path)) nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0, st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088, st_mtime=1413389088, st_ctime=1413388654) >>> On Fri, Oct 24, 2014 at 1:49 PM, eryksun <report@bugs.python.org> wrote: > > eryksun added the comment: > > What do you get for os.stat? > > bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py" > print(os.stat(bak_path)) > bak_path += '.bak' > print(os.stat(bak_path)) > bak_path += '.bak' > print(os.stat(bak_path)) # This should raise FileNotFoundError > > ---------- > nosy: +eryksun > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue22719> > _______________________________________ > |
|||
msg229951 - (view) | Author: Aaron (hosford42) | Date: 2014-10-24 19:30 | |
If I use a separate temp variable, the bug doesn't show, but if I use the same variable, even with + instead of +=, it still happens. >>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py" >>> print(os.stat(bak_path)) nt.stat_result(st_mode=33206, st_ino=8162774324652726, st_dev=0, st_nlink=1, st_uid=0, st_gid=0, st_size=29874, st_atime=1413389016, st_mtime=1413389016, st_ctime=1413388655) >>> temp = bak_path + '.bak' >>> bak_path = temp >>> print(os.stat(bak_path)) nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0, st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088, st_mtime=1413389088, st_ctime=1413388654) >>> temp = bak_path + '.bak' >>> bak_path = temp >>> print(os.stat(bak_path)) Traceback (most recent call last): File "<interactive input>", line 1, in <module> FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\Users\\EAARHOS\\Desktop\\Python Review\\baseExcel.py.bak.bak' >>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py" >>> bak_path = bak_path + '.bak' >>> print(os.stat(bak_path)) nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0, st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088, st_mtime=1413389088, st_ctime=1413388654) >>> bak_path = bak_path + '.bak' >>> print(os.stat(bak_path)) nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0, st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088, st_mtime=1413389088, st_ctime=1413388654) >>> bak_path = bak_path + '.bak' >>> print(os.stat(bak_path)) nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0, st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088, st_mtime=1413389088, st_ctime=1413388654) >>> bak_path = bak_path + '.bak' >>> print(os.stat(bak_path)) nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0, st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088, st_mtime=1413389088, st_ctime=1413388654) >>> On Fri, Oct 24, 2014 at 2:24 PM, Aaron <report@bugs.python.org> wrote: > > Aaron added the comment: > > Interesting. It continues to reuse the last one's stats once the path is no > longer valid. > > >>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py" > >>> print(os.stat(bak_path)) > nt.stat_result(st_mode=33206, st_ino=8162774324652726, st_dev=0, > st_nlink=1, st_uid=0, st_gid=0, st_size=29874, st_atime=1413389016, > st_mtime=1413389016, st_ctime=1413388655) > >>> bak_path += '.bak' > >>> print(os.stat(bak_path)) > nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0, > st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088, > st_mtime=1413389088, st_ctime=1413388654) > >>> bak_path += '.bak' > >>> print(os.stat(bak_path)) > nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0, > st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088, > st_mtime=1413389088, st_ctime=1413388654) > >>> bak_path += '.bak' > >>> print(os.stat(bak_path)) > nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0, > st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088, > st_mtime=1413389088, st_ctime=1413388654) > >>> bak_path += '.bak' > >>> print(os.stat(bak_path)) > nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0, > st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088, > st_mtime=1413389088, st_ctime=1413388654) > >>> > > On Fri, Oct 24, 2014 at 1:49 PM, eryksun <report@bugs.python.org> wrote: > > > > > eryksun added the comment: > > > > What do you get for os.stat? > > > > bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py" > > print(os.stat(bak_path)) > > bak_path += '.bak' > > print(os.stat(bak_path)) > > bak_path += '.bak' > > print(os.stat(bak_path)) # This should raise FileNotFoundError > > > > ---------- > > nosy: +eryksun > > > > _______________________________________ > > Python tracker <report@bugs.python.org> > > <http://bugs.python.org/issue22719> > > _______________________________________ > > > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue22719> > _______________________________________ > |
|||
msg229961 - (view) | Author: Eryk Sun (eryksun) * | Date: 2014-10-24 21:46 | |
When appending to a singly-referenced string, the interpreter tries to reallocate the string in place. This applies to both `s += 'text'` and `s = s + 'text'`. Storing to a temp variable is adding a 2nd reference, so a new string gets allocated instead. If the former is the case (i.e. the object id is the same after appending), use ctypes to check the string's cached wide-string (wchar_t *) representation: from ctypes import * pythonapi.PyUnicode_AsUnicode.argtypes = [py_object] pythonapi.PyUnicode_AsUnicode.restype = c_wchar_p print(pythonapi.PyUnicode_AsUnicode(bak_path)) The wstr cache should be cleared when the string is reallocated in place, so this is probably a dead end. |
|||
msg229962 - (view) | Author: Eryk Sun (eryksun) * | Date: 2014-10-24 22:12 | |
> i.e. the object id is the same after appending Actually, that's wrong. bak_path is a compact string. So the whole object is realloc'd, and the base address (i.e. id) could change. Check PyUnicode_AsUnicode even if the id changes. |
|||
msg230577 - (view) | Author: Zachary Ware (zach.ware) * | Date: 2014-11-04 05:15 | |
Aaron, what version of Python are you using on what version of Windows? Also, 32 or 64 bit on both? I can't reproduce this with any Python 3.3.6 or newer on 64-bit Windows 8.1. |
|||
msg230982 - (view) | Author: Aaron (hosford42) | Date: 2014-11-10 23:27 | |
Python 3.3.0, Windows 7, both 64 bit. Has it been resolved with the newer version, then? On Mon, Nov 3, 2014 at 11:15 PM, Zachary Ware <report@bugs.python.org> wrote: > > Zachary Ware added the comment: > > Aaron, what version of Python are you using on what version of Windows? > Also, 32 or 64 bit on both? > > I can't reproduce this with any Python 3.3.6 or newer on 64-bit Windows > 8.1. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue22719> > _______________________________________ > |
|||
msg230985 - (view) | Author: Zachary Ware (zach.ware) * | Date: 2014-11-10 23:46 | |
I haven't built 3.3.0 again yet to try to reproduce with it, but there have been enough bug and security fixes in the more recent 3.3 releases that I'd strongly advise updating on general principle and seeing if this issue goes away. If not to 3.4.2, at least to 3.3.5 (the last 3.3 version to have a Windows installer). |
|||
msg231000 - (view) | Author: Zachary Ware (zach.ware) * | Date: 2014-11-11 06:35 | |
I have had a chance to build 3.3.0 and I was able to reproduce the bug with it, so it is in fact fixed in later versions. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:09 | admin | set | github: 66908 |
2014-11-11 06:35:32 | zach.ware | set | status: open -> closed resolution: out of date messages: + msg231000 stage: resolved |
2014-11-10 23:46:31 | zach.ware | set | messages: + msg230985 |
2014-11-10 23:27:17 | hosford42 | set | messages: + msg230982 |
2014-11-04 05:15:10 | zach.ware | set | messages: + msg230577 |
2014-10-24 22:12:40 | eryksun | set | messages: + msg229962 |
2014-10-24 21:46:42 | eryksun | set | messages: + msg229961 |
2014-10-24 19:30:31 | hosford42 | set | messages: + msg229951 |
2014-10-24 19:24:24 | hosford42 | set | messages: + msg229950 |
2014-10-24 18:49:56 | eryksun | set | nosy:
+ eryksun messages: + msg229949 |
2014-10-24 17:53:26 | r.david.murray | set | messages: + msg229945 |
2014-10-24 17:16:50 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg229944 |
2014-10-24 16:37:55 | steve.dower | set | messages: + msg229942 |
2014-10-24 16:34:09 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg229940 |
2014-10-24 16:03:00 | hosford42 | set | title: os.path.isfile & os.path.exists but in while loop -> os.path.isfile & os.path.exists bug in while loop |
2014-10-24 16:00:12 | hosford42 | create |