classification
Title: os.path.isfile & os.path.exists bug in while loop
Type: behavior Stage: resolved
Components: Windows Versions: Python 3.3
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, hosford42, r.david.murray, serhiy.storchaka, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2014-10-24 16:00 by hosford42, last changed 2014-11-11 06:35 by zach.ware. This issue is now closed.

Messages (14)
msg229936 - (view) Author: Aaron (hosford42) Date: 2014-10-24 16:00
When using os.path.isfile() and os.path.exists() in a while loop under certain conditions, os.path.isfile() returns True for paths that do not actually exist.

Conditions:
The folder "C:\Users\EAARHOS\Desktop\Python Review" exists, as do the files "C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py" and "C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py.bak". (Note that I also tested this on a path that contained no spaces, and got the same results.)

Code:
>>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py"
>>> while os.path.isfile(bak_path):
...     bak_path += '.bak'
...     if not os.path.isfile(bak_path):
...         break
Traceback (most recent call last):
  File "<interactive input>", line 3, in <module>
  File "C:\Installs\Python33\Lib\genericpath.py", line 29, in isfile
    st = os.stat(path)
ValueError: path too long for Windows
>>> os.path.isfile(r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py.bak.bak")
False
>>> 

>>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py"
>>> while os.path.exists(bak_path):
...     bak_path += '.bak'
...     if not os.path.exists(bak_path):
...         break
Traceback (most recent call last):
  File "<interactive input>", line 3, in <module>
  File "C:\Installs\Python33\Lib\genericpath.py", line 18, in exists
    st = os.stat(path)
ValueError: path too long for Windows
>>> os.path.exists(r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py.bak.bak")
False
>>> 

>>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py"
>>> os.path.isfile(bak_path), os.path.exists(bak_path)
(True, True)
>>> bak_path += '.bak'
>>> os.path.isfile(bak_path), os.path.exists(bak_path)
(True, True)
>>> bak_path += '.bak'
>>> os.path.isfile(bak_path), os.path.exists(bak_path)
(True, True)
>>> bak_path
'C:\\Users\\EAARHOS\\Desktop\\Python Review\\baseExcel.py.bak.bak'
>>> temp = bak_path
>>> os.path.isfile(temp), os.path.exists(temp)
(True, True)
>>> os.path.isfile('C:\\Users\\EAARHOS\\Desktop\\Python Review\\baseExcel.py.bak.bak'), os.path.exists('C:\\Users\\EAARHOS\\Desktop\\Python Review\\baseExcel.py.bak.bak')
(False, False)
>>> 

On the other hand, this code works as expected:

>>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py"
>>> while os.path.isfile(bak_path):
...     temp = bak_path + '.bak'
...     bak_path = temp
... 
>>> bak_path
'C:\\Users\\EAARHOS\\Desktop\\Python Review\\baseExcel.py.bak.bak'
>>> 

>>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py"
>>> while os.path.exists(bak_path):
...     temp = bak_path + '.bak'
...     bak_path = temp
... 
>>> bak_path
'C:\\Users\\EAARHOS\\Desktop\\Python Review\\baseExcel.py.bak.bak'
>>>
msg229940 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-10-24 16:34
Interesting bug.  The obvious difference between the two cases is that in the += version the address of the string pointing to the filepath doesn't change, whereas when you use a temp variable it does (there's an optimization in += that reuses the same memory location if possible).  It looks like something is seeing that repeated addresses and returning the same result as the last time that address was passed, which is wrong.

I don't see anything obvious in os module.  Although I can't rule out a Python bug, since this works fine on unix I suspect this is a Windows CRT bug.
msg229942 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2014-10-24 16:37
I wonder whether the same thing occurs if you're not appending a new extension each time? There could be some optimisation (from the dark old days of 8.3 filename) that compares "baseExcel" and ".bak" separately and assumes that the name is known.

Last I looked at the code for stat() and isfile(), it was going directly to the Win32 API and not via the CRT. Though that may not have been the case in 3.3...
msg229944 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-24 17:16
Could we encode both paths to the unicode_internal encoding and check if results are equal?
msg229945 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-10-24 17:53
Looking at the code, it looks like it calls the win32 api directly if path->wide is true, which I'm guessing is the case unless you are using bytes paths in windows?  It looks like the critical call, then, is CreateFileA (why A in a _w method I have no idea...so my reading of this code is suspect :)
msg229949 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2014-10-24 18:49
What do you get for os.stat?

    bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py"
    print(os.stat(bak_path))
    bak_path += '.bak'
    print(os.stat(bak_path))
    bak_path += '.bak'
    print(os.stat(bak_path)) # This should raise FileNotFoundError
msg229950 - (view) Author: Aaron (hosford42) Date: 2014-10-24 19:24
Interesting. It continues to reuse the last one's stats once the path is no
longer valid.

>>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py"
>>> print(os.stat(bak_path))
nt.stat_result(st_mode=33206, st_ino=8162774324652726, st_dev=0,
st_nlink=1, st_uid=0, st_gid=0, st_size=29874, st_atime=1413389016,
st_mtime=1413389016, st_ctime=1413388655)
>>> bak_path += '.bak'
>>> print(os.stat(bak_path))
nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0,
st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088,
st_mtime=1413389088, st_ctime=1413388654)
>>> bak_path += '.bak'
>>> print(os.stat(bak_path))
nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0,
st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088,
st_mtime=1413389088, st_ctime=1413388654)
>>> bak_path += '.bak'
>>> print(os.stat(bak_path))
nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0,
st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088,
st_mtime=1413389088, st_ctime=1413388654)
>>> bak_path += '.bak'
>>> print(os.stat(bak_path))
nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0,
st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088,
st_mtime=1413389088, st_ctime=1413388654)
>>>

On Fri, Oct 24, 2014 at 1:49 PM, eryksun <report@bugs.python.org> wrote:

>
> eryksun added the comment:
>
> What do you get for os.stat?
>
>     bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py"
>     print(os.stat(bak_path))
>     bak_path += '.bak'
>     print(os.stat(bak_path))
>     bak_path += '.bak'
>     print(os.stat(bak_path)) # This should raise FileNotFoundError
>
> ----------
> nosy: +eryksun
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue22719>
> _______________________________________
>
msg229951 - (view) Author: Aaron (hosford42) Date: 2014-10-24 19:30
If I use a separate temp variable, the bug doesn't show, but if I use the
same variable, even with + instead of +=, it still happens.

>>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py"
>>> print(os.stat(bak_path))
nt.stat_result(st_mode=33206, st_ino=8162774324652726, st_dev=0,
st_nlink=1, st_uid=0, st_gid=0, st_size=29874, st_atime=1413389016,
st_mtime=1413389016, st_ctime=1413388655)
>>> temp = bak_path + '.bak'
>>> bak_path = temp
>>> print(os.stat(bak_path))
nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0,
st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088,
st_mtime=1413389088, st_ctime=1413388654)
>>> temp = bak_path + '.bak'
>>> bak_path = temp
>>> print(os.stat(bak_path))
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
FileNotFoundError: [WinError 2] The system cannot find the file specified:
'C:\\Users\\EAARHOS\\Desktop\\Python Review\\baseExcel.py.bak.bak'

>>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py"
>>> bak_path = bak_path + '.bak'
>>> print(os.stat(bak_path))
nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0,
st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088,
st_mtime=1413389088, st_ctime=1413388654)
>>> bak_path = bak_path + '.bak'
>>> print(os.stat(bak_path))
nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0,
st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088,
st_mtime=1413389088, st_ctime=1413388654)
>>> bak_path = bak_path + '.bak'
>>> print(os.stat(bak_path))
nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0,
st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088,
st_mtime=1413389088, st_ctime=1413388654)
>>> bak_path = bak_path + '.bak'
>>> print(os.stat(bak_path))
nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0,
st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088,
st_mtime=1413389088, st_ctime=1413388654)
>>>

On Fri, Oct 24, 2014 at 2:24 PM, Aaron <report@bugs.python.org> wrote:

>
> Aaron added the comment:
>
> Interesting. It continues to reuse the last one's stats once the path is no
> longer valid.
>
> >>> bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py"
> >>> print(os.stat(bak_path))
> nt.stat_result(st_mode=33206, st_ino=8162774324652726, st_dev=0,
> st_nlink=1, st_uid=0, st_gid=0, st_size=29874, st_atime=1413389016,
> st_mtime=1413389016, st_ctime=1413388655)
> >>> bak_path += '.bak'
> >>> print(os.stat(bak_path))
> nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0,
> st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088,
> st_mtime=1413389088, st_ctime=1413388654)
> >>> bak_path += '.bak'
> >>> print(os.stat(bak_path))
> nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0,
> st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088,
> st_mtime=1413389088, st_ctime=1413388654)
> >>> bak_path += '.bak'
> >>> print(os.stat(bak_path))
> nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0,
> st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088,
> st_mtime=1413389088, st_ctime=1413388654)
> >>> bak_path += '.bak'
> >>> print(os.stat(bak_path))
> nt.stat_result(st_mode=33206, st_ino=42502721483352490, st_dev=0,
> st_nlink=1, st_uid=0, st_gid=0, st_size=29999, st_atime=1413389088,
> st_mtime=1413389088, st_ctime=1413388654)
> >>>
>
> On Fri, Oct 24, 2014 at 1:49 PM, eryksun <report@bugs.python.org> wrote:
>
> >
> > eryksun added the comment:
> >
> > What do you get for os.stat?
> >
> >     bak_path = r"C:\Users\EAARHOS\Desktop\Python Review\baseExcel.py"
> >     print(os.stat(bak_path))
> >     bak_path += '.bak'
> >     print(os.stat(bak_path))
> >     bak_path += '.bak'
> >     print(os.stat(bak_path)) # This should raise FileNotFoundError
> >
> > ----------
> > nosy: +eryksun
> >
> > _______________________________________
> > Python tracker <report@bugs.python.org>
> > <http://bugs.python.org/issue22719>
> > _______________________________________
> >
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue22719>
> _______________________________________
>
msg229961 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2014-10-24 21:46
When appending to a singly-referenced string, the interpreter tries to reallocate the string in place. This applies to both `s += 'text'` and `s = s + 'text'`. Storing to a temp variable is adding a 2nd reference, so a new string gets allocated instead. If the former is the case (i.e. the object id is the same after appending), use ctypes to check the string's cached wide-string (wchar_t *) representation:

    from ctypes import *
                                             
    pythonapi.PyUnicode_AsUnicode.argtypes = [py_object]
    pythonapi.PyUnicode_AsUnicode.restype = c_wchar_p

    print(pythonapi.PyUnicode_AsUnicode(bak_path))

The wstr cache should be cleared when the string is reallocated in place, so this is probably a dead end.
msg229962 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2014-10-24 22:12
> i.e. the object id is the same after appending

Actually, that's wrong. bak_path is a compact string. So the whole object is realloc'd, and the base address (i.e. id) could change. Check PyUnicode_AsUnicode even if the id changes.
msg230577 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2014-11-04 05:15
Aaron, what version of Python are you using on what version of Windows?  Also, 32 or 64 bit on both?

I can't reproduce this with any Python 3.3.6 or newer on 64-bit Windows 8.1.
msg230982 - (view) Author: Aaron (hosford42) Date: 2014-11-10 23:27
Python 3.3.0, Windows 7, both 64 bit.

Has it been resolved with the newer version, then?

On Mon, Nov 3, 2014 at 11:15 PM, Zachary Ware <report@bugs.python.org>
wrote:

>
> Zachary Ware added the comment:
>
> Aaron, what version of Python are you using on what version of Windows?
> Also, 32 or 64 bit on both?
>
> I can't reproduce this with any Python 3.3.6 or newer on 64-bit Windows
> 8.1.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue22719>
> _______________________________________
>
msg230985 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2014-11-10 23:46
I haven't built 3.3.0 again yet to try to reproduce with it, but there
have been enough bug and security fixes in the more recent 3.3
releases that I'd strongly advise updating on general principle and
seeing if this issue goes away.  If not to 3.4.2, at least to 3.3.5
(the last 3.3 version to have a Windows installer).
msg231000 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2014-11-11 06:35
I have had a chance to build 3.3.0 and I was able to reproduce the bug with it, so it is in fact fixed in later versions.
History
Date User Action Args
2014-11-11 06:35:32zach.waresetstatus: open -> closed
resolution: out of date
messages: + msg231000

stage: resolved
2014-11-10 23:46:31zach.waresetmessages: + msg230985
2014-11-10 23:27:17hosford42setmessages: + msg230982
2014-11-04 05:15:10zach.waresetmessages: + msg230577
2014-10-24 22:12:40eryksunsetmessages: + msg229962
2014-10-24 21:46:42eryksunsetmessages: + msg229961
2014-10-24 19:30:31hosford42setmessages: + msg229951
2014-10-24 19:24:24hosford42setmessages: + msg229950
2014-10-24 18:49:56eryksunsetnosy: + eryksun
messages: + msg229949
2014-10-24 17:53:26r.david.murraysetmessages: + msg229945
2014-10-24 17:16:50serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg229944
2014-10-24 16:37:55steve.dowersetmessages: + msg229942
2014-10-24 16:34:09r.david.murraysetnosy: + r.david.murray
messages: + msg229940
2014-10-24 16:03:00hosford42settitle: os.path.isfile & os.path.exists but in while loop -> os.path.isfile & os.path.exists bug in while loop
2014-10-24 16:00:12hosford42create