classification
Title: os.path.isdir() is slow on windows
Type: performance Stage: committed/rejected
Components: Library (Lib) Versions: Python 3.3, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: brian.curtin Nosy List: brian.curtin, eli.bendersky, giampaolo.rodola, loewis, python-dev, stutzbach, tim.golden
Priority: normal Keywords: needs review, patch

Created on 2011-03-17 09:38 by eli.bendersky, last changed 2011-06-09 15:01 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
issue11583.diff brian.curtin, 2011-06-03 18:59 review
issue11583_v2.diff brian.curtin, 2011-06-03 20:55 review
Messages (14)
msg131238 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2011-03-17 09:38
Report here: http://stackoverflow.com/questions/5316928/python-os-path-isdir-is-slow-on-windows

It could be a problem with Windows itself, but I'm opening this to keep track of the issue, just to be on the safe side.
msg131239 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-03-17 10:46
How do you expect this to be resolved? Or will it stay open until Microsoft improves the NTFS performance?
msg131241 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2011-03-17 11:20
On Thu, Mar 17, 2011 at 12:46, Martin v. Löwis <report@bugs.python.org>wrote:

I opened this in order not to forget to look at the implementation of isdir
on windows, making sure it's the most optimal thing we can do. If you know
it is, the issue can be closed as far as I'm concerned.
msg131257 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-03-17 14:44
Ok. It's probably not the most optimal thing we can do. It opens the directory with CreateFileW (specifying OPEN_EXISTING and BACKUP_SEMANTICS), then GetFileInformationByHandle. If we only want to find out whether the file is a directory, we could just to FindFirstFile, which wouldn't need to go to the MFT record.

I would be surprised if this makes a difference in practice, though.
msg131260 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2011-03-17 14:48
I made a bunch of the stat changes in 3.2 so I'll assign this to myself and take a look.
msg137571 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2011-06-03 18:59
Attached is a patch that makes this about twice as fast for regular files and about 15 times faster for symbolic links and directories.

Not that this would be anyone's performance bottleneck, but it does make the time more of a constant due to the recursion and extra overhead when stat'ing symbolic paths. The numbers were tested with the patch on #12084 (new stat impl).


# The numbers were run before I hooked nt._isdir up through Lib/ntpath.py, so os.path.isdir is the old way.
# Regular file
>PCbuild\amd64\python.exe -m timeit -s "import nt" "nt._isdir('README')"
10000 loops, best of 3: 25 usec per loop

>PCbuild\amd64\python.exe -m timeit -s "import os" "os.path.isdir('README')"
10000 loops, best of 3: 43 usec per loop

# Regular directory

>PCbuild\amd64\python.exe -m timeit -s "import nt" "nt._isdir('Lib')"
10000 loops, best of 3: 24.3 usec per loop

>PCbuild\amd64\python.exe -m timeit -s "import os" "os.path.isdir('Lib')"
10000 loops, best of 3: 41.4 usec per loop

# testlink is a symbolically linked directory

>PCbuild\amd64\python.exe -m timeit -s "import nt" "nt._isdir('testlink')"
10000 loops, best of 3: 26.1 usec per loop

>PCbuild\amd64\python.exe -m timeit -s "import os" "os.path.isdir('testlink')"
1000 loops, best of 3: 415 usec per loop

# setup.py.link is a symbolic link

>PCbuild\amd64\python.exe -m timeit -s "import nt" "nt._isdir('setup.py.link')"
10000 loops, best of 3: 25.8 usec per loop

>PCbuild\amd64\python.exe -m timeit -s "import os" "os.path.isdir('setup.py.link')"
1000 loops, best of 3: 336 usec per loop
msg137579 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2011-06-03 20:03
Looks like I didn't test this enough - many other test failures are occurring. Disregard this patch for now.
msg137581 - (view) Author: Tim Golden (tim.golden) (Python committer) Date: 2011-06-03 20:11
One (presumably unintended) side-effect is that you can now do os.path.isdir on a wildcard and the first returned entry will be tested for directoryness:

import os
os.path.isdir ("c:/tem*")
# => True where c:/temp exists
msg137585 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2011-06-03 20:55
Here's a patch that works. All tests are passing.

I changed from using FindFirstFile to GetFileAttributes, which provides basically the same performance characteristics.

One change in this implementation is to not raise a WindowsError when the file cannot be opened. The original os.stat would return False in that case, so now this function returns False (before it just passed on the WindowsError, which killed lots of tests).
msg137629 - (view) Author: Tim Golden (tim.golden) (Python committer) Date: 2011-06-04 09:18
Code looks good and tests pass. Only one thing: the code in the patched lib\ntpath.py still refers to FindFirstFile
msg137931 - (view) Author: Roundup Robot (python-dev) Date: 2011-06-09 00:00
New changeset 88e318166eaf by Brian Curtin in branch '3.2':
Fix #11583. Changed os.path.isdir to use GetFileAttributes instead of os.stat.
http://hg.python.org/cpython/rev/88e318166eaf

New changeset 567f30527913 by Brian Curtin in branch 'default':
Fix #11583. Changed os.path.isdir to use GetFileAttributes instead of os.stat.
http://hg.python.org/cpython/rev/567f30527913
msg137932 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2011-06-09 00:31
This was also pushed to 2.7 in f1509fc75435.
msg137939 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2011-06-09 04:19
Brian - great, thanks.
msg137982 - (view) Author: Roundup Robot (python-dev) Date: 2011-06-09 15:01
New changeset d40609dd01e0 by Brian Curtin in branch '3.2':
Correction to 88e318166eaf - Issue #11583
http://hg.python.org/cpython/rev/d40609dd01e0

New changeset fe813f5711a5 by Brian Curtin in branch '2.7':
Correction to f1509fc75435 - Issue #11583
http://hg.python.org/cpython/rev/fe813f5711a5
History
Date User Action Args
2011-06-09 15:01:24python-devsetmessages: + msg137982
2011-06-09 04:19:54eli.benderskysetmessages: + msg137939
2011-06-09 00:31:48brian.curtinsetstatus: open -> closed
versions: - Python 3.1
messages: + msg137932

resolution: fixed
stage: patch review -> committed/rejected
2011-06-09 00:00:27python-devsetnosy: + python-dev
messages: + msg137931
2011-06-04 09:18:26tim.goldensetmessages: + msg137629
2011-06-03 20:55:51brian.curtinsetfiles: + issue11583_v2.diff

messages: + msg137585
2011-06-03 20:11:41tim.goldensetmessages: + msg137581
2011-06-03 20:03:33brian.curtinsetmessages: + msg137579
2011-06-03 18:59:22brian.curtinsetkeywords: + patch, needs review
files: + issue11583.diff
messages: + msg137571

stage: needs patch -> patch review
2011-03-19 18:36:04giampaolo.rodolasetnosy: + giampaolo.rodola
2011-03-17 18:08:11stutzbachsetnosy: + stutzbach
2011-03-17 14:48:58brian.curtinsetassignee: brian.curtin
messages: + msg131260
nosy: loewis, tim.golden, eli.bendersky, brian.curtin
stage: needs patch
2011-03-17 14:44:05loewissetnosy: loewis, tim.golden, eli.bendersky, brian.curtin
messages: + msg131257
2011-03-17 11:26:44SilentGhostsetfiles: - unnamed
nosy: loewis, tim.golden, eli.bendersky, brian.curtin
2011-03-17 11:20:21eli.benderskysetfiles: + unnamed

messages: + msg131241
nosy: loewis, tim.golden, eli.bendersky, brian.curtin
2011-03-17 10:46:38loewissetnosy: + loewis
messages: + msg131239
2011-03-17 09:38:32eli.benderskycreate