classification
Title: shutil.which wrong result on Windows
Type: behavior Stage:
Components: Library (Lib), Windows Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: bar.harel, bobjalex, eryksun, paul.moore, r.david.murray, steve.dower, tim.golden, tobytobkin, zach.ware
Priority: normal Keywords: easy, patch

Created on 2015-06-24 22:09 by bobjalex, last changed 2016-04-04 22:28 by bobjalex.

Files
File name Uploaded Description Edit
which2.py bobjalex, 2015-06-24 22:09 A sample of 'which' function corrected
issue 24505 proposed patch windows cmd semantics.patch tobytobkin, 2015-12-17 19:47 Proposed patch following cmd.exe semantics review
issue-24505-proposed-patch-2.patch tobytobkin, 2015-12-31 05:42
new_which_bug_files.zip bobjalex, 2016-04-04 16:48 ZIP file with suggested file updates
Messages (18)
msg245780 - (view) Author: Bob Alexander (bobjalex) * Date: 2015-06-24 22:09
Python session with 3.5b2

Showing existing error:
>>> from shutil import which

Works OK
>>> which("python")
'C:\\Python27\\python.EXE'

Also works OK
>>> which('C:\\Python27\\python.EXE')
'C:\\Python27\\python.EXE'

Fails
>>> which('C:\\Python27\\python')
>>>

Showing better results with my fix (attached):
>>> from which2 import which
>>> which("python")
'C:\\Python27\\python.exe'
>>> which('C:\\Python27\\python.exe')
'C:\\Python27\\python.exe'
>>> which('C:\\Python27\\python')
'C:\\Python27\\python.exe'

Problem is with the short-circuiting code near the beginning of the function. It fails to check the extensions for Windows when short-circuited.

My fix has a few other improvements -- efficiency and nicer output without those ugly upper-case extensions.
msg245792 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-06-25 03:26
Could you please submit your proposal as a patch?  (A diff -u against the existing version).
msg245885 - (view) Author: Bob Alexander (bobjalex) * Date: 2015-06-27 17:33
Hi R. David --

My report is just to notify y'all of a bug that I noticed. The bug is
causing me no problem, and it's your option as to whether to fix it or not.

I offered a fix, but I haven't the time to perform diffs, etc. You could
make that diff yourself, since I sent my modified "which" function. Just
replace the existing "which" function in the version of shutil you would
like to fix, and perform the diff.

Bob

On Wed, Jun 24, 2015 at 8:26 PM, R. David Murray <report@bugs.python.org>
wrote:

>
> R. David Murray added the comment:
>
> Could you please submit your proposal as a patch?  (A diff -u against the
> existing version).
>
> ----------
> nosy: +r.david.murray
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue24505>
> _______________________________________
>
msg245887 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-06-27 19:19
That's fine.  Perhaps someone will be interested enough to pick this up and work on it.
msg252329 - (view) Author: Bar Harel (bar.harel) * Date: 2015-10-05 15:14
Unfortunately that patch will not help for files like "python.exe.exe", and double ext which sometimes happens.
I'm on it. Should take me a day or two, more problems might arise along the way. Thanks Bob for alerting.

Regarding the uppercase, I believe it is better to leave the casing as it is. It should be up to the user to mess with it as converting case is a one way ticket. If someone entered an uppercase, perhaps he wants it like that. Also it's less prune to errors and behaves the same for Unix and Windows.
msg252341 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-10-05 16:39
Please don't default to searching the current directory:

    if is_windows:
        # The current directory takes precedence on Windows.
        if not os.curdir in path:
            path.insert(0, os.curdir)

Add a keyword option to enable this, with a warning that including the current directory isn't secure. This option should not be able to override the environment variable [NoDefaultCurrentDirectoryInExePath][1].

[1]: https://msdn.microsoft.com/en-us/library/ms684269
msg252355 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-10-05 18:26
Well, that's a newish situation for Windows, something I certainly wasn't aware of.  Which is trying to find the executable that would be found if typed at the prompt.  So, what is the correct algorithm for emulating how Windows does that search?  (Or some way to hook directly in to Windows search...)
msg252358 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-10-05 19:58
The Win32 API function linked by eryksun is the correct way to determine whether to include the current directory in the search - the doc specifically refers to matching cmd.exe's behaviour (searching ".;%PATH%" vs. just "%PATH%").

The PATHEXT behaviour should be to look for the name as provided first, followed be appending each extension. So looking for "python" will look for ["python", *("python" + p for p in getenv('PATHEXT').split(';'))] in that order.
msg252359 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-10-05 20:43
My suggestion is only for 3.5 / 3.6, because Windows XP is no longer supported. Starting with Windows Vista, both cmd.exe and CreateProcess support this behavior. I realize that disabling searching the current director by default is going beyond what cmd.exe does, which is to require opting in via the environment variable. But Python can choose to follow the more-modern behavior of PowerShell as a precedent here. 

As Steve mentioned, technically a program isn't supposed to check for the environment variable, but instead call NeedCurrentDirectoryForExePath. This would require either adding a built-in function, e.g. to the _winapi module, or require ctypes:

    >>> kernel32.NeedCurrentDirectoryForExePathW('command')
    1
    >>> os.environ['NoDefaultCurrentDirectoryInExePath'] = '1'
    >>> kernel32.NeedCurrentDirectoryForExePathW('command')
    0

Note that it requires first normalizing the path to replace slashes with backslashes, which is unusual for the Windows API.

    >>> kernel32.NeedCurrentDirectoryForExePathW(r'.\command')
    1
    >>> kernel32.NeedCurrentDirectoryForExePathW('./command')
    0
msg252367 - (view) Author: Bar Harel (bar.harel) * Date: 2015-10-05 23:18
Thanks eryksun.

Regarding the slashes, it should be fine as the input should be with
backslashes anyway. We don't need to fix the input as fixing it would
result in inaccurate data. Just as it won't work without backslashes on
CMD, it shouldn't work here.

On Mon, Oct 5, 2015 at 11:43 PM eryksun <report@bugs.python.org> wrote:

>
> eryksun added the comment:
>
> My suggestion is only for 3.5 / 3.6, because Windows XP is no longer
> supported. Starting with Windows Vista, both cmd.exe and CreateProcess
> support this behavior. I realize that disabling searching the current
> director by default is going beyond what cmd.exe does, which is to require
> opting in via the environment variable. But Python can choose to follow the
> more-modern behavior of PowerShell as a precedent here.
>
> As Steve mentioned, technically a program isn't supposed to check for the
> environment variable, but instead call NeedCurrentDirectoryForExePath. This
> would require either adding a built-in function, e.g. to the _winapi
> module, or require ctypes:
>
>     >>> kernel32.NeedCurrentDirectoryForExePathW('command')
>     1
>     >>> os.environ['NoDefaultCurrentDirectoryInExePath'] = '1'
>     >>> kernel32.NeedCurrentDirectoryForExePathW('command')
>     0
>
> Note that it requires first normalizing the path to replace slashes with
> backslashes, which is unusual for the Windows API.
>
>     >>> kernel32.NeedCurrentDirectoryForExePathW(r'.\command')
>     1
>     >>> kernel32.NeedCurrentDirectoryForExePathW('./command')
>     0
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue24505>
> _______________________________________
>
msg256242 - (view) Author: Toby Tobkin (tobytobkin) * Date: 2015-12-11 22:44
I'm working on a patch and an accompanying unit test for this now. Should have it out today.
msg256373 - (view) Author: Toby Tobkin (tobytobkin) * Date: 2015-12-14 08:29
Hopefully this isn't too much of an amateur question, but I ran into a semantics issue: should which be imitating the semantics of the Windows shell or of CreateProcess[3]? The current implementation of shutil.which implies Windows shell, but Python uses CreateProcess in subprocess for executing commands, not cmd.exe.

In order to correctly emulate the behavior of the Windows command search sequence[1] (e.g. for cmd.exe or Powershell), one needs to check whether or not a given command matches one of the internal Windows shell commands. For instance, if we have an executable at C:\xyz\chdir.exe, the following outputs should be observed from which if our current directory is C:\xyz:

>>> which('chdir')
(none)

>>> which('.\\chdir')
'C:\\xyz\\chdir.exe'

On the other hand, CreateProcess[3] would work this way:

CreateProcess(NULL, "chdir", ...) --> executes C:\xyz\chdir.exe
CreateProcess(NULL, "chdir.exe", ...) --> executes C:\xyz\chdir.exe

There are other semantic differences as well. For example, CreateProcess will not do path extension of e.g. "abc" to "abc.bat", but Powershell and cmd will.

Which semantics do I follow? Powershell/cmd or CreateProcess?

[1] Subsection "Command Search Sequence" of https://technet.microsoft.com/en-us/library/cc723564.aspx#XSLTsection127121120120
[2] Subsection "Internal and External Commands" of https://technet.microsoft.com/en-us/library/cc723564.aspx#XSLTsection127121120120
[3] https://msdn.microsoft.com/en-us/library/windows/desktop/ms682425(v=vs.85).aspx
msg256613 - (view) Author: Toby Tobkin (tobytobkin) * Date: 2015-12-17 19:47
Patch and tests attached. All Python tests passed on Windows 8 and Ubuntu 14.04LTS.
msg257248 - (view) Author: Toby Tobkin (tobytobkin) * Date: 2015-12-31 05:42
Patch incorporating Eryk Sun's feedback for various Windows-specific behavior attached.
msg258413 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-01-16 21:31
I'm good with the patch, except we can't use ctypes in the standard library. NeedCurrentDirectoryForExePathW will need to be wrapped in Modules/_winapi.c.

Should be fairly simple to do, though it probably requires learning argument clinic (which admittedly I haven't done enough to write the function without copying from elsewhere). I may get back to it eventually, but if someone posts the implementation separately I'm happy to merge two patches to get this in.
msg258415 - (view) Author: Toby Tobkin (tobytobkin) * Date: 2016-01-16 21:36
Steve, yeah I saw that and started playing with Clinic last night. I'll have the updated patch out soon--I've just been slow because I've been flying around to interviews trying to get my first job out of college and it's been eating all my free time.
msg262854 - (view) Author: Bob Alexander (bobjalex) * Date: 2016-04-04 16:48
Since there seems to be ongoing work on the "which"
function, here are a few more thoughts on this function's
future:

  - The existing version does not prepend the current
    directory to the path if it is already in the path.
    If the current directory is in the path but is not
    the first element, it will not be the first directory
    searched. It seems that the desired behavior is to
    search the current directory first, so the current
    directory should *always* be prepended. The existing
    "which" function already has an optimization to only
    search each directory once, so it's not a problem if
    the current directory is unconditionally prepended
    and may end up in there twice. This change would
    actually be a "correction", since the doc says the
    current directory is prepended

  - The function should always return an absolute path,
    which is the behavior of the Unix(1) "which" command
    and, I think, is the typical expected behavior of a
    "which"-type request. The existing implementation
    returns a relative path in certain cases, such as if
    the file is found via a relative directory reference
    in the path. This change is not inconsistent with
    the doc, since the doc does not address it.

  - It would be nice if the extension added when the
    given command has no extension were lower case,
    instead of the upper case extension retrieved from
    the PATHEXT environment variable. Several other
    "which" implementations work that way (e.g.
    see Go's os/exec.LookPath function), producing
    a more aesthetically pleasing name, as well as
    being more consistent with naming practices of the
    last 20+ years. The shocking-looking upper case
    sxtensions remind me of VERY OLD DOS programs  :-)
    This presents no conflict with the doc, and does
    not affect "correctness" since Windows file names
    are case-independent.

    A previous commenter objected to adding lower case
    extensions, but consider:
      - The current version never returns the extension
        case of the actual file. It returns the extension
        of the command string passed to the function,
        if any, otherwise it adds the extension from the
        PATHEXT environment variable, either of which
        might not match the actual file.
      - Returning the actual extension of the found file
        might be nice, but would require additional I/O;
        added expense for little return. This is not
        done in the current version.
      - The PATHEXT variable that ships with Windows
        contains the allowable extensions in upper case,
        an old DOS artifact. Few executable files these
        days have uppercase extensions. Using a lower
        case extension when the function has to invent
        one is a "modernization".

  - It would be nice if the returned file path were
    normalized. Currently, sometimes an unnormalized
    path is returned with a leading ".\".

I did write an update to "which" with the above changes,
and also updated the unit tests with:
  - 2 new tests to catch the bug that is the subject of
    this issue.
  - some tests were updated for the small changes such
    as normalization and lower case added extensions.
A zip file is attached with my updates. I'm not an
official "contributor", but feel free incorporate the
contents in any way you deem appropriate. The files are:

shutil.py
    updated shutil module
shutil.py_3_5_1
    existing 3.5.1 shutil module -- basis for updates
test_shutil_for_current_w_added_tests.py
    unit tests for existing 3.5.1 version of shutil with
    new tests to catch this bug
test_shutil_for_new_version.py
    unit tests for attached updated version of shutil
test_shutil_3_5_1.py
    existing 3.5.1 unit tests -- basis for updates

Bob
msg262875 - (view) Author: Bob Alexander (bobjalex) * Date: 2016-04-04 22:28
Oops, clarification...

I just reread my kind of long previous post, and realized it wasn't very explicit that anything concerning file extensions or prepending the current directory to the PATH apply to Windows only; not Unix (of course).

The "returning absolute, normalized paths" suggestions are multi-platform

I only tested the modified code on Windows.
History
Date User Action Args
2016-04-04 22:28:28bobjalexsetmessages: + msg262875
2016-04-04 16:48:43bobjalexsetfiles: + new_which_bug_files.zip

messages: + msg262854
2016-01-16 21:36:26tobytobkinsetmessages: + msg258415
2016-01-16 21:31:49steve.dowersetmessages: + msg258413
2015-12-31 05:42:37tobytobkinsetfiles: + issue-24505-proposed-patch-2.patch

messages: + msg257248
2015-12-17 19:47:33tobytobkinsetfiles: + issue 24505 proposed patch windows cmd semantics.patch
keywords: + patch
messages: + msg256613
2015-12-14 08:29:28tobytobkinsetmessages: + msg256373
2015-12-11 22:44:05tobytobkinsetnosy: + tobytobkin
messages: + msg256242
2015-10-05 23:18:07bar.harelsetmessages: + msg252367
2015-10-05 20:43:31eryksunsetmessages: + msg252359
2015-10-05 19:58:56steve.dowersetmessages: + msg252358
2015-10-05 18:26:45r.david.murraysetmessages: + msg252355
2015-10-05 16:39:58eryksunsetversions: + Python 3.6, - Python 3.5
nosy: + eryksun, paul.moore, tim.golden, zach.ware, steve.dower

messages: + msg252341

components: + Windows
2015-10-05 15:14:11bar.harelsetnosy: + bar.harel
messages: + msg252329
2015-06-27 19:19:43r.david.murraysetkeywords: + easy

messages: + msg245887
2015-06-27 17:33:48bobjalexsetmessages: + msg245885
2015-06-25 03:26:03r.david.murraysetnosy: + r.david.murray
messages: + msg245792
2015-06-24 22:09:39bobjalexcreate