Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behavior of pathlib.WindowsPath with drive paths #80486

Closed
kmaork mannequin opened this issue Mar 15, 2019 · 22 comments
Closed

Inconsistent behavior of pathlib.WindowsPath with drive paths #80486

kmaork mannequin opened this issue Mar 15, 2019 · 22 comments
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes OS-windows stdlib Python modules in the Lib dir topic-pathlib type-bug An unexpected behavior, bug, or error

Comments

@kmaork
Copy link
Mannequin

kmaork mannequin commented Mar 15, 2019

BPO 36305
Nosy @pfmoore, @tjguk, @zware, @serhiy-storchaka, @eryksun, @zooba, @kmaork, @nmat
PRs
  • bpo-36305: Fixes to path handling and parsing in pathlib #12361
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-03-15.15:37:05.961>
    labels = ['type-bug', '3.8', '3.9', '3.10', 'library', 'OS-windows']
    title = 'Inconsistent behavior of pathlib.WindowsPath with drive paths'
    updated_at = <Date 2021-03-20.06:55:56.836>
    user = 'https://github.com/kmaork'

    bugs.python.org fields:

    activity = <Date 2021-03-20.06:55:56.836>
    actor = 'eryksun'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)', 'Windows']
    creation = <Date 2019-03-15.15:37:05.961>
    creator = 'kmaork'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36305
    keywords = ['patch']
    message_count = 18.0
    messages = ['337999', '338007', '338011', '338663', '338667', '338669', '338675', '338677', '338680', '338685', '338708', '338727', '338768', '340388', '340396', '340826', '343621', '343784']
    nosy_count = 8.0
    nosy_names = ['paul.moore', 'tim.golden', 'zach.ware', 'serhiy.storchaka', 'eryksun', 'steve.dower', 'kmaork', 'nmat']
    pr_nums = ['12361']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue36305'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    Linked PRs

    @kmaork
    Copy link
    Mannequin Author

    kmaork mannequin commented Mar 15, 2019

    The behavior of WindowsPath is undefined when used with drive paths, specifically with no trailing slash:

    WindowsPath('cc:').absolute() -> WindowsPath('C:/Users/maor/cc:')
    WindowsPath('c:').absolute() -> WindowsPath('c:')
    WindowsPath('c:').is_absolute() -> False
    WindowsPath('c:') / 'p' -> WindowsPath('c:p')
    WindowsPath('c:p').absolute() -> WindowsPath('c:p')
    WindowsPath('c:p').is_absolute() -> False

    @kmaork kmaork mannequin added 3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Mar 15, 2019
    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 15, 2019

    WindowsPath('cc:').absolute() -> WindowsPath('C:/Users/maor/cc:')

    This is correct. "cc:" is not a drive, i.e. the drive attribute is an empty string. pathlib handles this as an unqualified filename that gets resolved relative to the process current working directory.

    Note that Windows allows a trailing colon for DOS devices such as "con:". Colons can also be used for file streams, but "cc:" isn't valid. The anonymous stream can only be specified explicitly by including the stream type, e.g. "cc::$DATA".

    WindowsPath('c:').is_absolute() -> False
    WindowsPath('c:p').is_absolute() -> False
    WindowsPath('c:') / 'p' -> WindowsPath('c:p')

    These are correct. Drive-relative paths depend on the current working directory of the drive. By definition they cannot be absolute. And the last one can be no more resolved than WindowsPath('spam/eggs') / 'p'.

    Note that Windows gets the working directory for drives from 'hidden' environment variables such as "=C:". If no value is set for a drive, it defaults to the root directory. Windows uses these values, but never sets them itself. Applications and libraries such as the C runtime library are responsible for setting the values. CMD and Python both set them, but PowerShell does not.

    WindowsPath('c:').absolute() -> WindowsPath('c:')
    WindowsPath('c:p').absolute() -> WindowsPath('c:p')

    This is a bug. absolute() should resolve "C:" to the current directory on the drive and join it with the remaining parts.

    @kmaork
    Copy link
    Mannequin Author

    kmaork mannequin commented Mar 15, 2019

    I agree.
    So I'll open a PR that will:

    1. Make the .absolute method return correct values in these cases
    2. Add a regression test for this behavior

    @kmaork
    Copy link
    Mannequin Author

    kmaork mannequin commented Mar 23, 2019

    Update after editing my PR - the bugs are:

    1. WindowsPath('C:a').absolute() should return WindowsPath('C:\\d\\a') but returns WindowsPath('C:a').
      This is caused by flawed logic in the parse_parts method of the _Flavour class.

    2. WindowsPath('./b:a').absolute() should return WindowsPath('C:\\d\\b:a') but returns WindowsPath('b:a').
      This is caused by the limited interface of parse_parts, and affects the Path.absolute, Path.expanduser and Path.__rtruediv__ methods.

    3. WindowsPath('./b:a').resolve() should return WindowsPath('C:\\d\\b:a') but returns WindowsPath('b:a').
      This is caused by missing logic in the resolve method and in Path.__str__

    It'd be great if someone could review the PR so we can make progress with fixing the bugs.

    @pfmoore
    Copy link
    Member

    pfmoore commented Mar 23, 2019

    (Note: I consider all of these to be *extremely* obscure corner cases)

    1. WindowsPath('C:a').absolute() should return WindowsPath('C:\\d\\a') but returns WindowsPath('C:a').
      This is caused by flawed logic in the parse_parts method of the _Flavour class.

    2. WindowsPath('./b:a').absolute() should return WindowsPath('C:\\d\\b:a') but returns WindowsPath('b:a').
      This is caused by the limited interface of parse_parts, and affects the Path.absolute, Path.expanduser and Path.__rtruediv__ methods.

    [Note there is no absolute() method - I assume you mean resolve()]

    Why? If './b:a' resulted from joining '.' and 'b:a', then it seems to
    me that 'b:a' is "absolute-ish" (drive-relative) and so any preceding
    path should be ignored (just like joining '.' and '/a/b/c' on POSIX).
    The problem here is more to do with the fact that the simple POSIX
    "absolute or relative" dichotomy isn't complete for Windows -
    drive-relative paths like C:foo have some of the characteristics of
    absolute paths and some of the characteristics of relative paths.

    To put it another way, I'm comfortable with WindowsPath("./b:a")
    returning "WindowsPath("b:a"). Of course, that's because I read b:a as
    a drive plus a filename. If you read it as a filename with a stream,
    then your interpretation is correct. But unless you check for all of
    the valid drives currently available, it's not possible to make that
    choice - both interpretations are equally valid. In fact, if you have
    a file "C", and add a stream "file1" to it, is "C:file1" a file on the
    C drive, or a stream in the file C? The problem is genuinely ambiguous
    in that case, and cannot be solved without making an arbitrary choice.
    Worse, WindowsPath("w:fred").resolve(strict=False) technically can't
    even take account of whether drive w exists or file w exists or has a
    stream fred. In that case, the question is fundamentally unanswerable.

    I'd be reluctant to "solve" this issue with a fix that doesn't address
    that problem - it would simply be replacing one weird behaviour with
    another in an obscure corner case. (It may be that the "fix" is simply
    to document the choices that the code currently makes).

    1. WindowsPath('./b:a').resolve() should return WindowsPath('C:\\d\\b:a') but returns WindowsPath('b:a').
      This is caused by missing logic in the resolve method and in Path.__str__

    Same as (2)

    It'd be great if someone could review the PR so we can make progress with fixing the bugs.

    I don't think we should worry about the PR until it's clearly
    established what the correct resolution actually is (or even whether
    we consider all of these cases as bugs).

    @kmaork
    Copy link
    Mannequin Author

    kmaork mannequin commented Mar 23, 2019

    (Note: I consider all of these to be *extremely* obscure corner cases)
    One bug was enough for me :)

    [Note there is no absolute() method - I assume you mean resolve()]
    Of course there is an absolute() method, I'm not sure what you are saying...

    it seems to me that 'b:a' is "absolute-ish" (drive-relative)
    I think that is incorrect. As written here: https://docs.microsoft.com/en-us/windows/desktop/fileio/naming-a-file#fully-qualified-vs-relative-paths, "If a file name begins with only a disk designator but not the backslash after the colon, it is interpreted as a relative path to the current directory on the drive with the specified letter."
    In that case, WindowsPath('C:a').is_absolute() should return False, (as it does today) and WindowsPath('C:a').absolute() should return a path on drive C:, with 'a' joined with the working directory in drive C:.

    I'm comfortable with WindowsPath("./b:a") returning WindowsPath("b:a")
    I disagree with that as well. "./b:a" is explicitly a path relative to the CWD (to a file named b:a). On the other hand, "b:a" should be (an is, in most windows programs) interpreted as a drive relative path.
    For example, the ntpath module handles these cases correctly. When located in the directory C:\\d, this is the ntpath behavior:
    ntpath.abspath('b:a') -> 'B:\\a'
    ntpath.abspath('.\\b:a') -> 'C:\\d\\b:a'

    In conclusion, I stand by my original fix offers. They are correct according to windows' documentation and behavior.

    @pfmoore
    Copy link
    Member

    pfmoore commented Mar 23, 2019

    > [Note there is no absolute() method - I assume you mean resolve()]
    Of course there is an absolute() method, I'm not sure what you are saying...

    Huh, weird. It's not in
    https://docs.python.org/3.7/library/pathlib.html But you're right, it
    does exist...

    > it seems to me that 'b:a' is "absolute-ish" (drive-relative)
    I think that is incorrect. As written here: https://docs.microsoft.com/en-us/windows/desktop/fileio/naming-a-file#fully-qualified-vs-relative-paths, "If a file name begins with only a disk designator but not the backslash after the colon, it is interpreted as a relative path to the current directory on the drive with the specified letter."
    In that case, WindowsPath('C:a').is_absolute() should return False, (as it does today) and WindowsPath('C:a').absolute() should return a path on drive C:, with 'a' joined with the working directory in drive C:.

    OK, sure. My point is that "relative path to the current directory on
    the drive with the specified letter" isn't a concept that matches how
    "relative paths" are typically interpreted (most code interprets
    "relative" as "relative to the CWD", and doesn't deal with the
    possibility of per-drive CWDs).

    > I'm comfortable with WindowsPath("./b:a") returning WindowsPath("b:a")
    I disagree with that as well. "./b:a" is explicitly a path relative to the CWD (to a file named b:a). On the other hand, "b:a" should be (an is, in most windows programs) interpreted as a drive relative path.

    Windows is inconsistent here - it can *also* be interpreted as a path
    to a stream within a file. But it (virtually) never is.

    For example, the ntpath module handles these cases correctly. When located in the directory C:\\d, this is the ntpath behavior:
    ntpath.abspath('b:a') -> 'B:\\a'
    ntpath.abspath('.\\b:a') -> 'C:\\d\\b:a'

    That second case only results in a valid filename if b:a is viewed as
    a file-with-stream, but the first only makes sense if b:a is viewed as
    a directory-relative file.

    Also, in effect it means that Path(".") / some_path can return a
    completely different location than some_path. Following documented
    behaviour or not, that violates a pretty fundamental assumption that
    users would expect to hold. And I think the problem is that
    "drive-relative paths" are somewhat odd things that don't fit well in
    the POSIX-derived model that Python's path APIs follow.

    In conclusion, I stand by my original fix offers. They are correct according to windows' documentation and behavior.

    I remain of the view that the Windows documentation introduces a
    concept ("relative path to the current directory on a specific drive")
    that isn't well modelled by the current APIs, and the only "proper"
    solution is to extend the API (like with the ideas of "drive" and
    "reserved filenames", which are Windows-specific, but supported
    everywhere). In the absence of that, I believe that any "fix" within
    the existing model will have odd edge cases, and I don't think these
    ones (specifically the "./b:a" cases) have *any* "obvious" answer.

    I'm happy to agree to differ on this point, though. If the new
    behaviour is just a side-effect of fixing absolute() to match the
    cases Eryk commented on, then that's fine - I just wouldn't describe
    the particular ./b:c cases as "bug fixes", rather as "changes in the
    behaviour of cases no-one should actually care about" :-)

    BTW, was there an actual use case for this issue, or was it simply a
    theoretical concern? I'm actually much happier considering these cases
    as "undefined" in practice (and just "not mentioned" in the docs),
    rather than trying to pin it down precisely. I don't know of a case
    where it actually benefits us to document this level of arguable
    behaviour.

    @kmaork
    Copy link
    Mannequin Author

    kmaork mannequin commented Mar 23, 2019

    OK, sure. My point is that "relative path to the current directory on
    the drive with the specified letter" isn't a concept that matches how
    "relative paths" are typically interpreted (most code interprets
    "relative" as "relative to the CWD", and doesn't deal with the
    possibility of per-drive CWDs).

    In pathlib, each path object has a drive, a root and path parts. Seeing pathlib's logic and tests (many tests are specifically testing the behavior of drive-relative paths), I think pathlib completely supports drive relative paths, with the exception of a few small bugs.

    > > I'm comfortable with WindowsPath("./b:a") returning WindowsPath("b:a")
    > I disagree with that as well. "./b:a" is explicitly a path relative to the CWD (to a file named b:a). On the other hand, "b:a" should be (an is, in most windows programs) interpreted as a drive relative path.
    Windows is inconsistent here - it can *also* be interpreted as a path
    to a stream within a file. But it (virtually) never is.

    I think that when parsing windows paths, pathlib should behave exactly like the windows API does. This is crucial for interaction with the windows API itself or with other applications that might use it. I don't see any other way to parse windows paths other than according to the normal windows behavior.
    Having said that, pathlib does a pretty good keeping the compatibility with the windows API, except for the small cases I found and brought forward in this issue report. From the information I gathered, when a path starts with one letter followed by a colon, windows treats it as a drive and continues parsing the rest of the path separately. That means that if you want to specify a path to a file in the CWD, with a single-character name and a file stream, you must precede the path with a "./" (See eryksun's comment on my PR before I fixed it #12361 (comment)).
    Here is an example for the behavior of the windows API in this case:
    win32api.GetFullPathName('b:a') -> 'B:\\a'
    win32api.GetFullPathName('./b:a') -> 'C:\\Users\\maor\\b:a'

    Also, in effect it means that Path(".") / some_path can return a
    completely different location than some_path.

    This behavior is completely normal. Should WindowsPath('C:\\f') / WindowsPath('D:\\f2') return anything other than WindowsPath('D:/f2')?

    And I think the problem is that "drive-relative paths" are somewhat odd things that don't fit well in the POSIX-derived model that Python's path APIs follow.

    As I wrote earlier, I think this is incorrect as the pathlib.Path class holds the attributes _drv, _root and _parts, which allows it to fully support drive-relative paths, by having a _drv and not having a _root.

    I'm happy to agree to differ on this point, though. If the new
    behaviour is just a side-effect of fixing absolute() to match the
    cases Eryk commented on, then that's fine - I just wouldn't describe
    the particular ./b:c cases as "bug fixes", rather as "changes in the
    behaviour of cases no-one should actually care about" :-)

    I'm still that my case is convincing enough, but if not - does that require me to make any changes in order to make progress with my PR?

    BTW, was there an actual use case for this issue, or was it simply a
    theoretical concern?

    I've had an annoying bug using pathlib, traced it to the first bug I've presented in this issue, and discovered a few similar unhandled edge cases. Again, the "bugginess" I set upon to fix (call it a bug or an undefined behavior) is an incompatibility issue with the way paths are normally treated in windows.

    @pfmoore
    Copy link
    Member

    pfmoore commented Mar 23, 2019

    does that require me to make any changes in order to make progress with my PR?

    I'm not going to block this PR. I'd prefer it if we at least
    documented the agreed behaviour, so that in future people don't come
    along and say the new behaviour is wrong, but again I'm not going to
    insist. I doubt I'll ever hit this edge case myself (either in code of
    my own, or when working with others) so my only real interest is in
    flagging up the concern. I'd like to hear what Eryk has to say on the
    matter, though.

    Ultimately the only person whose views matter are yours (as the person
    writing the code) whoever commits the change.

    @kmaork
    Copy link
    Mannequin Author

    kmaork mannequin commented Mar 23, 2019

    Alright, documentation is always good :)
    I'll be glad to add some, but could you please point me to the place in the code where you think it should go? (or just comment on the PR)

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 24, 2019

    Paul, I agree that joining Path(".") and Path("c:a") should yield Path("c:a"). However, I disagree that we should always be able to construct Path("a/b") from components Path("a") and Path("b"). It doesn't apply to Path("./c:a"). There's no way to split it up as the joining of two pathlib paths because there is no way to represent "c:a" by itself as anything other than a drive-relative path. The name "./c:a" has to be taken as a unit, which is fundamentally different from "c:a". pathlib agrees:

        >>> p1 = Path('./c:a')
        >>> p1
        WindowsPath('c:a')
        >>> [p1.drive, p1.root, p1.parts]
        ['', '', ('c:a',)]
    
        >>> p2 = Path('.') / Path('c:a')
        >>> p2
        WindowsPath('c:a')
        >>> [p2.drive, p2.root, p2.parts]
        ['c:', '', ('c:', 'a')]

    Path('./c:a') is correctly parsed as a relative filename (no root and no drive). So, if it helps any, on the PR I wasn't requesting to change how it's parsed. The ambiguity is due to the way pathlib always collapses all "." components. I would like it to retain an initial "." component. That way the string representation will come out correctly as ".\\c:a" as opposed to the drive-relative path "c:a".

    Some Windows API and runtime library functions behave differently depending whether a relative path has a leading "." or ".." component. We're at a disadvantage if we throw this information away. For example, "./spam/eggs.ext" and "spam/eggs.ext" can yield different results when searching for the file via SearchPathW, CreateProcessW (if using lpCommandLine, not lpApplicationName), or LoadLibraryExW (data/image DLL loading, not normal module loading). "./spam/eggs.ext" will be resolved relative to the process working directory, but "spam/eggs.ext" will be tried against every directory in the default file, executable, or library search path, which may not even include the working directory. (The latter behavior is unique to Windows. POSIX never searches for a name with a slash in it.)

    The CreateProcessW case is a generalization of the case that we're used to across various platforms, in which, for the sake of security, the "." entry is excluded from PATH. In this case, the only way to run an executable in the working directory is to reference it explicitly. For example (in Linux):

        >>> p = Path('./test.sh')
        >>> open(p, 'w').write('#!/bin/sh\necho spam\n')
        20
        >>> os.chmod(p, 0o700)
        >>> subprocess.call(['./test.sh'])
        spam
        0
        >>> try: subprocess.call([str(p)])
        ... except FileNotFoundError: print('eggs')
        ... 
        eggs
        >>> str(p)
        'test.sh'

    This would work if pathlib kept the initial "." component.

    An example where we currently retain information that's not obviously needed is with ".." components. Even Path.absolute() retains ".." components. It's important in POSIX. For example, "spam/../eggs" shouldn't be reduced to "eggs" because "spam" might be a symlink. This doesn't generally matter in Windows, since it normalizes paths in user mode as strings before they're passed to the kernel, but we still don't throw the information away because it could be useful to code that implements POSIX-like behavior.

    @pfmoore
    Copy link
    Member

    pfmoore commented Mar 24, 2019

    There's no way to split it up as the joining of two pathlib paths because there is no way to represent "c:a" by itself as anything other than a drive-relative path. The name "./c:a" has to be taken as a unit, which is fundamentally different from "c:a". pathlib agrees:

    \>\>\> p1 = Path('./c:a')
    \>\>\> p1
    WindowsPath('c:a')
    \>\>\> [p1.drive, p1.root, p1.parts]
    ['', '', ('c:a',)]
    
    \>\>\> p2 = Path('.') / Path('c:a')
    \>\>\> p2
    WindowsPath('c:a')
    \>\>\> [p2.drive, p2.root, p2.parts]
    ['c:', '', ('c:', 'a')]
    

    Path('./c:a') is correctly parsed as a relative filename (no root and no drive). So, if it helps any, on the PR I wasn't requesting to change how it's parsed. The ambiguity is due to the way pathlib always collapses all "." components. I would like it to retain an initial "." component. That way the string representation will come out correctly as ".\\c:a" as opposed to the drive-relative path "c:a".

    Ah, I think I follow now. But I'm not sure what you mean by wanting it
    to "retain an initial '.' component" - how would you expect that to
    work in practice? p1.parts == ('.', 'c:a')? I suspect that could break
    existing code. In 99% of cases an initial ./ *is* semantically
    irrelevant, and people expect it to be omitted. Upsetting that
    expectation for something so rare, while technically correct, is
    something we need to be careful of. Maybe it needs to be retained in a
    new attribute in the Path object, which affects the str() conversion,
    but not the existing attributes like parts.

    Some Windows API and runtime library functions behave differently depending whether a relative path has a leading "." or ".." component. We're at a disadvantage if we throw this information away.

    Yes, I see your point now. Whether the initial string representation
    was foo or ./foo is semantic information that we need to retain for
    those places where it matters. But my concern is at the other end of
    the equation - we need to be careful, having retained that semantic
    information, not to have it intrude on existing, working use cases.

    The CreateProcessW case is a generalization of the case that we're used to across various platforms, in which, for the sake of security, the "." entry is excluded from PATH. In this case, the only way to run an executable in the working directory is to reference it explicitly. For example (in Linux):
    [...]
    This would work if pathlib kept the initial "." component.

    Thanks, this is a really useful example, as it makes it clear that
    this is a general issue, not a platform-specific quirk.

    An example where we currently retain information that's not obviously needed is with ".." components. Even Path.absolute() retains ".." components. It's important in POSIX. For example, "spam/../eggs" shouldn't be reduced to "eggs" because "spam" might be a symlink. This doesn't generally matter in Windows, since it normalizes paths in user mode as strings before they're passed to the kernel, but we still don't throw the information away because it could be useful to code that implements POSIX-like behavior.

    Yes. Maybe not stripping an initial ./ can be modelled on that example

    • the documentation
      (https://docs.python.org/3.7/library/pathlib.html#pure-paths) says
      "Spurious slashes and single dots are collapsed, but double dots
      ('..') are not, since this would change the meaning of a path in the
      face of symbolic links" - that should be expanded to clarify that an
      initial "." is similarly retained because removing it would change the
      meaning in the face of subprocess invocatioons which rely on it to
      explicitly allow running a file from the current directory, or Windows
      files with streams.

    The exact behaviour needs to be clarified, of course:

    Path('./a')
    Path('./a:b')
    Path('.', 'a')
    Path('./', 'a')
    Path('.', '.', 'a')

    ... etc. We should have well-defined behaviour for all of these (I'm
    not saying it's *hard* to define the behaviour), and tests to ensure
    it's followed.

    Having said all of this, I'm not at all sure how much it relates to
    the original description of this issue, which didn't mention initial
    './' components at all. Is the originally reported behaviour a
    *consequence* of not retaining './', or is it a separate problem? If
    the latter, then maybe "Pathlib should retain an initial './'" would
    be better raised as a separate bpo item (and PR)?

    @kmaork
    Copy link
    Mannequin Author

    kmaork mannequin commented Mar 24, 2019

    Ah, I think I follow now. But I'm not sure what you mean by wanting it to "retain an initial '.' component" - how would you expect that to work in practice? p1.parts == ('.', 'c:a')? I suspect that could break existing code.

    I've dealt with that in my PR by checking in the __str__ method if the path could be misinterpreted, and if so prepending a './'.

    > The CreateProcessW case is a generalization of the case that we're used to across various platforms, in which, for the sake of security, the "." entry is excluded from PATH. In this case, the only way to run an executable in the working directory is to reference it explicitly. For example (in Linux):
    [...]
    > This would work if pathlib kept the initial "." component.

    Thanks, this is a really useful example, as it makes it clear that this is a general issue, not a platform-specific quirk.

    In my opinion, this is a different problem, that its solution doesn't necessarily belong in pathlib. Pathlib is responsible to parse, manipulate and display paths correctly (which it fails to do with the case of Path('./a:b') -> Path('a:b')). In contrast, in the case of CreateProcessW, both Path('./exec') and Path('exec') are the *correct* paths to the executable. The ./ is not necessary in that case to display the path correctly, but just to interact correctly with CreateProcessW's interface.

    Having said all of this, I'm not at all sure how much it relates to the original description of this issue, which didn't mention initial './' components at all. Is the originally reported behaviour a *consequence* of not retaining './', or is it a separate problem? If the latter, then maybe "Pathlib should retain an initial './'" would be better raised as a separate bpo item (and PR)?

    I completely agree that any change made to support retaining the initial './' for process invocation should be made in a different bpo item and PR. I do disagree though, that such change should be made, as like I wrote above, this is only needed for CreateProcessW's interface, and isn't related to pathlib's logic itself.
    Maybe, a nice idea would be to open a separate PR that would add a utility for dealing with executing paths. Something like Path.popen(cmd_args, **kwargs), that will call Popen with a path prepended with a './' if needed.

    @kmaork
    Copy link
    Mannequin Author

    kmaork mannequin commented Apr 17, 2019

    What can we do in order to make progress here?

    @eryksun
    Copy link
    Contributor

    eryksun commented Apr 17, 2019

    In contrast, in the case of CreateProcessW, both Path('./exec') and
    Path('exec') are the *correct* paths to the executable. The ./ is not
    necessary in that case to display the path correctly, but just to
    interact correctly with CreateProcessW's interface.

    It's semantic information, just for different reasons than the "./c:a" case. In this case, it's about what a path means in a search context. I think specifying a filesystem path for a search disposition is as valid a use case as is specifying a path for an open or create disposition.

    In Windows, the qualified path r".\exec" is resolved relative to just the working directory. Classically, for an unqualified name such as "exec", the working directory is checked after the application directory. Starting with Windows Vista, CreateProcessW and the CMD shell won't check the working directory at all for unqualified "exec" if NoDefaultCurrentDirectoryInExePath is set in the environment, not unless a "." entry is explicitly added to PATH.

    The behavior of CreateProcessW also differs for r".\spam\exec" vs r"spam\exec". Without explicitly including the leading ".", the system searches every directory in the executable search path (i.e. the application directory, working directory, system directories, and PATH). This differs from Unix, in which any path with a slash in it is always resolved relative to the working directory. Getting this behavior in Windows requires using the qualified path r".\spam\exec".

    We may want either behavior. I think if a path is specified explicitly with a leading "." component, or joined from one, the "." component should be preserved, just as we already preserve a leading ".." component.

    This is a separate issue, as I suggested in the PR comment. It was just supposed to be a related issue that you might be interested in. I didn't intend to conflate it with this issue.

    @kmaork
    Copy link
    Mannequin Author

    kmaork mannequin commented Apr 25, 2019

    Alright. And what is left to do with the current issue?

    @nmat
    Copy link
    Mannequin

    nmat mannequin commented May 27, 2019

    Is this related to the weird paths I am seeing when getting different output in venv compared to without venv:

    from pathlib import Path
    
    filename = Path(__file__)
    filename2 = Path('C:\\path\\to\\file.py')
    
    print(filename)
    print(filename2)

    Where the result is:
    ----------------------------
    Powershell (python.exe run):
    file.py
    C:\path\to\file.py

    Venv run:
    C:\path\to\file.py
    C:\path\to\file.py

    @kmaork
    Copy link
    Mannequin Author

    kmaork mannequin commented May 28, 2019

    Is this related to the weird paths I am seeing when getting different output in venv compared to without venv

    This is probably not related, sounds like something caused by the venv implementation.

    On a different note, how do I get my PR reviewed? (#12361)

    @eryksun eryksun added 3.9 only security fixes 3.10 only security fixes and removed 3.7 (EOL) end of life labels Mar 20, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @barneygale
    Copy link
    Contributor

    This can be solved quite simply once #101667 lands.

    @barneygale
    Copy link
    Contributor

    I've opened #102454, which brings @kmaork's patch up-to-date and solves the issue (I think) with a small change to pathlib.

    @pfmoore @eryksun I appreciate it's been 4 years, but would you be interested in reviewing that PR? Thanking you!

    @barneygale
    Copy link
    Contributor

    It looks like ntpath.realpath() is affected by a similar bug. I've logged a new issue:

    zooba pushed a commit that referenced this issue Mar 10, 2023
    @zooba zooba closed this as completed Mar 10, 2023
    @barneygale
    Copy link
    Contributor

    Resolved in #102454 / 90f1d77, merged into the 3.12 but not backported.

    Thanks again @kmaork for reporting this and working on the initial patch!

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes 3.10 only security fixes OS-windows stdlib Python modules in the Lib dir topic-pathlib type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants