classification
Title: support "UNC" device paths in ntpath.splitdrive
Type: behavior Stage: patch review
Components: Library (Lib), Windows Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, nsiregar, paul.moore, serhiy.storchaka, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2019-07-17 11:26 by eryksun, last changed 2021-04-09 17:58 by steve.dower.

Files
File name Uploaded Description Edit
splitdrive.py eryksun, 2020-10-27 17:15
Pull Requests
URL Status Linked Edit
PR 14841 open nsiregar, 2019-07-18 14:37
PR 25261 closed steve.dower, 2021-04-07 19:18
Messages (8)
msg348055 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-07-17 11:26
Windows Python includes UNC shares such as "//server/spam" in its definition of a drive. This is natural because Windows supports setting a UNC path as the working directory and handles the share component as the working drive when resolving rooted paths such as "/eggs". For the sake of generality when working with \\?\ extended paths, Python should expand its definition of a UNC drive to include "UNC" device paths.

A practical example is calling glob.glob with a "//?/UNC" device path.

    >>> import os, sys, glob
    >>> sys.addaudithook(lambda s,a: print('#', a[0]) if s == 'glob.glob' else None)

regular UNC path:

    >>> glob.glob('//localhost/C$/Sys*')
    # //localhost/C$/Sys*
    ['//localhost/C$/System Volume Information']

"UNC" device path:

    >>> glob.glob('//?/UNC/localhost/C$/Sys*')
    # //?/UNC/localhost/C$/Sys*
    # //?/UNC/localhost/C$
    # //?/UNC/localhost
    # //?/UNC/
    []

Since the magic character "?" is in the path (IMO, the drive should be excluded from this check, but that's a separate issue), the internal function glob._iglob calls itself recursively until it reaches the base case of dirname == pathname, where dirname is from os.path.split(pathname). The problem here is that ntpath.split doesn't stop at the proper base case of "//?/UNC/localhost/C$". This is due to ntpath.splitdrive. For example:

    >>> os.path.splitdrive('//?/UNC/localhost/C$/Sys*')
    ('//?/UNC', '/localhost/C$/Sys*')

    >>> os.path.splitdrive('//./UNC/localhost/C$/Sys*')
    ('//./UNC', '/localhost/C$/Sys*')

The results should be "//?/UNC/localhost/C$" and "//./UNC/localhost/C$". 

In other cases, returning a device as the drive is fine, if not exactly meaningful (e.g. "//./NUL"). I don't think this needs to change.
msg348099 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-07-18 06:26
Do you want to create a PR Eryk?
msg348206 - (view) Author: Ngalim Siregar (nsiregar) * Date: 2019-07-20 01:23
I was unsure about implementation in the patch, do you have UNC format specification?
msg352110 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-09-12 11:12
For clarity, given Eryk's examples above, both "\\?\UNC\" and "//?/UNC/" are okay (as are any combination of forward and backslashes in the prefix, as normalization will be applied for any except the "\\?\" version). "UNC" is also case-insensitive.
msg352355 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-09-13 16:05
Please consult the attached file "splitdrive.py". I redesigned splitdrive() to support "UNC" and "GLOBAL" junctions in device paths. I relaxed the design to allow repeated separators everywhere except for the UNC root. IIRC, Windows has supported this since XP. For example:

    >>> print(nt._getfullpathname('//server///share'))
    \\server\share
    >>> print(nt._getfullpathname(r'\\server\\\share'))
    \\server\share

There are also a couple of minor behavior changes in the new implementation.

The old implementation would split "//server/" as ('//server/', ''). Since there's no share, this should not count as a drive. The new implementation splits it as ('', '//server/'). Similarly it splits '//?/UNC/server/' as ('', '//?/UNC/server/'). 

The old implementation also allowed any character as a drive 'letter'. For example, it would split '/:/spam' as ('/:', '/spam'). The new implementation ensures that the drive letter in a DOS drive is alphabetic.

I also extended test_splitdrive to use a list of test cases in order to avoid having to define each case twice. It calls tester() a second time for each case, with slash and backslash swapped.
msg379780 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-10-27 17:15
I'm attaching a rewrite of splitdrive() from msg352355. This version uses an internal _next() function to get the indices of the next path component, ignoring repeated separators. It also flattens the nested structure of the previous implementation by adding multiple return statements.
msg390375 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-04-06 20:58
Once issue43105 is merged, I've got a fairly simple implementation for this using the (new) nt._path_splitroot native method, as well as improved tests that cover both the native and emulated calculations.
msg390391 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-04-07 00:26
> I've got a fairly simple implementation for this using the (new) 
> nt._path_splitroot native method

It's for the best to let the system path API handle this, especially if doing so gets this issue resolved, as well as others like it and those that depend on it. I'm a bit disappointed, however, that PathCchSkipRoot() doesn't handle some of the cases that I handled in splitdrive.py. 

PathCchSkipRoot() doesn't support the "Global" link in device paths. Fortunately this case is uncommon. In practice "Global" is only needed when a DOS device name has to be created globally, e.g. "\\.\Global\SomeGlobalDevice".

PathCchSkipRoot() splits "\\.\UNC\server\share" as "\\.\UNC\" and "server\share", but that's okay since no one uses "\\.\UNC". 

PathCchSkipRoot() restricts the "\\?\" prefix to drive names, volume GUID names, and the "UNC" device -- such as "\\?\X:", "\\?\Volume{12345678-1234-1234-1234-123456789ABC}", and "\\?\UNC\server\share". Other device names such as the "PIPE" device have to use the "\\.\" prefix. Rarely, a device path may need "\\?\" if it's a long path or needs to bypass normalization. I wanted splitdrive() to remain neutral for such cases, but that will have to be sacrificed.

PathCchSkipRoot() doesn't ignore repeated slashes in the drive part of a UNC "\\server\share" or "\\?\UNC\server\share" path, even though GetFullPathNameW() collapses all but the initial two slashes. (More than two initial slashes is invalid.) For example, the system normalizes "//localhost///C$" as "\\localhost\C$:

    >>> os.chdir('//localhost///C$/Temp')
    >>> print(os.getcwd())
    \\localhost\C$\Temp

PathCchSkipRoot() also allows just a UNC server to count as a root, e.g. "\\server" or "\\?\UNC\server", though a valid UNC path requires a share name. Without a share, FindFirstFileW(L"//server/*", &find_data) will fail to parse the directory name correctly and try to open "//server/*", which is an invalid name. If splitdrive() requires a valid drive name, it should not return "//server" or "//server/" as a drive name.
History
Date User Action Args
2021-04-09 17:58:00steve.dowersetassignee: steve.dower ->
2021-04-07 19:18:46steve.dowersetpull_requests: + pull_request23997
2021-04-07 00:26:39eryksunsetmessages: + msg390391
2021-04-06 20:58:54steve.dowersetassignee: steve.dower
messages: + msg390375
versions: + Python 3.10, - Python 3.8, Python 3.9
2021-03-28 02:09:36eryksunlinkissue38948 dependencies
2021-02-25 15:56:06eryksunsetfiles: - splitdrive.py
2020-10-27 17:15:45eryksunsetfiles: + splitdrive.py

messages: + msg379780
2020-10-27 14:18:18eryksunlinkissue42170 superseder
2019-09-13 16:05:15eryksunsetfiles: + splitdrive.py

messages: + msg352355
2019-09-12 11:12:24steve.dowersetmessages: + msg352110
2019-07-20 01:23:47nsiregarsetnosy: + nsiregar
messages: + msg348206
2019-07-18 14:37:15nsiregarsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request14632
2019-07-18 06:26:06serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg348099
2019-07-17 11:26:34eryksuncreate