classification
Title: support "UNC" device paths in ntpath.splitdrive
Type: behavior Stage: patch review
Components: Library (Lib), Windows Versions: Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, nsiregar, paul.moore, serhiy.storchaka, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2019-07-17 11:26 by eryksun, last changed 2019-09-13 16:05 by eryksun.

Files
File name Uploaded Description Edit
splitdrive.py eryksun, 2019-09-13 16:05
Pull Requests
URL Status Linked Edit
PR 14841 open nsiregar, 2019-07-18 14:37
Messages (5)
msg348055 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-07-17 11:26
Windows Python includes UNC shares such as "//server/spam" in its definition of a drive. This is natural because Windows supports setting a UNC path as the working directory and handles the share component as the working drive when resolving rooted paths such as "/eggs". For the sake of generality when working with \\?\ extended paths, Python should expand its definition of a UNC drive to include "UNC" device paths.

A practical example is calling glob.glob with a "//?/UNC" device path.

    >>> import os, sys, glob
    >>> sys.addaudithook(lambda s,a: print('#', a[0]) if s == 'glob.glob' else None)

regular UNC path:

    >>> glob.glob('//localhost/C$/Sys*')
    # //localhost/C$/Sys*
    ['//localhost/C$/System Volume Information']

"UNC" device path:

    >>> glob.glob('//?/UNC/localhost/C$/Sys*')
    # //?/UNC/localhost/C$/Sys*
    # //?/UNC/localhost/C$
    # //?/UNC/localhost
    # //?/UNC/
    []

Since the magic character "?" is in the path (IMO, the drive should be excluded from this check, but that's a separate issue), the internal function glob._iglob calls itself recursively until it reaches the base case of dirname == pathname, where dirname is from os.path.split(pathname). The problem here is that ntpath.split doesn't stop at the proper base case of "//?/UNC/localhost/C$". This is due to ntpath.splitdrive. For example:

    >>> os.path.splitdrive('//?/UNC/localhost/C$/Sys*')
    ('//?/UNC', '/localhost/C$/Sys*')

    >>> os.path.splitdrive('//./UNC/localhost/C$/Sys*')
    ('//./UNC', '/localhost/C$/Sys*')

The results should be "//?/UNC/localhost/C$" and "//./UNC/localhost/C$". 

In other cases, returning a device as the drive is fine, if not exactly meaningful (e.g. "//./NUL"). I don't think this needs to change.
msg348099 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-07-18 06:26
Do you want to create a PR Eryk?
msg348206 - (view) Author: Ngalim Siregar (nsiregar) * Date: 2019-07-20 01:23
I was unsure about implementation in the patch, do you have UNC format specification?
msg352110 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-09-12 11:12
For clarity, given Eryk's examples above, both "\\?\UNC\" and "//?/UNC/" are okay (as are any combination of forward and backslashes in the prefix, as normalization will be applied for any except the "\\?\" version). "UNC" is also case-insensitive.
msg352355 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-09-13 16:05
Please consult the attached file "splitdrive.py". I redesigned splitdrive() to support "UNC" and "GLOBAL" junctions in device paths. I relaxed the design to allow repeated separators everywhere except for the UNC root. IIRC, Windows has supported this since XP. For example:

    >>> print(nt._getfullpathname('//server///share'))
    \\server\share
    >>> print(nt._getfullpathname(r'\\server\\\share'))
    \\server\share

There are also a couple of minor behavior changes in the new implementation.

The old implementation would split "//server/" as ('//server/', ''). Since there's no share, this should not count as a drive. The new implementation splits it as ('', '//server/'). Similarly it splits '//?/UNC/server/' as ('', '//?/UNC/server/'). 

The old implementation also allowed any character as a drive 'letter'. For example, it would split '/:/spam' as ('/:', '/spam'). The new implementation ensures that the drive letter in a DOS drive is alphabetic.

I also extended test_splitdrive to use a list of test cases in order to avoid having to define each case twice. It calls tester() a second time for each case, with slash and backslash swapped.
History
Date User Action Args
2019-09-13 16:05:15eryksunsetfiles: + splitdrive.py

messages: + msg352355
2019-09-12 11:12:24steve.dowersetmessages: + msg352110
2019-07-20 01:23:47nsiregarsetnosy: + nsiregar
messages: + msg348206
2019-07-18 14:37:15nsiregarsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request14632
2019-07-18 06:26:06serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg348099
2019-07-17 11:26:34eryksuncreate