Title: support "UNC" device paths in ntpath.splitdrive
Type: behavior Stage: patch review
Components: Library (Lib), Windows Versions: Python 3.9, Python 3.8
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, nsiregar, paul.moore, serhiy.storchaka, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2019-07-17 11:26 by eryksun, last changed 2020-10-27 17:15 by eryksun.

File name Uploaded Description Edit eryksun, 2019-09-13 16:05 eryksun, 2020-10-27 17:15
Pull Requests
URL Status Linked Edit
PR 14841 open nsiregar, 2019-07-18 14:37
Messages (6)
msg348055 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-07-17 11:26
Windows Python includes UNC shares such as "//server/spam" in its definition of a drive. This is natural because Windows supports setting a UNC path as the working directory and handles the share component as the working drive when resolving rooted paths such as "/eggs". For the sake of generality when working with \\?\ extended paths, Python should expand its definition of a UNC drive to include "UNC" device paths.

A practical example is calling glob.glob with a "//?/UNC" device path.

    >>> import os, sys, glob
    >>> sys.addaudithook(lambda s,a: print('#', a[0]) if s == 'glob.glob' else None)

regular UNC path:

    >>> glob.glob('//localhost/C$/Sys*')
    # //localhost/C$/Sys*
    ['//localhost/C$/System Volume Information']

"UNC" device path:

    >>> glob.glob('//?/UNC/localhost/C$/Sys*')
    # //?/UNC/localhost/C$/Sys*
    # //?/UNC/localhost/C$
    # //?/UNC/localhost
    # //?/UNC/

Since the magic character "?" is in the path (IMO, the drive should be excluded from this check, but that's a separate issue), the internal function glob._iglob calls itself recursively until it reaches the base case of dirname == pathname, where dirname is from os.path.split(pathname). The problem here is that ntpath.split doesn't stop at the proper base case of "//?/UNC/localhost/C$". This is due to ntpath.splitdrive. For example:

    >>> os.path.splitdrive('//?/UNC/localhost/C$/Sys*')
    ('//?/UNC', '/localhost/C$/Sys*')

    >>> os.path.splitdrive('//./UNC/localhost/C$/Sys*')
    ('//./UNC', '/localhost/C$/Sys*')

The results should be "//?/UNC/localhost/C$" and "//./UNC/localhost/C$". 

In other cases, returning a device as the drive is fine, if not exactly meaningful (e.g. "//./NUL"). I don't think this needs to change.
msg348099 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-07-18 06:26
Do you want to create a PR Eryk?
msg348206 - (view) Author: Ngalim Siregar (nsiregar) * Date: 2019-07-20 01:23
I was unsure about implementation in the patch, do you have UNC format specification?
msg352110 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-09-12 11:12
For clarity, given Eryk's examples above, both "\\?\UNC\" and "//?/UNC/" are okay (as are any combination of forward and backslashes in the prefix, as normalization will be applied for any except the "\\?\" version). "UNC" is also case-insensitive.
msg352355 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-09-13 16:05
Please consult the attached file "". I redesigned splitdrive() to support "UNC" and "GLOBAL" junctions in device paths. I relaxed the design to allow repeated separators everywhere except for the UNC root. IIRC, Windows has supported this since XP. For example:

    >>> print(nt._getfullpathname('//server///share'))
    >>> print(nt._getfullpathname(r'\\server\\\share'))

There are also a couple of minor behavior changes in the new implementation.

The old implementation would split "//server/" as ('//server/', ''). Since there's no share, this should not count as a drive. The new implementation splits it as ('', '//server/'). Similarly it splits '//?/UNC/server/' as ('', '//?/UNC/server/'). 

The old implementation also allowed any character as a drive 'letter'. For example, it would split '/:/spam' as ('/:', '/spam'). The new implementation ensures that the drive letter in a DOS drive is alphabetic.

I also extended test_splitdrive to use a list of test cases in order to avoid having to define each case twice. It calls tester() a second time for each case, with slash and backslash swapped.
msg379780 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-10-27 17:15
I'm attaching a rewrite of splitdrive() from msg352355. This version uses an internal _next() function to get the indices of the next path component, ignoring repeated separators. It also flattens the nested structure of the previous implementation by adding multiple return statements.
Date User Action Args
2020-10-27 17:15:45eryksunsetfiles: +

messages: + msg379780
2020-10-27 14:18:18eryksunlinkissue42170 superseder
2019-09-13 16:05:15eryksunsetfiles: +

messages: + msg352355
2019-09-12 11:12:24steve.dowersetmessages: + msg352110
2019-07-20 01:23:47nsiregarsetnosy: + nsiregar
messages: + msg348206
2019-07-18 14:37:15nsiregarsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request14632
2019-07-18 06:26:06serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg348099
2019-07-17 11:26:34eryksuncreate