This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: support "UNC" device paths in ntpath.splitdrive
Type: behavior Stage: patch review
Components: Library (Lib), Windows Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barneygale, eryksun, nsiregar, paul.moore, serhiy.storchaka, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2019-07-17 11:26 by eryksun, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
splitdrive.py eryksun, 2020-10-27 17:15
Pull Requests
URL Status Linked Edit
PR 14841 open nsiregar, 2019-07-18 14:37
PR 25261 closed steve.dower, 2021-04-07 19:18
PR 31702 open barneygale, 2022-03-06 07:09
Messages (9)
msg348055 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-07-17 11:26
Windows Python includes UNC shares such as "//server/spam" in its definition of a drive. This is natural because Windows supports setting a UNC path as the working directory and handles the share component as the working drive when resolving rooted paths such as "/eggs". For the sake of generality when working with \\?\ extended paths, Python should expand its definition of a UNC drive to include "UNC" device paths.

A practical example is calling glob.glob with a "//?/UNC" device path.

    >>> import os, sys, glob
    >>> sys.addaudithook(lambda s,a: print('#', a[0]) if s == 'glob.glob' else None)

regular UNC path:

    >>> glob.glob('//localhost/C$/Sys*')
    # //localhost/C$/Sys*
    ['//localhost/C$/System Volume Information']

"UNC" device path:

    >>> glob.glob('//?/UNC/localhost/C$/Sys*')
    # //?/UNC/localhost/C$/Sys*
    # //?/UNC/localhost/C$
    # //?/UNC/localhost
    # //?/UNC/
    []

Since the magic character "?" is in the path (IMO, the drive should be excluded from this check, but that's a separate issue), the internal function glob._iglob calls itself recursively until it reaches the base case of dirname == pathname, where dirname is from os.path.split(pathname). The problem here is that ntpath.split doesn't stop at the proper base case of "//?/UNC/localhost/C$". This is due to ntpath.splitdrive. For example:

    >>> os.path.splitdrive('//?/UNC/localhost/C$/Sys*')
    ('//?/UNC', '/localhost/C$/Sys*')

    >>> os.path.splitdrive('//./UNC/localhost/C$/Sys*')
    ('//./UNC', '/localhost/C$/Sys*')

The results should be "//?/UNC/localhost/C$" and "//./UNC/localhost/C$". 

In other cases, returning a device as the drive is fine, if not exactly meaningful (e.g. "//./NUL"). I don't think this needs to change.
msg348099 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-07-18 06:26
Do you want to create a PR Eryk?
msg348206 - (view) Author: Ngalim Siregar (nsiregar) * Date: 2019-07-20 01:23
I was unsure about implementation in the patch, do you have UNC format specification?
msg352110 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-09-12 11:12
For clarity, given Eryk's examples above, both "\\?\UNC\" and "//?/UNC/" are okay (as are any combination of forward and backslashes in the prefix, as normalization will be applied for any except the "\\?\" version). "UNC" is also case-insensitive.
msg352355 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-09-13 16:05
Please consult the attached file "splitdrive.py". I redesigned splitdrive() to support "UNC" and "GLOBAL" junctions in device paths. I relaxed the design to allow repeated separators everywhere except for the UNC root. IIRC, Windows has supported this since XP. For example:

    >>> print(nt._getfullpathname('//server///share'))
    \\server\share
    >>> print(nt._getfullpathname(r'\\server\\\share'))
    \\server\share

There are also a couple of minor behavior changes in the new implementation.

The old implementation would split "//server/" as ('//server/', ''). Since there's no share, this should not count as a drive. The new implementation splits it as ('', '//server/'). Similarly it splits '//?/UNC/server/' as ('', '//?/UNC/server/'). 

The old implementation also allowed any character as a drive 'letter'. For example, it would split '/:/spam' as ('/:', '/spam'). The new implementation ensures that the drive letter in a DOS drive is alphabetic.

I also extended test_splitdrive to use a list of test cases in order to avoid having to define each case twice. It calls tester() a second time for each case, with slash and backslash swapped.
msg379780 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-10-27 17:15
I'm attaching a rewrite of splitdrive() from msg352355. This version uses an internal _next() function to get the indices of the next path component, ignoring repeated separators. It also flattens the nested structure of the previous implementation by adding multiple return statements.
msg390375 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-04-06 20:58
Once issue43105 is merged, I've got a fairly simple implementation for this using the (new) nt._path_splitroot native method, as well as improved tests that cover both the native and emulated calculations.
msg414604 - (view) Author: Barney Gale (barneygale) * Date: 2022-03-06 02:36
I'd like to pick this up, as it would allow us to remove a duplicate implementation in pathlib with its own shortcomings.

If using native functionality if difficult to get right, could I put @eryksun's splitdrive.py implementation up for review?
msg414621 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2022-03-06 18:47
If you can build this on top of nt._path_splitroot then it could save a decent amount of work, though at the same time I think it's worthwhile having a pure Python implementation which is cross-platform.

Haven't looked at the PR yet (or Eryk's implementation recently), but at the very least it would be nice to have tests that verify consistency with nt._path_splitroot. That way at least we'll discover if the native version changes.
History
Date User Action Args
2022-04-11 14:59:18adminsetgithub: 81790
2022-03-06 18:47:25steve.dowersetmessages: + msg414621
2022-03-06 11:02:32eryksunsetmessages: - msg390391
2022-03-06 07:09:23barneygalesetpull_requests: + pull_request29822
2022-03-06 02:36:38barneygalesetnosy: + barneygale
messages: + msg414604
2021-04-09 17:58:00steve.dowersetassignee: steve.dower ->
2021-04-07 19:18:46steve.dowersetpull_requests: + pull_request23997
2021-04-07 00:26:39eryksunsetmessages: + msg390391
2021-04-06 20:58:54steve.dowersetassignee: steve.dower
messages: + msg390375
versions: + Python 3.10, - Python 3.8, Python 3.9
2021-03-28 02:09:36eryksunlinkissue38948 dependencies
2021-02-25 15:56:06eryksunsetfiles: - splitdrive.py
2020-10-27 17:15:45eryksunsetfiles: + splitdrive.py

messages: + msg379780
2020-10-27 14:18:18eryksunlinkissue42170 superseder
2019-09-13 16:05:15eryksunsetfiles: + splitdrive.py

messages: + msg352355
2019-09-12 11:12:24steve.dowersetmessages: + msg352110
2019-07-20 01:23:47nsiregarsetnosy: + nsiregar
messages: + msg348206
2019-07-18 14:37:15nsiregarsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request14632
2019-07-18 06:26:06serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg348099
2019-07-17 11:26:34eryksuncreate