This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: os.path.dirname doesn't handle Windows' URNs correctly
Type: behavior Stage: resolved
Components: Windows Versions: Python 3.5, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Dustin.Oprea, eryksun, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2016-06-27 21:03 by Dustin.Oprea, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (4)
msg269404 - (view) Author: Dustin Oprea (Dustin.Oprea) Date: 2016-06-27 21:03
Notice that os.path.dirname() returns whatever it is given if it is given a URN, regardless of slash-type. Oddly, you have to double-up the forward-slashes (like you're escaping them) in order to get the correct result (if you're using forward-slashes). Back-slashes appear to be broken no matter what.

C:\Python35-32>python
Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:01:18) [MSC v.1900 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os.path
>>> os.path.dirname("\\\\a\\b")
'\\\\a\\b'
>>> os.path.dirname("//a/b")
'//a/b'
>>> os.path.dirname("////a//b")
'////a'

Any ideas?
msg269408 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-06-27 22:15
dirname() is implemented via split(), which begins by calling splitdrive(). The 'drive' for a UNC path is the r"\\server\share" component. For example:

    >>> path = r'\\server\share\folder\file'
    >>> os.path.splitdrive(path)
    ('\\\\server\\share', '\\folder\\file')
    >>> os.path.split(path)
    ('\\\\server\\share\\folder', 'file')
    >>> os.path.dirname(path)
    '\\\\server\\share\\folder'

If you double the initial slashes, it's no longer a valid UNC path:

    >>> path = r'\\\\server\\share\\folder\\file'
    >>> os.path.splitdrive(path)
    ('', '\\\\\\\\server\\\\share\\\\folder\\\\file')
    >>> os.path.split(path)
    ('\\\\\\\\server\\\\share\\\\folder', 'file')
    >>> os.path.dirname(path)
    '\\\\\\\\server\\\\share\\\\folder'

Windows itself will attempt to handle it as a UNC path, but the path is invalid. Specifically, before passing the path to the kernel, Windows collapses all of the extra slashes, except an initial slash count greater than two always leaves an extra slash in the path. For example:

    >>> open(r'\\\\server\\share\\folder\\file')
    Breakpoint 0 hit
    ntdll!NtCreateFile:
    00007ffb`a1f25b70 4c8bd1          mov     r10,rcx
    0:000> !obja @r8
    Obja +00000049781ef160 at 00000049781ef160:
            Name is \??\UNC\\server\share\folder\file
            OBJ_CASE_INSENSITIVE

Notice the extra backlash in "UNC\\server". Thus a valid UNC path must start with exactly two slashes. 

Using forward slash is generally fine. The Windows API substitutes backslash for slash before passing a path to the kernel. For example:

    >>> open(r'//server/share/folder/file')
    Breakpoint 0 hit
    ntdll!NtCreateFile:
    00007ffb`a1f25b70 4c8bd1          mov     r10,rcx
    0:000> !obja @r8
    Obja +00000049781ef160 at 00000049781ef160:
            Name is \??\UNC\server\share\folder\file
            OBJ_CASE_INSENSITIVE

Except you can't use forward slash with a "\\?\" path, which bypasses normal path processing. For example:

    >>> open(r'\\?\UNC/server/share/folder/file')
    Breakpoint 0 hit
    ntdll!NtCreateFile:
    00007ffb`a1f25b70 4c8bd1          mov     r10,rcx
    0:000> !obja @r8
    Obja +00000049781ef160 at 00000049781ef160:
            Name is \??\UNC/server/share/folder/file
            OBJ_CASE_INSENSITIVE

In the kernel '/' isn't a path separator. It's just another name character, so this looks for a DOS device named "UNC/server/share/folder/file". Microsoft file systems forbid using slash in names (for POSIX compatibility and to avoid needless confusion), but you can use slash in the name of kernel objects such as Event objects, or even in the name of DOS devices defined via DefineDosDevice.
msg269410 - (view) Author: Dustin Oprea (Dustin.Oprea) Date: 2016-06-27 22:21
Thank you for your elaborate response. I appreciate knowing that "\\server\share" could be considered as the "drive" portion of the path.

I'm having trouble determining if "\\?\" is literally some type of valid UNC prefix or you're just using it to represent some format/idea. Just curious.
msg269412 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-06-28 00:36
Paths starting with "\\.\" (or  "//./") and "\\?\" are not UNC paths. I've provided some explanations and examples below, and I also encourage you to read "Naming Files, Paths, and Namespaces":

https://msdn.microsoft.com/en-us/library/aa365247

"\\.\" is the general way to access DOS devices, but with some path processing still enabled. For example:

    >>> files = os.listdir(r'//./C:/Windows/System32/..')
    >>> [x for x in files if x[:2] == 'py']
    ['py.exe', 'pyw.exe']

Notice that using slash and ".." is allowed. This form doesn't allow relative paths that depend on per-drive current directories. It's actually not recommended to use "\\.\" to access files on drive letters. Normally it's used with drive letters only when directly opening a volume. For example:

    >>> fd = os.open(r'\\.\C:', os.O_RDONLY | os.O_BINARY)
    >>> os.read(fd, 512)[:7]
    b'\xebR\x90NTFS'

The "\\?\" prefix allows the most access to the NT kernel namespace from within the Windows API (e.g. file paths can be up to 32K characters instead of the DOS limit of 260 characters). It does so by disabling all path processing, which means the onus is on the programmer to provide a fully-qualified, Unicode path that only uses backslash as the path separator.

So why does "\\.\" exist? Some DOS devices are made implicitly available in the Windows API, such as DOS drive letters and "CON". However, the Windows API greatly extends the number of 'DOS' devices (e.g. the "PhysicalDrive0" device for low-level access to the first disk). Accessing these devices unambiguously requires the "\\.\" prefix. A common example is using "\\.\pipe\[pipe name]" to open a NamedPipe. You can even list the NamedPipe filesystem in Python. For example:

    >>> p1, p2 = multiprocessing.Pipe()
    >>> [x for x in os.listdir(r'\\.\pipe') if x[:2] == 'py']
    ['pyc-719-1-hoirbkzb']

Global DOS device names are defined in the kernel's "\Global??" directory. Some DOS devices, such as mapped network drives, are usually defined local to a logon session in the kernel's "\Sessions\0\DosDevices\[Logon Session ID]" directory. In the examples I gave, you may have noticed that each native kernel path starts with "\??\". This is a virtual directory in the kernel (and only the kernel). It instructs the object manager to first search the local session DOS devices and then the global DOS devices.

A DOS device is almost always implemented as an object symbolic link to the real NT device name in the kernel's "\Device" directory. For example, "\Global??\PIPE" links to "\Device\NamedPipe" and the "C:" drive may be a link to "\Device\HarddiskVolume2". This device is what the kernel actually opened in the previous example that read from "\\.\C:". Note that this accesses the volume itself, not the root directory of the filesystem that resides on the volume. The latter is "\\.C:\". The trailing backslash makes all the difference. (Opening a directory such as the latter requires backup semantics, as described in the CreateFile docs.)

If a DOS drive letter is assigned to a volume, the assignment is stored in the registry by the volume's ID. (Dynamic volumes that span multiple disks also contain a drive letter hint.) For volume devices, the kernel also creates a GUID name that's always available and allows mounting a volume in a directory using an NTFS reparse point (e.g. see the output of mountvol.exe). You can also use GUID volume names in the Windows API. For example:

    >>> path = r'\\?\Volume{1693b540-0000-0000-0000-612e00000000}\Windows'
    >>> files = os.listdir(path)
    >>> [x for x in files if x[:2] == 'py']
    ['py.exe', 'pyw.exe']

But normally you'd just mount the volume, which can even be recursively mounted within itself. For example:

    >>> os.mkdir('C:\\SystemVolume')
    >>> subprocess.call(r'mountvol C:\SystemVolume \\?\Volume{1693b540-0000-0000-0000-612e00000000}')
    0
    >>> files = os.listdir(r'C:\SystemVolume\Windows')
    >>> [x for x in files if x[:2] == 'py']
    ['py.exe', 'pyw.exe']
History
Date User Action Args
2022-04-11 14:58:33adminsetgithub: 71590
2016-06-28 00:36:05eryksunsetmessages: + msg269412
2016-06-27 22:21:52Dustin.Opreasetmessages: + msg269410
2016-06-27 22:15:33eryksunsetmessages: + msg269408
2016-06-27 22:14:31eryksunsetmessages: - msg269406
2016-06-27 22:08:52eryksunsetstatus: open -> closed

nosy: + eryksun
messages: + msg269406

resolution: not a bug
stage: resolved
2016-06-27 21:03:41Dustin.Opreacreate