This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: pathlib won't strip "\n" in path
Type: behavior Stage: resolved
Components: Library (Lib), Windows Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, paul.moore, steve.dower, tim.golden, zach.ware, 徐彻
Priority: normal Keywords:

Created on 2020-02-01 03:45 by 徐彻, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg361149 - (view) Author: 徐彻 (徐彻) Date: 2020-02-01 03:45
Pathlib won't strip "\n" in path. Of course, "\n" should exist in a legal path.
For example:

>>>a=pathlib.Path(pathlib.Path("C:/Program Files/\n"),"./JetBrains/\n")
>>>a
WindowsPath('C:/Program Files/\n/JetBrains/\n')
msg361180 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-02-01 17:47
A Windows path reserves the following characters:

* null, as the string terminator
* slash and backslash, as path separators
* colon as the second character in the first component of
  a non-UNC path, since it's a drive path

Additionally, a normalized path reserves trailing dots and spaces on names, since they get stripped from the final component (e.g. "C:\Temp\spam. . ." -> "C:\Temp\spam"). WindowsPath could automatically strip trailing dots and space from normalized paths. This would need to exclude extended paths that begin with the "\\?\" prefix.

Otherwise the set of reserved characters is a function of device and filesystem namespaces, regardless of the recommendations in "Naming Files, Paths, and Namespaces" [1], which are meant to constrain applications to what is generally allowed. I would prefer for WindowsPath to remain generic enough to support all device and filesystem namespaces. 

For example, the VirtualBox shared-folder filesystem (a mini-redirector to the host system) allows colon, pipe, and control characters in file and directory names:

    >>> control = '\a\b\t\n\v\f\r'
    >>> special = ':|'
    >>> dirname = f'//vboxsvr/work/nametest/{control}{special}'
    >>> os.makedirs(dirname, exist_ok=True)
    >>> os.listdir('//vboxsvr/work/nametest')[0]
    '\x07\x08\t\n\x0b\x0c\r:|'

Like most filesystems, it reserves the 5 wildcard characters in base filenames, which includes '*', '?', '<' (DOS_STAR), '>' (DOS_QM), and '"' (DOS_DOT). A filesystem that fails to reserve these wildcard characters cannot properly support WINAPI FindFirstFile[Ex]. The only filesystem I can think of that allows wildcard characters in base names is the named-pipe filesystem. NPFS actually allows any character in a pipe name -- even slash and backslash since it only supports a single directory, the root directory "//./PIPE/".

That said, a path may specify a stream name instead of a base filename. As is documented in [1], and NTFS stream name reserves colon as a delimiter, i.e. "filename:streamname:streamtype", and stream names can include wildcards, pipe, and control characters. For example:

    >>> control = '\a\b\t\n\v\f\r'
    >>> special = '*?<>"|'
    >>> dirname = 'C:\\Temp\\nametest'
    >>> filename = f'{dirname}\\spam'
    >>> streamname = f'{filename}:{control}{special}'
    >>> os.makedirs(dirname, exist_ok=True)
    >>> streamname
    'C:\\Temp\\nametest\\spam:\x07\x08\t\n\x0b\x0c\r*?<>"|'
    >>> open(streamname, 'w').close()

We can use PowerShell (pwsh) to verify the existence of the stream:

    >>> cmd = f'pwsh -c (gi "{filename}" -stream *)[1].Stream'
    >>> subprocess.check_output(cmd, text=True).rstrip()
    '\x07\x08\t\n\x0b\x0c\n*?<>"|'

In terms of device namespaces, a device that is not mounted by a filesystem can implement practically whatever namespace it wants. But considering "//./" device paths are normalized Windows paths, device namespaces should reserve slash, since the system translates slash to backslash.

[1] https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file
msg361264 - (view) Author: 徐彻 (徐彻) Date: 2020-02-03 04:09
Thank you for your explanation.
History
Date User Action Args
2022-04-11 14:59:26adminsetgithub: 83696
2020-02-03 04:09:34徐彻setstatus: open -> closed
resolution: not a bug
messages: + msg361264

stage: resolved
2020-02-01 17:53:52eryksunsetnosy: + paul.moore, tim.golden, zach.ware, steve.dower

components: + Windows
versions: + Python 3.9
2020-02-01 17:47:50eryksunsetnosy: + eryksun
messages: + msg361180
2020-02-01 03:45:08徐彻create