classification
Title: urllib's request.pathname2url not compatible with extended-length Windows file paths
Type: behavior Stage: resolved
Components: Windows Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: steve.dower Nosy List: eryksun, levineds, miss-islington, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2021-03-23 22:56 by levineds, last changed 2021-04-23 18:21 by steve.dower. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 25539 merged steve.dower, 2021-04-22 21:40
PR 25558 merged miss-islington, 2021-04-23 17:02
PR 25559 merged miss-islington, 2021-04-23 17:03
Messages (8)
msg389415 - (view) Author: D Levine (levineds) Date: 2021-03-23 22:56
Windows file paths are limited to 256 characters, and one of Windows's prescribed methods to address this is to prepend "\\?\" before a Windows absolute path (see: https://docs.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation)

urllib.request.pathname2url raises an error on such paths as this function calls nturl2path.py's pathname2url function which explicitly checks that the number of characters before the ":" in a Windows path is precisely one, which is, of course, not the case if you are using an extended-length path (e.g. "\\?\C:\Python39").

As a result, urllib cannot handle pathname2url conversion for some valid Windows paths.
msg389430 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-03-24 02:54
RFC8089 doesn't specify "a mechanism for translating namespaced paths ["\\?\" and "\\.\"] to or from file URIs", and the Windows shell doesn't support them. So what's the practical benefit of supporting them in nturl2path?

> Windows file paths are limited to 256 characters,

Classically, normal filepaths are limited to MAX_PATH - 1 (259) characters, in most cases, except for a few cases in which the limit is even smaller. 

For a normal filepath, the API replaces slashes with backlashes; resolves relative paths; resolves "." and ".." components; strips trailing dots and spaces from the final path component; and, for relative paths and DOS drive-letter paths, reserves DOS device names in the final path component (e.g. CON, NUL). 

The kernel supports filepaths with up to 32,767 characters, but classically this was only accessible by using a verbatim \\?\ filepath, or by using workarounds based on substitute drives or filesystem mountpoints and symlinks.

With Python 3.6+ in Windows 10, if long paths are enabled in the system, normal filepaths support up to the full 32,767 characters in most cases. The need for the \\?\ prefix is thus limited to the rare case when a verbatim path is required, or when a filepath has to be passed to a legacy application that doesn't support long paths.
msg389432 - (view) Author: D Levine (levineds) Date: 2021-03-24 03:12
I really meant 255 characters not 256 because I was leaving three for "<drive name>:/". I suppose the most reasonable behavior is to strip out the "\\?\" before attempting the conversion as the path is sensible and parsable without that, as opposed to the current behavior which is to  crash. The practical benefit is to permit the function to work on a wider range of inputs than currently is possible for essentially no cost.
msg389433 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-03-24 03:26
> I suppose the most reasonable behavior is to strip out the "\\?\" before 
> attempting the conversion as the path is sensible and parsable without 

Okay, so you're not looking to preserve the fact that it's a \\?\ verbatim path in the URI. You just want to automatically convert from verbatim \\?\X: or \\?\UNC\server\share to normal form. Devices other than drive letters and "UNC" wouldn't be supported.
msg389434 - (view) Author: D Levine (levineds) Date: 2021-03-24 03:30
I think that would make the most sense, yes.
msg391708 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-04-23 17:02
New changeset 3513d55a617012002c3f82dbf3cec7ec1abd7090 by Steve Dower in branch 'master':
bpo-43607: Fix urllib handling of Windows paths with \\?\ prefix (GH-25539)
https://github.com/python/cpython/commit/3513d55a617012002c3f82dbf3cec7ec1abd7090
msg391714 - (view) Author: miss-islington (miss-islington) Date: 2021-04-23 17:28
New changeset 04bcfe001cdf6290cb78fa4884002e5301e14c93 by Miss Islington (bot) in branch '3.9':
bpo-43607: Fix urllib handling of Windows paths with \\?\ prefix (GH-25539)
https://github.com/python/cpython/commit/04bcfe001cdf6290cb78fa4884002e5301e14c93
msg391722 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-04-23 18:21
New changeset e92d1106291e5a7d4970372478f2882056b7eb3a by Miss Islington (bot) in branch '3.8':
bpo-43607: Fix urllib handling of Windows paths with \\?\ prefix (GH-25539)
https://github.com/python/cpython/commit/e92d1106291e5a7d4970372478f2882056b7eb3a
History
Date User Action Args
2021-04-23 18:21:54steve.dowersetmessages: + msg391722
2021-04-23 17:28:09miss-islingtonsetmessages: + msg391714
2021-04-23 17:04:46steve.dowersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-04-23 17:03:06miss-islingtonsetpull_requests: + pull_request24278
2021-04-23 17:02:59miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request24277
2021-04-23 17:02:53steve.dowersetmessages: + msg391708
2021-04-22 21:43:33steve.dowersetassignee: steve.dower
versions: + Python 3.8, Python 3.10
2021-04-22 21:40:51steve.dowersetkeywords: + patch
stage: patch review
pull_requests: + pull_request24259
2021-03-24 03:30:39levinedssetmessages: + msg389434
2021-03-24 03:26:21eryksunsetmessages: + msg389433
2021-03-24 03:12:58levinedssetmessages: + msg389432
2021-03-24 02:54:50eryksunsetnosy: + eryksun
messages: + msg389430
2021-03-23 22:57:28levinedssettype: crash -> behavior
2021-03-23 22:56:49levinedscreate