This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib.parse.urlsplit parses schemes that do not begin with letters
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7, Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: sgg
Priority: normal Keywords: patch

Created on 2020-04-27 19:59 by sgg, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 19741 open sgg, 2020-04-27 20:21
Messages (1)
msg367452 - (view) Author: Samani Gikandi (sgg) * Date: 2020-04-27 19:59
RFC 3986 (STD66) says that a URL scheme should begin with an "letter", however urllib.parse.urlsplit (and urlparse) parse strings that don't adhere to this as valid schemes.

Example from Python3.8 using "+git+ssh://git@github.com/user/project.git":

>>> from urllib.parse import urlsplit, urlparse
>>> urlparse("+git+ssh://git@github.com/user/project.git")
ParseResult(scheme='+git+ssh', netloc='git@github.com', path='/user/project.git', params='', query='', fragment='')
>>> urlsplit("+git+ssh://git@github.com/user/project.git")
SplitResult(scheme='+git+ssh', netloc='git@github.com', path='/user/project.git', query='', fragment='')

I double checked this behavior and number of other languages (Rust, Go, Javascript, Ruby) all complain if you try to use parse this URL

For reference, RFC3986 section 3.1 --

Scheme names consist of a sequence of characters beginning with a
   letter and followed by any combination of letters, digits, plus
   ("+"), period ("."), or hyphen ("-"). 

   [...]

   scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
History
Date User Action Args
2022-04-11 14:59:30adminsetgithub: 84589
2020-04-27 20:21:34sggsetkeywords: + patch
stage: patch review
pull_requests: + pull_request19063
2020-04-27 19:59:49sggcreate