classification
Title: urllib.parse doesn't fully comply to RFC 3986
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Howie Benefiel, Ivan.Pozdeev, The Compiler, orsenthil, xtreak
Priority: normal Keywords:

Created on 2018-08-08 16:02 by The Compiler, last changed 2018-08-10 09:28 by Ivan.Pozdeev.

Messages (2)
msg323292 - (view) Author: Florian Bruhin (The Compiler) Date: 2018-08-08 16:02
Since bpo-29651, the urllib.parse docs say:

> Unmatched square brackets in the netloc attribute will raise a ValueError.

However, when there are at least one [ and ], but they don't match, there's somewhat inconsistent behavior:

>>> urllib.parse.urlparse('http://[::1]]').hostname
'::1'
>>> urllib.parse.urlparse('http://[[::1]').hostname
'[::1'
msg323362 - (view) Author: Ivan Pozdeev (Ivan.Pozdeev) * Date: 2018-08-10 09:28
I confirm violation of https://tools.ietf.org/html/rfc3986#section-3.2.2 . 

URLs are now covered by RFC 3986 which obsoletes RFC 1808 that `urllib's documentation refers to.

This new URL RFC adds [] to 'reserved' characters, so them being present unquoted anywhere where reserved characters are not allowed shall be a parsing error.
History
Date User Action Args
2018-08-10 09:28:00Ivan.Pozdeevsetversions: + Python 3.6, Python 3.7, Python 3.8
nosy: + Ivan.Pozdeev
title: urllib.parse doesn't fail with multiple unmatching square brackets -> urllib.parse doesn't fully comply to RFC 3986
messages: + msg323362

2018-08-09 07:45:55xtreaksetnosy: + xtreak
2018-08-08 16:02:35The Compilercreate