Author hokousya
Recipients ezio.melotti, hokousya, vstinner
Date 2019-04-27.12:30:16
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1556368216.92.0.317568776121.issue36742@roundup.psfhosted.org>
In-reply-to
Content
urllib.parse.urlsplit raises an exception for an url including a non-ascii hostname in NFKD form and a port number.

example:
>>> urlsplit('http://\u30d5\u309a:80')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ito/.maltybrew/deen/lib/python3.7/urllib/parse.py", line 437, in urlsplit
    _checknetloc(netloc)
  File "/Users/ito/.maltybrew/deen/lib/python3.7/urllib/parse.py", line 407, in _checknetloc
    "characters under NFKC normalization")
ValueError: netloc 'プ:80' contains invalid characters under NFKC normalization
>>> urlsplit('http://\u30d5\u309a')
SplitResult(scheme='http', netloc='プ', path='', query='', fragment='')
>>> urlsplit(unicodedata.normalize('NFKC', 'http://\u30d5\u309a:80'))
SplitResult(scheme='http', netloc='プ:80', path='', query='', fragment='')

I believe this behavior was introduced at Python 3.7.3. Python 3.7.2 doesn't raise any exception for these lines.
History
Date User Action Args
2019-04-27 12:30:16hokousyasetrecipients: + hokousya, vstinner, ezio.melotti
2019-04-27 12:30:16hokousyasetmessageid: <1556368216.92.0.317568776121.issue36742@roundup.psfhosted.org>
2019-04-27 12:30:16hokousyalinkissue36742 messages
2019-04-27 12:30:16hokousyacreate