Message 296442 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	Nam.Nguyen, martin.panter, serhiy.storchaka, vstinner
Date	2017-06-20.14:39:54
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1497969594.32.0.527271568779.issue30500@psf.upfronthosting.co.za>
In-reply-to

Content
I tested my system python2 (Python 2.7.13 on Fedora 25): haypo@selma$ python2 Python 2.7.13 (default, May 10 2017, 20:04:28) >>> urllib.splithost('//hostname/url') ('hostname', '/url') >>> urllib.splithost('//host\nname/url') # newline in hostname, accepted ('host\nname', '/url') >>> print(urllib.splithost('//host\nname/url')[0]) # newline in hostname, accepted host name >>> urllib.splithost('//hostname/ur\nl') # newline in URL, rejected (None, '//hostname/ur\nl') => Newline is accepted in the hostname, but not in the URL path. With my change (adding DOTALL), newlines are accepted in the hostname and in the URL: haypo@selma$ ./python Python 2.7.13+ (heads/2.7:b39a748, Jun 19 2017, 18:07:19) >>> import urllib >>> urllib.splithost("//hostname/url") ('hostname', '/url') >>> urllib.splithost("//host\nname/url") # newline in hostname, accepted ('host\nname', '/url') >>> urllib.splithost("//hostname/ur\nl") # newline in URL, accepted ('hostname', '/ur\nl') More generally, it seems like the urllib module doesn't try to validate URL content. It just try to "split" them. Who is responsible to validate URLs? Is it the responsability of the application developer to implement a whitelist?

I tested my system python2 (Python 2.7.13 on Fedora 25):

haypo@selma$ python2
Python 2.7.13 (default, May 10 2017, 20:04:28) 
>>> urllib.splithost('//hostname/url')
('hostname', '/url')
>>> urllib.splithost('//host\nname/url')  # newline in hostname, accepted
('host\nname', '/url')
>>> print(urllib.splithost('//host\nname/url')[0])  # newline in hostname, accepted
host
name
>>> urllib.splithost('//hostname/ur\nl')  # newline in URL, rejected
(None, '//hostname/ur\nl')

=> Newline is accepted in the hostname, but not in the URL path.


With my change (adding DOTALL), newlines are accepted in the hostname and in the URL:

haypo@selma$ ./python
Python 2.7.13+ (heads/2.7:b39a748, Jun 19 2017, 18:07:19) 
>>> import urllib
>>> urllib.splithost("//hostname/url")
('hostname', '/url')
>>> urllib.splithost("//host\nname/url")  # newline in hostname, accepted
('host\nname', '/url')
>>> urllib.splithost("//hostname/ur\nl")  # newline in URL, accepted
('hostname', '/ur\nl')


More generally, it seems like the urllib module doesn't try to validate URL content. It just try to "split" them.

Who is responsible to validate URLs? Is it the responsability of the application developer to implement a whitelist?

History
Date	User	Action	Args
2017-06-20 14:39:54	vstinner	set	recipients: + vstinner, martin.panter, Nam.Nguyen, serhiy.storchaka
2017-06-20 14:39:54	vstinner	set	messageid: <1497969594.32.0.527271568779.issue30500@psf.upfronthosting.co.za>
2017-06-20 14:39:54	vstinner	link	issue30500 messages
2017-06-20 14:39:54	vstinner	create