Issue 19094: urljoin should raise a TypeError if URL is not a string

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/63293

classification

Title:	urljoin should raise a TypeError if URL is not a string
Type:	behavior	Stage:	patch review
Components:	Library (Lib)	Versions:	Python 3.11, Python 3.10, Python 3.9

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	berker.peksag, iritkatriel, jacobtylerwalls, jaraco, orsenthil, serhiy.storchaka, vajrasky
Priority:	normal	Keywords:	patch

Created on 2013-09-25 20:55 by jaraco, last changed 2022-04-11 14:57 by admin.

Files
File name	Uploaded	Description	Edit
urljoin_throws_type_error.patch	vajrasky, 2013-09-26 10:12		review
urljoin_throws_type_error_v2.patch	vajrasky, 2013-09-28 09:02		review
urljoin_throws_type_error_v3.patch	vajrasky, 2013-10-01 14:11		review

Pull Requests
URL	Status	Linked	Edit
PR 26687	open	jacobtylerwalls, 2021-06-12 01:47

Messages (12)
msg198418 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2013-09-25 20:55
>>> import urllib.parse >>> urllib.parse.urljoin('foo', []) 'foo' That should raise a TypeError, not succeed quietly as if an empty string were passed.
msg198435 - (view)	Author: Vajrasky Kok (vajrasky) *	Date: 2013-09-26 10:12
This is the preliminary patch to raise TypeError in that situation.
msg198455 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2013-09-26 22:59
Thanks Vajraski for the patch (especially the tests). A colleague reminded me of an aphorism by Raymond Hettinger from the recent PyCon paraphrased: duck typing is superior to isinstance. Maybe instead consider something like this: diff -r 7f13d5ecf71f Lib/urllib/parse.py --- a/Lib/urllib/parse.py Sun Sep 22 09:33:45 2013 -0400 +++ b/Lib/urllib/parse.py Thu Sep 26 18:52:18 2013 -0400 @@ -405,10 +405,9 @@ def urljoin(base, url, allow_fragments=True): """Join a base URL and a possibly relative URL to form an absolute interpretation of the latter.""" - if not base: - return url - if not url: - return base + if not base or not url: + # one of the inputs is empty, so simply concatenate + return base + url base, url, _coerce_result = _coerce_args(base, url) bscheme, bnetloc, bpath, bparams, bquery, bfragment = \ urlparse(base, '', allow_fragments) This implementation is not only shorter and raises TypeErrors if the types aren't compatible, but those errors are triggered by the underlying implementation. Furthermore, if either the url or base were to be a duck-type that met the necessary interface, it need not be a string subclass. I don't feel strongly either way, but I'm slightly more inclined to accept the simpler implementation. I'm interested in the everyone's thoughts on either approach.
msg198462 - (view)	Author: Vajrasky Kok (vajrasky) *	Date: 2013-09-27 05:21
What about this one? >>> urljoin(['a'], []) ['a'] >>> urljoin(['a'], ['b']) ..... omitted ...... AttributeError: 'list' object has no attribute 'decode' Is this desirable? Both patches missed this case. Should we only accept str and bytes?
msg198493 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2013-09-27 17:40
I don't mind the AttributeError - that's a reasonable exception when passing invalid types in, and that's in fact the current behavior. The example of (['a'], []) does bother me though. Those inputs are also seemingly invalid, though somewhat more compatible from a 'urljoin' perspective than two different types. You've given me reason to re-consider your patch.
msg198510 - (view)	Author: Vajrasky Kok (vajrasky) *	Date: 2013-09-28 09:02
I modified the patch to handle the last case using your way as well. Anyway, I found out that urlsplit and urlparse got the same issue as well. >>> urlparse('python.org', b'http://') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/sky/Code/python/programming_language/cpython/Lib/urllib/parse.py", line 292, in urlparse url, scheme, _coerce_result = _coerce_args(url, scheme) File "/home/sky/Code/python/programming_language/cpython/Lib/urllib/parse.py", line 109, in _coerce_args raise TypeError("Cannot mix str and non-str arguments") TypeError: Cannot mix str and non-str arguments >>> urlparse('python.org', b'') ParseResult(scheme=b'', netloc='', path='python.org', params='', query='', fragment='') >>> urlparse('python.org', 0) ParseResult(scheme=0, netloc='', path='python.org', params='', query='', fragment='') Same thing happens in urlsplit. Fortunately, urlunsplit and urlunparse don't have this issue.
msg198750 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2013-10-01 05:07
I have provided my comments in the review tool. Please check them out.
msg198784 - (view)	Author: Vajrasky Kok (vajrasky) *	Date: 2013-10-01 14:11
Okay, attached the patch based on your comment, Senthil Kumaran. Thanks. The reason I use isinstance to check the type is because I want to support the inheritance as much as possible. >>> class new_str(str): ... pass >>> urljoin(new_str('http://python.org') + new_str('hehe')) will not work with the newest patch. Also, in Lib/urllib/parse.py, most of the time, we use isinstance(x, str) not type(x) == str. But I don't know. Maybe it does not matter.
msg198808 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2013-10-01 21:14
Vajrasky, you're right. Comparing against type(obj) is an anti-pattern. isinstance is better. Duck typing is even better (in many cases).
msg230742 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-11-06 13:50
About acceptable behavior with wrong arguments types see discussions in issue22766 and http://comments.gmane.org/gmane.comp.python.devel/150073 .
msg386875 - (view)	Author: Irit Katriel (iritkatriel) *	Date: 2021-02-12 20:01
Still broken in 3.10: Running Release\|x64 interpreter... Python 3.10.0a5+ (heads/master:bf2e7e55d7, Feb 11 2021, 23:09:25) [MSC v.1928 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import urllib.parse >>> urllib.parse.urljoin('foo', []) 'foo' >>>
msg395598 - (view)	Author: Jacob Walls (jacobtylerwalls) *	Date: 2021-06-11 03:57
Hi vajrasky, do you have any interest in converting your patch to a GitHub PR? If not I can see about doing so myself. Cheers.

History
Date	User	Action	Args
2022-04-11 14:57:51	admin	set	github: 63293
2021-06-12 01:47:56	jacobtylerwalls	set	pull_requests: + pull_request25276
2021-06-11 10:49:11	iritkatriel	set	versions: + Python 3.11, - Python 3.8
2021-06-11 03:57:18	jacobtylerwalls	set	nosy: + jacobtylerwalls messages: + msg395598
2021-02-12 20:01:14	iritkatriel	set	nosy: + iritkatriel messages: + msg386875 versions: + Python 3.8, Python 3.9, Python 3.10, - Python 2.7, Python 3.4, Python 3.5
2014-11-06 13:50:51	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg230742 versions: + Python 3.5, - Python 3.3
2013-10-01 21:14:06	jaraco	set	messages: + msg198808
2013-10-01 14:11:39	vajrasky	set	files: + urljoin_throws_type_error_v3.patch messages: + msg198784
2013-10-01 05:07:45	orsenthil	set	messages: + msg198750
2013-09-28 21:28:10	terry.reedy	set	nosy: + orsenthil type: behavior stage: patch review
2013-09-28 09:02:46	vajrasky	set	files: + urljoin_throws_type_error_v2.patch messages: + msg198510
2013-09-27 17:40:02	jaraco	set	messages: + msg198493
2013-09-27 05:21:49	vajrasky	set	messages: + msg198462
2013-09-26 22:59:51	jaraco	set	messages: + msg198455
2013-09-26 10:12:03	vajrasky	set	files: + urljoin_throws_type_error.patch nosy: + vajrasky messages: + msg198435 keywords: + patch
2013-09-26 08:36:24	berker.peksag	set	nosy: + berker.peksag
2013-09-25 20:55:41	jaraco	create