This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urljoin should raise a TypeError if URL is not a string
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, iritkatriel, jacobtylerwalls, jaraco, orsenthil, serhiy.storchaka, vajrasky
Priority: normal Keywords: patch

Created on 2013-09-25 20:55 by jaraco, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
urljoin_throws_type_error.patch vajrasky, 2013-09-26 10:12 review
urljoin_throws_type_error_v2.patch vajrasky, 2013-09-28 09:02 review
urljoin_throws_type_error_v3.patch vajrasky, 2013-10-01 14:11 review
Pull Requests
URL Status Linked Edit
PR 26687 open jacobtylerwalls, 2021-06-12 01:47
Messages (12)
msg198418 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2013-09-25 20:55
>>> import urllib.parse
>>> urllib.parse.urljoin('foo', [])
'foo'

That should raise a TypeError, not succeed quietly as if an empty string were passed.
msg198435 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-09-26 10:12
This is the preliminary patch to raise TypeError in that situation.
msg198455 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2013-09-26 22:59
Thanks Vajraski for the patch (especially the tests).

A colleague reminded me of an aphorism by Raymond Hettinger from the recent PyCon paraphrased: duck typing is superior to isinstance.

Maybe instead consider something like this:

diff -r 7f13d5ecf71f Lib/urllib/parse.py
--- a/Lib/urllib/parse.py	Sun Sep 22 09:33:45 2013 -0400
+++ b/Lib/urllib/parse.py	Thu Sep 26 18:52:18 2013 -0400
@@ -405,10 +405,9 @@
 def urljoin(base, url, allow_fragments=True):
     """Join a base URL and a possibly relative URL to form an absolute
     interpretation of the latter."""
-    if not base:
-        return url
-    if not url:
-        return base
+    if not base or not url:
+        # one of the inputs is empty, so simply concatenate
+        return base + url
     base, url, _coerce_result = _coerce_args(base, url)
     bscheme, bnetloc, bpath, bparams, bquery, bfragment = \
             urlparse(base, '', allow_fragments)

This implementation is not only shorter and raises TypeErrors if the types aren't compatible, but those errors are triggered by the underlying implementation. Furthermore, if either the url or base were to be a duck-type that met the necessary interface, it need not be a string subclass.

I don't feel strongly either way, but I'm slightly more inclined to accept the simpler implementation. I'm interested in the everyone's thoughts on either approach.
msg198462 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-09-27 05:21
What about this one?

>>> urljoin(['a'], [])
['a']
>>> urljoin(['a'], ['b'])
..... omitted ......
AttributeError: 'list' object has no attribute 'decode'

Is this desirable? Both patches missed this case. Should we only accept str and bytes?
msg198493 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2013-09-27 17:40
I don't mind the AttributeError - that's a reasonable exception when passing invalid types in, and that's in fact the current behavior. The example of (['a'], []) does bother me though. Those inputs are also seemingly invalid, though somewhat more compatible from a 'urljoin' perspective than two different types.

You've given me reason to re-consider your patch.
msg198510 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-09-28 09:02
I modified the patch to handle the last case using your way as well.

Anyway, I found out that urlsplit and urlparse got the same issue as well.

>>> urlparse('python.org', b'http://')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sky/Code/python/programming_language/cpython/Lib/urllib/parse.py", line 292, in urlparse
    url, scheme, _coerce_result = _coerce_args(url, scheme)
  File "/home/sky/Code/python/programming_language/cpython/Lib/urllib/parse.py", line 109, in _coerce_args
    raise TypeError("Cannot mix str and non-str arguments")
TypeError: Cannot mix str and non-str arguments
>>> urlparse('python.org', b'')
ParseResult(scheme=b'', netloc='', path='python.org', params='', query='', fragment='')
>>> urlparse('python.org', 0)
ParseResult(scheme=0, netloc='', path='python.org', params='', query='', fragment='')

Same thing happens in urlsplit. Fortunately, urlunsplit and urlunparse don't have this issue.
msg198750 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013-10-01 05:07
I have provided my comments in the review tool. Please check them out.
msg198784 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-10-01 14:11
Okay, attached the patch based on your comment, Senthil Kumaran. Thanks.

The reason I use isinstance to check the type is because I want to support the inheritance as much as possible.

>>> class new_str(str):
...   pass
>>> urljoin(new_str('http://python.org') + new_str('hehe'))
will not work with the newest patch.

Also, in Lib/urllib/parse.py, most of the time, we use isinstance(x, str) not type(x) == str.

But I don't know. Maybe it does not matter.
msg198808 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2013-10-01 21:14
Vajrasky, you're right. Comparing against type(obj) is an anti-pattern. isinstance is better. Duck typing is even better (in many cases).
msg230742 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-06 13:50
About acceptable behavior with wrong arguments types see discussions in issue22766 and http://comments.gmane.org/gmane.comp.python.devel/150073 .
msg386875 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-02-12 20:01
Still broken in 3.10:

Running Release|x64 interpreter...
Python 3.10.0a5+ (heads/master:bf2e7e55d7, Feb 11 2021, 23:09:25) [MSC v.1928 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.parse
>>> urllib.parse.urljoin('foo', [])
'foo'
>>>
msg395598 - (view) Author: Jacob Walls (jacobtylerwalls) * Date: 2021-06-11 03:57
Hi vajrasky, do you have any interest in converting your patch to a GitHub PR? If not I can see about doing so myself. Cheers.
History
Date User Action Args
2022-04-11 14:57:51adminsetgithub: 63293
2021-06-12 01:47:56jacobtylerwallssetpull_requests: + pull_request25276
2021-06-11 10:49:11iritkatrielsetversions: + Python 3.11, - Python 3.8
2021-06-11 03:57:18jacobtylerwallssetnosy: + jacobtylerwalls
messages: + msg395598
2021-02-12 20:01:14iritkatrielsetnosy: + iritkatriel

messages: + msg386875
versions: + Python 3.8, Python 3.9, Python 3.10, - Python 2.7, Python 3.4, Python 3.5
2014-11-06 13:50:51serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg230742
versions: + Python 3.5, - Python 3.3
2013-10-01 21:14:06jaracosetmessages: + msg198808
2013-10-01 14:11:39vajraskysetfiles: + urljoin_throws_type_error_v3.patch

messages: + msg198784
2013-10-01 05:07:45orsenthilsetmessages: + msg198750
2013-09-28 21:28:10terry.reedysetnosy: + orsenthil

type: behavior
stage: patch review
2013-09-28 09:02:46vajraskysetfiles: + urljoin_throws_type_error_v2.patch

messages: + msg198510
2013-09-27 17:40:02jaracosetmessages: + msg198493
2013-09-27 05:21:49vajraskysetmessages: + msg198462
2013-09-26 22:59:51jaracosetmessages: + msg198455
2013-09-26 10:12:03vajraskysetfiles: + urljoin_throws_type_error.patch

nosy: + vajrasky
messages: + msg198435

keywords: + patch
2013-09-26 08:36:24berker.peksagsetnosy: + berker.peksag
2013-09-25 20:55:41jaracocreate