This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urlsplit can't round-trip relative-host urls.
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: urllib.parse wrongly strips empty #fragment, ?query, //netloc
View: 22852
Assigned To: orsenthil Nosy List: Buck.Golemon, ankitoshniwal, bukzor, ezio.melotti, martin.panter, orsenthil
Priority: normal Keywords:

Created on 2012-06-05 22:28 by Buck.Golemon, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
parse.py ankitoshniwal, 2012-06-07 23:18
Messages (7)
msg162378 - (view) Author: Buck Golemon (Buck.Golemon) Date: 2012-06-05 22:28
1) As long as x is valid, I expect that urlunsplit(urlsplit(x)) == x
2) yelp:///foo is a well-formed (albeit odd) url. It it similar to file:///tmp: it specifies the /foo resource, on the "current" host, using the yelp protocol (defined on mobile devices).

>>> from urlparse import urlsplit, urlunsplit
>>> urlunsplit(urlsplit('yelp:///foo'))
'yelp:/foo'

Urlparse / unparse has the same bug:

>>> urlunparse(urlparse('yelp:///foo'))
'yelp:/foo'

The file: protocol seems to be special-case, in an inappropriate manner:

>>> urlunsplit(urlsplit('file:///tmp'))
'file:///tmp'
msg162507 - (view) Author: Ankit Toshniwal (ankitoshniwal) Date: 2012-06-07 23:18
Hello,

Did some initial investigation, so looks like as per the code in parse.py, under the function urlunsplit, we take the 5-tuple returned by urlsplit . In the case of foo we get:
SplitResult(scheme='yelp', netloc='', path='/foo', query='', fragment='')

Now this tuple is passed to urlunsplit. We have a if statement under the urlunsplit function 

if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):

which checks if the netloc exists in the url (in our case it does not) then we check if the scheme in the url is part of the uses_netloc list (predefined list in parse.py with the list of common types of schemes used like http, ftp, file, rsync etc). In our case since yelp is not part of it we fail at the if statement and then we just return the url instead of modifying it. What we need was that if the above statement fails we do an else which does something like this:

    if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
        if url and url[:1] != '/':
          url = '/' + url
        url = '//' + (netloc or '') + url
    else:
        if url and url[:1] != '/':
          url = '/' + url
        url = '//' + (netloc or '') + url

In that case we get the right url back.

After changing the code here is what i get on local dev machines:
>>> urlunparse(urlparse('yelp:///foo'))
'yelp:///foo'
>>> urlunsplit(urlsplit('file:///tmp'))
'file:///tmp'
>>> urlunsplit(urlsplit('yelp:///foo'))
'yelp:///foo'

Thanks,
Ankit.

P.S : I am new to python trying to learn it and also work on small projects let me know what you think if this is the right approach.
msg162509 - (view) Author: Buck Golemon (Buck.Golemon) Date: 2012-06-07 23:55
Well i think the real issue is that you can't enumerate the protocals that "use netloc". All protocols are allowed to have a netloc. the smb: protocol certainly does, but it's not in the list.

The core issue is that smb:/foo and smb:///foo are different urls, and should be represented differently when split. The /// form has a netloc, it's just the empty-string. The single-slash form has no netloc, so I propose that urlsplit('smb:/foo') return SplitResult(scheme='smb', netloc=None, path='/foo', query='', fragment='')
msg164320 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-06-29 09:36
Let me address this one thing at a time, the point on smb really confused me and I got into thinking that how smb (being more common), the issue was not raised. Looks smb url will always start with smb:// (// are the requirements for identified netloc, empty or otherwise) and cases for smb are fine - http://tools.ietf.org/html/draft-crhertel-smb-url-00

That said, the dependency on uses_netloc has come many times and I am still looking for way to remove the dependency without affecting the previous parsing behaviors and ofcourse tests.
msg164322 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-06-29 11:07
Look at the following two bugs which dwelt on similar issues: Issue8339 and Issue7904 and in one message particular, msg102737, I seem to have come to a conclusion that " I don't see that 'x://' and 'x:///y' qualifies as valid URLS as per RFC 3986"   and it applies to this bug too where the url is requested as yelp:///x 

Does yelp://localhost/x be a way to access in your case? That would be consistent with specification. Or in your code, you can add 'yelp' to uses_netloc list and then expect the desired behavior.

from urlparse import uses_netloc
uses_netloc.append('yelp')

I understand that, using of the uses_netloc is a limitation, but given the requirements of both absolute and relative parsing that lists has served a useful behavior.

I would like to close this one for the above mention points and open a feature request (or convert this to a feature request) which asks to remove the dependency of uses_netloc in urlparse. Does this resolution sound okay?
msg164712 - (view) Author: Buck Evan (bukzor) * Date: 2012-07-06 02:44
Let's examine x://

absolute-URI  = scheme ":" hier-part [ "?" query ]
hier-part     = "//" authority path-abempty

So this is okay if authority and path-abempty can both be empty strings.

authority     = [ userinfo "@" ] host [ ":" port ]
host          = IP-literal / IPv4address / reg-name
reg-name      = *( unreserved / pct-encoded / sub-delims )
path-abempty  = *( "/" segment )

Yep.

And the same applies for x:///y, except that path-abempty matches /y
instead of nothing.

This means these are in fact valid urls per RFC3986, counter to your claim.
msg235585 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-09 04:13
Fixing Issue 22852 or Issue 5843 should help fixing this.
History
Date User Action Args
2022-04-11 14:57:31adminsetgithub: 59214
2015-05-31 04:28:09martin.pantersetstatus: open -> closed
superseder: urllib.parse wrongly strips empty #fragment, ?query, //netloc
resolution: duplicate
stage: needs patch -> resolved
2015-02-09 04:13:13martin.pantersetmessages: + msg235585
2013-11-24 03:28:14martin.pantersetnosy: + martin.panter
2012-07-06 02:44:36bukzorsetnosy: + bukzor
messages: + msg164712
2012-06-29 11:07:32orsenthilsetmessages: + msg164322
2012-06-29 09:36:43orsenthilsetmessages: + msg164320
2012-06-15 08:07:34ezio.melottisetnosy: + ezio.melotti
stage: needs patch
type: behavior

versions: + Python 3.3, - Python 2.6
2012-06-08 03:27:20orsenthilsetassignee: orsenthil

nosy: + orsenthil
2012-06-07 23:55:12Buck.Golemonsetmessages: + msg162509
2012-06-07 23:18:35ankitoshniwalsetfiles: + parse.py
nosy: + ankitoshniwal
messages: + msg162507

2012-06-05 22:28:27Buck.Golemoncreate