This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urlunparse(urlparse('x://')) now returns 'x:' instead of 'x://'
Type: behavior Stage: test needed
Components: Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Michael Glassford, eric.araujo, orsenthil, r.david.murray
Priority: normal Keywords: easy

Created on 2010-04-07 21:42 by Michael Glassford, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (9)
msg102569 - (view) Author: Michael Glassford (Michael Glassford) Date: 2010-04-07 21:42
An unfortunate side-effect of this change: 

http://svn.python.org/view/python/branches/release26-maint/Lib/urlparse.py?r1=66717&r2=78235

which was made to fix this issue: 

http://bugs.python.org/issue7904

is that urlparse.urlunparse(urlparse.urlparse('x://')) now returns 'x:' instead of 'x://', and urlparse.urlunparse(urlparse.urlparse('x:///y')) now returns 'x:/y' instead of 'x:///y'.

This behavior exists in at least Python 2.6 and 3.1, but not in 2.5.
msg102580 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-04-08 00:19
To fix this, urlparse would have to differentiate between a null netloc and no netloc characters specified at all.  This could be done by using None for one an '' for the other.  I'm not sure that behavior change could be backported to 2.6, though.

Did this issue actually cause a program failure for you in 2.6?  If so the original patch might wind up getting classed as a regression for the 2.6 line.
msg102589 - (view) Author: Michael Glassford (Michael Glassford) Date: 2010-04-08 02:35
It caused a minor issue with the Schemes extension for Mercurial: the output changed, which caused a unit test to fail. I'm pretty sure I have a Mercurial patch to fix that issue, however.
msg102737 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-04-09 18:00
Hello Michael, 

Looking a bit deeper into this issue, I don't see that 'x://' and 'x:///y' qualifies as valid URLS as per RFC 3986. (Well, urlparse has been not strictly conforming to it, but that is a different issue)

If you look at the section 3. it states the following for validity.

 hier-part   = "//" authority path-abempty
                  / path-absolute
                  / path-rootless
                  / path-empty
 
In those cases, I would assume that 'x://y', x:/y','x:/','/' as valid URLS, but not the two examples you mentioned.

For the issue7904, we had just gone by the definition of RFC to make that minor change and it has resulted in this issue. I looked at the code to see if this can be addressed, but I see that your examples did not fit in as valid urls.

Do you have any opinions on this?

We can just the test_urlparse.py a little like below, and you might fix the break your code.

     def test_unparse_parse(self):
-        for u in ['Python', './Python','x-newscheme://foo.com/stuff']:
+        for u in ['Python', './Python','x-newscheme://foo.com/stuff','x://y','x:/y','x:/','/',]:
             self.assertEqual(urlparse.urlunsplit(urlparse.urlsplit(u)), u)
             self.assertEqual(urlparse.urlunparse(urlparse.urlparse(u)), u)
msg102922 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-04-12 07:08
Added additional examples of valid urls in r79988 and branches under the roundtrip test cases.

Michael, RDM: If you have any comments on msg102737, let me know. Otherwise we can close this issue as wont-fix.
msg102932 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-04-12 09:28
By the way, I’ve been meaning to file a bug against the Mercurial schemes extension since I saw how it misused URIs (i.e. “py://trunk/” should be “py:trunk/”); I hope it’s not too late now that this extension is shipped with the program.

Regards
msg102939 - (view) Author: Michael Glassford (Michael Glassford) Date: 2010-04-12 11:50
> In those cases, I would assume that 'x://y', x:/y','x:/','/' 
> as valid URLS, but not the two examples you mentioned.

Only 2 comments about this:
1) Although the urlparse documentation does mention the relevant RFCs, on a quick read-through I don't see that it actually requires its input to be a valid URL.
2) Obviously, some code is using it for invalid URLs.

> Michael, RDM: If you have any comments on msg102737, 
> let me know. Otherwise we can close this issue as wont-fix.

I don't have a problem with this. I reported the issue to bring attention to the fact that the change affected real code and to see if anyone had suggestions for dealing with the problem in a better way (at the moment, I don't).

> By the way, I’ve been meaning to file a bug against the 
> Mercurial schemes extension...

Please do. I don't have anything to do with the Schemes extension except that I found this issue while investigating a unit test failure, but if you file an issue probably I will look into it--or if I don't, then someone else probably will.
msg102940 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-04-12 12:25
On Mon, Apr 12, 2010 at 11:50:57AM +0000, Michael Glassford wrote:
> 1) Although the urlparse documentation does mention the relevant RFCs, on a quick read-through I don't see that it actually requires its input to be a valid URL.
> 2) Obviously, some code is using it for invalid URLs.

Which is true, because in some cases like urlparse.urljoin(base, rel),
the rel is a relative url it undergoes the same parsing mechanism
applicable to any url and thus there is no way a 'rigorous' check for
authorized url is happening. 

Checks are done for valid chars in scheme and other parsing
behaviours.

> I don't have a problem with this. I reported the issue to bring attention to the fact that the change affected real code and to see if anyone had suggestions for dealing with the problem in a better way (at the moment, I don't).
>

Yeah, I understand the situation with 'changes which break the
existing code', which we try to avoid many times. I shall if this can
be addressed without reverting any recent changes and still being
complaint.
msg103141 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-04-14 19:09
It looks like any change for this specific case would make tests like handling anyscheme fail.  I am less inclined to do that as opposed to handling this case for invalid url,

I suggest this be handled at user end to fix the proper parsing and constructing correct url. I am closing this as wont-fix.
History
Date User Action Args
2022-04-11 14:56:59adminsetgithub: 52586
2010-04-14 19:09:23orsenthilsetstatus: open -> closed
resolution: wont fix
messages: + msg103141
2010-04-12 12:25:17orsenthilsetmessages: + msg102940
2010-04-12 11:50:55Michael Glassfordsetmessages: + msg102939
2010-04-12 09:28:55eric.araujosetnosy: + eric.araujo
messages: + msg102932
2010-04-12 07:08:08orsenthilsetmessages: + msg102922
2010-04-09 18:00:03orsenthilsetassignee: orsenthil
messages: + msg102737
2010-04-08 02:35:32Michael Glassfordsetmessages: + msg102589
2010-04-08 00:19:46r.david.murraysetpriority: normal

versions: + Python 2.7, Python 3.2
keywords: + easy
nosy: + r.david.murray, orsenthil

messages: + msg102580
stage: test needed
2010-04-07 21:42:04Michael Glassfordcreate