msg99181 - (view) |
Author: mARK (mbloore) |
Date: 2010-02-10 23:24 |
urlparse.urlsplit('s3://example/files/photos/161565.jpg')
returns
('s3', '', '//example/files/photos/161565.jpg', '', '')
instead of
('s3', 'example', '/files/photos/161565.jpg', '', '')
according to rfc 3986 's3' is a valid scheme name, so the '://' indicates a URL with netloc and path parts.
|
msg99183 - (view) |
Author: Ezio Melotti (ezio.melotti) * |
Date: 2010-02-10 23:28 |
Thanks for the report, could you provide a patch with unit tests?
|
msg99196 - (view) |
Author: Senthil Kumaran (orsenthil) * |
Date: 2010-02-11 03:48 |
Does s3 stand for the amazon s3 services? urlparse does not have it under its list of known schemes yet. Does s3 have any specifications as such or is aligned towards any of the known schemes (like http or ftp)?
s3 is valid scheme name according to rfc 3986, but urlparse module does not seem to recognize it. If we say, s3 to be much similar to http, then it can be added to list of known schemes. Does Amazon say anything about it?
|
msg99198 - (view) |
Author: mARK (mbloore) |
Date: 2010-02-11 04:53 |
it's not actually necessary to have a list of known schemes. any url that has a double slash after the colon is expected to follow that with an authority section (what urlparse calls "netloc"), optionally followed by a path, which starts with a slash.
there are various defined schemes with their own syntax within the URL framework, but one is free to invent new ones with the general form
scheme://netloc/path
|
msg99229 - (view) |
Author: mARK (mbloore) |
Date: 2010-02-11 18:20 |
i have attached an svn diff of my (very simple!) fix and added unit test for python 2.7.
|
msg99256 - (view) |
Author: Senthil Kumaran (orsenthil) * |
Date: 2010-02-12 02:58 |
Hello Mark,
Thanks for the patch.
However there are reasons why the check is:
"if scheme in uses_netloc and url[:2] == '//':"
It cannot be replaced by just url[:2] == '//' as in your patch.
Different protocols have different parsing requirements. (for e.g. some wish to consider (or act as if), after the scheme, the rest is their path)
The better way is to add 's3' to uses_netloc list and it should be fine too. I shall add it and include your tests. Thanks.
|
msg99265 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2010-02-12 13:41 |
I think Mark is correct. RFC 3986 says:
When authority is present, the path must either be empty or begin with a slash ("/") character. When authority is not present, the path cannot begin with two slash characters ("//").
I think it would make sense to have urlparse fall back to doing a generic RFC 3986 parse when it does not recognize the scheme.
|
msg99290 - (view) |
Author: mARK (mbloore) |
Date: 2010-02-12 21:12 |
The case which prompted this issue was a purely private set of URLs, sent to me by a client but never sent to Amazon or anywhere else outside our systems (though I'm sure many others have invented this particular scheme for their own use). It would have been convenient if urlparse had handled it properly. That is true for any scheme one may invent at need.
On second thought it does make sense to enforce the use of :// for the schemes in uses_netloc, but still not to ignore its meaning for other schemes. It also makes sense to add s3 to uses_netloc despite the fact that it is not (afaik) registered, since it is an obvious invention.
I'll make another patch, but I don't have time to do it just now.
|
msg99480 - (view) |
Author: mARK (mbloore) |
Date: 2010-02-17 21:09 |
Doing a fallback test for // would look like
if scheme in uses_netloc and url[:2] == '//' or url[:2] == '//':
but this is equivalent to
if url[:2] == '//':
i.e., an authority appears if and only if there is a // after the scheme.
This still allows a uses_netloc scheme to appear without //.
I have attached a patch against Python 2.7, svn revision 78212. It adds s3 to netloc.
|
msg99560 - (view) |
Author: Senthil Kumaran (orsenthil) * |
Date: 2010-02-19 07:47 |
Fixed in the r78234 and merged back to other branches.
I fell back to RFC's definition of scheme, as anything before the ://.
I did not see the need to add s3 specifically as a valid scheme type, because s3 itself is not registered a schemetype.
So, the fix should work for s3 and other undefined schemes as per RFC.
Thanks for the patch.
|
msg104261 - (view) |
Author: Tres Seaver (tseaver) * |
Date: 2010-04-26 17:38 |
The fix for this bug breaks any code which worked with non-standard
schemes in 2.6.4 (by working around the issue). This kind of backward
incompatibility should be called out prominently in NEWS.txt (assuming
that such a fix is considered appropriate in a third-dot release).
|
msg105078 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2010-05-05 19:14 |
I remember seeing a discussion on python-dev archives about that months or years ago. Someone pointed to Guido that the new RFC removed the need for uses_netloc thanks to the generic syntax. Isn’t there already a bug about that?
|
msg123300 - (view) |
Author: Fred Drake (fdrake) |
Date: 2010-12-03 22:33 |
Though msg104261 suggests this change be documented in NEWS.txt, it doesn't appear to have made it.
Sure enough, we just found application code that this broke.
|
msg123327 - (view) |
Author: Senthil Kumaran (orsenthil) * |
Date: 2010-12-04 10:02 |
On Fri, Dec 03, 2010 at 10:33:50PM +0000, Fred L. Drake, Jr. wrote:
> Though msg104261 suggests this change be documented in NEWS.txt, it
> doesn't appear to have made it.
Better late than never. I just added the NEWS in r87014 (py3k)
,r87015(release31-maint) ,r87016(release27-maint).
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:57 | admin | set | github: 52152 |
2010-12-04 10:02:42 | orsenthil | set | messages:
+ msg123327 |
2010-12-03 22:33:47 | fdrake | set | nosy:
+ fdrake messages:
+ msg123300
|
2010-05-05 19:14:32 | eric.araujo | set | nosy:
+ eric.araujo messages:
+ msg105078
|
2010-04-26 17:38:54 | tseaver | set | nosy:
+ tseaver messages:
+ msg104261
|
2010-02-19 07:47:30 | orsenthil | set | status: open -> closed resolution: fixed messages:
+ msg99560
|
2010-02-17 21:09:37 | mbloore | set | files:
+ fix7904-2.txt
messages:
+ msg99480 |
2010-02-12 21:13:58 | mbloore | set | nosy:
orsenthil, ezio.melotti, mbloore, r.david.murray components:
+ Library (Lib), - Extension Modules versions:
+ Python 3.1, Python 3.2 |
2010-02-12 21:12:06 | mbloore | set | nosy:
orsenthil, ezio.melotti, mbloore, r.david.murray messages:
+ msg99290 components:
+ Extension Modules, - Library (Lib) versions:
- Python 3.1, Python 3.2 |
2010-02-12 13:41:48 | r.david.murray | set | nosy:
+ r.david.murray
messages:
+ msg99265 versions:
+ Python 3.1, Python 3.2 |
2010-02-12 02:58:48 | orsenthil | set | nosy:
orsenthil, ezio.melotti, mbloore messages:
+ msg99256 components:
+ Library (Lib), - Extension Modules |
2010-02-11 18:20:37 | mbloore | set | files:
+ fix7904.txt
messages:
+ msg99229 title: urllib.urlparse mishandles novel schemes -> urlparse.urlsplit mishandles novel schemes |
2010-02-11 04:53:11 | mbloore | set | messages:
+ msg99198 |
2010-02-11 03:48:18 | orsenthil | set | assignee: orsenthil
messages:
+ msg99196 nosy:
+ orsenthil |
2010-02-10 23:28:06 | ezio.melotti | set | priority: normal versions:
+ Python 2.6, Python 2.7, - Python 2.5 nosy:
+ ezio.melotti
messages:
+ msg99183
stage: test needed |
2010-02-10 23:24:49 | mbloore | create | |