Issue 2583: urlparse normalize URL path

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/46835

classification

Title:	urlparse normalize URL path
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 2.5

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	facundobatista, monk.e.boy, orsenthil
Priority:	normal	Keywords:

Created on 2008-04-08 13:56 by monk.e.boy, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (4)
msg65162 - (view)	Author: monk.e.boy (monk.e.boy)	Date: 2008-04-08 13:56
Hi, This is my first problem with anything Python :-) and my first issue. Doing in the following: urlparse.urljoin( 'http://site.com/', '../../../../path/' ) 'http://site.com/../../../../path/' urlparse.urljoin( 'http://site.com/', '/path/../path/.././path/./' ) 'http://site.com/path/../path/.././path/./' These URLs are normalized to http://site.com/path/ in both Firefox and Google (the google spider would follow these OK) I think the documentation could be improved to point at the posixpath.py normpath function and how it solves the above. I blogged a how to: http://teethgrinder.co.uk/blog/Normalize-URL-path-python/ I hope my bug report is OK. Thanks for all the code :-) johng@neutralize.com
msg66890 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2008-05-16 03:48
Just try it this way. >>> print urlparse.urljoin('http://site.com/', 'path/../path/.././path/./') http://site.com/path/ >>> The difference is the inital '/' in the second argument. Human interpretation is: Go to http://site.com/ and 1) go to path directory 2) go to one-level above (/../) which results in site.com again 3) go to path directory 4) go to one-level above (..) (results site.com )5) Stay in the same directory (.) 6) goto path 7) stay there (.) Final result is http://www.site.com/path/ When you start the path with a '/' >>> print urlparse.urljoin('http://site.com/', '/path/../path/.././path/./') http://site.com/path/../path/.././path/./ The RFC (1808) suggests the following. urlparse.urljoin('http://a/b/c/d','/./g') = <URL:http://a/./g> The argument is taken as a complete path for the server. The way to use this would be, this way: >>> print urlparse.urljoin('http://site.com/', 'path/../path/.././path/./') http://site.com/path/ >>> This is not a bug and can be closed.
msg66892 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2008-05-16 03:51
Btw, Thank you for the exciting report monk.e.boy. :-) There are many hidden in urlparse,urllib*. I hope you will have fun time finding them (and fixing them too :) And one general comment. If the bug is valid, Python official Documentation cannot be made to reference a blog site. Instead, a patch to fix the python doc would itself be welcome.
msg67141 - (view)	Author: Facundo Batista (facundobatista) *	Date: 2008-05-21 00:25
Not a bug...

History
Date	User	Action	Args
2022-04-11 14:56:33	admin	set	github: 46835
2008-05-21 00:25:35	facundobatista	set	status: open -> closed resolution: not a bug messages: + msg67141 nosy: + facundobatista
2008-05-16 03:51:06	orsenthil	set	messages: + msg66892
2008-05-16 03:48:38	orsenthil	set	nosy: + orsenthil messages: + msg66890
2008-04-08 13:56:58	monk.e.boy	create