This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urlparse normalize URL path
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: facundobatista, monk.e.boy, orsenthil
Priority: normal Keywords:

Created on 2008-04-08 13:56 by monk.e.boy, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (4)
msg65162 - (view) Author: monk.e.boy (monk.e.boy) Date: 2008-04-08 13:56
Hi,
  This is my first problem with anything Python :-) and my first issue.

  Doing in the following:

  urlparse.urljoin( 'http://site.com/', '../../../../path/' )
  'http://site.com/../../../../path/'

  urlparse.urljoin( 'http://site.com/', '/path/../path/.././path/./' )
  'http://site.com/path/../path/.././path/./'

These URLs are normalized to http://site.com/path/ in both Firefox and
Google (the google spider would follow these OK)

  I think the documentation could be improved to point at the
posixpath.py normpath function and how it solves the above. I blogged a
how to:

http://teethgrinder.co.uk/blog/Normalize-URL-path-python/

I hope my bug report is OK. Thanks for all the code :-)

johng@neutralize.com
msg66890 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2008-05-16 03:48
Just try it this way.
>>> print urlparse.urljoin('http://site.com/', 'path/../path/.././path/./')
http://site.com/path/
>>>

The difference is the inital '/' in the second argument.
Human interpretation is:
Go to http://site.com/ and 1) go to path directory 2) go to one-level
above (/../) which results in site.com again 3) go to path directory 4)
go to one-level above (..) (results site.com )5) Stay in the same
directory (.) 6) goto path 7) stay there (.) 
Final result is http://www.site.com/path/

When you start the path with a '/'
>>> print urlparse.urljoin('http://site.com/', '/path/../path/.././path/./')
http://site.com/path/../path/.././path/./

The RFC (1808) suggests the following.
urlparse.urljoin('http://a/b/c/d','/./g') = <URL:http://a/./g>
The argument is taken as a complete path for the server.


The way to use this would be, this way:

>>> print urlparse.urljoin('http://site.com/', 'path/../path/.././path/./')
http://site.com/path/
>>>

This is not a bug and can be closed.
msg66892 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2008-05-16 03:51
Btw, Thank you for the exciting report monk.e.boy. :-)
There are many hidden in urlparse,urllib*. I hope you will have fun time
finding them (and fixing them too :)

And one general comment. If the bug is valid, Python official
Documentation cannot be made to  reference a blog site. Instead, a patch
to fix the python doc would itself be welcome.
msg67141 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2008-05-21 00:25
Not a bug...
History
Date User Action Args
2022-04-11 14:56:33adminsetgithub: 46835
2008-05-21 00:25:35facundobatistasetstatus: open -> closed
resolution: not a bug
messages: + msg67141
nosy: + facundobatista
2008-05-16 03:51:06orsenthilsetmessages: + msg66892
2008-05-16 03:48:38orsenthilsetnosy: + orsenthil
messages: + msg66890
2008-04-08 13:56:58monk.e.boycreate