classification
Title: http.server._url_collapse_path should live elsewhere
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: chin, gregory.p.smith, martin.panter, orsenthil, r.david.murray, ta1hia
Priority: normal Keywords: easy, patch

Created on 2009-04-07 00:24 by gregory.p.smith, last changed 2019-09-10 21:01 by ta1hia.

Files
File name Uploaded Description Edit
issue5714_withdoc.diff chin, 2009-04-07 13:50 Moved CGIHTTPServer._url_collapse_path_split -> urlparse.url_collapse_path_split. Changed docs for urlparse. review
Messages (8)
msg85679 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-04-07 00:23
CGIHTTPServer._url_collapse_path_split should probably live in a more 
general library and become a documented API.  Perhaps in urlparse?
msg85699 - (view) Author: Leonid Vasilev (chin) Date: 2009-04-07 13:50
I'd Wrote a patch for this.

Perhaps some unittests are needed.
msg85701 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-04-07 14:44
Yes, unit tests are always needed for any change.

Gregory, why is a function that does two things (collapse and split) a
good candidate for a public API?  It results in a pretty awkward name :)

I'm not saying it's necessarily a bad idea, just looking for motivation.
msg85715 - (view) Author: Leonid Vasilev (chin) Date: 2009-04-07 15:21
Actually urlparse.urljoin implements RFC 2396

Is it true that 'CGIHTTPServer._url_collapse_path_split' is just a
inverted 'urlparse.urljoin' ?
"""
>>> urlparse.urljoin('http://a/b/c/','g')
'http://a/b/c/g'
>>> urlparse.url_collapse_path_split('http://a/b/c/g')
('/http:/a/b/c', 'g')
>>> urlparse.urlsplit('http://a/b/c/g')
SplitResult(scheme='http', netloc='a', path='/b/c/g', query='', fragment='')
"""

And there is existing function 'urlparse.urlsplit'.
I think "CGIHTTPServer._url_collapse_path_split" is just a customized
version of latter, and should be rewritten using 'urlparse.urlsplit'.
msg85729 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-04-07 17:09
urlparse.urljoin and urlparse.urlsplit do not do what is required for 
this function.  urljoin does not collapse paths.  urlsplit has nothing 
to do with paths.

I agree r.david.murray that it is odd that it does two functions at once 
(the collapse and the split).  I wrote it specifically for its current 
use case when checking paths to cgi scripts.

The unittests for it describe the exact behavior it needs to implement.  
Trying to implement a separate collapse function is approximately the 
same amount of code because the edge cases such as
'/a/b/' and '/a/b/c/..' which both need to result in it returning 
('/a/b', '') instead of ('/a', 'b') are why it made sense to keep as a 
single function for its current use.

Unittests for the function already exist in Lib/test/test_httpservers.py  
to describe its full behavior.
msg86872 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2009-05-01 09:12
Yes, the parsing best be done in urlparse.

As this function claims to do step 6 of RFC2396, I am surprised why it
cannot be done by using existing urlparse functions itself.(Because, the
tests for RFC2396 compliance covers those cases too).
msg221756 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-06-28 00:02
I've tried investigating this but I've got lost so I'll have to pass the buck.  I've got confused because of the code moving around in Python 2 and the library changes going to Python 3 :-(
msg252079 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-10-02 03:05
Some ideas for this sort of function (or perhaps a few related functions):

* Allow OS-independent handling of the path. E.g. the CGI server still has to check if each segment is a CGI script, the rest becomes PATH_INFO. Perhaps return a sequence of path segments rather than a string.
* Allow easy conversion to a valid OS path (or perhaps a pathlib object)
* Ensure that invalid and special OS filenames trigger an error (e.g. "NUL" or unsupported characters, including backslashes \, on Windows). Like a stricter version of pathlib’s is_reserved().
* Handle percent decoding

Also, see the SimpleHTTPRequestHandler.translate_path() method, and Issue 14567. There seems to be a lot of redundancy.
History
Date User Action Args
2019-09-10 21:01:24ta1hiasetnosy: + ta1hia
2019-02-24 22:35:12BreamoreBoysetnosy: - BreamoreBoy
2015-10-02 03:05:55martin.pantersetmessages: + msg252079
2015-03-31 02:36:23martin.pantersetnosy: + martin.panter
2014-06-30 05:35:00berker.peksagsettitle: CGIHTTPServer._url_collapse_path_split should live elsewhere -> http.server._url_collapse_path should live elsewhere
stage: test needed -> needs patch
2014-06-28 00:02:49BreamoreBoysetnosy: + BreamoreBoy

messages: + msg221756
versions: + Python 3.5, - Python 3.3
2011-03-09 02:04:08terry.reedysetnosy: gregory.p.smith, orsenthil, r.david.murray, chin
versions: + Python 3.3, - Python 3.1, Python 2.7
2009-05-01 09:12:04orsenthilsetnosy: + orsenthil
messages: + msg86872
2009-04-07 17:09:53gregory.p.smithsetmessages: + msg85729
2009-04-07 15:21:56chinsetmessages: + msg85715
2009-04-07 14:44:44r.david.murraysetnosy: + r.david.murray
messages: + msg85701

components: + Library (Lib)
stage: test needed
2009-04-07 13:50:12chinsetfiles: + issue5714_withdoc.diff

nosy: + chin
messages: + msg85699

keywords: + patch
2009-04-07 00:24:03gregory.p.smithcreate