Issue 5714: http.server._url_collapse_path should live elsewhere

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/49964

classification

Title:	http.server._url_collapse_path should live elsewhere
Type:	enhancement	Stage:	needs patch
Components:	Library (Lib)	Versions:	Python 3.5

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	chin, gregory.p.smith, martin.panter, orsenthil, r.david.murray, ta1hia
Priority:	normal	Keywords:	easy, patch

Created on 2009-04-07 00:24 by gregory.p.smith, last changed 2022-04-11 14:56 by admin.

Files
File name	Uploaded	Description	Edit
issue5714_withdoc.diff	chin, 2009-04-07 13:50	Moved CGIHTTPServer._url_collapse_path_split -> urlparse.url_collapse_path_split. Changed docs for urlparse.	review

Messages (8)
msg85679 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2009-04-07 00:23
CGIHTTPServer._url_collapse_path_split should probably live in a more general library and become a documented API. Perhaps in urlparse?
msg85699 - (view)	Author: Leonid Vasilev (chin)	Date: 2009-04-07 13:50
I'd Wrote a patch for this. Perhaps some unittests are needed.
msg85701 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2009-04-07 14:44
Yes, unit tests are always needed for any change. Gregory, why is a function that does two things (collapse and split) a good candidate for a public API? It results in a pretty awkward name :) I'm not saying it's necessarily a bad idea, just looking for motivation.
msg85715 - (view)	Author: Leonid Vasilev (chin)	Date: 2009-04-07 15:21
Actually urlparse.urljoin implements RFC 2396 Is it true that 'CGIHTTPServer._url_collapse_path_split' is just a inverted 'urlparse.urljoin' ? """ >>> urlparse.urljoin('http://a/b/c/','g') 'http://a/b/c/g' >>> urlparse.url_collapse_path_split('http://a/b/c/g') ('/http:/a/b/c', 'g') >>> urlparse.urlsplit('http://a/b/c/g') SplitResult(scheme='http', netloc='a', path='/b/c/g', query='', fragment='') """ And there is existing function 'urlparse.urlsplit'. I think "CGIHTTPServer._url_collapse_path_split" is just a customized version of latter, and should be rewritten using 'urlparse.urlsplit'.
msg85729 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2009-04-07 17:09
urlparse.urljoin and urlparse.urlsplit do not do what is required for this function. urljoin does not collapse paths. urlsplit has nothing to do with paths. I agree r.david.murray that it is odd that it does two functions at once (the collapse and the split). I wrote it specifically for its current use case when checking paths to cgi scripts. The unittests for it describe the exact behavior it needs to implement. Trying to implement a separate collapse function is approximately the same amount of code because the edge cases such as '/a/b/' and '/a/b/c/..' which both need to result in it returning ('/a/b', '') instead of ('/a', 'b') are why it made sense to keep as a single function for its current use. Unittests for the function already exist in Lib/test/test_httpservers.py to describe its full behavior.
msg86872 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2009-05-01 09:12
Yes, the parsing best be done in urlparse. As this function claims to do step 6 of RFC2396, I am surprised why it cannot be done by using existing urlparse functions itself.(Because, the tests for RFC2396 compliance covers those cases too).
msg221756 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2014-06-28 00:02
I've tried investigating this but I've got lost so I'll have to pass the buck. I've got confused because of the code moving around in Python 2 and the library changes going to Python 3 :-(
msg252079 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-10-02 03:05
Some ideas for this sort of function (or perhaps a few related functions): * Allow OS-independent handling of the path. E.g. the CGI server still has to check if each segment is a CGI script, the rest becomes PATH_INFO. Perhaps return a sequence of path segments rather than a string. * Allow easy conversion to a valid OS path (or perhaps a pathlib object) * Ensure that invalid and special OS filenames trigger an error (e.g. "NUL" or unsupported characters, including backslashes \, on Windows). Like a stricter version of pathlib’s is_reserved(). * Handle percent decoding Also, see the SimpleHTTPRequestHandler.translate_path() method, and Issue 14567. There seems to be a lot of redundancy.

History
Date	User	Action	Args
2022-04-11 14:56:47	admin	set	github: 49964
2019-09-10 21:01:24	ta1hia	set	nosy: + ta1hia
2019-02-24 22:35:12	BreamoreBoy	set	nosy: - BreamoreBoy
2015-10-02 03:05:55	martin.panter	set	messages: + msg252079
2015-03-31 02:36:23	martin.panter	set	nosy: + martin.panter
2014-06-30 05:35:00	berker.peksag	set	title: CGIHTTPServer._url_collapse_path_split should live elsewhere -> http.server._url_collapse_path should live elsewhere stage: test needed -> needs patch
2014-06-28 00:02:49	BreamoreBoy	set	nosy: + BreamoreBoy messages: + msg221756 versions: + Python 3.5, - Python 3.3
2011-03-09 02:04:08	terry.reedy	set	nosy: gregory.p.smith, orsenthil, r.david.murray, chin versions: + Python 3.3, - Python 3.1, Python 2.7
2009-05-01 09:12:04	orsenthil	set	nosy: + orsenthil messages: + msg86872
2009-04-07 17:09:53	gregory.p.smith	set	messages: + msg85729
2009-04-07 15:21:56	chin	set	messages: + msg85715
2009-04-07 14:44:44	r.david.murray	set	nosy: + r.david.murray messages: + msg85701 components: + Library (Lib) stage: test needed
2009-04-07 13:50:12	chin	set	files: + issue5714_withdoc.diff nosy: + chin messages: + msg85699 keywords: + patch
2009-04-07 00:24:03	gregory.p.smith	create