Issue 23255: SimpleHTTPRequestHandler refactor for more extensible usage.

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/67444

classification

Title:	SimpleHTTPRequestHandler refactor for more extensible usage.
Type:	enhancement	Stage:	patch review
Components:	Library (Lib)	Versions:	Python 3.6

process

Status:	languishing	Resolution:
Dependencies:		Superseder:
Assigned To:	berker.peksag	Nosy List:	berker.peksag, demian.brecht, last-ent, martin.panter, ned.deily
Priority:	normal	Keywords:	needs review, patch

Created on 2015-01-17 13:51 by last-ent, last changed 2022-04-11 14:58 by admin.

Files
File name	Uploaded	Description	Edit
simplehttp.patch	last-ent, 2015-01-17 13:51	Patch for http.server.SimpleHTTPRequestHandler.send_head	review
simplehttp1.patch	last-ent, 2015-01-17 14:11	base_files = ['index.htm', 'index.html']	review
simplehttp-fcn-rnm-doc.patch	last-ent, 2015-01-18 06:43	Doc String update + functions renamed	review
helpers-unittests-docs.patch	last-ent, 2015-01-20 09:26	Helper Functions, Unit Tests, Documentation
jan24th.patch	last-ent, 2015-01-24 19:43	All the above + stuff in msg234562	review
jan25.patch	last-ent, 2015-01-25 07:26	Patch after serious critique
jan27th.patch	last-ent, 2015-01-27 17:47	Deeper look at patch for public use.	review
index-test.patch	martin.panter, 2015-02-01 00:28	Enhance index.html test	review
Feb1st.patch	last-ent, 2015-02-01 06:37	index_files served for all directories & then some.	review
index-test.2.patch	martin.panter, 2015-02-01 09:08	Moved to Issue 23440	review
Feb1stb.patch	last-ent, 2015-02-01 13:28	Removed duplicate apply_headers. Used assertIn, removed "//" in redirect testcase	review

Messages (29)
msg234165 - (view)	Author: Ent (last-ent) *	Date: 2015-01-17 13:51
Use of http.server.BaseHTTPRequestHandler for exploratory or simple usage, SimpleHTTPRequestHandler provides a good platform to start on. However, the current code in SimpleHTTPRequestHandler's send_head is tightly coupled together as a single unit. This patch aims to break send_head down into composable parts so that any developer can subclass SimpleHTTPRequestHandler to create one's own HTTPRequestHandler class with their own custom behaviour without having to rewrite a lot of repeated code. For example consider a developer wanting to address two usecases * Use SimpleHTTPRequestHandler, but use index.html from a different directory. * For certain GET urls, call on a specific function to response. * Use custom html page instead of index.html class CustomHTTPRequestHandler(SimpleHTTPRequestHandler): def do_GET(self): if self.path =='/': f = self.home_head() elif self.path in self.custom_paths: f = self.special_head() else: f = self.send_head() # ... # Standard Code As a result of the patch, in above scenario instead of copy-pasting logic for browser redirection, getting object for file or directory and, applying headers upon success; Developer can use methods redirect_browser, get_dir_or_html_dir_path and, apply_headers to reduce the code. Basically, applying DRY principle. Note: Since this is but refactoring of existing code without any new functionality being added, no test cases are provided.
msg234166 - (view)	Author: Berker Peksag (berker.peksag) *	Date: 2015-01-17 13:55
> + base_files = ['index.html', 'index.html'] I guess this should be ['index.htm', 'index.html'].
msg234167 - (view)	Author: Ent (last-ent) *	Date: 2015-01-17 14:11
Changing base_files to point @ ['index.htm', 'index.html']
msg234224 - (view)	Author: Ent (last-ent) *	Date: 2015-01-18 06:43
@vadmium: My Mistake. It should read "file path" not "file object". (500 error when submitting to review page.) Renaming get_html_or_dir_path to get_path_or_dir for accurate description. Also renaming copyfile to more pythonic copy_file.
msg234305 - (view)	Author: Ent (last-ent) *	Date: 2015-01-19 07:16
I am having some issues with replying to comments on review page. It is giving 500 error hence posting replies here. @berker.peksag: Thanks! I will add the methods' documentation & examples into the Doc/library/http.server.rst. As well as include a few unit tests. I searched through github and there doesn't seem to be many examples of "BaseHTTPRequestHandler copyfile". Still if you feel it is better not to make this change, I will revert it back. I have submitted the contribuer-form on the same day as I submitted this patch. Waiting for it to be accepted. Thanks again.
msg234362 - (view)	Author: Ent (last-ent) *	Date: 2015-01-20 09:26
Following is updated patch with * Refactored code with helper functions * Unit Tests * Documentation - Explanation + Examples SimpleHTTPRequestHandler's copyfile has been renamed to copy_file but not shutils'.
msg234562 - (view)	Author: Demian Brecht (demian.brecht) *	Date: 2015-01-23 17:10
Thanks for the work! I'm not sure why the last patch doesn't appear on Rietveld, so (unfortunately) here's the result of my review. I've only covered functional aspects in this run at it: + base_files = ['index.html', 'index.htm'] Can you use "index_files"? That's the commonly used term to refer to these files. - def copyfile(self, source, outputfile): + def copy_file(self, source, outputfile): As Berker mentioned, we can't just rename this as it's not backwards compatible. Standard library modules aren't the only dependency. Such a change would cause breakage in, say, any 3rd party code that subclasses SimpleHTTPRequestHandler. This should remain as copyfile. + def redirect_browser(self, path, parts): Can "_browser" be dropped here? This applies to all clients, not only browsers. Additionally, I think that the name is a little misleading. I think it would be better to have a generic public redirect(<url>, status=http.FOUND) method and then an internal _resolve_path() that calls into the redirect method using the Apache-like logic. It also seems like the path parameter is unused so should be dropped. + def get_path_or_dir(self, path): I think the content of this method should be changed to result in consistent output. Right now, you're either returning a file path /or/ a BytesIO object containing the full output of the directory listing. It might make sense to have a single method that takes the path and produces consistent BytesIO object, regardless of whether it's a directory or a file path. + self.send_response(301) Please use the http.HTTPStatus enum for status codes (i.e. http.HTTPStatus.MOVED_PERMANENTLY) + def apply_headers(self, f, path) I don't think that this makes sense as a public API as it is as it only accounts for a 200 response. What if any error conditions are raised with the given file? As this is functionality specific to the single case in which it's used, I think this should either be left as-is, made more generic to handle header values based on any potential state of the file object, or made into a private helper method (indicated by a single underscore prefix to the method name). Also, you should be able to derive the path from the file parameter (f.name), so I'm not sure that the extra path parameter is necessary. + def get_response(self, tail=False) tail should default to None here, otherwise it's a little confusing as to why a parameter that /looks/ like it should be a bool actually expects a string value.
msg234564 - (view)	Author: Ent (last-ent) *	Date: 2015-01-23 17:34
@demian: That's a tall order! :) I would love to use HTTPStatus but for some reason http/__init__.py is devoid of code related to it - https://hg.python.org/cpython/file/31982d70a52a/Lib/http/__init__.py I wasn't sure why this change was made because it like feels a step backwards. Will someone else be reverting it back or should I just include it in this patch or create another one? As for rest of your comments, I will make the necessary changes and put up next patch.
msg234565 - (view)	Author: Ent (last-ent) *	Date: 2015-01-23 17:43
@demian: If you don't mind, could you please elaborate a bit more on `_resolve_path()` you mentioned in the review/comment? Or maybe link me to the type of behaviour you mentioned? I will accordingly make the changes. As for self.apply_headers, I will see if I can make it more generic. As it stands, I have tried not to make any radical changes in existing logic; This being my first patch and all.
msg234566 - (view)	Author: Berker Peksag (berker.peksag) *	Date: 2015-01-23 17:44
> I would love to use HTTPStatus but for some reason http/__init__.py is devoid of code related to it - https://hg.python.org/cpython/file/31982d70a52a/Lib/http/__init__.py See the default branch: https://hg.python.org/cpython/file/default/Lib/http/__init__.py
msg234571 - (view)	Author: Demian Brecht (demian.brecht) *	Date: 2015-01-23 18:41
> @demian: If you don't mind, could you please elaborate a bit more on `_resolve_path()` you mentioned in the review/comment? Sure. In your patch, you have redirect_browser (or redirect if you renamed it), which sounds like it's allowing for a very generic event to happen: executing a redirect based on a passed in path. What I would expect from calling into this is that the input path would be used as the Location header in the resulting response. For example (using the code as-is): self.redirect_browser('http://example.com/foo/bar/') The above would result in the response Location header being set to input URL. What it's actually doing is a very specific thing: Redirection if the final char in the input URL is '/', which is something that I don't think needs to be exposed as part of the public API. So, my recommendation is to do something like this: def redirect(self, url, status=http.HTTPStatus.FOUND): self.send_response(status) self.send_header('Location', url) self.end_headers() Add a private helper method: def _resolve_path(self, url): parts = urllib.parse.urlsplit(url) [...snip...] return new_path_with_appended_slash And in the body of send_head, do something like: resolved_path = self._resolve_path(path) if path != resolved_path: self.redirect(resolved_path) > As it stands, I have tried not to make any radical changes in existing logic; This being my first patch and all. The tough thing with adding public API in general is that you have to consider both API readability as well as various use cases, not only the one that you're specifically writing code for. To make your first patch a little easier, you /could/ change the public API methods that you've added to private helper methods and rename them as it makes sense. At least in that way, you don't need to worry about making the methods generic and writing tests for each where functionality is enhanced. In reality, it seems that that's exactly what you're trying to achieve: Helper methods where you're grouping logical bodies of code rather than a full blown public API change (but I could be misunderstanding your intentions there). Hope that all makes sense.
msg234635 - (view)	Author: Ent (last-ent) *	Date: 2015-01-24 19:43
New patch for review! Let me know if anything is missing.
msg234657 - (view)	Author: Ent (last-ent) *	Date: 2015-01-25 07:26
Based on the comments of many good netizens, I have further updated the patch.
msg234753 - (view)	Author: Demian Brecht (demian.brecht) *	Date: 2015-01-26 16:40
Thanks for the extra effort on this to satisfy multiple people's opinions. It's never an easy thing, especially when dealing with a decentralized group. My following comments are largely based on the public API. I haven't done a full review of the changes made to the tests. All extraneous spaces should be removed. For example the first line here: + + redirect_path = self._redirect_path(path) + if redirect_path: > Lists the files we want the server to check and render to client. We can add/remove file names as required. Order of files denotes priority. Defaults to ['index.htm', 'index.html'] I think this might read a little better as: "Overridable index files. If any of these files are present in a request pointing to a directory on the server's file system, the contents of the file will be returned. Otherwise, a directory listing will be returned. Defaults to ['index.htm', 'index.html']. Order of files denotes priority. That said, I'm not a wordsmith so there may be a (much) better way of structuring this. > +class HTTPStatusType(Enum): This doesn't sit quite well with me and I think it really should be removed. There are very few cases (if any) that I can think of where this would be useful. Cases where I've encountered the need for a range check generally is around subsets (i.e. 200 and 201 mean entirely different things and /shouldn't/ be globbed together). Additionally (this is subjective), I find it a little strange that the value of an enum is a range. If for some reason it stays, I think the values should at least be sets allowing for O(1) lookup. > + def redirect(self, url, status=HTTPStatus.MOVED_PERMANENTLY): There should be an assertion in this block that the status is one of the redirect codes (301, 302, 308). Calling redirect('/foo', status=200) doesn't make sense. >+ if os.path.isfile(path): + return self.get_file(path) + + # `path` has file with filename in self.index_files + for index_file in self.index_files: + index = os.path.join(path, index_file) Before the directory-related logic executes, you could add a check in for isdir() and short circuit with a 404 rather than relying on an index file not being found and then running into list_directory(), which would result in a 404 with the message "No permission to list directory", which isn't actually correct in the case where the directory doesn't exist. > get_status_type, apply_success_headers, apply_headers I think that these three methods should be removed as they're superfluous, make the code less readable and is inconsistent with how responses and headers are sent throughout the rest of server.py. The general practice is: self.send_response() self.send_header() self.send_header() self.end_headers() The code in this patch should match that. get_status_type is not needed as it /should/, if anything, be a single line passthrough to a reverse lookup mapping. Iterating over each enum followed by an iteration over each value is not optimal behaviour (O(N) rather than O(1)).
msg234830 - (view)	Author: Ent (last-ent) *	Date: 2015-01-27 17:47
Latest patch with simpler(?) logic? @Demian: This is a tough task! And I appreciate your kind words. I have gone through your comments and I have made a few changes as per your suggestion but I have refrained from a few as well. > get_status_type, apply_success_headers, apply_headers The reason I wrote these three methods is that when a new dev wants to try out the library, he shouldn't have to handcode all the boilerplate. It makes no sense to to re-write this part of code for each new implementation. It can always be overwritten when required and it follows existing functionality. But I understand your idea of what should be part of Public API and I have changed the access levels of most of such methods to class methods. I have updated `get_status_type` to be of O(1) (hopefully) and removed HTTPStatusType. I think since it is quite easy to overwrite `apply_headers` & `_get_status_type`, the default implementation can be straight up used by anyone with no modification and still yield satisfactory behaviour. As for Doc, I really have no idea how to make it better. Maybe someone else can submit a patch later on? Or if you are fine, I will include your own Doc. Thanks again.
msg234833 - (view)	Author: Berker Peksag (berker.peksag) *	Date: 2015-01-27 18:22
Thanks for the updated patch. I will tweak the docs before I commit the patch.
msg234948 - (view)	Author: Ent (last-ent) *	Date: 2015-01-29 05:32
Thanks for the update! I wasn't expecting this to be such a friendly & positive experience. Glad to be proven wrong :)
msg235137 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-02-01 00:28
Here is an enhancement to the existing SimpleHTTPServerTestCase.test_get() test case, that demonstrates the current implementation breaks serving index.html files by default.
msg235144 - (view)	Author: Ent (last-ent) *	Date: 2015-02-01 06:37
I have updated the patch such that for any directory, if they have a file with name in index_files, it will be served by default. Also a few tweaks.
msg235149 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-02-01 09:08
Here is another addition to the existing test suite to detect the bug with the duplicate 404 Not Found responses. It relies on running the non-Windows, non-root test that says # chmod() doesn't work as expected on Windows, and filesystem # permissions are ignored by root on Unix In other words, you trigger it by requesting an unreadable file or directory. Thanks for your replies to my comments, I think I am understanding your point of view better now. You want functions to either return a file object, or to mutate the connection and send part of the response, but not both (except for sending an error response): * get_file() returns a file object only * redirect() sends response headers only I am trying find a neater way to avoiding calling apply_headers() when list_directory() has already sent the headers. How do you feel about calling apply_headers() from inside get_file_or_dir() and get_index_file()? Or at least put a big fat warning saying get_file_or_dir() may or may not have already sent the headers, indicated by the type of file returned. Another option might be to split list_directory() into a get-file stage and an apply-headers stage. But in my experience the headers to send are usually tightly coupled with the content body. In this case both vary depending on the redirect, generated directory, or static file cases.
msg235167 - (view)	Author: Ent (last-ent) *	Date: 2015-02-01 13:28
@vadmium: Thanks for the suggestion - I opted for the first one. While I wasn't happy with it being called twice, wasn't getting an idea of how to address it. Am I supposed to include your patch as part of my patch? I am not much knowledgeable about the protocol to follow in such cases, so I haven't included it.
msg235243 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-02-02 11:50
You’re welcome to merge my test patch into yours if you want to. Or I could open a separate issue for it, I don’t mind.
msg235387 - (view)	Author: Ent (last-ent) *	Date: 2015-02-04 15:17
No I think it's better if you put up a separate patch. That way any questions other reviewers will have, you will be better suited to answer them. Cheers!
msg235731 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-02-11 07:45
Opened Issue 23440 for my test changes
msg245444 - (view)	Author: Ent (last-ent) *	Date: 2015-06-17 19:47
Hello, Since this patch is in acceptable state, should the Status or Resolution be changed so that it is flagged to be merged into Python 3.5? Thanks.
msg245446 - (view)	Author: Ned Deily (ned.deily) *	Date: 2015-06-17 20:37
Ent, thanks for all your work on this, and thanks to Demian and Martin for their reviews. In the meantime, the Python 3.5 release cycle has reached feature code cutoff so, at this point, generally only bug fixes are being accepted into 3.5, which means this refactoring will likely land in 3.6. The next step is for a core developer to do a final review and then decide whether to commit it. Berker has been commenting on this issue so perhaps he will have time to handle it as 3.6 gets underway. (http is one of the modules/packages in the standard library where currently no core developer has volunteered to be its "expert" so we all share responsibility for it as time and interest permits. https://docs.python.org/devguide/experts.html)
msg245458 - (view)	Author: Berker Peksag (berker.peksag) *	Date: 2015-06-18 10:01
Ned is correct. I started to review the patch, but couldn't find time to do a complete review. I'll take a look at it in a week or two. Thanks!
msg245472 - (view)	Author: Ent (last-ent) *	Date: 2015-06-18 16:19
Thanks Ned & Berker, I can only imagine the amount of work the core devs have to deal with. Hope my patch makes it through in next version. Regards, Ent
msg253416 - (view)	Author: Ent (last-ent) *	Date: 2015-10-24 20:08
Hi, Is it possible for this patch to be reviewed now? Regards, Ent

History
Date	User	Action	Args
2022-04-11 14:58:11	admin	set	github: 67444
2016-02-02 21:37:44	last-ent	set	status: open -> languishing
2015-10-24 20:08:57	last-ent	set	messages: + msg253416
2015-06-18 16:19:16	last-ent	set	messages: + msg245472
2015-06-18 10:01:48	berker.peksag	set	assignee: berker.peksag messages: + msg245458
2015-06-17 20:37:12	ned.deily	set	versions: + Python 3.6, - Python 3.5 nosy: + ned.deily messages: + msg245446 keywords: + needs review
2015-06-17 19:47:39	last-ent	set	messages: + msg245444
2015-02-11 07:45:28	martin.panter	set	messages: + msg235731
2015-02-04 15:17:16	last-ent	set	messages: + msg235387
2015-02-02 11:50:38	martin.panter	set	messages: + msg235243
2015-02-01 13:28:05	last-ent	set	files: + Feb1stb.patch messages: + msg235167
2015-02-01 09:08:22	martin.panter	set	files: + index-test.2.patch messages: + msg235149
2015-02-01 06:37:54	last-ent	set	files: + Feb1st.patch messages: + msg235144
2015-02-01 00:28:25	martin.panter	set	files: + index-test.patch messages: + msg235137
2015-01-29 05:32:45	last-ent	set	messages: + msg234948
2015-01-27 18:22:11	berker.peksag	set	messages: + msg234833
2015-01-27 17:47:25	last-ent	set	files: + jan27th.patch messages: + msg234830
2015-01-26 16:40:48	demian.brecht	set	messages: + msg234753
2015-01-25 07:26:11	last-ent	set	files: + jan25.patch messages: + msg234657
2015-01-24 19:43:49	last-ent	set	files: + jan24th.patch messages: + msg234635
2015-01-23 18:41:14	demian.brecht	set	messages: + msg234571
2015-01-23 17:44:15	berker.peksag	set	messages: + msg234566
2015-01-23 17:43:32	last-ent	set	messages: + msg234565
2015-01-23 17:34:50	last-ent	set	messages: + msg234564
2015-01-23 17:10:39	demian.brecht	set	messages: + msg234562
2015-01-20 09:26:07	last-ent	set	files: + helpers-unittests-docs.patch messages: + msg234362
2015-01-20 02:00:45	demian.brecht	set	nosy: + demian.brecht
2015-01-19 07:16:38	last-ent	set	messages: + msg234305
2015-01-18 06:43:54	last-ent	set	files: + simplehttp-fcn-rnm-doc.patch messages: + msg234224
2015-01-17 21:12:40	martin.panter	set	nosy: + martin.panter
2015-01-17 14:11:59	last-ent	set	files: + simplehttp1.patch messages: + msg234167
2015-01-17 13:55:34	berker.peksag	set	messages: + msg234166
2015-01-17 13:54:32	berker.peksag	set	nosy: + berker.peksag stage: patch review versions: + Python 3.5, - Python 3.4
2015-01-17 13:51:58	last-ent	create