classification
Title: SimpleHTTPRequestHandler refactor for more extensible usage.
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.6
process
Status: languishing Resolution:
Dependencies: Superseder:
Assigned To: berker.peksag Nosy List: berker.peksag, demian.brecht, last-ent, martin.panter, ned.deily
Priority: normal Keywords: needs review, patch

Created on 2015-01-17 13:51 by last-ent, last changed 2016-02-02 21:37 by last-ent.

Files
File name Uploaded Description Edit
simplehttp.patch last-ent, 2015-01-17 13:51 Patch for http.server.SimpleHTTPRequestHandler.send_head review
simplehttp1.patch last-ent, 2015-01-17 14:11 base_files = ['index.htm', 'index.html'] review
simplehttp-fcn-rnm-doc.patch last-ent, 2015-01-18 06:43 Doc String update + functions renamed review
helpers-unittests-docs.patch last-ent, 2015-01-20 09:26 Helper Functions, Unit Tests, Documentation
jan24th.patch last-ent, 2015-01-24 19:43 All the above + stuff in msg234562 review
jan25.patch last-ent, 2015-01-25 07:26 Patch after serious critique
jan27th.patch last-ent, 2015-01-27 17:47 Deeper look at patch for public use. review
index-test.patch martin.panter, 2015-02-01 00:28 Enhance index.html test review
Feb1st.patch last-ent, 2015-02-01 06:37 index_files served for all directories & then some. review
index-test.2.patch martin.panter, 2015-02-01 09:08 Moved to Issue 23440 review
Feb1stb.patch last-ent, 2015-02-01 13:28 Removed duplicate apply_headers. Used assertIn, removed "//" in redirect testcase review
Messages (29)
msg234165 - (view) Author: Ent (last-ent) * Date: 2015-01-17 13:51
Use of http.server.BaseHTTPRequestHandler for exploratory or simple usage, SimpleHTTPRequestHandler provides a good platform to start on. However, the current code in SimpleHTTPRequestHandler's send_head is tightly coupled together as a single unit.

This patch aims to break send_head down into composable parts so that any developer can subclass SimpleHTTPRequestHandler to create one's own HTTPRequestHandler class with their own custom behaviour without having to rewrite a lot of repeated code.

For example consider a developer wanting to address two usecases
* Use SimpleHTTPRequestHandler, but use index.html from a different directory.
* For certain GET urls, call on a specific function to response.
* Use custom html page instead of index.html

class CustomHTTPRequestHandler(SimpleHTTPRequestHandler):
    def do_GET(self):
        if self.path =='/':
            f = self.home_head()
        elif self.path in self.custom_paths:
            f = self.special_head()
        else:
            f = self.send_head()
        # ...
        # Standard Code

As a result of the patch, in above scenario instead of copy-pasting logic for browser redirection, getting object for file or directory and, applying headers upon success; Developer can use methods redirect_browser, get_dir_or_html_dir_path and, apply_headers to reduce the code.

Basically, applying DRY principle.

Note: Since this is but refactoring of existing code without any new functionality being added, no test cases are provided.
msg234166 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2015-01-17 13:55
> +    base_files = ['index.html', 'index.html']

I guess this should be ['index.htm', 'index.html'].
msg234167 - (view) Author: Ent (last-ent) * Date: 2015-01-17 14:11
Changing base_files to point @  ['index.htm', 'index.html']
msg234224 - (view) Author: Ent (last-ent) * Date: 2015-01-18 06:43
@vadmium: My Mistake. It should read "file path" not "file object". (500 error when submitting to review page.)

Renaming get_html_or_dir_path to get_path_or_dir for accurate description.

Also renaming copyfile to more pythonic copy_file.
msg234305 - (view) Author: Ent (last-ent) * Date: 2015-01-19 07:16
I am having some issues with replying to comments on review page. It is giving 500 error hence posting replies here.

@berker.peksag: Thanks! I will add the methods' documentation & examples into the Doc/library/http.server.rst. As well as include a few unit tests.

I searched through github and there doesn't seem to be many examples of "BaseHTTPRequestHandler copyfile". Still if you feel it is better not to make this change, I will revert it back.

I have submitted the contribuer-form on the same day as I submitted this patch. Waiting for it to be accepted.

Thanks again.
msg234362 - (view) Author: Ent (last-ent) * Date: 2015-01-20 09:26
Following is updated patch with
* Refactored code with helper functions
* Unit Tests
* Documentation - Explanation + Examples


SimpleHTTPRequestHandler's copyfile has been renamed to copy_file but not shutils'.
msg234562 - (view) Author: Demian Brecht (demian.brecht) * (Python triager) Date: 2015-01-23 17:10
Thanks for the work! I'm not sure why the last patch doesn't appear on Rietveld, so (unfortunately) here's the result of my review. I've only covered functional aspects in this run at it:


+    base_files = ['index.html', 'index.htm']

Can you use "index_files"? That's the commonly used term to refer to these files.

-    def copyfile(self, source, outputfile):
+    def copy_file(self, source, outputfile):

As Berker mentioned, we can't just rename this as it's not backwards compatible. Standard library modules aren't the only dependency. Such a change would cause breakage in, say, any 3rd party code that subclasses SimpleHTTPRequestHandler. This should remain as copyfile.

+    def redirect_browser(self, path, parts):

Can "_browser" be dropped here? This applies to all clients, not only browsers. Additionally, I think that the name is a little misleading. I think it would be better to have a generic public redirect(<url>, status=http.FOUND) method and then an internal _resolve_path() that calls into the redirect method using the Apache-like logic. It also seems like the path parameter is unused so should be dropped.

+    def get_path_or_dir(self, path):

I think the content of this method should be changed to result in consistent output. Right now, you're either returning a file path /or/ a BytesIO object containing the full output of the directory listing. It might make sense to have a single method that takes the path and produces consistent BytesIO object, regardless of whether it's a directory or a file path.

+            self.send_response(301)

Please use the http.HTTPStatus enum for status codes (i.e. http.HTTPStatus.MOVED_PERMANENTLY)

+    def apply_headers(self, f, path)

I don't think that this makes sense as a public API as it is as it only accounts for a 200 response. What if any error conditions are raised with the given file? As this is functionality specific to the single case in which it's used, I think this should either be left as-is, made more generic to handle header values based on any potential state of the file object, or made into a private helper method (indicated by a single underscore prefix to the method name).

Also, you should be able to derive the path from the file parameter (f.name), so I'm not sure that the extra path parameter is necessary.

+    def get_response(self, tail=False)

tail should default to None here, otherwise it's a little confusing as to why a parameter that /looks/ like it should be a bool actually expects a string value.
msg234564 - (view) Author: Ent (last-ent) * Date: 2015-01-23 17:34
@demian: That's a tall order! :)

I would love to use HTTPStatus but for some reason http/__init__.py is devoid of code related to it - https://hg.python.org/cpython/file/31982d70a52a/Lib/http/__init__.py

I wasn't sure why this change was made because it like feels a step backwards. Will someone else be reverting it back or should I just include it in this patch or create another one?

As for rest of your comments, I will make the necessary changes and put up next patch.
msg234565 - (view) Author: Ent (last-ent) * Date: 2015-01-23 17:43
@demian: If you don't mind, could you please elaborate a bit more on `_resolve_path()` you mentioned in the review/comment? Or maybe link me to the type of behaviour you mentioned? I will accordingly make the changes. As for self.apply_headers, I will see if I can make it more generic. As it stands, I have tried not to make any radical changes in existing logic; This being my first patch and all.
msg234566 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2015-01-23 17:44
> I would love to use HTTPStatus but for some reason http/__init__.py is devoid of code related to it - https://hg.python.org/cpython/file/31982d70a52a/Lib/http/__init__.py

See the default branch: https://hg.python.org/cpython/file/default/Lib/http/__init__.py
msg234571 - (view) Author: Demian Brecht (demian.brecht) * (Python triager) Date: 2015-01-23 18:41
> @demian: If you don't mind, could you please elaborate a bit more on `_resolve_path()` you mentioned in the review/comment?

Sure. In your patch, you have redirect_browser (or redirect if you
renamed it), which sounds like it's allowing for a very generic event to
happen: executing a redirect based on a passed in path. What I would
expect from calling into this is that the input path would be used as
the Location header in the resulting response. For example (using the
code as-is):

self.redirect_browser('http://example.com/foo/bar/')

The above would result in the response Location header being set to
input URL. What it's actually doing is a very specific thing:
Redirection if the final char in the input URL is '/', which is
something that I don't think needs to be exposed as part of the public API.

So, my recommendation is to do something like this:

def redirect(self, url, status=http.HTTPStatus.FOUND):
    self.send_response(status)
    self.send_header('Location', url)
    self.end_headers()

Add a private helper method:

def _resolve_path(self, url):
    parts = urllib.parse.urlsplit(url)
    [...snip...]
    return new_path_with_appended_slash

And in the body of send_head, do something like:

resolved_path = self._resolve_path(path)
if path != resolved_path:
    self.redirect(resolved_path)

> As it stands, I have tried not to make any radical changes in existing
logic; This being my first patch and all.

The tough thing with adding public API in general is that you have to
consider both API readability as well as various use cases, not only the
one that you're specifically writing code for. To make your first patch
a little easier, you /could/ change the public API methods that you've
added to private helper methods and rename them as it makes sense. At
least in that way, you don't need to worry about making the methods
generic and writing tests for each where functionality is enhanced. In
reality, it seems that that's exactly what you're trying to achieve:
Helper methods where you're grouping logical bodies of code rather than
a full blown public API change (but I could be misunderstanding your
intentions there).

Hope that all makes sense.
msg234635 - (view) Author: Ent (last-ent) * Date: 2015-01-24 19:43
New patch for review! Let me know if anything is missing.
msg234657 - (view) Author: Ent (last-ent) * Date: 2015-01-25 07:26
Based on the comments of many good netizens,
I have further updated the patch.
msg234753 - (view) Author: Demian Brecht (demian.brecht) * (Python triager) Date: 2015-01-26 16:40
Thanks for the extra effort on this to satisfy multiple people's opinions. It's never an easy thing, especially when dealing with a decentralized group. My following comments are largely based on the public API. I haven't done a full review of the changes made to the tests.


All extraneous spaces should be removed. For example the first line here:

+        
+        redirect_path = self._redirect_path(path)
+        if redirect_path:


> Lists the files we want the server to check and render to client. We can add/remove file names as required. Order of files denotes priority. Defaults to ['index.htm', 'index.html']

I think this might read a little better as:

"Overridable index files. If any of these files are present in a request pointing to a directory on the server's file system, the contents of the file will be returned. Otherwise, a directory listing will be returned. Defaults to ['index.htm', 'index.html']. Order of files denotes priority.

That said, I'm not a wordsmith so there may be a (much) better way of structuring this.

> +class HTTPStatusType(Enum):

This doesn't sit quite well with me and I think it really should be removed. There are very few cases (if any) that I can think of where this would be useful. Cases where I've encountered the need for a range check generally is around subsets (i.e. 200 and 201 mean entirely different things and /shouldn't/ be globbed together). Additionally (this is subjective), I find it a little strange that the value of an enum is a range.

If for some reason it stays, I think the values should at least be sets allowing for O(1) lookup.

> +    def redirect(self, url, status=HTTPStatus.MOVED_PERMANENTLY):

There should be an assertion in this block that the status is one of the redirect codes (301, 302, 308). Calling redirect('/foo', status=200) doesn't make sense.

>+         if os.path.isfile(path):
 +            return self.get_file(path)
 +
 +        # `path` has file with filename in self.index_files
 +        for index_file in self.index_files:
 +            index = os.path.join(path, index_file)

Before the directory-related logic executes, you could add a check in for isdir() and short circuit with a 404 rather than relying on an index file not being found and then running into list_directory(), which would result in a 404 with the message "No permission to list directory", which isn't actually correct in the case where the directory doesn't exist.

> get_status_type, apply_success_headers, apply_headers

I think that these three methods should be removed as they're superfluous, make the code less readable and is inconsistent with how responses and headers are sent throughout the rest of server.py. The general practice is:

self.send_response()
self.send_header()
self.send_header()
self.end_headers()

The code in this patch should match that.

get_status_type is not needed as it /should/, if anything, be a single line passthrough to a reverse lookup mapping. Iterating over each enum followed by an iteration over each value is not optimal behaviour (O(N) rather than O(1)).
msg234830 - (view) Author: Ent (last-ent) * Date: 2015-01-27 17:47
Latest patch with simpler(?) logic?

@Demian: This is a tough task! And I appreciate your kind words.

I have gone through your comments and I have made a few changes as per your suggestion but I have refrained from a few as well.

> get_status_type, apply_success_headers, apply_headers
The reason I wrote these three methods is that when a new dev wants to try out the library, he shouldn't have to handcode all the boilerplate. It makes no sense to to re-write this part of code for each new implementation. It can always be overwritten when required and it follows existing functionality. 

But I understand your idea of what should be part of Public API and I have changed the access levels of most of such methods to class methods.

I have updated `get_status_type` to be of O(1) (hopefully) and removed HTTPStatusType. 

I think since it is quite easy to overwrite `apply_headers` & `_get_status_type`, the default implementation can be straight up used by anyone with no modification and still yield satisfactory behaviour.

As for Doc, I really have no idea how to make it better. Maybe someone else can submit a patch later on? Or if you are fine, I will include your own Doc.

Thanks again.
msg234833 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2015-01-27 18:22
Thanks for the updated patch. I will tweak the docs before I commit the patch.
msg234948 - (view) Author: Ent (last-ent) * Date: 2015-01-29 05:32
Thanks for the update! I wasn't expecting this to be such a friendly & positive experience. Glad to be proven wrong :)
msg235137 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-01 00:28
Here is an enhancement to the existing SimpleHTTPServerTestCase.test_get() test case, that demonstrates the current implementation breaks serving index.html files by default.
msg235144 - (view) Author: Ent (last-ent) * Date: 2015-02-01 06:37
I have updated the patch such that for any directory, if they have a file with name in index_files, it will be served by default. Also a few tweaks.
msg235149 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-01 09:08
Here is another addition to the existing test suite to detect the bug with the duplicate 404 Not Found responses. It relies on running the non-Windows, non-root test that says

# chmod() doesn't work as expected on Windows, and filesystem
# permissions are ignored by root on Unix

In other words, you trigger it by requesting an unreadable file or directory.

Thanks for your replies to my comments, I think I am understanding your point of view better now. You want functions to either return a file object, or to mutate the connection and send part of the response, but not both (except for sending an error response):

* get_file() returns a file object only
* redirect() sends response headers only

I am trying find a neater way to avoiding calling apply_headers() when list_directory() has already sent the headers. How do you feel about calling apply_headers() from inside get_file_or_dir() and get_index_file()? Or at least put a big fat warning saying get_file_or_dir() may or may not have already sent the headers, indicated by the type of file returned.

Another option might be to split list_directory() into a get-file stage and an apply-headers stage. But in my experience the headers to send are usually tightly coupled with the content body. In this case both vary depending on the redirect, generated directory, or static file cases.
msg235167 - (view) Author: Ent (last-ent) * Date: 2015-02-01 13:28
@vadmium: Thanks for the suggestion - I opted for the first one. While I wasn't happy with it being called twice, wasn't getting an idea of how to address it. Am I supposed to include your patch as part of my patch? I am not much knowledgeable about the protocol to follow in such cases, so I haven't included it.
msg235243 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-02 11:50
You’re welcome to merge my test patch into yours if you want to. Or I could open a separate issue for it, I don’t mind.
msg235387 - (view) Author: Ent (last-ent) * Date: 2015-02-04 15:17
No I think it's better if you put up a separate patch. That way any questions other reviewers will have, you will be better suited to answer them.

Cheers!
msg235731 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-11 07:45
Opened Issue 23440 for my test changes
msg245444 - (view) Author: Ent (last-ent) * Date: 2015-06-17 19:47
Hello,

Since this patch is in acceptable state, should the Status or Resolution be changed so that it is flagged to be merged into Python 3.5?

Thanks.
msg245446 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2015-06-17 20:37
Ent, thanks for all your work on this, and thanks to Demian and Martin for their reviews. In the meantime, the Python 3.5 release cycle has reached feature code cutoff so, at this point, generally only bug fixes are being accepted into 3.5, which means this refactoring will likely land in 3.6.  The next step is for a core developer to do a final review and then decide whether to commit it.  Berker has been commenting on this issue so perhaps he will have time to handle it as 3.6 gets underway.  (http is one of the modules/packages in the standard library where currently no core developer has volunteered to be its "expert" so we all share responsibility for it as time and interest permits.

https://docs.python.org/devguide/experts.html)
msg245458 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2015-06-18 10:01
Ned is correct. I started to review the patch, but couldn't find time to do a complete review. I'll take a look at it in a week or two. Thanks!
msg245472 - (view) Author: Ent (last-ent) * Date: 2015-06-18 16:19
Thanks Ned & Berker,

I can only imagine the amount of work the core devs have to deal with.
Hope my patch makes it through in next version.

Regards,
Ent
msg253416 - (view) Author: Ent (last-ent) * Date: 2015-10-24 20:08
Hi,

Is it possible for this patch to be reviewed now?

Regards,
Ent
History
Date User Action Args
2016-02-02 21:37:44last-entsetstatus: open -> languishing
2015-10-24 20:08:57last-entsetmessages: + msg253416
2015-06-18 16:19:16last-entsetmessages: + msg245472
2015-06-18 10:01:48berker.peksagsetassignee: berker.peksag
messages: + msg245458
2015-06-17 20:37:12ned.deilysetversions: + Python 3.6, - Python 3.5
nosy: + ned.deily

messages: + msg245446

keywords: + needs review
2015-06-17 19:47:39last-entsetmessages: + msg245444
2015-02-11 07:45:28martin.pantersetmessages: + msg235731
2015-02-04 15:17:16last-entsetmessages: + msg235387
2015-02-02 11:50:38martin.pantersetmessages: + msg235243
2015-02-01 13:28:05last-entsetfiles: + Feb1stb.patch

messages: + msg235167
2015-02-01 09:08:22martin.pantersetfiles: + index-test.2.patch

messages: + msg235149
2015-02-01 06:37:54last-entsetfiles: + Feb1st.patch

messages: + msg235144
2015-02-01 00:28:25martin.pantersetfiles: + index-test.patch

messages: + msg235137
2015-01-29 05:32:45last-entsetmessages: + msg234948
2015-01-27 18:22:11berker.peksagsetmessages: + msg234833
2015-01-27 17:47:25last-entsetfiles: + jan27th.patch

messages: + msg234830
2015-01-26 16:40:48demian.brechtsetmessages: + msg234753
2015-01-25 07:26:11last-entsetfiles: + jan25.patch

messages: + msg234657
2015-01-24 19:43:49last-entsetfiles: + jan24th.patch

messages: + msg234635
2015-01-23 18:41:14demian.brechtsetmessages: + msg234571
2015-01-23 17:44:15berker.peksagsetmessages: + msg234566
2015-01-23 17:43:32last-entsetmessages: + msg234565
2015-01-23 17:34:50last-entsetmessages: + msg234564
2015-01-23 17:10:39demian.brechtsetmessages: + msg234562
2015-01-20 09:26:07last-entsetfiles: + helpers-unittests-docs.patch

messages: + msg234362
2015-01-20 02:00:45demian.brechtsetnosy: + demian.brecht
2015-01-19 07:16:38last-entsetmessages: + msg234305
2015-01-18 06:43:54last-entsetfiles: + simplehttp-fcn-rnm-doc.patch

messages: + msg234224
2015-01-17 21:12:40martin.pantersetnosy: + martin.panter
2015-01-17 14:11:59last-entsetfiles: + simplehttp1.patch

messages: + msg234167
2015-01-17 13:55:34berker.peksagsetmessages: + msg234166
2015-01-17 13:54:32berker.peksagsetnosy: + berker.peksag
stage: patch review

versions: + Python 3.5, - Python 3.4
2015-01-17 13:51:58last-entcreate