classification
Title: SimpleHTTPRequestHandler shouldn't redirect to directories with code 301
Type: Stage:
Components: Library (Lib) Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: agnosticdev, berker.peksag, eric.smith, oulenz
Priority: normal Keywords:

Created on 2018-03-29 16:08 by oulenz, last changed 2018-04-03 11:02 by agnosticdev.

Messages (9)
msg314663 - (view) Author: Oliver Urs Lenz (oulenz) Date: 2018-03-29 16:08
SimpleHTTPRequestHandler.send_head() has this bit:

        if os.path.isdir(path):
            parts = urllib.parse.urlsplit(self.path)
            if not parts.path.endswith('/'):
                # redirect browser - doing basically what apache does
                self.send_response(HTTPStatus.MOVED_PERMANENTLY)

https://github.com/python/cpython/blob/521995205a2cb6b504fe0e39af22a81f785350a3/Lib/http/server.py#L676

I think there are two issues here:
1) why should the server return a redirect code here, and not (in the code that immediately follows) when it serves an index file?
2) code 301 (permanent redirect) is really unforgiving, browsers like Firefox and Chrome will permanently cache the redirect, making it essentially impossible to undo if you do not control the client, and not trivial even if you do. This will probably not change on the browser-side, general opinion seems to be that limited caching should either be specified in the response header or that a different redirect code should be sent back.
https://lists.w3.org/Archives/Public/ietf-http-wg/2017OctDec/thread.html#msg363

Therefore I would like to propose that preferably,
- no redirect code be sent back, or else that
- a different redirect code be sent back, or else that
- no-caching or a time limit be added to the header

(This may require that send_head check for index files instead)
msg314687 - (view) Author: Matt Eaton (agnosticdev) * Date: 2018-03-30 15:54
Oliver,

A possible option that would work for both the client side caching and the server would be to pass back a Cache-Control header with a max-age attached when the 301 is returned.

I am thinking something like this:

if os.path.isdir(path):
    parts = urllib.parse.urlsplit(self.path)
    if not parts.path.endswith('/'):
        # redirect browser - doing basically what apache does
        self.send_response(HTTPStatus.MOVED_PERMANENTLY)
        new_parts = (parts[0], parts[1], parts[2] + '/',
                     parts[3], parts[4])
        new_url = urllib.parse.urlunsplit(new_parts)
        self.send_header("Location", new_url)
        self.send_header("Cache-Control", "max-age=600")
        self.end_headers()
        return None
    for index in "index.html", "index.htm":
        index = os.path.join(path, index)
        if os.path.exists(index):
            path = index
            break


Possibly we could even provide some way for the max age to be variable.
Thoughts?
msg314697 - (view) Author: Oliver Urs Lenz (oulenz) Date: 2018-03-30 17:22
That would definitely take the sting out of that permanent redirect. But I am still wondering why we redirect at all. Why not leave redirects to the user and do:

if os.path.isdir(path):
    if not path.endswith('/'):
        path = path + '/'
    for index in "index.html", "index.htm":
        index = os.path.join(path, index)
        if os.path.exists(index):
            path = index
            break
msg314698 - (view) Author: Oliver Urs Lenz (oulenz) Date: 2018-03-30 17:26
In fact, since we use os.path.join, we could remove that condition altogether:

if os.path.isdir(path):
    for index in "index.html", "index.htm":
        index = os.path.join(path, index)
        if os.path.exists(index):
            path = index
            break
msg314701 - (view) Author: Matt Eaton (agnosticdev) * Date: 2018-03-30 17:33
It looks like the 301 redirect is in place to emulate the behavior that Apache provides when it runs into a trailing slash.  This is observed when this condition is met:

parts = urllib.parse.urlsplit(self.path)
if not parts.path.endswith('/'):

Apache Directory Index Redirect:
http://httpd.apache.org/docs/current/mod/mod_dir.html#DirectoryIndexRedirect
msg314702 - (view) Author: Oliver Urs Lenz (oulenz) Date: 2018-03-30 17:48
Yes, but Apache's redirect can be turned off, whereas this can't, so it seems unnecessarily limiting to force this on users. Note that most of the motivation given by Apache doesn't apply here: with my proposed simplification, both types of indices would still be served whether the path contains a trailing slash or not.
msg314705 - (view) Author: Matt Eaton (agnosticdev) * Date: 2018-03-30 18:22
I agree with you in regards to the statement, "Apache's redirect can be turned off, whereas this can't."  That is why my first thought would be to try and control this response to the client a bit better by providing a variable max-age.

self.send_header("Cache-Control", "max-age=xxx")

Removing the condition completely would be good idea as well, but would be my second course of action due to this functionality being in the Python code base for awhile now.

if os.path.isdir(path):
    for index in "index.html", "index.htm":
        index = os.path.join(path, index)
        if os.path.exists(index):
            path = index
            break


Those are my thoughts on the matter at least, possibly someone else in the community could weigh in as well?
msg314737 - (view) Author: Matt Eaton (agnosticdev) * Date: 2018-03-31 12:05
Adding some core team members to the nosy list to get their take on this as well.
msg314876 - (view) Author: Matt Eaton (agnosticdev) * Date: 2018-04-03 11:02
Wanted to check in on this to see if there was any feedback by any community or core members?
History
Date User Action Args
2018-04-03 11:02:24agnosticdevsetmessages: + msg314876
2018-03-31 12:05:11agnosticdevsetnosy: + eric.smith, berker.peksag
messages: + msg314737
2018-03-30 18:22:15agnosticdevsetmessages: + msg314705
2018-03-30 17:48:35oulenzsetmessages: + msg314702
2018-03-30 17:33:20agnosticdevsetmessages: + msg314701
2018-03-30 17:26:26oulenzsetmessages: + msg314698
2018-03-30 17:22:57oulenzsetmessages: + msg314697
2018-03-30 15:54:26agnosticdevsetnosy: + agnosticdev
messages: + msg314687
2018-03-29 16:08:41oulenzcreate