Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty response from http.server when directory listing contains invalid unicode #66361

Closed
jleedev mannequin opened this issue Aug 7, 2014 · 16 comments
Closed

Empty response from http.server when directory listing contains invalid unicode #66361

jleedev mannequin opened this issue Aug 7, 2014 · 16 comments
Assignees
Labels
OS-mac stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@jleedev
Copy link
Mannequin

jleedev mannequin commented Aug 7, 2014

BPO 22165
Nosy @ronaldoussoren, @orsenthil, @vstinner, @ned-deily, @bitdancer, @hynek, @vadmium, @serhiy-storchaka, @jleedev, @demianbrecht
Files
  • issue22165.patch
  • issue22165_2.patch
  • test_undecodable_filename.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2014-08-19.03:19:01.155>
    created_at = <Date 2014-08-07.15:06:33.716>
    labels = ['OS-mac', 'type-bug', 'library']
    title = 'Empty response from http.server when directory listing contains invalid unicode'
    updated_at = <Date 2015-01-05.09:06:09.116>
    user = 'https://github.com/jleedev'

    bugs.python.org fields:

    activity = <Date 2015-01-05.09:06:09.116>
    actor = 'python-dev'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2014-08-19.03:19:01.155>
    closer = 'orsenthil'
    components = ['Library (Lib)', 'macOS']
    creation = <Date 2014-08-07.15:06:33.716>
    creator = 'jleedev'
    dependencies = []
    files = ['36304', '36388', '36390']
    hgrepos = []
    issue_num = 22165
    keywords = ['patch']
    message_count = 16.0
    messages = ['225016', '225030', '225031', '225035', '225036', '225407', '225409', '225426', '225427', '225428', '225430', '225431', '225433', '225435', '225444', '233449']
    nosy_count = 12.0
    nosy_names = ['ronaldoussoren', 'orsenthil', 'vstinner', 'ned.deily', 'durin42', 'r.david.murray', 'python-dev', 'hynek', 'martin.panter', 'serhiy.storchaka', 'jleedev', 'demian.brecht']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue22165'
    versions = ['Python 3.4', 'Python 3.5']

    @jleedev
    Copy link
    Mannequin Author

    jleedev mannequin commented Aug 7, 2014

    While SimpleHTTPServer from Python2 would happily spit out whatever bytes were in the directory listing, Python3's http.server logs an error and closes the connection without responding to the HTTP request.

    $ mkdir $'\xff'
    $ ls
    \377/
    $ python3 -m http.server 
    Serving HTTP on 0.0.0.0 port 8000 ...

    Exception happened during processing of request from ('74.125.59.145', 19648)
    Traceback (most recent call last):
      File "/home/josh/local/lib/python3.5/socketserver.py", line 321, in _handle_request_noblock
        self.process_request(request, client_address)
      File "/home/josh/local/lib/python3.5/socketserver.py", line 347, in process_request
        self.finish_request(request, client_address)
      File "/home/josh/local/lib/python3.5/socketserver.py", line 360, in finish_request
        self.RequestHandlerClass(request, client_address, self)
      File "/home/josh/local/lib/python3.5/socketserver.py", line 681, in __init__
        self.handle()
      File "/home/josh/local/lib/python3.5/http/server.py", line 398, in handle
        self.handle_one_request()
      File "/home/josh/local/lib/python3.5/http/server.py", line 386, in handle_one_request
        method()
      File "/home/josh/local/lib/python3.5/http/server.py", line 677, in do_GET
        f = self.send_head()
      File "/home/josh/local/lib/python3.5/http/server.py", line 716, in send_head
        return self.list_directory(path)
      File "/home/josh/local/lib/python3.5/http/server.py", line 772, in list_directory
        % (urllib.parse.quote(linkname), html.escape(displayname)))
      File "/home/josh/local/lib/python3.5/urllib/parse.py", line 688, in quote
        string = string.encode(encoding, errors)
    UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 0: surrogates not allowed

    @jleedev jleedev mannequin added the stdlib Python modules in the Lib dir label Aug 7, 2014
    @bitdancer
    Copy link
    Member

    It should return a server error, I think.

    @bitdancer bitdancer added the type-bug An unexpected behavior, bug, or error label Aug 7, 2014
    @durin42
    Copy link
    Mannequin

    durin42 mannequin commented Aug 7, 2014

    Why not treat the filename as opaque bytes, and let the client fetch it anyway?

    @bitdancer
    Copy link
    Member

    Because http traffic is supposed to be either latin-1 or whatever charset is specified (at least, to my understanding that is the case), so sending incorrectly encoded data seems wrong.

    On the other hand, we support unix files systems not having well defined charsets, so extending this to directory listings in http isn't crazy. That does raise the question, though, of passing the bytes through python3's string model without breaking anything, so some careful thought may be required. I haven't looked at the details, though, so it might well be pretty simple.

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch which fixes handling of undecodable paths in SimpleHTTPRequestHandler.

    @orsenthil
    Copy link
    Member

    Attached patch looks good to me. If an unittest can be provided for this situation in test_httpservers.py it will be comprehensive and will be good to go.

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch with a test.

    @serhiy-storchaka serhiy-storchaka self-assigned this Aug 16, 2014
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 17, 2014

    New changeset f180a9156cc8 by Serhiy Storchaka in branch '3.4':
    Issue bpo-22165: SimpleHTTPRequestHandler now supports undecodable file names.
    http://hg.python.org/cpython/rev/f180a9156cc8

    New changeset 3153a400b739 by Serhiy Storchaka in branch 'default':
    Issue bpo-22165: SimpleHTTPRequestHandler now supports undecodable file names.
    http://hg.python.org/cpython/rev/3153a400b739

    @serhiy-storchaka
    Copy link
    Member

    Thank you for the review Senthil.

    @orsenthil
    Copy link
    Member

    Looks like we hit with an encoding issue, which is due to way os.fsdecode() and os.listdir() decode the filenames.

    >>> support.TESTFN_UNDECODABLE
    b'@test_99678_tmp\xe7w\xf0'
    >>> dir_list = os.listdir(self.tempdir)
    >>> dir_list
    ['@test_99678_tmp%E7w%F0.txt', 'test']
    >>> filename = os.fsdecode(support.TESTFN_UNDECODABLE) + '.txt'
    >>> filename
    '@test_99678_tmp\udce7w\udcf0.txt'

    ======================================================================
    FAIL: test_undecodable_filename (test.test_httpservers.SimpleHTTPServerTestCase)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/Users/buildbot/buildarea/3.4.murray-snowleopard/build/Lib/test/test_httpservers.py", line 282, in test_undecodable_filename
        .encode('utf-8', 'surrogateescape'), body)
    AssertionError: b'href="%40test_62069_tmp%ED%B3%A7w%ED%B3%B0.txt"' not found in b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">\n<html>\n<head>\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8">\n<title>Directory listing for tmp0asrs9ei/</title>\n</head>\n<body>\n<h1>Directory listing for tmp0asrs9ei/</h1>\n<hr>\n<ul>\n<li><a href="%40test_62069_tmp%25E7w%25F0.txt">@test_62069_tmp%E7w%F0.txt</a></li>\n<li><a href="test">test</a></li>\n</ul>\n<hr>\n</body>\n</html>\n'

    @orsenthil orsenthil reopened this Aug 17, 2014
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 17, 2014

    New changeset a894b629bbea by Serhiy Storchaka in branch '3.4':
    Issue bpo-22165: Fixed test_undecodable_filename on non-UTF-8 locales.
    http://hg.python.org/cpython/rev/a894b629bbea

    New changeset 7cdc941d5180 by Serhiy Storchaka in branch 'default':
    Issue bpo-22165: Fixed test_undecodable_filename on non-UTF-8 locales.
    http://hg.python.org/cpython/rev/7cdc941d5180

    @serhiy-storchaka
    Copy link
    Member

    Oh, I missed that os.listdir() on Mac returns really strange result. Thank you Senthil.

    Here is a patch which try to workaround this. I'm not sure that it is enough. May be we should fix os.listdir(). Or conclude that this issue can't be fixed on Mac OS.

    @ronaldoussoren
    Copy link
    Contributor

    OSX returns a strange value in os.listdir because the HFS+ filesystem itself has unicode filenames and transforms byte strings that are assumed to contain UTF-8 into something the filesystem can handle (and seems to replace bytes that aren't valid UTF-8 into a percent-encoded value).

    @serhiy-storchaka
    Copy link
    Member

    Well, then the workaround should work.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 17, 2014

    New changeset b05d4f3ee190 by Serhiy Storchaka in branch '3.4':
    Issue bpo-22165: Fixed test_undecodable_filename on Mac OS.
    http://hg.python.org/cpython/rev/b05d4f3ee190

    New changeset 58e0d2c3ead8 by Serhiy Storchaka in branch 'default':
    Issue bpo-22165: Fixed test_undecodable_filename on Mac OS.
    http://hg.python.org/cpython/rev/58e0d2c3ead8

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 5, 2015

    New changeset 1bc41bbbe02d by Ned Deily in branch '3.4':
    Issue bpo-22165: Skip test_undecodable_filename on OS X prior to 10.5.
    https://hg.python.org/cpython/rev/1bc41bbbe02d

    New changeset 85258e08b69b by Ned Deily in branch 'default':
    Issue bpo-22165: merge from 3.4
    https://hg.python.org/cpython/rev/85258e08b69b

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    OS-mac stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants