Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use html.escape to replace _quote_html in http.server #70772

Closed
zhangyangyu opened this issue Mar 18, 2016 · 17 comments
Closed

Use html.escape to replace _quote_html in http.server #70772

zhangyangyu opened this issue Mar 18, 2016 · 17 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@zhangyangyu
Copy link
Member

BPO 26585
Nosy @vadmium, @zhangyangyu
Dependencies
  • bpo-26609: Wrong request target in test_httpservers.py
  • Files
  • _quote_html_to_html_escape.patch: Use html.escape to replace _quote_html for BaseHTTPRequestHandler
  • _quote_html_to_html_escape_v2.patch
  • _quote_html_to_html_escape_v3.patch
  • _quote_html_to_html_escape_v4.patch
  • _quote_html_to_html_escape_v5.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-04-11.04:04:32.670>
    created_at = <Date 2016-03-18.03:18:06.756>
    labels = ['type-feature', 'library']
    title = 'Use html.escape to replace _quote_html in http.server'
    updated_at = <Date 2016-04-11.07:15:56.590>
    user = 'https://github.com/zhangyangyu'

    bugs.python.org fields:

    activity = <Date 2016-04-11.07:15:56.590>
    actor = 'xiang.zhang'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-04-11.04:04:32.670>
    closer = 'martin.panter'
    components = ['Library (Lib)']
    creation = <Date 2016-03-18.03:18:06.756>
    creator = 'xiang.zhang'
    dependencies = ['26609']
    files = ['42195', '42204', '42205', '42218', '42220']
    hgrepos = []
    issue_num = 26585
    keywords = ['patch']
    message_count = 17.0
    messages = ['261943', '261944', '261950', '261951', '261975', '261978', '262001', '262018', '262021', '262040', '262041', '262057', '262062', '262814', '263159', '263164', '263170']
    nosy_count = 3.0
    nosy_names = ['python-dev', 'martin.panter', 'xiang.zhang']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue26585'
    versions = ['Python 3.6']

    @zhangyangyu
    Copy link
    Member Author

    In http.server, _quote_html is used to escape content for BaseHTTPServer. The function has already been implemented by html.escape.

    @zhangyangyu zhangyangyu added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Mar 18, 2016
    @zhangyangyu
    Copy link
    Member Author

    It's BaseHTTPRequestHandler, not BaseHTTPServer.

    @vadmium
    Copy link
    Member

    vadmium commented Mar 18, 2016

    Removing the redundant _quote_html() function seems like a good idea to me. I wonder if the code should be using the html.escape(..., quote=False) option though, because it does not need to encode quote signs.

    It might be good to add a test. It looks like there may not be anything testing the HTML encoding.

    @zhangyangyu
    Copy link
    Member Author

    At first I also want to use html.escape(..., quote=False) since the spec only asks to escape quote signs in attribute. But after some search on Google, there are articles recommends escaping quote in content too: https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet

    @zhangyangyu
    Copy link
    Member Author

    I add two tests for html escaping. One for the error message and the other for display name in SimpleHTTPRequesthandler.

    @zhangyangyu
    Copy link
    Member Author

    make some change to test for html escape in SimpleHTTPRequestHandler

    @vadmium
    Copy link
    Member

    vadmium commented Mar 18, 2016

    Thanks for the tests. I left a couple comments.

    About encoding quotes: Personally I don’t see much value unless you are encoding an attribute value, in which case I would prefer to use xml.sax.saxutils.quoteattr(). Encoded quotes would only become useful if the “error_message_format” attribute was modified.

    A more practical downside is that if “error_content_type” is set to say text/plain, we are adding two somewhat common characters that will get messed up. E.g. the “explain” string for 429 Too Many Requests will include the double-quoted "rate limiting". And an apostrophe could easily be given in a custom error message, e.g. “Can't write a clean error message”.

    @zhangyangyu
    Copy link
    Member Author

    Thanks for your review. Set quote to False when html.escape is OK to me. But except for the usage in BaseHTTPRequestHandler, I think we should also set quote to False for the usage in SimpleHTTPRequestHandler since they do not appear in attribute either. And I left a reply to your comment. How do you think about these?

    @vadmium
    Copy link
    Member

    vadmium commented Mar 19, 2016

    Yeah I would be happy if you want to change to html.escape(quote=False) in list_directory() as well.

    BTW I am getting Undelivered Mail replies from the review comments. I guess they might be getting rejected due to looking like spam (they tend to be classified as spam by default for me, something about how the server sends them).

    @zhangyangyu
    Copy link
    Member Author

    OK. You left the comment preferring using ascii or utf-8 to encode the filename in test. But actually the response SimpeHTTPRequestHandler send is explicitly encoded in filesystemdefaultencoding. So in such a case, am I doing right?

    @zhangyangyu
    Copy link
    Member Author

    Oops. You have already left new comment. No notify either. :(

    I like the idea that extract the actual encoding from response header. I will update my patch then.

    @zhangyangyu
    Copy link
    Member Author

    The uploaded patch is updated. Hope you have time to review.

    @zhangyangyu
    Copy link
    Member Author

    Thanks for the reviews. I have updated the patch.

    @vadmium
    Copy link
    Member

    vadmium commented Apr 3, 2016

    I left a couple notes on minor style nits (which I can fix when I commit this). But perhaps we should fix the business with tempdir_name and absolute URL paths (bpo-26609) first.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 11, 2016

    New changeset bf44913588b7 by Martin Panter in branch 'default':
    Issue bpo-26585: Eliminate _quote_html() and use html.escape(quote=False)
    https://hg.python.org/cpython/rev/bf44913588b7

    @vadmium
    Copy link
    Member

    vadmium commented Apr 11, 2016

    Thanks for the patch Xiang

    @vadmium vadmium closed this as completed Apr 11, 2016
    @zhangyangyu
    Copy link
    Member Author

    Happy to see it works. Thanks for your reviews too. :)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants