classification
Title: Use html.escape to replace _quote_html in http.server
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: 26609 Superseder:
Assigned To: Nosy List: martin.panter, python-dev, xiang.zhang
Priority: normal Keywords: patch

Created on 2016-03-18 03:18 by xiang.zhang, last changed 2016-04-11 07:15 by xiang.zhang. This issue is now closed.

Files
File name Uploaded Description Edit
_quote_html_to_html_escape.patch xiang.zhang, 2016-03-18 03:18 Use html.escape to replace _quote_html for BaseHTTPRequestHandler review
_quote_html_to_html_escape_v2.patch xiang.zhang, 2016-03-18 16:55 review
_quote_html_to_html_escape_v3.patch xiang.zhang, 2016-03-18 17:14 review
_quote_html_to_html_escape_v4.patch xiang.zhang, 2016-03-19 18:13 review
_quote_html_to_html_escape_v5.patch xiang.zhang, 2016-03-20 07:50 review
Messages (17)
msg261943 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-03-18 03:18
In http.server, _quote_html is used to escape content for BaseHTTPServer. The function has already been implemented by html.escape.
msg261944 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-03-18 03:47
It's BaseHTTPRequestHandler, not BaseHTTPServer.
msg261950 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-03-18 07:31
Removing the redundant _quote_html() function seems like a good idea to me. I wonder if the code should be using the html.escape(..., quote=False) option though, because it does not need to encode quote signs.

It might be good to add a test. It looks like there may not be anything testing the HTML encoding.
msg261951 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-03-18 07:37
At first I also want to use html.escape(..., quote=False) since the spec only asks to escape quote signs in attribute. But after some search on Google, there are articles recommends escaping quote in content too: https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet
msg261975 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-03-18 16:55
I add two tests for html escaping. One for the error message and the other for display name in SimpleHTTPRequesthandler.
msg261978 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-03-18 17:14
make some change to test for html escape in SimpleHTTPRequestHandler
msg262001 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-03-18 22:32
Thanks for the tests. I left a couple comments.

About encoding quotes: Personally I don’t see much value unless you are encoding an attribute value, in which case I would prefer to use xml.sax.saxutils.quoteattr(). Encoded quotes would only become useful if the “error_message_format” attribute was modified.

A more practical downside is that if “error_content_type” is set to say text/plain, we are adding two somewhat common characters that will get messed up. E.g. the “explain” string for 429 Too Many Requests will include the double-quoted "rate limiting". And an apostrophe could easily be given in a custom error message, e.g. “Can't write a clean error message”.
msg262018 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-03-19 05:07
Thanks for your review. Set quote to False when html.escape is OK to me. But except for the usage in BaseHTTPRequestHandler, I think we should also set quote to False for the usage in SimpleHTTPRequestHandler since they do not appear in attribute either. And I left a reply to your comment. How do you think about these?
msg262021 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-03-19 06:59
Yeah I would be happy if you want to change to html.escape(quote=False) in list_directory() as well.

BTW I am getting Undelivered Mail replies from the review comments. I guess they might be getting rejected due to looking like spam (they tend to be classified as spam by default for me, something about how the server sends them).
msg262040 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-03-19 11:33
OK. You left the comment preferring using ascii or utf-8 to encode the filename in test. But actually the response SimpeHTTPRequestHandler send is explicitly encoded in filesystemdefaultencoding. So in such a case, am I doing right?
msg262041 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-03-19 11:40
Oops. You have already left new comment. No notify either. :(

I like the idea that extract the actual encoding from response header. I will update my patch then.
msg262057 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-03-19 18:13
The uploaded patch is updated. Hope you have time to review.
msg262062 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-03-20 07:50
Thanks for the reviews. I have updated the patch.
msg262814 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-03 04:40
I left a couple notes on minor style nits (which I can fix when I commit this). But perhaps we should fix the business with tempdir_name and absolute URL paths (Issue 26609) first.
msg263159 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-04-11 01:20
New changeset bf44913588b7 by Martin Panter in branch 'default':
Issue #26585: Eliminate _quote_html() and use html.escape(quote=False)
https://hg.python.org/cpython/rev/bf44913588b7
msg263164 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-11 04:04
Thanks for the patch Xiang
msg263170 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-04-11 07:15
Happy to see it works. Thanks for your reviews too. :)
History
Date User Action Args
2016-04-11 07:15:56xiang.zhangsetmessages: + msg263170
2016-04-11 04:04:32martin.pantersetstatus: open -> closed
resolution: fixed
messages: + msg263164

stage: patch review -> resolved
2016-04-11 01:20:28python-devsetnosy: + python-dev
messages: + msg263159
2016-04-03 04:40:49martin.pantersetdependencies: + Wrong request target in test_httpservers.py
messages: + msg262814
2016-03-20 07:50:41xiang.zhangsetfiles: + _quote_html_to_html_escape_v5.patch

messages: + msg262062
2016-03-19 18:13:41xiang.zhangsetfiles: + _quote_html_to_html_escape_v4.patch

messages: + msg262057
2016-03-19 11:40:03xiang.zhangsetmessages: + msg262041
2016-03-19 11:33:01xiang.zhangsetmessages: + msg262040
2016-03-19 06:59:19martin.pantersetmessages: + msg262021
2016-03-19 05:07:31xiang.zhangsetmessages: + msg262018
2016-03-18 22:32:17martin.pantersetmessages: + msg262001
2016-03-18 17:14:47xiang.zhangsetfiles: + _quote_html_to_html_escape_v3.patch

messages: + msg261978
2016-03-18 16:55:56xiang.zhangsetfiles: + _quote_html_to_html_escape_v2.patch

messages: + msg261975
2016-03-18 07:37:20xiang.zhangsetmessages: + msg261951
2016-03-18 07:33:57martin.pantersetstage: patch review
2016-03-18 07:31:06martin.pantersetnosy: + martin.panter
messages: + msg261950
2016-03-18 03:47:46xiang.zhangsetmessages: + msg261944
2016-03-18 03:18:06xiang.zhangcreate