Message 359356 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	berker.peksag, ezio.melotti, jkamdjou, kodial, xtreak
Date	2020-01-05.17:07:32
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1578244053.7.0.603942575705.issue34480@roundup.psfhosted.org>
In-reply-to

Content
HTMLParser is supposed to follow the HTML5 standard, and never raise an error. For the example in the first comment ("<![hi world]>"), the steps should be: * https://html.spec.whatwg.org/multipage/parsing.html#data-state:tag-open-state * https://html.spec.whatwg.org/multipage/parsing.html#tag-open-state:markup-declaration-open-state * https://html.spec.whatwg.org/multipage/parsing.html#markup-declaration-open-state:bogus-comment-state * https://html.spec.whatwg.org/multipage/parsing.html#bogus-comment-state I agree that the error should be fixed by setting `match` to None, and a test case that triggers the UnboundLocalError (before the fix) should be added as well (what provided by Karthikeyan looks good). However, it also seems wrong that HTMLParser ends up calling self.error() through Lib/_markupbase.py ParserBase after HTMLParser.error() and all the calls to it have been removed. _markupbase.py is internal, so it should be safe to remove ParserBase.error() and the code that calls it as suggested in #31844 (and possibly to merge _markupbase into html.parser too). Even if this is done and the call to self.error() is removed from ParserBase.parse_marked_section(), `match` still needs to be set to None (either in the `else` branch or before the `if/elif` block).

HTMLParser is supposed to follow the HTML5 standard, and never raise an error.

For the example in the first comment ("<![hi world]>"), the steps should be:

* https://html.spec.whatwg.org/multipage/parsing.html#data-state:tag-open-state
* https://html.spec.whatwg.org/multipage/parsing.html#tag-open-state:markup-declaration-open-state
* https://html.spec.whatwg.org/multipage/parsing.html#markup-declaration-open-state:bogus-comment-state
* https://html.spec.whatwg.org/multipage/parsing.html#bogus-comment-state

I agree that the error should be fixed by setting `match` to None, and a test case that triggers the UnboundLocalError (before the fix) should be added as well (what provided by Karthikeyan looks good).

However, it also seems wrong that HTMLParser ends up calling self.error() through  Lib/_markupbase.py ParserBase after HTMLParser.error() and all the calls to it have been removed.  _markupbase.py is internal, so it should be safe to remove ParserBase.error() and the code that calls it as suggested in #31844 (and possibly to merge _markupbase into html.parser too).  Even if this is done and the call to self.error() is removed from ParserBase.parse_marked_section(), `match` still needs to be set to None (either in the `else` branch or before the `if/elif` block).

History
Date	User	Action	Args
2020-01-05 17:07:33	ezio.melotti	set	recipients: + ezio.melotti, berker.peksag, xtreak, kodial, jkamdjou
2020-01-05 17:07:33	ezio.melotti	set	messageid: <1578244053.7.0.603942575705.issue34480@roundup.psfhosted.org>
2020-01-05 17:07:33	ezio.melotti	link	issue34480 messages
2020-01-05 17:07:32	ezio.melotti	create