This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients berker.peksag, ezio.melotti, jkamdjou, kodial, xtreak
Date 2020-01-05.17:07:32
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1578244053.7.0.603942575705.issue34480@roundup.psfhosted.org>
In-reply-to
Content
HTMLParser is supposed to follow the HTML5 standard, and never raise an error.

For the example in the first comment ("<![hi world]>"), the steps should be:

* https://html.spec.whatwg.org/multipage/parsing.html#data-state:tag-open-state
* https://html.spec.whatwg.org/multipage/parsing.html#tag-open-state:markup-declaration-open-state
* https://html.spec.whatwg.org/multipage/parsing.html#markup-declaration-open-state:bogus-comment-state
* https://html.spec.whatwg.org/multipage/parsing.html#bogus-comment-state

I agree that the error should be fixed by setting `match` to None, and a test case that triggers the UnboundLocalError (before the fix) should be added as well (what provided by Karthikeyan looks good).

However, it also seems wrong that HTMLParser ends up calling self.error() through  Lib/_markupbase.py ParserBase after HTMLParser.error() and all the calls to it have been removed.  _markupbase.py is internal, so it should be safe to remove ParserBase.error() and the code that calls it as suggested in #31844 (and possibly to merge _markupbase into html.parser too).  Even if this is done and the call to self.error() is removed from ParserBase.parse_marked_section(), `match` still needs to be set to None (either in the `else` branch or before the `if/elif` block).
History
Date User Action Args
2020-01-05 17:07:33ezio.melottisetrecipients: + ezio.melotti, berker.peksag, xtreak, kodial, jkamdjou
2020-01-05 17:07:33ezio.melottisetmessageid: <1578244053.7.0.603942575705.issue34480@roundup.psfhosted.org>
2020-01-05 17:07:33ezio.melottilinkissue34480 messages
2020-01-05 17:07:32ezio.melotticreate