Title: Regression in HTMLParser on malformed tags
Components: Library (Lib) Versions: Python 3.8
Author: Dan (dan) Date: 2020-10-06 09:02
The attached HTML document (pulled from a Samsung printer web interface) contains the following invalid HTML tag:
<img style="vertical-align:bottom;" ,="" src="images/sws/icon_alert_warning_16.gif" title="Warning">
(invalid because of ,="")
In Python 3.x completely stops the HTML parser, preventing any further tags from being parsed. This does not happen in Python 2.x
See the attached Python script, which counts the number of "input" tags. When executed using Python 2.7, it correctly counts 4 such tags. When executed using Python 3.8 it only finds 1.
