This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Regression in HTMLParser on malformed tags
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: dan
Priority: normal Keywords:

Created on 2020-10-06 09:02 by dan, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
testhtmlparse.zip dan, 2020-10-06 09:02 Script and data which reproduces the issue.
Messages (1)
msg378101 - (view) Author: Dan (dan) Date: 2020-10-06 09:02
The attached HTML document (pulled from a Samsung printer web interface) contains the following invalid HTML tag:
<img style="vertical-align:bottom;" ,="" src="images/sws/icon_alert_warning_16.gif" title="Warning">
(invalid because of ,="")
In Python 3.x completely stops the HTML parser, preventing any further tags from being parsed. This does not happen in Python 2.x
See the attached Python script, which counts the number of "input" tags. When executed using Python 2.7, it correctly counts 4 such tags. When executed using Python 3.8 it only finds 1.
History
Date User Action Args
2022-04-11 14:59:36adminsetgithub: 86122
2020-10-06 09:02:38dancreate