This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author waylan
Recipients waylan
Date 2020-10-10.01:08:28
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1602292109.75.0.598084394689.issue41989@roundup.psfhosted.org>
In-reply-to
Content
When the `close` method of the HtmlParser is called, any cached text data is generally flushed and passed to a `data` event; except when in `data_mode`. Specifically, if an unclosed `script` or `style` tag has been encountered, a call to `close` does not flush the data.

A simple test which demonstrates the issue is attached.

I see that in Lib/html/parser.py#L244-L249 there are two nested if statements which both check for `not self.cdata_elem`. Obviously, if we got past the first one, that situation will never exist for the nested one. Somehow this block of code needs a branch for when `self.cdata_elem` is True.

I should note that the input is invalid HTML. However, the existing behavior results in data loss. Within any other unclosed tag (other than `script` or `style`) any data is still flushed and passed to a `data` event. I would expect the same behavior here. Although, the data escaping behavior should perhaps be applied as it is with data within properly closed tags.
History
Date User Action Args
2020-10-10 01:08:29waylansetrecipients: + waylan
2020-10-10 01:08:29waylansetmessageid: <1602292109.75.0.598084394689.issue41989@roundup.psfhosted.org>
2020-10-10 01:08:29waylanlinkissue41989 messages
2020-10-10 01:08:29waylancreate