This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Matt.Basta
Recipients Hunanyan, Matt.Basta, cpalmer, eric.araujo, ezio.melotti, fantoozler, fdrake, friday, georg.brandl, gsf, momat, orsenthil, r.david.murray, yotam
Date 2011-07-27.16:53:52
SpamBayes Score 2.38792e-05
Marked as misclassified No
Message-id <>
> So I think the example is invalid (should escape the <), and that HTMLParser is not buggy.

On the other hand, the HTML5 spec clearly dictates otherwise:
The text in raw text and RCDATA elements must not contain any occurrences of the string "</" (U+003C LESS-THAN SIGN, U+002F SOLIDUS) followed by characters that case-insensitively match the tag name of the element followed by one of U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), U+0020 SPACE, U+003E GREATER-THAN SIGN (>), or U+002F SOLIDUS (/).

Additionally, no browsers (perhaps unless they are in quirks mode) currently obey the HTML4 variant of the rule. This is due largely in part to the need to include strings such as "</scr" + "ipt>" within a script tag itself. This behavior can be observed firsthand by loading this snippet in a browser:

<script><span></span>This should not be visible.</script>
Date User Action Args
2011-07-27 16:53:53Matt.Bastasetrecipients: + Matt.Basta, fdrake, georg.brandl, yotam, orsenthil, fantoozler, gsf, cpalmer, ezio.melotti, eric.araujo, r.david.murray, momat, Hunanyan, friday
2011-07-27 16:53:53Matt.Bastasetmessageid: <>
2011-07-27 16:53:52Matt.Bastalinkissue670664 messages
2011-07-27 16:53:52Matt.Bastacreate