Author Matt.Basta
Recipients Hunanyan, Matt.Basta, cpalmer, eric.araujo, ezio.melotti, fantoozler, fdrake, friday, georg.brandl, gsf, momat, orsenthil, r.david.murray, yotam
Date 2011-07-27.16:53:52
SpamBayes Score 2.38792e-05
Marked as misclassified No
Message-id <1311785633.2.0.721784130542.issue670664@psf.upfronthosting.co.za>
In-reply-to
Content
> So I think the example is invalid (should escape the <), and that HTMLParser is not buggy.

On the other hand, the HTML5 spec clearly dictates otherwise:

http://www.w3.org/TR/html5/syntax.html#cdata-rcdata-restrictions
The text in raw text and RCDATA elements must not contain any occurrences of the string "</" (U+003C LESS-THAN SIGN, U+002F SOLIDUS) followed by characters that case-insensitively match the tag name of the element followed by one of U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), U+0020 SPACE, U+003E GREATER-THAN SIGN (>), or U+002F SOLIDUS (/).


Additionally, no browsers (perhaps unless they are in quirks mode) currently obey the HTML4 variant of the rule. This is due largely in part to the need to include strings such as "</scr" + "ipt>" within a script tag itself. This behavior can be observed firsthand by loading this snippet in a browser:

<script><span></span>This should not be visible.</script>
History
Date User Action Args
2011-07-27 16:53:53Matt.Bastasetrecipients: + Matt.Basta, fdrake, georg.brandl, yotam, orsenthil, fantoozler, gsf, cpalmer, ezio.melotti, eric.araujo, r.david.murray, momat, Hunanyan, friday
2011-07-27 16:53:53Matt.Bastasetmessageid: <1311785633.2.0.721784130542.issue670664@psf.upfronthosting.co.za>
2011-07-27 16:53:52Matt.Bastalinkissue670664 messages
2011-07-27 16:53:52Matt.Bastacreate