This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients eric.araujo, ezio.melotti
Date 2011-12-11.01:58:32
SpamBayes Score 3.6169123e-11
Marked as misclassified No
Message-id <1323568714.45.0.610853679425.issue13576@psf.upfronthosting.co.za>
In-reply-to
Content
The attached patch adds a few tests about the handling of broken conditional comments (condcoms).
A valid condcom looks like <!--[if ie 6]>...<![endif]-->.
An invalid one looks like <![if ie 6]>...<![endif]>.
This seems a common mistake, and it's found even on popular sites like adobe, linkedin, deviantart.

Currently, HTMLParser calls unknown_decl() passing e.g. 'if ie 6', and if strict=True an error is raised.  With strict=False no error is raised and the unknown declaration is ignored.

The HTML5 specs say:
"""
[After '<!',] If the next two characters are both U+002D HYPHEN-MINUS characters (-), consume those two characters, [...]
Otherwise, this is a parse error. Switch to the bogus comment state.[0]

[Once in the bogus comment state,] Consume every character up to and including the first U+003E GREATER-THAN SIGN character (>) or the end of the file (EOF), whichever comes first. Emit a comment token whose data is the concatenation of all the characters starting from and including the character that caused the state machine to switch into the bogus comment state, up to and including the character immediately before the last consumed character (i.e. up to the character just before the U+003E or EOF character), but with any U+0000 NULL characters replaced by U+FFFD REPLACEMENT CHARACTER characters. (If the comment was started by the end of the file (EOF), the token is empty.)[1]
"""

So, IIUC, '<![if ie 6]>...<![endif]>' should emit a '[if ie 6]' comment, parse the '...' normally, and emit a '[endif]' comment.

However I think it's fine to leave the current behavior for the following reasons:
  1) backward compatibility;
  2) handling broken condcoms in unknown_decl is easier than doing it in handle_comment, where all the other comments are sent;
  3) no one probably cares about them anyway;

[0]: http://www.w3.org/TR/html5/tokenization.html#markup-declaration-open-state
[1]: http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state
History
Date User Action Args
2011-12-11 01:58:35ezio.melottisetrecipients: + ezio.melotti, eric.araujo
2011-12-11 01:58:34ezio.melottisetmessageid: <1323568714.45.0.610853679425.issue13576@psf.upfronthosting.co.za>
2011-12-11 01:58:33ezio.melottilinkissue13576 messages
2011-12-11 01:58:33ezio.melotticreate