Author ezio.melotti
Recipients berker.peksag, ezio.melotti, hanno, steven.daprano
Date 2018-09-14.07:28:22
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
There are at least a couple of issues here.

The first one is the way the parser handles '<![...'.  The linked page contains markup like '<![STAT]-[USER-ACTIVE]!>' and since the parser currently checks for '<![' only, gets called and an error gets incorrectly raised.   
However " Markup declaration open state"[0], states that after consuming '<!', there are only 4  valid paths forward:
1) if we have '<!--', it's a comment;
2) if we have '<!doctype', it's a doctype declaration;
3) if we have '<![CDATA[', it's a CDATA section;
4) if it's something else, it's a bogus comment;

The above example should therefore fall into 4), and be treated like a bogus comment.

PR-9295 changes parse_html_declaration() to align to the specs and implement path 3), resulting in the webpage being parsed without errors (the invalid markup is considered as a bogus comment).

The second issue is about an EOF in the middle of a bogus markup declaration, like in the minified example provided by OP ("<![\n").  In this case the comment should still be emitted ('[\n'), but currently nothing gets emitted.  I'll look more into it either tomorrow or later this month and update the PR accordingly (or perhaps I'll open a separate issue).

Date User Action Args
2018-09-14 07:28:22ezio.melottisetrecipients: + ezio.melotti, steven.daprano, berker.peksag, hanno
2018-09-14 07:28:22ezio.melottisetmessageid: <>
2018-09-14 07:28:22ezio.melottilinkissue32876 messages
2018-09-14 07:28:22ezio.melotticreate