This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients benjamin.peterson, eric.araujo, ezio.melotti, r.david.murray
Date 2012-02-10.13:45:57
SpamBayes Score 2.1204182e-06
Marked as misclassified No
Message-id <1328881558.78.0.634398241399.issue13987@psf.upfronthosting.co.za>
In-reply-to
Content
The attached patch fixes a few problems with HTMLParser on 2.7.
Instead of raising error when invalid markup is detected, the parser now consumes the invalid input and proceeds.  This patch is a partial backport of #1486713.

After this two more patches will follow.
The first will get rid of errors raised while parsing declarations and should also solve #13576:
     def unknown_decl(self, data):
-        self.error("unknown declaration: %r" % (data,))
+        pass

The second will take care of "bogus comments" (see #13960).

Once this is done HTMLParser should be able to parse (almost) everything.  I'm planning to commit this before the release of 2.7.3.
History
Date User Action Args
2012-02-10 13:45:58ezio.melottisetrecipients: + ezio.melotti, benjamin.peterson, eric.araujo, r.david.murray
2012-02-10 13:45:58ezio.melottisetmessageid: <1328881558.78.0.634398241399.issue13987@psf.upfronthosting.co.za>
2012-02-10 13:45:58ezio.melottilinkissue13987 messages
2012-02-10 13:45:57ezio.melotticreate