This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author smroid
Recipients
Date 2003-05-12.11:37:44
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
I'm using 2.3a2.

HTMLParser correctly raises a "malformed start tag"
error on:

<meta NAME=DESCRIPTION Content=Lands' End quality...
outerwear and more.> 

because my application is imprecise by nature (web
scraping), I want to be able to continue after such errors.

I can override the error() method to not raise an
exception. To make this work, I also needed to alter
HTMLParser.py, near line 316, to read as:

            self.updatepos(i, j)
            self.error("malformed start tag")
            return j                    #  ADDED THIS LINE
        raise AssertionError("we should not get here!")

My enhancement request is for every place where
self.error() is called, to ensure that the "override
error() to not raise an exception" continuation
strategy works as well as can be hoped.

Thanks,

Steve
History
Date User Action Args
2008-01-20 09:56:06adminlinkissue736428 messages
2008-01-20 09:56:06admincreate