This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author calvin
Recipients
Date 2003-03-31.10:44:31
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Logged In: YES 
user_id=9205

HTMLParser (and lots of other parsers I tried) has
definitely limits when it comes to error recovering. I dont
know if its good to put further development effort in
HTMLParser as it will IMHO never reach the ability to cope
with all the crappy HTML out there.
If you really want to have a html parser in Python, I
suggest you look at my htmlsax module packaged with
linkchecker (linkchecker.sf.net) and webcleaner
(webcleaner.sf.net), the parser is tested with lots of real
world examples.
The parser packaged with linkchecker has line counting, the
one with webcleaner not.

Cheers, Bastian
History
Date User Action Args
2008-01-20 09:55:57adminlinkissue683938 messages
2008-01-20 09:55:57admincreate