Message 60306 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	calvin
Recipients
Date	2003-03-31.10:44:31
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Logged In: YES user_id=9205 HTMLParser (and lots of other parsers I tried) has definitely limits when it comes to error recovering. I dont know if its good to put further development effort in HTMLParser as it will IMHO never reach the ability to cope with all the crappy HTML out there. If you really want to have a html parser in Python, I suggest you look at my htmlsax module packaged with linkchecker (linkchecker.sf.net) and webcleaner (webcleaner.sf.net), the parser is tested with lots of real world examples. The parser packaged with linkchecker has line counting, the one with webcleaner not. Cheers, Bastian

Logged In: YES 
user_id=9205

HTMLParser (and lots of other parsers I tried) has
definitely limits when it comes to error recovering. I dont
know if its good to put further development effort in
HTMLParser as it will IMHO never reach the ability to cope
with all the crappy HTML out there.
If you really want to have a html parser in Python, I
suggest you look at my htmlsax module packaged with
linkchecker (linkchecker.sf.net) and webcleaner
(webcleaner.sf.net), the parser is tested with lots of real
world examples.
The parser packaged with linkchecker has line counting, the
one with webcleaner not.

Cheers, Bastian

History
Date	User	Action	Args
2008-01-20 09:55:57	admin	link	issue683938 messages
2008-01-20 09:55:57	admin	create