Message60306
Logged In: YES
user_id=9205
HTMLParser (and lots of other parsers I tried) has
definitely limits when it comes to error recovering. I dont
know if its good to put further development effort in
HTMLParser as it will IMHO never reach the ability to cope
with all the crappy HTML out there.
If you really want to have a html parser in Python, I
suggest you look at my htmlsax module packaged with
linkchecker (linkchecker.sf.net) and webcleaner
(webcleaner.sf.net), the parser is tested with lots of real
world examples.
The parser packaged with linkchecker has line counting, the
one with webcleaner not.
Cheers, Bastian
|
|
Date |
User |
Action |
Args |
2008-01-20 09:55:57 | admin | link | issue683938 messages |
2008-01-20 09:55:57 | admin | create | |
|