Message50232
Changes:
* Now allows missing spaces between attributes as its
often seen on the web like this :
<script type="text/javascript"language="JavaScript1.1">
That like broke the whole parsing before.
* A fully auto-tolerant mode (HTMLParser.tolerant=1)
was added. It should hopefully NEVER break HTML parsing
on the level of HTMLParser, but recover and continue
the parsing smartly. The mode was tested extensively
with complex pages. The tolerant mode is guaranted to
finish all HTML stuff only during HTMLParser.close() /
goahead(end=True) - yet that was the same (stucking)
policy before.
Maybe steep: I have switched ON the tolerant mode by
default, as this is, what in 99.9% of cases one wants
to have.
(I've maybe 20 applications for HTMLParser - None like
the unrecoverable breaks with Exceptions)
During tolerant mode the virtual .warning(message,i,k)
is called instead of error - by default this just
counts .warning_count up. This framework should even
enable to write po HTML checkers
* The patch was generated against py2.3 (still the
"good/base" Python for me) and also fixes a regexp-bug
(which already was fixed in py2.4.2). Yet the patch
works also against py2.4/2.5 - 2 locations where py24
trivially changed to %r/repr may grumble.
-robert
|
|
Date |
User |
Action |
Args |
2007-08-23 15:48:49 | admin | link | issue1486713 messages |
2007-08-23 15:48:49 | admin | create | |
|