Message312363
I noticed that the HTMLParser will raise an exception on some inputs.
I'm not sure what the expectations here are, but given that real-world HTML often contains all kinds of broken content I would assume an HTMLParser to always try to parse a document and not be interrupted by an exception if an error occurs.
Here's a minified example:
#!/usr/bin/env python3
import html.parser
html.parser.HTMLParser().feed("<![\n")
However I actually stepped upon HTML failing on a real webpage:
https://kafanews.com/
Exception of minified example:
Traceback (most recent call last):
File "./foo.py", line 5, in <module>
html.parser.HTMLParser().feed("<![\n")
File "/usr/lib64/python3.6/html/parser.py", line 111, in feed
self.goahead(0)
File "/usr/lib64/python3.6/html/parser.py", line 179, in goahead
k = self.parse_html_declaration(i)
File "/usr/lib64/python3.6/html/parser.py", line 264, in parse_html_declaration
return self.parse_marked_section(i)
File "/usr/lib64/python3.6/_markupbase.py", line 149, in parse_marked_section
sectName, j = self._scan_name( i+3, i )
File "/usr/lib64/python3.6/_markupbase.py", line 391, in _scan_name
% rawdata[declstartpos:declstartpos+20])
File "/usr/lib64/python3.6/_markupbase.py", line 34, in error
"subclasses of ParserBase must override error()")
NotImplementedError: subclasses of ParserBase must override error() |
|
Date |
User |
Action |
Args |
2018-02-19 19:52:16 | hanno | set | recipients:
+ hanno |
2018-02-19 19:52:16 | hanno | set | messageid: <1519069936.36.0.467229070634.issue32876@psf.upfronthosting.co.za> |
2018-02-19 19:52:16 | hanno | link | issue32876 messages |
2018-02-19 19:52:16 | hanno | create | |
|