Hello,
I'm not sure if this a bug, but I noticed that a TypeError is raised by the parse_marked_section function of the _markupbase module in Python 3.7.4 when attempting to parse a name token of <![\r�N&=\x00%\x1a\x1e��;u�dWf'.
See:
- https://github.com/python/cpython/blob/3.7/Lib/_markupbase.py#L149
Steps to reproduce:
$ pip3 freeze | grep beautifulsoup4
beautifulsoup4==4.6.3
$ python3
>>> a='<![\r�N&=\x00%\x1a\x1e��;u�dWf'
>>> from bs4 import BeautifulSoup
>>> BeautifulSoup(a, 'html.parser')
/usr/local/lib/python3.7/site-packages/bs4/builder/_htmlparser.py:78: UserWarning: expected name token at '<![\r�N&=\x00%\x1a\x1e��;u�dWf'
warnings.warn(msg)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/site-packages/bs4/__init__.py", line 303, in __init__
self._feed()
File "/usr/local/lib/python3.7/site-packages/bs4/__init__.py", line 364, in _feed
self.builder.feed(self.markup)
File "/usr/local/lib/python3.7/site-packages/bs4/builder/_htmlparser.py", line 250, in feed
parser.feed(markup)
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 111, in feed
self.goahead(0)
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 179, in goahead
k = self.parse_html_declaration(i)
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 264, in parse_html_declaration
return self.parse_marked_section(i)
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_markupbase.py", line 149, in parse_marked_section
sectName, j = self._scan_name( i+3, i )
TypeError: cannot unpack non-iterable NoneType object
If it's not a bug, sorry, not sure.
|