This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author bp256r1
Recipients berker.peksag, bp256r1, ezio.melotti, kodial
Date 2019-08-02.18:38:26
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1564771107.73.0.664658859343.issue37747@roundup.psfhosted.org>
In-reply-to
Content
Hello,

I'm not sure if this a bug, but I noticed that a TypeError is raised by the parse_marked_section function of the _markupbase module in Python 3.7.4 when attempting to parse a name token of <![\r�N&=\x00%\x1a\x1e��;u�dWf'.

See:
- https://github.com/python/cpython/blob/3.7/Lib/_markupbase.py#L149

Steps to reproduce:

$ pip3 freeze | grep beautifulsoup4
beautifulsoup4==4.6.3

$ python3
>>> a='<![\r�N&=\x00%\x1a\x1e��;u�dWf'
>>> from bs4 import BeautifulSoup
>>> BeautifulSoup(a, 'html.parser')
/usr/local/lib/python3.7/site-packages/bs4/builder/_htmlparser.py:78: UserWarning: expected name token at '<![\r�N&=\x00%\x1a\x1e��;u�dWf'
  warnings.warn(msg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/bs4/__init__.py", line 303, in __init__
    self._feed()
  File "/usr/local/lib/python3.7/site-packages/bs4/__init__.py", line 364, in _feed
    self.builder.feed(self.markup)
  File "/usr/local/lib/python3.7/site-packages/bs4/builder/_htmlparser.py", line 250, in feed
    parser.feed(markup)
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 111, in feed
    self.goahead(0)
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 179, in goahead
    k = self.parse_html_declaration(i)
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 264, in parse_html_declaration
    return self.parse_marked_section(i)
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_markupbase.py", line 149, in parse_marked_section
    sectName, j = self._scan_name( i+3, i )
TypeError: cannot unpack non-iterable NoneType object

If it's not a bug, sorry, not sure.
History
Date User Action Args
2019-08-02 18:38:27bp256r1setrecipients: + bp256r1, ezio.melotti, berker.peksag, kodial
2019-08-02 18:38:27bp256r1setmessageid: <1564771107.73.0.664658859343.issue37747@roundup.psfhosted.org>
2019-08-02 18:38:27bp256r1linkissue37747 messages
2019-08-02 18:38:26bp256r1create