classification
Title: _markupbase.py fails with TypeError on invalid keyword in marked section
Type: Stage:
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, bp256r1, ezio.melotti, kodial, leonardr
Priority: normal Keywords:

Created on 2019-08-02 18:38 by bp256r1, last changed 2020-06-11 20:23 by leonardr.

Files
File name Uploaded Description Edit
test_issue37747.py leonardr, 2020-06-11 20:23 Reproduce issue 37747 without using external packages
Messages (2)
msg348910 - (view) Author: bp256r1 (bp256r1) Date: 2019-08-02 18:38
Hello,

I'm not sure if this a bug, but I noticed that a TypeError is raised by the parse_marked_section function of the _markupbase module in Python 3.7.4 when attempting to parse a name token of <![\r�N&=\x00%\x1a\x1e��;u�dWf'.

See:
- https://github.com/python/cpython/blob/3.7/Lib/_markupbase.py#L149

Steps to reproduce:

$ pip3 freeze | grep beautifulsoup4
beautifulsoup4==4.6.3

$ python3
>>> a='<![\r�N&=\x00%\x1a\x1e��;u�dWf'
>>> from bs4 import BeautifulSoup
>>> BeautifulSoup(a, 'html.parser')
/usr/local/lib/python3.7/site-packages/bs4/builder/_htmlparser.py:78: UserWarning: expected name token at '<![\r�N&=\x00%\x1a\x1e��;u�dWf'
  warnings.warn(msg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/bs4/__init__.py", line 303, in __init__
    self._feed()
  File "/usr/local/lib/python3.7/site-packages/bs4/__init__.py", line 364, in _feed
    self.builder.feed(self.markup)
  File "/usr/local/lib/python3.7/site-packages/bs4/builder/_htmlparser.py", line 250, in feed
    parser.feed(markup)
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 111, in feed
    self.goahead(0)
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 179, in goahead
    k = self.parse_html_declaration(i)
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 264, in parse_html_declaration
    return self.parse_marked_section(i)
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_markupbase.py", line 149, in parse_marked_section
    sectName, j = self._scan_name( i+3, i )
TypeError: cannot unpack non-iterable NoneType object

If it's not a bug, sorry, not sure.
msg371323 - (view) Author: Leonard Richardson (leonardr) * Date: 2020-06-11 20:23
This was also recently filed as a bug against Beautiful Soup, a package I maintain, using Python 3.8. (https://bugs.launchpad.net/beautifulsoup/+bug/1883104)

The attached script reproduces the problem without using external packages.
History
Date User Action Args
2020-06-11 20:23:39leonardrsetfiles: + test_issue37747.py
versions: + Python 3.8
nosy: + leonardr

messages: + msg371323
2019-08-02 18:38:27bp256r1create