classification
Title: _markupbase.py fails with UnboundLocalError on invalid keyword in marked section
Type: behavior Stage: test needed
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: berker.peksag, ezio.melotti, kodial
Priority: normal Keywords:

Created on 2018-08-23 17:30 by kodial, last changed 2018-09-13 16:39 by ezio.melotti.

Messages (2)
msg323962 - (view) Author: Conrad (kodial) Date: 2018-08-23 17:30
$ pip freeze | grep beautifulsoup4
beautifulsoup4==4.6.3

$ python
Python 3.7.0 (default, Jul 23 2018, 20:24:19)
[Clang 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from bs4 import BeautifulSoup
>>> text = "<![hi world]>"
>>> BeautifulSoup(text, 'html.parser')
/Users/conradroche/solvvy/ml-pipeline/venv/lib/python3.7/site-packages/bs4/builder/_htmlparser.py:78: UserWarning: unknown status keyword 'hi ' in marked section
  warnings.warn(msg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/conradroche/solvvy/ml-pipeline/venv/lib/python3.7/site-packages/bs4/__init__.py", line 282, in __init__
    self._feed()
  File "/Users/conradroche/solvvy/ml-pipeline/venv/lib/python3.7/site-packages/bs4/__init__.py", line 343, in _feed
    self.builder.feed(self.markup)
  File "/Users/conradroche/solvvy/ml-pipeline/venv/lib/python3.7/site-packages/bs4/builder/_htmlparser.py", line 247, in feed
    parser.feed(markup)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 111, in feed
    self.goahead(0)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 179, in goahead
    k = self.parse_html_declaration(i)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 264, in parse_html_declaration
    return self.parse_marked_section(i)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_markupbase.py", line 160, in parse_marked_section
    if not match:
UnboundLocalError: local variable 'match' referenced before assignment
>>>
msg323966 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2018-08-23 18:12
Thanks for the report.

HTMLParser.error() was supposed to raise an exception, but the BeautifulSoup project just prints a warning here:

    def error(self, msg):
        warnings.warn(msg)

https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/bs4/builder/_htmlparser.py#L69

As a result of this, the code doesn't stop executing in the following branch:

    else:
        self.error('unknown status keyword %r in marked section' % rawdata[i+3:j])
    if not match:
        return -1

https://github.com/python/cpython/blob/3.7/Lib/_markupbase.py#L159

Note that HTMLParser.error() was removed in Python 3.5 (https://github.com/python/cpython/commit/73a4359eb0eb624c588c5d52083ea4944f9787ea#diff-1a7486df8279dbac7f20abd487947845L171) and there is an open issue about the status of _markupbase.ParserBase.error(): Issue 31844.

I also think that https://github.com/python/cpython/commit/73a4359eb0eb624c588c5d52083ea4944f9787ea#diff-1a7486df8279dbac7f20abd487947845L171 may have caused a minor regression when it was removed the error() method and its uses from the HTMLParser class. It still calls the parse_marked_section() method of _markupbase.ParserBase() which it then calls the error() method of _markupbase.ParserBase():

    elif rawdata[i:i+3] == '<![':
        return self.parse_marked_section(i)

https://github.com/python/cpython/blob/3.7/Lib/html/parser.py#L264
History
Date User Action Args
2018-09-13 16:39:29ezio.melottisetassignee: ezio.melotti
2018-08-23 18:13:11berker.peksagsetmessages: - msg323967
2018-08-23 18:12:56berker.peksagsetstatus: open
2018-08-23 18:12:27berker.peksagsetnosy: ezio.melotti, berker.peksag, kodial
messages: + msg323967
2018-08-23 18:12:19berker.peksagsetstatus: open -> (no value)

nosy: + ezio.melotti, berker.peksag
messages: + msg323966

stage: test needed
2018-08-23 17:30:37kodialcreate