This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: HTMLParser : HTMLParser.error creating multiple errors.
Type: behavior Stage: resolved
Components: Versions: Python 3.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: AbcSxyZ, Jeffrey.Kintscher, berker.peksag, iritkatriel
Priority: normal Keywords:

Created on 2020-08-05 20:04 by AbcSxyZ, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
error.txt AbcSxyZ, 2020-08-05 20:03 pdf file
Messages (3)
msg374899 - (view) Author: AbcSxyZ (AbcSxyZ) Date: 2020-08-05 20:03
Coming from deprecated feature. Using python 3.7.3

Related and probably fixed with https://bugs.python.org/issue31844
Just in case.

I've got 2 different related problems, the first one creating the second.

Using linked file and this class :
```
from html.parser import HTMLParser

class LinkParser(HTMLParser):
    """ DOM parser to retrieve href of all <a> elements """

    def parse_links(self, html_content):
        self.links = []
        self.feed(html_content)
        return self.links

    def handle_starttag(self, tag, attrs):
        if tag == "a":
            attrs = {key.lower():value for key, *value in attrs}
            urls = attrs.get("href", None)
            if urls and urls[0]:
                self.links.append(urls[0])

    # def error(self, *args, **kwargs):
    #     pass

if __name__ == "__main__":
    with open("error.txt") as File:
        LinkParser().parse_links(File.read())

```

With error method commented, it creates :
```
  File "scanner/link.py", line 8, in parse_links                                                                                                                        
    self.feed(html_content)                                                                                                                                             
  File "/usr/lib/python3.7/html/parser.py", line 111, in feed                                                                                                           
    self.goahead(0)
  File "/usr/lib/python3.7/html/parser.py", line 179, in goahead
    k = self.parse_html_declaration(i)
  File "/usr/lib/python3.7/html/parser.py", line 264, in parse_html_declaration
    return self.parse_marked_section(i)
  File "/usr/lib/python3.7/_markupbase.py", line 159, in parse_marked_section
    self.error('unknown status keyword %r in marked section' % rawdata[i+3:j])
  File "/usr/lib/python3.7/_markupbase.py", line 34, in error
    "subclasses of ParserBase must override error()")
NotImplementedError: subclasses of ParserBase must override error()
```

If error method do not raise anything, using only pass, it creates :
```
  File "/home/simon/Documents/radio-parser/scanner/link.py", line 8, in parse_links
    self.feed(html_content)
  File "/usr/lib/python3.7/html/parser.py", line 111, in feed
    self.goahead(0)
  File "/usr/lib/python3.7/html/parser.py", line 179, in goahead
    k = self.parse_html_declaration(i)
  File "/usr/lib/python3.7/html/parser.py", line 264, in parse_html_declaration
    return self.parse_marked_section(i)
  File "/usr/lib/python3.7/_markupbase.py", line 160, in parse_marked_section
    if not match:
UnboundLocalError: local variable 'match' referenced before assignment
```

We see here `match` variable is not created if `self.error` is called,
and because error do not raise exception, will create UnboundLocalError :

```
    def parse_marked_section(self, i, report=1):
        rawdata= self.rawdata
        assert rawdata[i:i+3] == '<![', "unexpected call to parse_marked_section()"
        sectName, j = self._scan_name( i+3, i )
        if j < 0:
            return j
        if sectName in {"temp", "cdata", "ignore", "include", "rcdata"}:
            # look for standard ]]> ending
            match= _markedsectionclose.search(rawdata, i+3)
        elif sectName in {"if", "else", "endif"}:
            # look for MS Office ]> ending
            match= _msmarkedsectionclose.search(rawdata, i+3)
        else:
            self.error('unknown status keyword %r in marked section' % rawdata[i+3:j])
        if not match:
            return -1
        if report:
            j = match.start(0)
            self.unknown_decl(rawdata[i+3: j])
        return match.end(0)

```
msg401295 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-09-07 16:04
Changing type because crash typically refers to segfault rather than an exception.
msg401296 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-09-07 16:06
The code was changed and now instead of calling self.error() it raises an exception: 

raise AssertionError(
   'unknown status keyword %r in marked section' % rawdata[i+3:j])


So match not being initialised is no longer a problem.
History
Date User Action Args
2022-04-11 14:59:34adminsetgithub: 85661
2021-09-07 16:06:37iritkatrielsetstatus: open -> closed
resolution: out of date
messages: + msg401296

stage: resolved
2021-09-07 16:04:51iritkatrielsettype: crash -> behavior

messages: + msg401295
nosy: + iritkatriel
2020-08-16 03:40:23xtreaksetnosy: + berker.peksag
2020-08-05 23:14:22Jeffrey.Kintschersetnosy: + Jeffrey.Kintscher
2020-08-05 20:04:25AbcSxyZcreate