Message346210
Thank you for the report.
Looking at the BeautifulSoup source, there is a comment about this scenario:
# Unlike other parsers, html.parser doesn't send separate end tag
# events for empty-element tags. (It's handled in
# handle_startendtag, but only if the original markup looked like
# <tag/>.)
#
# So we need to call handle_endtag() ourselves. Since we
# know the start event is identical to the end event, we
# don't want handle_endtag() to cross off any previous end
# events for tags of this name.
HTMLParser itself produces output such as:
>>> class MyParser(HTMLParser):
... def handle_starttag(self, tag, attrs):
... print(f'start: {tag}')
... def handle_endtag(self, tag):
... print(f'end: {tag}')
... def handle_data(self, data):
... print(f'data: {data}')
...
>>> parser = MyParser()
>>> parser.feed('<p><test></p>')
start: p
start: test
end: p
My suggestion would be to try a different parser in BeautifulSoup [1] to handle this. Even if we wanted to modify HTMLParser, any such change would probably be backwards incompatible.
[1] https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser |
|
Date |
User |
Action |
Args |
2019-06-21 13:14:58 | cheryl.sabella | set | recipients:
+ cheryl.sabella, terry.reedy, ezio.melotti, htran |
2019-06-21 13:14:58 | cheryl.sabella | set | messageid: <1561122898.03.0.0623370000836.issue37071@roundup.psfhosted.org> |
2019-06-21 13:14:57 | cheryl.sabella | link | issue37071 messages |
2019-06-21 13:14:57 | cheryl.sabella | create | |
|