classification
Title: HTMLParser: undocumented not implemented method
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: William Ayd, berker.peksag, cheryl.sabella, ezio.melotti, srittau
Priority: normal Keywords: patch

Created on 2017-10-23 08:27 by srittau, last changed 2020-07-16 06:39 by berker.peksag. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 8562 merged berker.peksag, 2018-07-30 09:51
PR 21504 merged berker.peksag, 2020-07-16 06:31
Messages (10)
msg304782 - (view) Author: Sebastian Rittau (srittau) * Date: 2017-10-23 08:27
HTMLParser derives from _markupbase.ParserBase, which has the following method:

class HTMLParser:

    ...

    def error(self, message):
        raise NotImplementedError(
            "subclasses of ParserBase must override error()")

HTMLParser does not implement this method and the documentation for HTMLParser (https://docs.python.org/3.6/library/html.parser.html) does not mention that its sub-classes need to override it.

I am not sure whether this is a documentation omission, whether HTMLParser should provide an (empty?) implementation, or whether ParserBase should not raise a NotImplementedError (to make linters happy).
msg304783 - (view) Author: Sebastian Rittau (srittau) * Date: 2017-10-23 08:29
The quoted code above should have used ParserBase:

class ParserBase:

    ...

    def error(self, message):
        raise NotImplementedError(
            "subclasses of ParserBase must override error()")
msg305303 - (view) Author: William Ayd (William Ayd) Date: 2017-10-31 14:24
Would we be open to setting the meta class of the ParserBase to ABCMeta and setting error as an abstract method? That at the very least would make the expectation clearer for subclasses. 

I haven’t contributed to Python before but am open to this as a first attempt if the direction makes sense.
msg305306 - (view) Author: William Ayd (William Ayd) Date: 2017-10-31 14:38
And assuming that subclass requirement is intentional we could add an optional keyword argument to the HTMLParser that indicates what to do with errors, much like how encoding issues are handled within codecs. For backwards compatibility it can default to ignore, but fail and warn could be two alternate approaches that the error method could account for
msg322662 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2018-07-30 09:40
HTMLParser.error() method was deprecated in Python 3.4 (https://github.com/python/cpython/commit/88ebfb129b59dc8a2b855fc93fcf32457128d64d#diff-1a7486df8279dbac7f20abd487947845R157) and removed in Python 3.5 (https://github.com/python/cpython/commit/73a4359eb0eb624c588c5d52083ea4944f9787ea#diff-1a7486df8279dbac7f20abd487947845L171)

_markupbase is a private and undocumented module and its only user is HTMLParser (sgmllib was removed from the stdlib in 2008) Since we already have removed HTMLParser.error(), I think we can just remove _markupbase.ParserBase.error() without a deprecation period.
msg322672 - (view) Author: Sebastian Rittau (srittau) * Date: 2018-07-30 13:01
Good call. Maybe it's actually time to retire _markupbase and merge ParserBase into HTMLParser.
msg323968 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2018-08-23 18:16
After triaging issue 34480, I realized that we can't simply remove the error() method because the _markupbase.ParserBase() class still uses it. I've just closed PR 8562.
msg371500 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2020-06-14 12:06
@berker.peksag's last comment was he closed the PR on 23 August 2018.  However, he reopened it on 6 January 2020 as @ezio.melotti mentioned that they are both needed.

The PR for this issue is waiting to be re-reviewed by Ezio.
msg373745 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2020-07-16 06:13
New changeset e34bbfd61f405eef89e8aa50672b0b25022de320 by Berker Peksag in branch 'master':
bpo-31844: Remove _markupbase.ParserBase.error() (GH-8562)
https://github.com/python/cpython/commit/e34bbfd61f405eef89e8aa50672b0b25022de320
msg373746 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2020-07-16 06:39
New changeset d4d127f1c6e586036104e4101f5af239fe7dc156 by Berker Peksag in branch 'master':
bpo-31844: Move whatsnew note to 3.10.rst (GH-21504)
https://github.com/python/cpython/commit/d4d127f1c6e586036104e4101f5af239fe7dc156
History
Date User Action Args
2020-07-16 06:39:25berker.peksagsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-07-16 06:39:05berker.peksagsetmessages: + msg373746
2020-07-16 06:31:01berker.peksagsetstage: test needed -> patch review
pull_requests: + pull_request20644
2020-07-16 06:13:12berker.peksagsetmessages: + msg373745
2020-06-14 12:06:35cheryl.sabellasetnosy: + cheryl.sabella

messages: + msg371500
versions: + Python 3.10, - Python 3.8
2018-08-25 00:57:50ezio.melottisetassignee: ezio.melotti
2018-08-23 18:18:40berker.peksagsetstage: patch review -> test needed
2018-08-23 18:16:08berker.peksagsetmessages: + msg323968
2018-07-30 13:01:26srittausetmessages: + msg322672
2018-07-30 09:51:05berker.peksagsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request8076
2018-07-30 09:40:29berker.peksagsetversions: + Python 3.8, - Python 3.6
nosy: + berker.peksag

messages: + msg322662

stage: needs patch
2017-10-31 20:02:42serhiy.storchakasetnosy: + ezio.melotti
type: behavior
components: + Library (Lib)
2017-10-31 14:38:03William Aydsetmessages: + msg305306
2017-10-31 14:24:39William Aydsetnosy: + William Ayd
messages: + msg305303
2017-10-23 08:29:10srittausetmessages: + msg304783
2017-10-23 08:27:23srittaucreate