This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: documentation bug: HTMLParser needs to document unknown_decl
Type: behavior Stage: resolved
Components: Documentation Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, freyley, georg.brandl, terry.reedy, vbnmzx1
Priority: normal Keywords: patch

Created on 2008-09-15 21:52 by freyley, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 18805 closed vbnmzx1, 2020-03-06 10:23
Messages (7)
msg73282 - (view) Author: jeff (freyley) Date: 2008-09-15 21:52
the unknown_decl function is critical to dealing with MS Office
generated HTML files. There's no documentation of that. The default
behavior of the function is to error, which is reasonable, but it should
be stated in the documentation.
msg107972 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-06-17 00:55
Documentation issues should be component: documentation rather than library. When submitting one, please at least indicate the module or class concerned. I have never heard of 'unknown_decl' function.

Preferably, indicate the specific section you want modified, by version, number and name. Best is to submit a suggested text to be inserted.  You may know better than most issue reviewers what should be said. Someone else will add markup and possibly edit.

If you respond, this will be reopened. Please indicate whether this issue applies to 3.x and add 3.1 and 3.2 if it does.
msg108052 - (view) Author: jeff (freyley) Date: 2010-06-17 19:33
On Wed, Jun 16, 2010 at 5:55 PM, Terry J. Reedy <report@bugs.python.org> wrote:
>
> Terry J. Reedy <tjreedy@udel.edu> added the comment:
>
> Documentation issues should be component: documentation rather than library. When submitting one, please at least indicate the module or class concerned. I have never heard of 'unknown_decl' function.

It's your bug tracker. This sort of statement that says that I should
know exactly how you want bugs reported only serves to tell people
like me not to even try. In addition, it's inaccurate in this case, as
the title of the bug is that HTMLParser, which is a module in the
standard library, needs a function documented.

HTMLParser runs over HTML and calls internal functions when certain
events occur. unknown_decl is called when an unknown declaration is
found, and by default, it throws an exception. Thus, to correctly use
HTMLParser, when subclassing it, you need to override unknown_decl if
there are any unknown declarations in your HTML (or if you think there
might be).

> Preferably, indicate the specific section you want modified, by version, number and name. Best is to submit a suggested text to be inserted.  You may know better than most issue reviewers what should be said. Someone else will add markup and possibly edit.

It's been almost 2 years since I submitted this bug. I don't know if
it applies to Python 3, and at this point I find it difficult to care.

Thanks,

Jeff
msg108068 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-06-17 22:30
I understand that getting no response to a submission is not pleasant. I do not like it either. That is partly why I have started reviewing old issues. In the past couple of weeks, I have gotten old two orphaned patches applied by updating the headers, reading the patch, and adding a first-response approval message that got the attention of someone with code-commit privileges. I hope you agree that late is better than never.

I just discovered the nosy-count box on the search page. 351 open issues with a nosy count of 1 (which means no response unless someone responded and then removed themself) is too many. We need more issue reviewers.

As to your message: this is *our* tracker, not my tracker. My participation is as much voluntary as yours. I hope you do not really give up on improving Python and its documentation.

I did not expect that you *should* have known submission details. That is why I tried to inform you. In particular, when an issue is marked as 'documentation', it is automatically assigned to 'docs@python', a pseudo-user standing in for people who handle doc revisions. Now they will see this issue, whereas they would not have before.

Please excuse me for not remembering the title as I responded to the message. It is best if message text stands alone. Again, I hope you would agree that an somewhat ignorant response may be better than none.

In order for the doc maintainers to add an entry, someone knowledgeable must write it. Your paragraph of explanation is a start, but more editing is needed.

Looking at dir(html.parser.HTMLParser) and help(...), I see that there are several public internal methods. Some have doc strings that show up with help(), some do not. I thing all should. Some are defined on HTMLParser and some inherited from the undocumented (I believe) _markupbase.ParserBase.

I see that there are also several (completely undocumented except fir dir()) private ('_xyz') internal methods. This implies to me that the public internal methods were made public rather than private because there might be reason to override them. If so, perhaps there should be a new subsection on public internal methods to explain what is what with them. What do you think? Document just one, some, or all?
msg111707 - (view) Author: jeff (freyley) Date: 2010-07-27 17:29
On Thu, Jun 17, 2010 at 3:30 PM, Terry J. Reedy <report@bugs.python.org> wrote:
> In order for the doc maintainers to add an entry, someone knowledgeable must write it. Your paragraph of explanation is a start, but more editing is needed.
>
> Looking at dir(html.parser.HTMLParser) and help(...), I see that there are several public internal methods. Some have doc strings that show up with help(), some do not. I thing all should. Some are defined on HTMLParser and some inherited from the undocumented (I believe) _markupbase.ParserBase.
>
> I see that there are also several (completely undocumented except fir dir()) private ('_xyz') internal methods. This implies to me that the public internal methods were made public rather than private because there might be reason to override them. If so, perhaps there should be a new subsection on public internal methods to explain what is what with them. What do you think? Document just one, some, or all?

Terry,

I'm looking at the HTMLParser code, and I only see unknown_decl as a
method in there that is: a) not marked as internal or doing a lot, b)
not documented. There are a number of methods which should probably be
refactored to be _methodname rather than methodname, but that's beyond
the scope of this report.

HTMLParser.unknown_decl(data)¶
Method called when an unrecognized SGML declaration is read by the
parser. The data parameter will be the entire contents of the
declaration inside the <!...> markup. It is sometimes useful to be be
overridden by a derived class; the base class implementation throws an
HTMLParseError.

There may be other undocumented methods showing up, but if so they're
part of a parent class.

Thanks,

Jeff
msg111714 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-07-27 18:44
OK, your recommendation is to add one entry with the text suggested in the message. Given the name, the text seems reasonable. I will leave it to a doc person to format and apply.
msg111922 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-07-29 13:38
Applied with some tweaks in r83223.  Thanks Jeff and Terry!
History
Date User Action Args
2022-04-11 14:56:39adminsetgithub: 48124
2020-03-06 10:23:14vbnmzx1setnosy: + vbnmzx1

pull_requests: + pull_request18162
2010-07-29 13:41:29terry.reedysetstage: needs patch -> resolved
2010-07-29 13:38:55georg.brandlsetstatus: open -> closed

nosy: + georg.brandl
messages: + msg111922

resolution: fixed
2010-07-27 18:44:35terry.reedysetkeywords: + patch

messages: + msg111714
stage: needs patch
2010-07-27 17:29:34freyleysetmessages: + msg111707
2010-06-17 22:30:30terry.reedysetmessages: + msg108068
2010-06-17 19:43:55fdrakesetversions: + Python 3.1, Python 3.2
2010-06-17 19:33:39freyleysetstatus: pending -> open

messages: + msg108052
2010-06-17 00:55:31terry.reedysetstatus: open -> pending

assignee: docs@python
components: + Documentation, - Library (Lib)
versions: + Python 2.6, Python 2.7, - Python 2.5
nosy: + terry.reedy, docs@python

messages: + msg107972
2008-09-15 21:52:28freyleycreate