Title: htmllib deprecated: Which library to use? Missing sane default in docs
Type: Stage: resolved
Components: Documentation Versions: Python 2.7
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Nan Wu, berker.peksag, docs@python, ezio.melotti, guettli, martin.panter, python-dev, r.david.murray
Priority: normal Keywords: easy, patch

Created on 2015-09-07 09:01 by guettli, last changed 2015-11-14 00:48 by martin.panter. This issue is now closed.

File name Uploaded Description Edit
htmllib_deprecation_warning.patch Nan Wu, 2015-10-16 20:52
htmllib_deprecation_warning_2.patch Nan Wu, 2015-10-21 12:56
htmllib_deprecation_warning_3.patch martin.panter, 2015-11-13 02:44 review
Messages (17)
msg250088 - (view) Author: Thomas Guettler (guettli) Date: 2015-09-07 09:01
At the top of the htmllib module:

> Deprecated since version 2.6: The htmllib module has been removed in
> Python 3.


Newcomers need more advice: Which library should be used?

I know there are many html parsing libraries.

But there should be a sane default for newcomers.

Is there already an agreement of a sane default html parsing library?
msg250092 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-07 09:50
PEP 3108 says “Superseded by HTMLParser”. I presume this means Python 3’s “html.parser” module (called “HTMLParser” in Python 2). I guess a lot of work would be involved in changing existing code over, but it shouldn’t be much of a problem for someone writing new code.
msg250123 - (view) Author: Thomas Guettler (guettli) Date: 2015-09-07 19:54
This issue is just about documentation. No code change is required for it.

How to update the docs, to point to html.parser?
msg250125 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2015-09-07 20:07
If you want to create a patch, you have to edit the file Doc/library/htmllib.rst in the 2.7 branch.  You can find information about cloning the CPython repository and switching branch in the devguide.
The warning should suggest :mod:`HTMLParser` for Python 2 and the equivalent :mod:`html.parser` for Python 3.
msg253098 - (view) Author: Nan Wu (Nan Wu) * Date: 2015-10-16 20:52
Added a small patched for this change.
msg253274 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2015-10-21 03:17
Thanks for the patch.

I think we can move the Python 3 part of the patch to a new note directive (similar to the example in httplib documentation:

For example:

.. deprecated:: 2.6
   Use :mode:`HTMLParser` instead.

.. note::
   The :mod:`htmllib` module has been removed in Python 3.  Use :mod:`html.parser` (equivalent of :mode:`HTMLParser`) instead.
msg253279 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-10-21 08:02
Also beware it should be :mod: not :mode: :)
msg253285 - (view) Author: Nan Wu (Nan Wu) * Date: 2015-10-21 12:56
Updated the patch. The typo was fixed too. Thanks for the catching.
msg253533 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-10-27 12:35
This looks good enough to me. I would have probably avoided littering the page with too many Deprecated and Note boxes, but I can respect your and Berker’s preference to add the separate box.
msg253541 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-10-27 14:24
The note should actually be parallel to the http one (assuming 2to3 does do the translation), rather than say "use instead", which would be incorrect advice for a python2 user :)
msg253562 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-10-27 20:41
Not quite. This is a two-step deprecation:

1. “htmllib” is removed in favour of HTMLParser. The API is different, so no automatic 2to3 change would be practical.

2. HTMLParser is renamed to “html.parser”, and 2to3 handles this. This is already documented at <>.
msg253565 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-10-27 21:40
OK, then the note should be dropped.
msg254256 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-07 05:59
David: are you saying you like the first patch better (ignoring the markup mistakes)?
msg254313 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-07 23:21
Yes, though I hadn't looked at it before this :)
msg254582 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-13 02:44
Here is a cleaned-up version of Nan’s first patch.
msg254586 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2015-11-13 03:11
htmllib_deprecation_warning_3.patch looks good to me.
msg254639 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-11-14 00:45
New changeset 7bc8f56ef1f3 by Martin Panter in branch '2.7':
Issue #25017: Document that htmllib is superseded by module HTMLParser
Date User Action Args
2015-11-14 00:48:43martin.pantersetstatus: open -> closed
resolution: fixed
stage: commit review -> resolved
2015-11-14 00:45:13python-devsetnosy: + python-dev
messages: + msg254639
2015-11-13 03:11:07berker.peksagsetmessages: + msg254586
stage: patch review -> commit review
2015-11-13 02:44:24martin.pantersetfiles: + htmllib_deprecation_warning_3.patch

messages: + msg254582
2015-11-07 23:21:51r.david.murraysetmessages: + msg254313
2015-11-07 05:59:03martin.pantersetmessages: + msg254256
2015-10-27 21:40:28r.david.murraysetmessages: + msg253565
2015-10-27 20:41:58martin.pantersetmessages: + msg253562
2015-10-27 14:24:32r.david.murraysetnosy: + r.david.murray
messages: + msg253541
2015-10-27 12:35:01martin.pantersetmessages: + msg253533
2015-10-21 12:56:55Nan Wusetfiles: + htmllib_deprecation_warning_2.patch

messages: + msg253285
2015-10-21 08:02:37martin.pantersetmessages: + msg253279
2015-10-21 03:17:45berker.peksagsetnosy: + berker.peksag

messages: + msg253274
stage: needs patch -> patch review
2015-10-16 20:52:11Nan Wusetfiles: + htmllib_deprecation_warning.patch

nosy: + Nan Wu
messages: + msg253098

keywords: + patch
2015-09-08 11:50:25berker.peksagsetkeywords: + easy
stage: needs patch
2015-09-07 20:07:55ezio.melottisetmessages: + msg250125
2015-09-07 19:58:32berker.peksagsetnosy: + ezio.melotti
2015-09-07 19:54:19guettlisetmessages: + msg250123
2015-09-07 09:50:08martin.pantersetnosy: + martin.panter

messages: + msg250092
versions: + Python 2.7
2015-09-07 09:01:37guettlicreate