Author Jim.Jewett
Recipients Jim.Jewett, Michel.Leunen, ezio.melotti, georg.brandl, r.david.murray, serhiy.storchaka
Date 2012-04-13.17:34:02
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1334338442.85.0.128304632835.issue14538@psf.upfronthosting.co.za>
In-reply-to
Content
It sounds like this is a case where the docs should mention an external library; perhaps something like changing the intro of http://docs.python.org/dev/library/html.parser.html from:

"""
19.2. html.parser — Simple HTML and XHTML parser
Source code: Lib/html/parser.py

This module defines a class HTMLParser which serves as the basis for parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
"""

to:


"""
19.2. html.parser — Simple HTML and XHTML parser
Source code: Lib/html/parser.py

This module defines a class HTMLParser which serves as the basis for parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.  

Note that mainstream web browsers also attempt to repair invalid markup; the algorithms for this can be quite complex, and are evolving too quickly for the Python release cycle.  Applications handling arbitrary web pages should consider using 3rd-party modules.  The python version of html5lib ( http://code.google.com/p/html5lib/ ) is being developed in parallel with the HTML standard itself, and serves as a reference implementation.
"""
History
Date User Action Args
2012-04-13 17:34:02Jim.Jewettsetrecipients: + Jim.Jewett, georg.brandl, ezio.melotti, r.david.murray, serhiy.storchaka, Michel.Leunen
2012-04-13 17:34:02Jim.Jewettsetmessageid: <1334338442.85.0.128304632835.issue14538@psf.upfronthosting.co.za>
2012-04-13 17:34:02Jim.Jewettlinkissue14538 messages
2012-04-13 17:34:02Jim.Jewettcreate