Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(41)

Side by Side Diff: Doc/library/html.parser.rst

Issue 21047: html.parser.HTMLParser: convert_charrefs should become True by default
Patch Set: Created 5 years, 9 months ago
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments. Please Sign in to add in-line comments.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | Doc/whatsnew/3.5.rst » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 :mod:`html.parser` --- Simple HTML and XHTML parser 1 :mod:`html.parser` --- Simple HTML and XHTML parser
2 =================================================== 2 ===================================================
3 3
4 .. module:: html.parser 4 .. module:: html.parser
5 :synopsis: A simple parser that can handle HTML and XHTML. 5 :synopsis: A simple parser that can handle HTML and XHTML.
6 6
7 7
8 .. index:: 8 .. index::
9 single: HTML 9 single: HTML
10 single: XHTML 10 single: XHTML
11 11
12 **Source code:** :source:`Lib/html/parser.py` 12 **Source code:** :source:`Lib/html/parser.py`
13 13
14 -------------- 14 --------------
15 15
16 This module defines a class :class:`HTMLParser` which serves as the basis for 16 This module defines a class :class:`HTMLParser` which serves as the basis for
17 parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML. 17 parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
18 18
19 .. class:: HTMLParser(strict=False, *, convert_charrefs=False) 19 .. class:: HTMLParser(strict=False, *, convert_charrefs=True)
20 20
21 Create a parser instance. 21 Create a parser instance.
22 22
23 If *convert_charrefs* is ``True`` (default: ``False``), all character 23 If *convert_charrefs* is ``True`` (the default), all character references
24 references (except the ones in ``script``/``style`` elements) are 24 (except the ones in ``script``/``style`` elements) are automatically
25 automatically converted to the corresponding Unicode characters. 25 converted to the corresponding Unicode characters.
26 The use of ``convert_charrefs=True`` is encouraged and will become
27 the default in Python 3.5.
28 26
29 If *strict* is ``False`` (the default), the parser will accept and parse 27 If *strict* is ``False`` (the default), the parser will accept and parse
30 invalid markup. If *strict* is ``True`` the parser will raise an 28 invalid markup. If *strict* is ``True`` the parser will raise an
31 :exc:`~html.parser.HTMLParseError` exception instead [#]_ when it's not 29 :exc:`~html.parser.HTMLParseError` exception instead [#]_ when it's not
32 able to parse the markup. The use of ``strict=True`` is discouraged and 30 able to parse the markup. The use of ``strict=True`` is discouraged and
33 the *strict* argument is deprecated. 31 the *strict* argument is deprecated.
34 32
35 An :class:`.HTMLParser` instance is fed HTML data and calls handler methods 33 An :class:`.HTMLParser` instance is fed HTML data and calls handler methods
36 when start tags, end tags, text, comments, and other markup elements are 34 when start tags, end tags, text, comments, and other markup elements are
37 encountered. The user should subclass :class:`.HTMLParser` and override its 35 encountered. The user should subclass :class:`.HTMLParser` and override its
38 methods to implement the desired behavior. 36 methods to implement the desired behavior.
39 37
40 This parser does not check that end tags match start tags or call the end-tag 38 This parser does not check that end tags match start tags or call the end-tag
41 handler for elements which are closed implicitly by closing an outer element. 39 handler for elements which are closed implicitly by closing an outer element.
42 40
43 .. versionchanged:: 3.2 41 .. versionchanged:: 3.2
44 *strict* argument added. 42 *strict* argument added.
45 43
46 .. deprecated-removed:: 3.3 3.5 44 .. deprecated-removed:: 3.3 3.5
47 The *strict* argument and the strict mode have been deprecated. 45 The *strict* argument and the strict mode have been deprecated.
48 The parser is now able to accept and parse invalid markup too. 46 The parser is now able to accept and parse invalid markup too.
49 47
50 .. versionchanged:: 3.4 48 .. versionchanged:: 3.4
51 *convert_charrefs* keyword argument added. 49 *convert_charrefs* keyword argument added.
50
51 .. versionchanged:: 3.5
52 The default value for argument *convert_charrefs* is now ``True``.
52 53
53 An exception is defined as well: 54 An exception is defined as well:
54 55
55 56
56 .. exception:: HTMLParseError 57 .. exception:: HTMLParseError
57 58
58 Exception raised by the :class:`HTMLParser` class when it encounters an error 59 Exception raised by the :class:`HTMLParser` class when it encounters an error
59 while parsing and *strict* is ``True``. This exception provides three 60 while parsing and *strict* is ``True``. This exception provides three
60 attributes: :attr:`msg` is a brief message explaining the error, 61 attributes: :attr:`msg` is a brief message explaining the error,
61 :attr:`lineno` is the number of the line on which the broken construct was 62 :attr:`lineno` is the number of the line on which the broken construct was
(...skipping 295 matching lines...) Expand 10 before | Expand all | Expand 10 after
357 attr: ('href', '#main') 358 attr: ('href', '#main')
358 Data : tag soup 359 Data : tag soup
359 End tag : p 360 End tag : p
360 End tag : a 361 End tag : a
361 362
362 .. rubric:: Footnotes 363 .. rubric:: Footnotes
363 364
364 .. [#] For backward compatibility reasons *strict* mode does not raise 365 .. [#] For backward compatibility reasons *strict* mode does not raise
365 exceptions for all non-compliant HTML. That is, some invalid HTML 366 exceptions for all non-compliant HTML. That is, some invalid HTML
366 is tolerated even in *strict* mode. 367 is tolerated even in *strict* mode.
OLDNEW
« no previous file with comments | « no previous file | Doc/whatsnew/3.5.rst » ('j') | no next file with comments »

RSS Feeds Recent Issues | This issue
This is Rietveld 894c83f36cb7+