Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(133018)

Delta Between Two Patch Sets: Doc/library/json.rst

Issue 19361: Specialize exceptions thrown by JSON parser
Left Patch Set: Created 5 years, 1 month ago
Right Patch Set: Created 4 years, 11 months ago
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments. Please Sign in to add in-line comments.
Jump to:
Left: Side by side diff | Download
Right: Side by side diff | Download
« no previous file with change/comment | « no previous file | Lib/json/decoder.py » ('j') | no next file with change/comment »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
LEFTRIGHT
1 :mod:`json` --- JSON encoder and decoder 1 :mod:`json` --- JSON encoder and decoder
2 ======================================== 2 ========================================
3 3
4 .. module:: json 4 .. module:: json
5 :synopsis: Encode and decode the JSON format. 5 :synopsis: Encode and decode the JSON format.
6 .. moduleauthor:: Bob Ippolito <bob@redivi.com> 6 .. moduleauthor:: Bob Ippolito <bob@redivi.com>
7 .. sectionauthor:: Bob Ippolito <bob@redivi.com> 7 .. sectionauthor:: Bob Ippolito <bob@redivi.com>
8 8
9 `JSON (JavaScript Object Notation) <http://json.org>`_, specified by 9 `JSON (JavaScript Object Notation) <http://json.org>`_, specified by
10 :rfc:`4627`, is a lightweight data interchange format based on a subset of 10 :rfc:`7159` (which obsoletes :rfc:`4627`) and by
11 `JavaScript <http://en.wikipedia.org/wiki/JavaScript>`_ syntax (`ECMA-262 3rd 11 `ECMA-404 <http://www.ecma-international.org/publications/standards/Ecma-404.htm >`_,
12 edition <http://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA- 262,%203rd%20edition,%20December%201999.pdf>`_). 12 is a lightweight data interchange format inspired by
13 `JavaScript <http://en.wikipedia.org/wiki/JavaScript>`_ object literal syntax
14 (although it is not a strict subset of JavaScript [#rfc-errata]_ ).
13 15
14 :mod:`json` exposes an API familiar to users of the standard library 16 :mod:`json` exposes an API familiar to users of the standard library
15 :mod:`marshal` and :mod:`pickle` modules. 17 :mod:`marshal` and :mod:`pickle` modules.
16 18
17 Encoding basic Python object hierarchies:: 19 Encoding basic Python object hierarchies::
18 20
19 >>> import json 21 >>> import json
20 >>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}]) 22 >>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])
21 '["foo", {"bar": ["baz", null, 1.0, 2]}]' 23 '["foo", {"bar": ["baz", null, 1.0, 2]}]'
22 >>> print(json.dumps("\"foo\bar")) 24 >>> print(json.dumps("\"foo\bar"))
(...skipping 67 matching lines...) Expand 10 before | Expand all | Expand 10 after
90 >>> ComplexEncoder().encode(2 + 1j) 92 >>> ComplexEncoder().encode(2 + 1j)
91 '[2.0, 1.0]' 93 '[2.0, 1.0]'
92 >>> list(ComplexEncoder().iterencode(2 + 1j)) 94 >>> list(ComplexEncoder().iterencode(2 + 1j))
93 ['[2.0', ', 1.0', ']'] 95 ['[2.0', ', 1.0', ']']
94 96
95 97
96 .. highlight:: bash 98 .. highlight:: bash
97 99
98 Using json.tool from the shell to validate and pretty-print:: 100 Using json.tool from the shell to validate and pretty-print::
99 101
100 $ echo '{"json":"obj"}' | python -mjson.tool 102 $ echo '{"json":"obj"}' | python -m json.tool
101 { 103 {
102 "json": "obj" 104 "json": "obj"
103 } 105 }
104 $ echo '{1.2:3.4}' | python -mjson.tool 106 $ echo '{1.2:3.4}' | python -m json.tool
105 Expecting property name enclosed in double quotes: line 1 column 2 (char 1) 107 Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
106 108
107 See :ref:`json-commandline` for detailed documentation. 109 See :ref:`json-commandline` for detailed documentation.
108 110
109 .. highlight:: python3 111 .. highlight:: python3
110 112
111 .. note:: 113 .. note::
112 114
113 JSON is a subset of `YAML <http://yaml.org/>`_ 1.2. The JSON produced by 115 JSON is a subset of `YAML <http://yaml.org/>`_ 1.2. The JSON produced by
114 this module's default settings (in particular, the default *separators* 116 this module's default settings (in particular, the default *separators*
(...skipping 357 matching lines...) Expand 10 before | Expand all | Expand 10 after
472 474
473 Exceptions 475 Exceptions
474 ---------- 476 ----------
475 477
476 .. exception:: JSONDecodeError(msg, doc, pos, end=None) 478 .. exception:: JSONDecodeError(msg, doc, pos, end=None)
477 479
478 Subclass of :exc:`ValueError` with the following additional attributes: 480 Subclass of :exc:`ValueError` with the following additional attributes:
479 481
480 .. attribute:: msg 482 .. attribute:: msg
481 483
482 The unformatted error message 484 The unformatted error message.
483 485
484 .. attribute:: doc 486 .. attribute:: doc
485 487
486 The JSON document being parsed 488 The JSON document being parsed.
487 489
488 .. attribute:: pos 490 .. attribute:: pos
489 491
490 The start index of doc where parsing failed 492 The start index of *doc* where parsing failed.
491
492 .. attribute:: end
493
494 The end index of doc where parsing failed (may be ``None``)
495 493
496 .. attribute:: lineno 494 .. attribute:: lineno
497 495
498 The line corresponding to pos 496 The line corresponding to *pos*.
499 497
500 .. attribute:: colno 498 .. attribute:: colno
501 499
502 The column corresponding to pos 500 The column corresponding to *pos*.
503
504 .. attribute:: endlineno
505
506 The line corresponding to end (may be ``None``)
507
508 .. attribute:: endcolno
509
510 The column corresponding to end (may be ``None``)
511 501
512 .. versionadded:: 3.5 502 .. versionadded:: 3.5
513 503
514 504
515 Standard Compliance 505 Standard Compliance and Interoperability
516 ------------------- 506 ----------------------------------------
517 507
518 The JSON format is specified by :rfc:`4627`. This section details this 508 The JSON format is specified by :rfc:`7159` and by
519 module's level of compliance with the RFC. For simplicity, 509 `ECMA-404 <http://www.ecma-international.org/publications/standards/Ecma-404.htm >`_.
520 :class:`JSONEncoder` and :class:`JSONDecoder` subclasses, and parameters other 510 This section details this module's level of compliance with the RFC.
521 than those explicitly mentioned, are not considered. 511 For simplicity, :class:`JSONEncoder` and :class:`JSONDecoder` subclasses, and
512 parameters other than those explicitly mentioned, are not considered.
522 513
523 This module does not comply with the RFC in a strict fashion, implementing some 514 This module does not comply with the RFC in a strict fashion, implementing some
524 extensions that are valid JavaScript but not valid JSON. In particular: 515 extensions that are valid JavaScript but not valid JSON. In particular:
525 516
526 - Top-level non-object, non-array values are accepted and output;
527 - Infinite and NaN number values are accepted and output; 517 - Infinite and NaN number values are accepted and output;
528 - Repeated names within an object are accepted, and only the value of the last 518 - Repeated names within an object are accepted, and only the value of the last
529 name-value pair is used. 519 name-value pair is used.
530 520
531 Since the RFC permits RFC-compliant parsers to accept input texts that are not 521 Since the RFC permits RFC-compliant parsers to accept input texts that are not
532 RFC-compliant, this module's deserializer is technically RFC-compliant under 522 RFC-compliant, this module's deserializer is technically RFC-compliant under
533 default settings. 523 default settings.
534 524
535 Character Encodings 525 Character Encodings
536 ^^^^^^^^^^^^^^^^^^^ 526 ^^^^^^^^^^^^^^^^^^^
537 527
538 The RFC recommends that JSON be represented using either UTF-8, UTF-16, or 528 The RFC requires that JSON be represented using either UTF-8, UTF-16, or
539 UTF-32, with UTF-8 being the default. 529 UTF-32, with UTF-8 being the recommended default for maximum interoperability.
540 530
541 As permitted, though not required, by the RFC, this module's serializer sets 531 As permitted, though not required, by the RFC, this module's serializer sets
542 *ensure_ascii=True* by default, thus escaping the output so that the resulting 532 *ensure_ascii=True* by default, thus escaping the output so that the resulting
543 strings only contain ASCII characters. 533 strings only contain ASCII characters.
544 534
545 Other than the *ensure_ascii* parameter, this module is defined strictly in 535 Other than the *ensure_ascii* parameter, this module is defined strictly in
546 terms of conversion between Python objects and 536 terms of conversion between Python objects and
547 :class:`Unicode strings <str>`, and thus does not otherwise address the issue 537 :class:`Unicode strings <str>`, and thus does not otherwise directly address
548 of character encodings. 538 the issue of character encodings.
549 539
550 540 The RFC prohibits adding a byte order mark (BOM) to the start of a JSON text,
551 Top-level Non-Object, Non-Array Values 541 and this module's serializer does not add a BOM to its output.
552 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 542 The RFC permits, but does not require, JSON deserializers to ignore an initial
553 543 BOM in their input. This module's deserializer raises a :exc:`ValueError`
554 The RFC specifies that the top-level value of a JSON text must be either a 544 when an initial BOM is present.
555 JSON object or array (Python :class:`dict` or :class:`list`). This module's 545
556 deserializer also accepts input texts consisting solely of a 546 The RFC does not explicitly forbid JSON strings which contain byte sequences
557 JSON null, boolean, number, or string value:: 547 that don't correspond to valid Unicode characters (e.g. unpaired UTF-16
558 548 surrogates), but it does note that they may cause interoperability problems.
559 >>> just_a_json_string = '"spam and eggs"' # Not by itself a valid JSON text 549 By default, this module accepts and outputs (when present in the original
560 >>> json.loads(just_a_json_string) 550 :class:`str`) codepoints for such sequences.
561 'spam and eggs'
562
563 This module itself does not include a way to request that such input texts be
564 regarded as illegal. Likewise, this module's serializer also accepts single
565 Python :data:`None`, :class:`bool`, numeric, and :class:`str`
566 values as input and will generate output texts consisting solely of a top-level
567 JSON null, boolean, number, or string value without raising an exception::
568
569 >>> neither_a_list_nor_a_dict = "spam and eggs"
570 >>> json.dumps(neither_a_list_nor_a_dict) # The result is not a valid JSON t ext
571 '"spam and eggs"'
572
573 This module's serializer does not itself include a way to enforce the
574 aforementioned constraint.
575 551
576 552
577 Infinite and NaN Number Values 553 Infinite and NaN Number Values
578 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 554 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
579 555
580 The RFC does not permit the representation of infinite or NaN number values. 556 The RFC does not permit the representation of infinite or NaN number values.
581 Despite that, by default, this module accepts and outputs ``Infinity``, 557 Despite that, by default, this module accepts and outputs ``Infinity``,
582 ``-Infinity``, and ``NaN`` as if they were valid JSON number literal values:: 558 ``-Infinity``, and ``NaN`` as if they were valid JSON number literal values::
583 559
584 >>> # Neither of these calls raises an exception, but the results are not val id JSON 560 >>> # Neither of these calls raises an exception, but the results are not val id JSON
585 >>> json.dumps(float('-inf')) 561 >>> json.dumps(float('-inf'))
586 '-Infinity' 562 '-Infinity'
587 >>> json.dumps(float('nan')) 563 >>> json.dumps(float('nan'))
588 'NaN' 564 'NaN'
589 >>> # Same when deserializing 565 >>> # Same when deserializing
590 >>> json.loads('-Infinity') 566 >>> json.loads('-Infinity')
591 -inf 567 -inf
592 >>> json.loads('NaN') 568 >>> json.loads('NaN')
593 nan 569 nan
594 570
595 In the serializer, the *allow_nan* parameter can be used to alter this 571 In the serializer, the *allow_nan* parameter can be used to alter this
596 behavior. In the deserializer, the *parse_constant* parameter can be used to 572 behavior. In the deserializer, the *parse_constant* parameter can be used to
597 alter this behavior. 573 alter this behavior.
598 574
599 575
600 Repeated Names Within an Object 576 Repeated Names Within an Object
601 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 577 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
602 578
603 The RFC specifies that the names within a JSON object should be unique, but 579 The RFC specifies that the names within a JSON object should be unique, but
604 does not specify how repeated names in JSON objects should be handled. By 580 does not mandate how repeated names in JSON objects should be handled. By
605 default, this module does not raise an exception; instead, it ignores all but 581 default, this module does not raise an exception; instead, it ignores all but
606 the last name-value pair for a given name:: 582 the last name-value pair for a given name::
607 583
608 >>> weird_json = '{"x": 1, "x": 2, "x": 3}' 584 >>> weird_json = '{"x": 1, "x": 2, "x": 3}'
609 >>> json.loads(weird_json) 585 >>> json.loads(weird_json)
610 {'x': 3} 586 {'x': 3}
611 587
612 The *object_pairs_hook* parameter can be used to alter this behavior. 588 The *object_pairs_hook* parameter can be used to alter this behavior.
613 589
590
591 Top-level Non-Object, Non-Array Values
592 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
593
594 The old version of JSON specified by the obsolete :rfc:`4627` required that
595 the top-level value of a JSON text must be either a JSON object or array
596 (Python :class:`dict` or :class:`list`), and could not be a JSON null,
597 boolean, number, or string value. :rfc:`7159` removed that restriction, and
598 this module does not and has never implemented that restriction in either its
599 serializer or its deserializer.
600
601 Regardless, for maximum interoperability, you may wish to voluntarily adhere
602 to the restriction yourself.
603
604
605 Implementation Limitations
606 ^^^^^^^^^^^^^^^^^^^^^^^^^^
607
608 Some JSON deserializer implementations may set limits on:
609
610 * the size of accepted JSON texts
611 * the maximum level of nesting of JSON objects and arrays
612 * the range and precision of JSON numbers
613 * the content and maximum length of JSON strings
614
615 This module does not impose any such limits beyond those of the relevant
616 Python datatypes themselves or the Python interpreter itself.
617
618 When serializing to JSON, beware any such limitations in applications that may
619 consume your JSON. In particular, it is common for JSON numbers to be
620 deserialized into IEEE 754 double precision numbers and thus subject to that
621 representation's range and precision limitations. This is especially relevant
622 when serializing Python :class:`int` values of extremely large magnitude, or
623 when serializing instances of "exotic" numerical types such as
624 :class:`decimal.Decimal`.
625
614 .. highlight:: bash 626 .. highlight:: bash
627 .. module:: json.tool
615 628
616 .. _json-commandline: 629 .. _json-commandline:
617 630
618 Command Line Interface 631 Command Line Interface
619 ---------------------- 632 ----------------------
620 633
621 The :mod:`json.tool` module provides a simple command line interface to validate 634 The :mod:`json.tool` module provides a simple command line interface to validate
622 and pretty-print JSON objects. 635 and pretty-print JSON objects.
623 636
624 If the optional ``infile`` and ``outfile`` arguments are not 637 If the optional ``infile`` and ``outfile`` arguments are not
625 specified, :attr:`sys.stdin` and :attr:`sys.stdout` will be used respectively:: 638 specified, :attr:`sys.stdin` and :attr:`sys.stdout` will be used respectively::
626 639
627 $ echo '{"json": "obj"}' | python -m json.tool 640 $ echo '{"json": "obj"}' | python -m json.tool
628 { 641 {
629 "json": "obj" 642 "json": "obj"
630 } 643 }
631 $ echo '{1.2:3.4}' | python -m json.tool 644 $ echo '{1.2:3.4}' | python -m json.tool
632 Expecting property name enclosed in double quotes: line 1 column 2 (char 1) 645 Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
633 646
647 .. versionchanged:: 3.5
648 The output is now in the same order as the input. Use the
649 :option:`--sort-keys` option to sort the output of dictionaries
650 alphabetically by key.
634 651
635 Command line options 652 Command line options
636 ^^^^^^^^^^^^^^^^^^^^ 653 ^^^^^^^^^^^^^^^^^^^^
637 654
638 .. cmdoption:: infile 655 .. cmdoption:: infile
639 656
640 The JSON file to be validated or pretty-printed:: 657 The JSON file to be validated or pretty-printed::
641 658
642 $ python -m json.tool mp_films.json 659 $ python -m json.tool mp_films.json
643 [ 660 [
644 { 661 {
645 "title": "And Now for Something Completely Different", 662 "title": "And Now for Something Completely Different",
646 "year": 1971 663 "year": 1971
647 }, 664 },
648 { 665 {
649 "title": "Monty Python and the Holy Grail", 666 "title": "Monty Python and the Holy Grail",
650 "year": 1975 667 "year": 1975
651 } 668 }
652 ] 669 ]
653 670
654 If *infile* is not specified, read from :attr:`sys.stdin`. 671 If *infile* is not specified, read from :attr:`sys.stdin`.
655 672
656 .. cmdoption:: outfile 673 .. cmdoption:: outfile
657 674
658 Write the output of the *infile* to the given *outfile*. Otherwise, write it 675 Write the output of the *infile* to the given *outfile*. Otherwise, write it
659 to :attr:`sys.stdout`. 676 to :attr:`sys.stdout`.
660 677
678 .. cmdoption:: --sort-keys
679
680 Sort the output of dictionaries alphabetically by key.
681
682 .. versionadded:: 3.5
683
661 .. cmdoption:: -h, --help 684 .. cmdoption:: -h, --help
662 685
663 Show the help message. 686 Show the help message.
687
688
689 .. rubric:: Footnotes
690
691 .. [#rfc-errata] As noted in `the errata for RFC 7159
692 <http://www.rfc-editor.org/errata_search.php?rfc=7159>`_,
693 JSON permits literal U+2028 (LINE SEPARATOR) and
694 U+2029 (PARAGRAPH SEPARATOR) characters in strings, whereas JavaScript
695 (as of ECMAScript Edition 5.1) does not.
LEFTRIGHT

RSS Feeds Recent Issues | This issue
This is Rietveld 894c83f36cb7+