Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(24)

Side by Side Diff: Doc/howto/unicode.rst

Issue 18758: Fix internal references in the documentation
Patch Set: Created 6 years, 6 months ago
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments. Please Sign in to add in-line comments.
Jump to:
View unified diff | Download patch
« no previous file with comments | « Doc/faq/design.rst ('k') | Doc/howto/urllib2.rst » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 .. _unicode-howto: 1 .. _unicode-howto:
2 2
3 ***************** 3 *****************
4 Unicode HOWTO 4 Unicode HOWTO
5 ***************** 5 *****************
6 6
7 :Release: 1.12 7 :Release: 1.12
8 8
9 This HOWTO discusses Python support for Unicode, and explains 9 This HOWTO discusses Python support for Unicode, and explains
10 various problems that people commonly encounter when trying to work 10 various problems that people commonly encounter when trying to work
(...skipping 513 matching lines...) Expand 10 before | Expand all | Expand 10 after
524 end of a chunk. One solution would be to read the entire file into memory and 524 end of a chunk. One solution would be to read the entire file into memory and
525 then perform the decoding, but that prevents you from working with files that 525 then perform the decoding, but that prevents you from working with files that
526 are extremely large; if you need to read a 2 GiB file, you need 2 GiB of RAM. 526 are extremely large; if you need to read a 2 GiB file, you need 2 GiB of RAM.
527 (More, really, since for at least a moment you'd need to have both the encoded 527 (More, really, since for at least a moment you'd need to have both the encoded
528 string and its Unicode version in memory.) 528 string and its Unicode version in memory.)
529 529
530 The solution would be to use the low-level decoding interface to catch the case 530 The solution would be to use the low-level decoding interface to catch the case
531 of partial coding sequences. The work of implementing this has already been 531 of partial coding sequences. The work of implementing this has already been
532 done for you: the built-in :func:`open` function can return a file-like object 532 done for you: the built-in :func:`open` function can return a file-like object
533 that assumes the file's contents are in a specified encoding and accepts Unicode 533 that assumes the file's contents are in a specified encoding and accepts Unicode
534 parameters for methods such as :meth:`read` and :meth:`write`. This works throu gh 534 parameters for methods such as :meth:`~io.TextIOBase.read` and :meth:`~io.TextIO Base.write`. This works through
535 :func:`open`\'s *encoding* and *errors* parameters which are interpreted just 535 :func:`open`\'s *encoding* and *errors* parameters which are interpreted just
536 like those in :meth:`str.encode` and :meth:`bytes.decode`. 536 like those in :meth:`str.encode` and :meth:`bytes.decode`.
537 537
538 Reading Unicode from a file is therefore simple:: 538 Reading Unicode from a file is therefore simple::
539 539
540 with open('unicode.txt', encoding='utf-8') as f: 540 with open('unicode.txt', encoding='utf-8') as f:
541 for line in f: 541 for line in f:
542 print(repr(line)) 542 print(repr(line))
543 543
544 It's also possible to open files in update mode, allowing both reading and 544 It's also possible to open files in update mode, allowing both reading and
(...skipping 104 matching lines...) Expand 10 before | Expand all | Expand 10 after
649 649
650 650
651 Converting Between File Encodings 651 Converting Between File Encodings
652 ''''''''''''''''''''''''''''''''' 652 '''''''''''''''''''''''''''''''''
653 653
654 The :class:`~codecs.StreamRecoder` class can transparently convert between 654 The :class:`~codecs.StreamRecoder` class can transparently convert between
655 encodings, taking a stream that returns data in encoding #1 655 encodings, taking a stream that returns data in encoding #1
656 and behaving like a stream returning data in encoding #2. 656 and behaving like a stream returning data in encoding #2.
657 657
658 For example, if you have an input file *f* that's in Latin-1, you 658 For example, if you have an input file *f* that's in Latin-1, you
659 can wrap it with a :class:`StreamRecoder` to return bytes encoded in UTF-8:: 659 can wrap it with a :class:`~codecs.StreamRecoder` to return bytes encoded in UTF -8::
660 660
661 new_f = codecs.StreamRecoder(f, 661 new_f = codecs.StreamRecoder(f,
662 # en/decoder: used by read() to encode its results and 662 # en/decoder: used by read() to encode its results and
663 # by write() to decode its input. 663 # by write() to decode its input.
664 codecs.getencoder('utf-8'), codecs.getdecoder('utf-8'), 664 codecs.getencoder('utf-8'), codecs.getdecoder('utf-8'),
665 665
666 # reader/writer: used to read and write to the stream. 666 # reader/writer: used to read and write to the stream.
667 codecs.getreader('latin-1'), codecs.getwriter('latin-1') ) 667 codecs.getreader('latin-1'), codecs.getwriter('latin-1') )
668 668
669 669
(...skipping 37 matching lines...) Expand 10 before | Expand all | Expand 10 after
707 ================ 707 ================
708 708
709 The initial draft of this document was written by Andrew Kuchling. 709 The initial draft of this document was written by Andrew Kuchling.
710 It has since been revised further by Alexander Belopolsky, Georg Brandl, 710 It has since been revised further by Alexander Belopolsky, Georg Brandl,
711 Andrew Kuchling, and Ezio Melotti. 711 Andrew Kuchling, and Ezio Melotti.
712 712
713 Thanks to the following people who have noted errors or offered 713 Thanks to the following people who have noted errors or offered
714 suggestions on this article: Éric Araujo, Nicholas Bastin, Nick 714 suggestions on this article: Éric Araujo, Nicholas Bastin, Nick
715 Coghlan, Marius Gedminas, Kent Johnson, Ken Krugler, Marc-André 715 Coghlan, Marius Gedminas, Kent Johnson, Ken Krugler, Marc-André
716 Lemburg, Martin von Löwis, Terry J. Reedy, Chad Whitacre. 716 Lemburg, Martin von Löwis, Terry J. Reedy, Chad Whitacre.
OLDNEW
« no previous file with comments | « Doc/faq/design.rst ('k') | Doc/howto/urllib2.rst » ('j') | no next file with comments »

RSS Feeds Recent Issues | This issue
This is Rietveld 894c83f36cb7+