# HG changeset patch # User Frank van Dijk # Date 1407064865 -7200 # Sun Aug 03 13:21:05 2014 +0200 # Branch 2.7 # Node ID 425f3144941f9fe213f2d61ed9e9e766d0cd87da # Parent 133ee2b48e52cb891b1898a16e3c187ab18f4310 steer folks away from using codecs.open because it handles text files incorrectly diff -r 133ee2b48e52 -r 425f3144941f Doc/howto/unicode.rst --- a/Doc/howto/unicode.rst Fri Aug 01 23:51:51 2014 -0700 +++ b/Doc/howto/unicode.rst Sun Aug 03 13:21:05 2014 +0200 @@ -365,10 +365,6 @@ interfaces, but implementing encodings is a specialized task that also won't be covered here. Consult the Python documentation to learn more about this module. -The most commonly used part of the :mod:`codecs` module is the -:func:`codecs.open` function which will be discussed in the section on input and -output. - Unicode Literals in Python Source Code -------------------------------------- @@ -534,33 +530,31 @@ The solution would be to use the low-level decoding interface to catch the case of partial coding sequences. The work of implementing this has already been -done for you: the :mod:`codecs` module includes a version of the :func:`open` -function that returns a file-like object that assumes the file's contents are in -a specified encoding and accepts Unicode parameters for methods such as -``.read()`` and ``.write()``. +done for you: the :func:`io.open` function returns a file-like object that +assumes the file's contents are in a specified encoding and accepts Unicode +parameters for methods such as ``.read()`` and ``.write()``. -The function's parameters are ``open(filename, mode='rb', encoding=None, -errors='strict', buffering=1)``. ``mode`` can be ``'r'``, ``'w'``, or ``'a'``, -just like the corresponding parameter to the regular built-in ``open()`` -function; add a ``'+'`` to update the file. ``buffering`` is similarly parallel -to the standard function's parameter. ``encoding`` is a string giving the -encoding to use; if it's left as ``None``, a regular Python file object that -accepts 8-bit strings is returned. Otherwise, a wrapper object is returned, and -data written to or read from the wrapper object will be converted as needed. -``errors`` specifies the action for encoding errors and can be one of the usual -values of 'strict', 'ignore', and 'replace'. +The function's parameters are ``io.open(file, mode='r', buffering=-1, +encoding=None, errors=None, newline=None, closefd=True)``. ``mode`` can be +``'r'``, ``'w'``, or ``'a'``, just like the corresponding parameter to the +regular built-in ``open()`` function; add a ``'+'`` to update the file. +``buffering`` is similarly parallel to the standard function's parameter. +``encoding`` is a string giving the encoding to use. Data written to or read +from the stream will be converted as needed. ``errors`` specifies the action +for encoding errors and can be one of the usual values of 'strict', 'ignore', +and 'replace'. Reading Unicode from a file is therefore simple:: - import codecs - f = codecs.open('unicode.rst', encoding='utf-8') + import io + f = io.open('unicode.rst', encoding='utf-8') for line in f: print repr(line) It's also possible to open files in update mode, allowing both reading and writing:: - f = codecs.open('test', encoding='utf-8', mode='w+') + f = io.open('test', encoding='utf-8', mode='w+') f.write(u'\u4500 blah blah blah\n') f.seek(0) print repr(f.readline()[:1]) diff -r 133ee2b48e52 -r 425f3144941f Doc/library/codecs.rst --- a/Doc/library/codecs.rst Fri Aug 01 23:51:51 2014 -0700 +++ b/Doc/library/codecs.rst Sun Aug 03 13:21:05 2014 +0200 @@ -246,17 +246,17 @@ .. note:: + Files are always opened in binary mode, even if no binary mode was + specified. This means that no automatic conversion of ``b'\n'`` is done + on reading and writing. To open text files with transparent + encoding/decoding use the :func:`io.open` function instead. + + .. note:: + The wrapped version will only accept the object format defined by the codecs, i.e. Unicode objects for most built-in codecs. Output is also codec-dependent and will usually be Unicode as well. - .. note:: - - Files are always opened in binary mode, even if no binary mode was - specified. This is done to avoid data loss due to encodings using 8-bit - values. This means that no automatic conversion of ``'\n'`` is done - on reading and writing. - *encoding* specifies the encoding which is to be used for the file. *errors* may be given to define the error handling. It defaults to ``'strict'``