# HG changeset patch # User Frank van Dijk # Date 1407266167 -7200 # Tue Aug 05 21:16:07 2014 +0200 # Branch 2.7 # Node ID 698951e2a998d99299d451a2aee73539daed37f9 # Parent 133ee2b48e52cb891b1898a16e3c187ab18f4310 steer folks away from using codecs.open because it handles text files incorrectly diff -r 133ee2b48e52 -r 698951e2a998 Doc/howto/unicode.rst --- a/Doc/howto/unicode.rst Fri Aug 01 23:51:51 2014 -0700 +++ b/Doc/howto/unicode.rst Tue Aug 05 21:16:07 2014 +0200 @@ -566,6 +566,12 @@ print repr(f.readline()[:1]) f.close() +The :func:`codecs.open()` function does not support the automatic newline +translation features that the builtin :func:`open()` function provides to make +reading and writing text platform independent. If you need automatic newline +translation for the Unicode data that you read or write, consider using +:func:`io.open()` instead. + Unicode character U+FEFF is used as a byte-order mark (BOM), and is often written as the first character of a file in order to assist with autodetection of the file's byte ordering. Some encodings, such as UTF-16, expect a BOM to be diff -r 133ee2b48e52 -r 698951e2a998 Doc/library/codecs.rst --- a/Doc/library/codecs.rst Fri Aug 01 23:51:51 2014 -0700 +++ b/Doc/library/codecs.rst Tue Aug 05 21:16:07 2014 +0200 @@ -246,17 +246,17 @@ .. note:: + Files are always opened in binary mode, even if no binary mode was + specified. This means that no automatic conversion of ``'\n'`` is done + on reading and writing. To open text files with automatic + newline conversion use the :func:`io.open` function instead. + + .. note:: + The wrapped version will only accept the object format defined by the codecs, i.e. Unicode objects for most built-in codecs. Output is also codec-dependent and will usually be Unicode as well. - .. note:: - - Files are always opened in binary mode, even if no binary mode was - specified. This is done to avoid data loss due to encodings using 8-bit - values. This means that no automatic conversion of ``'\n'`` is done - on reading and writing. - *encoding* specifies the encoding which is to be used for the file. *errors* may be given to define the error handling. It defaults to ``'strict'``