Index: unicode.rst =================================================================== --- unicode.rst (revision 74875) +++ unicode.rst (working copy) @@ -480,20 +480,21 @@ corresponding Unicode object. *errors* (if non-*NULL*) defines the error handling. It defaults to "strict". - If *byteorder* is non-*NULL*, the decoder starts decoding using the given byte - order:: + If *byteorder* is *NULL*, or if *\*byteorder* is 0, then the byte order + is autodetected. The decoder checks to see if the buffer starts with a + UTF16 byte order mark (BOM). If so, the BOM is used to determine the + byte order, and *\*byteorder* is set to report the byte order to the + caller. This BOM is not copied into the resulting Unicode string. If no + BOM is present then native byte order is assumed, and *\*byteorder* is not + updated. + If *byteorder* is non-*NULL* and *\*byteorder* is 1 or -1, then the byte + order is fixed. Any BOM will not change the byte order, and will be + copied into the resulting Unicode string. The values are:: + *byteorder == -1: little endian - *byteorder == 0: native order *byteorder == 1: big endian - and then switches if the first two bytes of the input data are a byte order mark - (BOM) and the specified byte order is native order. This BOM is not copied into - the resulting Unicode string. After completion, *\*byteorder* is set to the - current byte order at the. - - If *byteorder* is *NULL*, the codec starts in native order mode. - Return *NULL* if an exception was raised by the codec. .. versionchanged:: 2.5 @@ -519,13 +520,13 @@ .. cfunction:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder) - Return a Python string object holding the UTF-16 encoded value of the Unicode - data in *s*. If *byteorder* is not ``0``, output is written according to the - following byte order:: + Return a Python string object holding the UTF-16 encoded value of the + Unicode data in *s*. Output is written according to the following + byte order:: - byteorder == -1: little endian + byteorder == -1: little endian (no BOM) byteorder == 0: native byte order (writes a BOM mark) - byteorder == 1: big endian + byteorder == 1: big endian (no BOM) If byteorder is ``0``, the output string will always start with the Unicode BOM mark (U+FEFF). In the other two modes, no BOM mark is prepended.