Index: Include/unicodeobject.h =================================================================== --- Include/unicodeobject.h (révision 81162) +++ Include/unicodeobject.h (copie de travail) @@ -1245,20 +1245,24 @@ PyAPI_FUNC(int) PyUnicode_FSConverter(PyObject*, void*); -/* Decode a null-terminated string using Py_FileSystemDefaultEncoding. +/* Decodes a null-terminated string using Py_FileSystemDefaultEncoding + and "surrogateescape" error handler. - If the encoding is supported by one of the built-in codecs (i.e., UTF-8, - UTF-16, UTF-32, Latin-1 or MBCS), otherwise fallback to UTF-8 and replace - invalid characters with '?'. + If Py_FileSystemDefaultEncoding is not set, fallback to UTF-8. - The function is intended to be used for paths and file names only - during bootstrapping process where the codecs are not set up. + Use PyUnicode_DecodeFSDefaultAndSize() if you have the string length. */ PyAPI_FUNC(PyObject*) PyUnicode_DecodeFSDefault( const char *s /* encoded string */ ); +/* Decodes a string using Py_FileSystemDefaultEncoding + and "surrogateescape" error handler. + + If Py_FileSystemDefaultEncoding is not set, fallback to UTF-8. +*/ + PyAPI_FUNC(PyObject*) PyUnicode_DecodeFSDefaultAndSize( const char *s, /* encoded string */ Py_ssize_t size /* size */ Index: Doc/c-api/unicode.rst =================================================================== --- Doc/c-api/unicode.rst (révision 81162) +++ Doc/c-api/unicode.rst (copie de travail) @@ -10,12 +10,13 @@ Unicode Objects ^^^^^^^^^^^^^^^ +Unicode Type +"""""""""""" + These are the basic Unicode object types used for the Unicode implementation in Python: -.. % --- Unicode Type ------------------------------------------------------- - .. ctype:: Py_UNICODE This type represents the storage type which is used by Python internally as @@ -89,13 +90,14 @@ Clear the free list. Return the total number of freed items. +Unicode character properties +"""""""""""""""""""""""""""" + Unicode provides many different character properties. The most often needed ones are available through these macros which are mapped to C functions depending on the Python configuration. -.. % --- Unicode character properties --------------------------------------- - .. cfunction:: int Py_UNICODE_ISSPACE(Py_UNICODE ch) Return 1 or 0 depending on whether *ch* is a whitespace character. @@ -192,12 +194,14 @@ Return the character *ch* converted to a double. Return ``-1.0`` if this is not possible. This macro does not raise exceptions. + +Plain Py_UNICODE +"""""""""""""""" + To create Unicode objects and access their basic sequence properties, use these APIs: -.. % --- Plain Py_UNICODE --------------------------------------------------- - .. cfunction:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size) Create a Unicode Object from the Py_UNICODE buffer *u* of the given size. *u* @@ -364,9 +368,46 @@ Support is optimized if Python's own :ctype:`Py_UNICODE` type is identical to the system's :ctype:`wchar_t`. -.. % --- wchar_t support for platforms which support it --------------------- +File System Encoding +"""""""""""""""""""" +For encoding and decoding file names and other environment strings, +:cdata:`Py_FileSystemEncoding` should be used as the encoding, and +``'surrogateescape'`` should be used as the error handler (:pep:`383`). For +encoding file names during argument parsing, the ``O&`` converter should be +used, passsing PyUnicode_FSConverter as the conversion function: + +.. cfunction:: int PyUnicode_FSConverter(PyObject* obj, void* result) + + Convert *obj* into *result*, using :cdata:`Py_FileSystemDefaultEncoding`, + and the ``'surrogateescape'`` error handler. *result* must be a + ``PyObject*``, yielding a bytes object which must be released if it is no + longer used. + + .. versionadded:: 3.1 + +.. cfunction:: PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size) + + Decodes a null-terminated string using :cdata:`Py_FileSystemDefaultEncoding` + and ``'surrogateescape'`` error handler. + + If :cdata:`Py_FileSystemDefaultEncoding` is not set, fallback to UTF-8. + + Use :func:`PyUnicode_DecodeFSDefaultAndSize` if you know the string length. + +.. cfunction:: PyObject* PyUnicode_DecodeFSDefault(const char *s) + + Decodes a string using :cdata:`Py_FileSystemDefaultEncoding` and + ``'surrogateescape'`` error handler. + + If :cdata:`Py_FileSystemDefaultEncoding` is not set, fallback to UTF-8. + + +wchar_t support for platforms which support it +"""""""""""""""""""""""""""""""""""""""""""""" + + .. cfunction:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size) Create a Unicode object from the :ctype:`wchar_t` buffer *w* of the given size. @@ -413,11 +454,13 @@ The codecs all use a similar interface. Only deviation from the following generic ones are documented for simplicity. + +Generic Codecs +"""""""""""""" + These are the generic codec APIs: -.. % --- Generic Codecs ----------------------------------------------------- - .. cfunction:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, const char *errors) Create a Unicode object by decoding *size* bytes of the encoded string *s*. @@ -444,11 +487,13 @@ using the Python codec registry. Return *NULL* if an exception was raised by the codec. + +UTF-8 Codecs +"""""""""""" + These are the UTF-8 codec APIs: -.. % --- UTF-8 Codecs ------------------------------------------------------- - .. cfunction:: PyObject* PyUnicode_DecodeUTF8(const char *s, Py_ssize_t size, const char *errors) Create a Unicode object by decoding *size* bytes of the UTF-8 encoded string @@ -476,11 +521,13 @@ object. Error handling is "strict". Return *NULL* if an exception was raised by the codec. + +UTF-32 Codecs +""""""""""""" + These are the UTF-32 codec APIs: -.. % --- UTF-32 Codecs ------------------------------------------------------ */ - .. cfunction:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder) Decode *length* bytes from a UTF-32 encoded buffer string and return the @@ -543,11 +590,12 @@ Return *NULL* if an exception was raised by the codec. +UTF-16 Codecs +""""""""""""" + These are the UTF-16 codec APIs: -.. % --- UTF-16 Codecs ------------------------------------------------------ */ - .. cfunction:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder) Decode *length* bytes from a UTF-16 encoded buffer string and return the @@ -609,11 +657,13 @@ order. The string always starts with a BOM mark. Error handling is "strict". Return *NULL* if an exception was raised by the codec. + +Unicode-Escape Codecs +""""""""""""""""""""" + These are the "Unicode Escape" codec APIs: -.. % --- Unicode-Escape Codecs ---------------------------------------------- - .. cfunction:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, Py_ssize_t size, const char *errors) Create a Unicode object by decoding *size* bytes of the Unicode-Escape encoded @@ -633,11 +683,13 @@ string object. Error handling is "strict". Return *NULL* if an exception was raised by the codec. + +Raw-Unicode-Escape Codecs +""""""""""""""""""""""""" + These are the "Raw Unicode Escape" codec APIs: -.. % --- Raw-Unicode-Escape Codecs ------------------------------------------ - .. cfunction:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, Py_ssize_t size, const char *errors) Create a Unicode object by decoding *size* bytes of the Raw-Unicode-Escape @@ -657,12 +709,14 @@ Python string object. Error handling is "strict". Return *NULL* if an exception was raised by the codec. + +Latin-1 Codecs +"""""""""""""" + These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode ordinals and only these are accepted by the codecs during encoding. -.. % --- Latin-1 Codecs ----------------------------------------------------- - .. cfunction:: PyObject* PyUnicode_DecodeLatin1(const char *s, Py_ssize_t size, const char *errors) Create a Unicode object by decoding *size* bytes of the Latin-1 encoded string @@ -682,12 +736,14 @@ object. Error handling is "strict". Return *NULL* if an exception was raised by the codec. + +ASCII Codecs +"""""""""""" + These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other codes generate errors. -.. % --- ASCII Codecs ------------------------------------------------------- - .. cfunction:: PyObject* PyUnicode_DecodeASCII(const char *s, Py_ssize_t size, const char *errors) Create a Unicode object by decoding *size* bytes of the ASCII encoded string @@ -707,10 +763,12 @@ object. Error handling is "strict". Return *NULL* if an exception was raised by the codec. + +Character Map Codecs +"""""""""""""""""""" + These are the mapping codec APIs: -.. % --- Character Map Codecs ----------------------------------------------- - This codec is special in that it can be used to implement many different codecs (and this is in fact what was done to obtain most of the standard codecs included in the :mod:`encodings` package). The codec uses mapping to encode and @@ -778,9 +836,11 @@ DBCS) is a class of encodings, not just one. The target encoding is defined by the user settings on the machine running the codec. -.. % --- MBCS codecs for Windows -------------------------------------------- +MBCS codecs for Windows +""""""""""""""""""""""" + .. cfunction:: PyObject* PyUnicode_DecodeMBCS(const char *s, Py_ssize_t size, const char *errors) Create a Unicode object by decoding *size* bytes of the MBCS encoded string *s*. @@ -808,22 +868,11 @@ object. Error handling is "strict". Return *NULL* if an exception was raised by the codec. -For decoding file names and other environment strings, :cdata:`Py_FileSystemEncoding` -should be used as the encoding, and ``"surrogateescape"`` should be used as the error -handler. For encoding file names during argument parsing, the ``O&`` converter should -be used, passsing PyUnicode_FSConverter as the conversion function: -.. cfunction:: int PyUnicode_FSConverter(PyObject* obj, void* result) +Methods & Slots +""""""""""""""" - Convert *obj* into *result*, using the file system encoding, and the ``surrogateescape`` - error handler. *result* must be a ``PyObject*``, yielding a bytes or bytearray object - which must be released if it is no longer used. - .. versionadded:: 3.1 - -.. % --- Methods & Slots ---------------------------------------------------- - - .. _unicodemethodsandslots: Methods and Slot Functions