# HG changeset patch # Parent 3de678cd184d943f53e9bc0e74feefaa07cc7f55 Document that the UTF-8 representation is null-terminated diff -r 3de678cd184d Doc/c-api/bytearray.rst --- a/Doc/c-api/bytearray.rst Thu Dec 18 23:47:55 2014 +0100 +++ b/Doc/c-api/bytearray.rst Thu Mar 12 00:39:46 2015 +0000 @@ -64,7 +64,8 @@ .. c:function:: char* PyByteArray_AsString(PyObject *bytearray) Return the contents of *bytearray* as a char array after checking for a - *NULL* pointer. + *NULL* pointer. The returned array always has an extra + null byte appended, even when the array already contains null bytes. .. c:function:: int PyByteArray_Resize(PyObject *bytearray, Py_ssize_t len) diff -r 3de678cd184d Doc/c-api/bytes.rst --- a/Doc/c-api/bytes.rst Thu Dec 18 23:47:55 2014 +0100 +++ b/Doc/c-api/bytes.rst Thu Mar 12 00:39:46 2015 +0000 @@ -136,8 +136,9 @@ .. c:function:: char* PyBytes_AsString(PyObject *o) - Return a NUL-terminated representation of the contents of *o*. The pointer - refers to the internal buffer of *o*, not a copy. The data must not be + Return the contents of *o*. The pointer refers to the internal + buffer of *o*, which is always terminated with an extra null byte, + even when the string already contains null bytes. The data must not be modified in any way, unless the string was just created using ``PyBytes_FromStringAndSize(NULL, size)``. It must not be deallocated. If *o* is not a string object at all, :c:func:`PyBytes_AsString` returns *NULL* @@ -151,10 +152,10 @@ .. c:function:: int PyBytes_AsStringAndSize(PyObject *obj, char **buffer, Py_ssize_t *length) - Return a NUL-terminated representation of the contents of the object *obj* + Return a null-terminated representation of the contents of the object *obj* through the output variables *buffer* and *length*. - If *length* is *NULL*, the resulting buffer may not contain NUL characters; + If *length* is *NULL*, the string may not contain embedded null characters; if it does, the function returns ``-1`` and a :exc:`TypeError` is raised. The buffer refers to an internal string buffer of *obj*, not a copy. The data diff -r 3de678cd184d Doc/c-api/unicode.rst --- a/Doc/c-api/unicode.rst Thu Dec 18 23:47:55 2014 +0100 +++ b/Doc/c-api/unicode.rst Thu Mar 12 00:39:46 2015 +0000 @@ -226,9 +226,11 @@ .. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o) const char* PyUnicode_AS_DATA(PyObject *o) - Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The - ``AS_DATA`` form casts the pointer to :c:type:`const char *`. *o* has to be - a Unicode object (not checked). + Return a pointer to a :c:type:`Py_UNICODE` representation of the object. + The returned buffer is always terminated with an extra null character, + even when the string already contains null characters. + The ``AS_DATA`` form casts the pointer to :c:type:`const char *`. + The *o* argument has to be a Unicode object (not checked). .. versionchanged:: 3.3 This macro is now inefficient -- because in many cases the @@ -650,7 +652,9 @@ Copy the string *u* into a new UCS4 buffer that is allocated using :c:func:`PyMem_Malloc`. If this fails, *NULL* is returned with a - :exc:`MemoryError` set. + :exc:`MemoryError` set. The returned buffer always has an extra + null character appended, even if the string already contains + null characters. .. versionadded:: 3.3 @@ -689,7 +693,8 @@ Return a read-only pointer to the Unicode object's internal :c:type:`Py_UNICODE` buffer, or *NULL* on error. This will create the :c:type:`Py_UNICODE*` representation of the object if it is not yet - available. Note that the resulting :c:type:`Py_UNICODE` string may contain + available. The buffer is always terminated with an extra null character. + Note that the resulting :c:type:`Py_UNICODE` string may also contain embedded null characters, which would cause the string to be truncated when used in most C functions. @@ -708,7 +713,8 @@ .. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size) Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE` - array length in *size*. Note that the resulting :c:type:`Py_UNICODE*` string + array length (excluding the extra null terminator) in *size*. + Note that the resulting :c:type:`Py_UNICODE*` string may contain embedded null characters, which would cause the string to be truncated when used in most C functions. @@ -717,7 +723,7 @@ .. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode) - Create a copy of a Unicode string ending with a nul character. Return *NULL* + Create a copy of a Unicode string ending with a null character. Return *NULL* and raise a :exc:`MemoryError` exception on memory allocation failure, otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may @@ -902,10 +908,10 @@ Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*. At most *size* :c:type:`wchar_t` characters are copied (excluding a possibly trailing - 0-termination character). Return the number of :c:type:`wchar_t` characters + null termination character). Return the number of :c:type:`wchar_t` characters copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t*` - string may or may not be 0-terminated. It is the responsibility of the caller - to make sure that the :c:type:`wchar_t*` string is 0-terminated in case this is + string may or may not be null-terminated. It is the responsibility of the caller + to make sure that the :c:type:`wchar_t*` string is null-terminated in case this is required by the application. Also, note that the :c:type:`wchar_t*` string might contain null characters, which would cause the string to be truncated when used with most C functions. @@ -914,9 +920,9 @@ .. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size) Convert the Unicode object to a wide character string. The output string - always ends with a nul character. If *size* is not *NULL*, write the number - of wide characters (excluding the trailing 0-termination character) into - *\*size*. + always ends with a null character. If *size* is not *NULL*, write the number + of wide characters (excluding the trailing null termination character) + into *\*size*. Returns a buffer allocated by :c:func:`PyMem_Alloc` (use :c:func:`PyMem_Free` to free it) on success. On error, returns *NULL*, @@ -1045,9 +1051,11 @@ .. c:function:: char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size) - Return a pointer to the default encoding (UTF-8) of the Unicode object, and - store the size of the encoded representation (in bytes) in *size*. *size* - can be *NULL*, in this case no size will be stored. + Return a pointer to the UTF-8 encoding of the Unicode object, and + store the size of the encoded representation (in bytes) in *size*. The + *size* argument can be *NULL*; in this case no size will be stored. The + returned buffer always has an extra null byte appended (not included in + *size*), even if the string already contains null characters. In the case of an error, *NULL* is returned with an exception set and no *size* is stored.