This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Document that PyUnicode_AsUTF8() returns a null-terminated string
Type: behavior Stage: resolved
Components: Documentation Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, martin.panter, pitrou, python-dev, r.david.murray, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2014-12-19 04:42 by martin.panter, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
utf8-null.patch martin.panter, 2014-12-19 04:42 review
utf8-null.v2.patch martin.panter, 2015-03-10 10:47 review
utf8-null.v3.patch martin.panter, 2015-03-12 00:51 review
utf8-null.v4.patch martin.panter, 2015-03-31 04:11 review
utf8-null.v5.patch martin.panter, 2015-05-13 12:08 review
Messages (15)
msg232925 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-12-19 04:42
As discussed in msg232863, and later confirmed in the code
msg233028 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-12-22 23:00
This looks good to me.
msg236878 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-28 11:33
May be mention that the result of PyUnicode_AsUTF8() can contain null bytes? And the same for PyBytes_AS_STRING()/PyBytes_AsString()?
msg237743 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-10 10:47
Posting a new patch that says that the NUL is always appended for both Unicode and Bytes, and explicitly says that internal NULs are allowed.
msg237747 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-10 11:16
There are other functions that returns null terminated data: PyByteArray_AsString(), PyBytes_AsStringAndSize(), PyUnicode_AS_UNICODE(), PyUnicode_AsUCS4Copy() PyUnicode_AsUnicode(), PyUnicode_AsUnicodeAndSize(), PyUnicode_AsWideCharString() and may be more. See also examples of notes about embedded null characters.

And for consistency with all other documentation this should be written as "null byte/character", not NUL.
msg237750 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-10 11:27
Serhiy Storchaka added the comment:
> And for consistency with all other documentation this should be written as "null byte/character", not NUL.

Agreed!
msg237751 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-10 11:29
Yes, and for agreement with Victor. ;-)
msg237909 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-12 00:51
Posting a new patch updating the documentation for some of the extra functions Serhiy mentioned. Also changed references of “NUL”, “nul” and “0” characters to “null”. I’m not very familiar with Python’s C API, so I am mainly relying on what you guys say without much of my own verification. But if there are other related doc fixes you can think of, I’m happy to include them.

The PyUnicode_AsWideCharString() function already seems to document null termination well enough, so I did not change it. Let me know if you had a specific change in mind.
msg239661 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-31 04:11
utf8-null.v4.patch:

* Clarified some mentions of “string” and “character” as bytes or code points
* Copied the warning about embedded nulls to PyUnicode_AS_UNICODE()
msg239670 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-31 05:49
The patch LGTM, but someone other should look on it. David, could you please make a look?
msg242418 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-02 17:53
Added some review comments.
msg243076 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-13 12:08
Thanks for looking at this David. I am posting utf8-null.v5.patch, which tweaks some of the wording.
msg243132 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-05-14 00:32
New changeset 99d2f83290c0 by R David Murray in branch '3.4':
#23088: Clarify null termination of bytes and strings in C API.
https://hg.python.org/cpython/rev/99d2f83290c0

New changeset 863f7c57081b by R David Murray in branch 'default':
Merge: #23088: Clarify null termination of bytes and strings in C API.
https://hg.python.org/cpython/rev/863f7c57081b
msg243134 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-14 00:35
Oh, I just realized I committed this without checking how it rendered...
msg243135 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-14 00:41
OK, I didn't see anything obvious at least :)

Thanks, Martin.
History
Date User Action Args
2022-04-11 14:58:11adminsetgithub: 67277
2015-05-14 00:41:12r.david.murraysetstatus: open -> closed
resolution: fixed
messages: + msg243135

stage: commit review -> resolved
2015-05-14 00:35:21r.david.murraysetmessages: + msg243134
2015-05-14 00:32:33python-devsetnosy: + python-dev
messages: + msg243132
2015-05-13 12:08:49martin.pantersetfiles: + utf8-null.v5.patch

messages: + msg243076
2015-05-02 17:53:42r.david.murraysetmessages: + msg242418
2015-03-31 05:49:03serhiy.storchakasetnosy: + r.david.murray
messages: + msg239670
2015-03-31 04:11:11martin.pantersetfiles: + utf8-null.v4.patch

messages: + msg239661
2015-03-12 00:51:22martin.pantersetfiles: + utf8-null.v3.patch
2015-03-12 00:51:09martin.pantersetmessages: + msg237909
2015-03-10 11:29:31serhiy.storchakasetmessages: + msg237751
2015-03-10 11:27:57vstinnersetmessages: + msg237750
2015-03-10 11:16:40serhiy.storchakasetmessages: + msg237747
2015-03-10 10:47:52martin.pantersetfiles: + utf8-null.v2.patch

messages: + msg237743
2015-02-28 11:33:06serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg236878
2014-12-22 23:00:53pitrousetversions: + Python 3.5
nosy: + pitrou

messages: + msg233028

type: behavior
stage: commit review
2014-12-20 22:33:25vstinnersetnosy: + vstinner
2014-12-19 04:42:34martin.pantercreate