This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Document PyUnicode_* API
Type: enhancement Stage: patch review
Components: Documentation Versions: Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: alexandre.vassalotti, berker.peksag, docs@python, donlorenzo, furkanonder, georg.brandl, lemburg, martin.panter, serhiy.storchaka, shihai1991, vstinner
Priority: normal Keywords: patch

Created on 2008-01-27 06:26 by alexandre.vassalotti, last changed 2022-04-11 14:56 by admin.

Files
File name Uploaded Description Edit
unicode.patch donlorenzo, 2009-04-18 15:18 docs for PyUnicodes C-API functions: FromFormat, FromFormatV, FromString, FromStringAndSize, Partition, RPartition and RSplit review
Pull Requests
URL Status Linked Edit
PR 20011 closed furkanonder, 2020-05-08 22:35
Messages (9)
msg61734 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008-01-27 06:26
I was wandering whether the pointer returned by PyUnicode_AsString needs
to be freed after usage (It turned it doesn't since the result is
cached). However, I found out that there isn't any documentation on
docs.python.org about the PyUnicode_AsString and
PyUnicode_AsStringAndSize functions. Although, both are documented in
the public unicodeobject.h header.

I notice that the documentation for several other unicode functions is
missing. Quickly, I see:

  PyUnicode_Resize
  PyUnicode_InternImmortal
  PyUnicode_GetDefaultEncoding
  PyUnicode_SetDefaultEncoding
  PyUnicode_BuildEncodingMap
  PyUnicode_FromFormatV
  PyUnicode_*UTF7*
  PyUnicode_AsEncodedObject
  PyUnicode_FromOrdinal
  PyUnicode_DecodeFSDefault
  PyUnicode_DecodeFSDefaultAndSize

It would probably be a good idea to polish up the documentation for
PyUnicode as much as possible for Python 3000, since extension
developers will certainly need to refer to it a lot during the
transition from 2.x.
msg86116 - (view) Author: Lorenz Quack (donlorenzo) * Date: 2009-04-18 12:22
In addition to the above mentioned functions I found these to be
undocumented:

  PyUnicode_DecodeUTF7
  PyUnicode_DecodeUTF7Stateful
  PyUnicode_EncodeDecimal
  PyUnicode_EncodeUTF7
  PyUnicode_FromFormat
  PyUnicode_FromString
  PyUnicode_FromStringAndSize
  PyUnicode_GetMax
  PyUnicode_Partition
  PyUnicode_RPartition
  PyUnicode_RSplit

From the original list the following functions seem to have been removed:

  PyUnicode_InternImmortal
  PyUnicode_DecodeFSDefault
  PyUnicode_DecodeFSDefaultAndSize


I try to put together a patch for some of these during the weekend.
msg86121 - (view) Author: Lorenz Quack (donlorenzo) * Date: 2009-04-18 15:18
Ok, here is my shot at a patch for at least some of the undocumented
functions. Namely the following functions are being documented in the patch:

  PyUnicode_FromFormat
  PyUnicode_FromFormatV
  PyUnicode_FromString
  PyUnicode_FromStringAndSize
  PyUnicode_Partition
  PyUnicode_RPartition
  PyUnicode_RSplit

Please thoroughly review this patch since I didn't really digg into the
source to find out what the functions do but rather just copied old
PyString documentation or derived it from the docs for the Python API.
msg89100 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2009-06-08 19:06
The patch looks alright. I don't like the documentation for
PyUnicode_FromFormatV, however. Here's my attempt to document it:

.. cfunction:: PyObject* PyUnicode_FromFormatV(const char *format,
va_list vargs)

   Equivalent to the function :cfunc:`PyUnicode_FromFormat`, except that
it takes a va_list instead of variable number of arguments.
msg185552 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2013-03-30 12:09
Is it worth applying the patch given the complete rewrite of unicode for 3.3 via PEP393?
msg185725 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-01 10:40
On 30.03.2013 13:09, Mark Lawrence wrote:
> 
> Is it worth applying the patch given the complete rewrite of unicode for 3.3 via PEP393?

PEP 393 only changed the way Unicode is internally stored.
The Unicode API is mostly unaffected by this change.
msg264571 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-04-30 18:39
Remaining undocumented functions:

From this issue:

PyUnicode_RSplit
PyUnicode_Partition
PyUnicode_RPartition

From issue 10435:

PyUnicode_IsIdentifier
PyUnicode_Append
PyUnicode_AppendAndDel
PyUnicode_GetDefaultEncoding
PyUnicode_FromOrdinal
PyUnicode_Resize
PyUnicode_GetMax
PyUnicode_InternImmortal
PyUnicode_CHECK_INTERNED

From issue 18688:

Py_UNICODE_REPLACEMENT_CHARACTER
PyUnicodeIter_Type
PyUnicode_AsDecodedObject
PyUnicode_AsDecodedUnicode
PyUnicode_AsEncodedObject
PyUnicode_AsEncodedUnicode
PyUnicode_BuildEncodingMap
msg264575 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-30 19:38
PyUnicode_DecodeCodePageStateful

Following functions likely should be wrapped with "#ifndef Py_LIMITED_API":

_PyUnicode_ClearStaticStrings
_PyUnicode_EQ
_PyUnicode_FromId
msg368582 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-10 10:10
"""
Following functions likely should be wrapped with "#ifndef Py_LIMITED_API":

_PyUnicode_ClearStaticStrings
_PyUnicode_EQ
_PyUnicode_FromId
"""

It's already the case since at least Python 3.7. Extract of Python 3.7 Include/unicodeobject.h:

#ifndef Py_LIMITED_API
/* Return an interned Unicode object for an Identifier; may fail if there is no memory.*/
PyAPI_FUNC(PyObject*) _PyUnicode_FromId(_Py_Identifier*);
/* Clear all static strings. */
PyAPI_FUNC(void) _PyUnicode_ClearStaticStrings(void);

/* Fast equality check when the inputs are known to be exact unicode types
   and where the hash values are equal (i.e. a very probable match) */
PyAPI_FUNC(int) _PyUnicode_EQ(PyObject *, PyObject *);
#endif /* !Py_LIMITED_API */
History
Date User Action Args
2022-04-11 14:56:30adminsetgithub: 46236
2020-06-21 12:39:22shihai1991setnosy: + shihai1991
2020-05-10 10:10:52vstinnersetnosy: + vstinner
messages: + msg368582
2020-05-08 22:35:10furkanondersetnosy: + furkanonder

pull_requests: + pull_request19322
stage: needs patch -> patch review
2016-04-30 19:38:01serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg264575
2016-04-30 18:41:21berker.peksaglinkissue18688 superseder
2016-04-30 18:40:27berker.peksaglinkissue10435 superseder
2016-04-30 18:39:35berker.peksagsettitle: Documentation for PyUnicode_AsString (et al.) missing. -> Document PyUnicode_* API

keywords: - easy
nosy: + berker.peksag
versions: + Python 3.5, Python 3.6, - Python 3.1, Python 2.7, Python 3.2
messages: + msg264571
stage: needs patch
2014-02-03 17:02:32BreamoreBoysetnosy: - BreamoreBoy
2013-08-26 14:16:22martin.pantersetnosy: + martin.panter
2013-04-01 10:40:56lemburgsetmessages: + msg185725
2013-03-30 12:09:37BreamoreBoysetnosy: + BreamoreBoy
messages: + msg185552
2010-09-20 13:59:36BreamoreBoysetassignee: georg.brandl -> docs@python

nosy: + docs@python
2010-08-07 18:08:24terry.reedysetversions: + Python 3.2, - Python 2.6, Python 3.0
2010-04-03 07:25:34georg.brandlsetassignee: georg.brandl
2009-06-08 19:06:43alexandre.vassalottisetmessages: + msg89100
2009-04-20 11:05:39pitrousetnosy: + lemburg, georg.brandl
2009-04-18 15:18:46donlorenzosetfiles: + unicode.patch
keywords: + patch
messages: + msg86121
2009-04-18 12:22:55donlorenzosetnosy: + donlorenzo

messages: + msg86116
versions: + Python 2.6, Python 3.1, Python 2.7
2008-01-27 14:55:27christian.heimessetpriority: normal
type: enhancement
2008-01-27 06:26:42alexandre.vassalotticreate