This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Unsupported provider

classification
Title: Document unicode C-API in reST
Type: Stage: resolved
Components: Documentation Versions: Python 3.2
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Document PyUnicode_* API
View: 1944
Assigned To: belopolsky Nosy List: BreamoreBoy, belopolsky, berker.peksag, ezio.melotti, hodgestar, lemburg, loewis, vstinner
Priority: normal Keywords: patch

Created on 2010-11-16 16:16 by belopolsky, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue10435.diff belopolsky, 2010-11-16 23:58 review
issue10435a.diff belopolsky, 2010-11-17 02:58 review
Messages (21)
msg121302 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-16 16:16
The following C-APIs are only documented in comments inside unicode.h:

PyUnicode_GetMax
PyUnicode_Resize
PyUnicode_InternImmortal
PyUnicode_FromOrdinal
PyUnicode_GetDefaultEncoding
PyUnicode_AsDecodedObject
PyUnicode_AsDecodedUnicode
PyUnicode_AsEncodedObject
PyUnicode_AsEncodedUnicode
PyUnicode_BuildEncodingMap
PyUnicode_EncodeDecimal
PyUnicode_Append
PyUnicode_AppendAndDel
PyUnicode_Partition
PyUnicode_RPartition
PyUnicode_RSplit
PyUnicode_IsIdentifier
Py_UNICODE_strlen
Py_UNICODE_strcpy
Py_UNICODE_strcat
Py_UNICODE_strncpy
Py_UNICODE_strcmp
Py_UNICODE_strncmp
Py_UNICODE_strchr
Py_UNICODE_strrchr
msg121321 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-16 22:17
On Tue, Nov 16, 2010 at 10:38 AM, M.-A. Lemburg <mal@egenix.com> wrote:
> Alexander Belopolsky wrote:
..
>> I also have a similar question about C API.  Here, in absence of
>> __all__, the answer should be clear: all symbols in public header
>> files should start with either _Py_ or Py_ and those that start with
>> Py_ are public.   The question is what should be done with names that
>> start with Py_, but are not documented?  Can we add an underscore to
>> those names?  If so, should a (deprecated) alias be made available?
>> Should they be documented as deprecated?
>>
>> I think these questions can only be answered on a case by case bases
>> which choices being:
>>
>> 1. Document.
>> 2. Document as deprecated.
>> 3. Document as deprecated, add underscore prefix and retain a deprecated alias.
>> 4. Add an underscore prefix.
>>
>> The specific set of names that I would like to consider is the
>> following from unicode.h.  I am marking with (*) the names that I
>> think should be documented and with (D) those that should be
>> deprecated:
>>
>> PyUnicode_GetMax
>> PyUnicode_Resize (*)
>> PyUnicode_InternImmortal
>> PyUnicode_FromOrdinal (*)
>> PyUnicode_GetDefaultEncoding (D)
>> PyUnicode_AsDecodedObject
>> PyUnicode_AsDecodedUnicode
>> PyUnicode_AsEncodedObject
>> PyUnicode_AsEncodedUnicode
>> PyUnicode_BuildEncodingMap
>> PyUnicode_EncodeDecimal (*)
>> PyUnicode_Append (*)
>> PyUnicode_AppendAndDel (*)
>> PyUnicode_Partition (*)
>> PyUnicode_RPartition (*)
>> PyUnicode_RSplit (*)
>> PyUnicode_IsIdentifier (*)
>> Py_UNICODE_strlen
>> Py_UNICODE_strcpy
>> Py_UNICODE_strcat
>> Py_UNICODE_strncpy
>> Py_UNICODE_strcmp
>> Py_UNICODE_strncmp
>> Py_UNICODE_strchr
>> Py_UNICODE_strrchr
>
> For Unicode, unicodeobject.h defines which APIs are private or not.
> APIs which don't appear in the header file are either private or
> need to be added to the header file (but I don't think there are
> any in this category).
>
> All APIs in the header that do not appear in the documentation,
> should be added there as well. unicodeobject.h already provides
> documentation for most of the APIs you've listed above (except some
> new ones that were added later on).
>
> One API I'm not sure about is PyUnicode_AppendAndDel(). It's somewhat
> obscure and given that we already have PyUnicode_Concat(), I think
> it should be made private and eventually dropped.
>

I would also like to nominate PyUnicode_AsEncodedObject and PyUnicode_AsEncodedUnicode.  The later is a particularly attractive candidate for removal because it appears to be broken:

    v = PyCodec_Encode(unicode, encoding, errors);
    if (v == NULL)
        goto onError;
    if (!PyUnicode_Check(v)) {
        PyErr_Format(PyExc_TypeError,
                     "encoder did not return an str object (type=%.400s)",
                     Py_TYPE(v)->tp_name);

Since PyCodec_Encode() returns bytes in 3.x, the code above will always raise an error.
msg121323 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-16 22:45
PyUnicode_AsDecodedObject() and PyUnicode_AsDecodedUnicode() appear to be broken as well: both start with a PyUnicode_Check(unicode) and then pass unicode to PyCodec_Decode() which expects bytes.
msg121325 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-11-16 22:54
Please note that PyCodec_Encode()/PyCodec_Decode() will return whatever the codec returns for these operations.

The codec system is not limited to converting between Unicode and bytes only.

A typical example is a same-type codec such as rot13 that only transforms Unicode data.
msg121326 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-16 23:14
On Tue, Nov 16, 2010 at 5:54 PM, Marc-Andre Lemburg
<report@bugs.python.org> wrote:
>
> Marc-Andre Lemburg <mal@egenix.com> added the comment:
>
> Please note that PyCodec_Encode()/PyCodec_Decode() will return whatever the codec returns for these operations.
>
> The codec system is not limited to converting between Unicode and bytes only.

Not according to the latest reST documentation:

"""
* Encoding converts a string object to a bytes object using a
particular character set encoding (e.g., cp1252 or iso-8859-1).

* Decoding converts a bytes object encoded using a particular
character set encoding to a string object.
""" http://docs.python.org/dev/library/codecs.html?highlight=codecs#codecs.Codec.encode

> A typical example is a same-type codec such as rot13 that only transforms Unicode data.

I thought rot13 would only transform English (or Latin) alphabet.
msg121328 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-16 23:58
Attached patch documents all previously undocumented unicode C API functions.  Note that for the PyUnicode_As{En,De}codedObject() and PyUnicode_As{En,De}DecodedUnicode() functions I attempted to capture what they are supposed to do rather than what the current implementation does.
msg121330 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-11-17 00:19
Alexander Belopolsky wrote:
> 
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
> 
> On Tue, Nov 16, 2010 at 5:54 PM, Marc-Andre Lemburg
> <report@bugs.python.org> wrote:
>>
>> Marc-Andre Lemburg <mal@egenix.com> added the comment:
>>
>> Please note that PyCodec_Encode()/PyCodec_Decode() will return whatever the codec returns for these operations.
>>
>> The codec system is not limited to converting between Unicode and bytes only.
> 
> Not according to the latest reST documentation:
> 
> """
> * Encoding converts a string object to a bytes object using a
> particular character set encoding (e.g., cp1252 or iso-8859-1).
> 
> * Decoding converts a bytes object encoded using a particular
> character set encoding to a string object.
> """ http://docs.python.org/dev/library/codecs.html?highlight=codecs#codecs.Codec.encode

That's another documentation bug, then. The codec system has always
supported other type combinations for encoding/decoding as well.

Only certain methods on str and bytes objects in 3.x limit the possible
types to either str or bytes - which probably results in the
idea that Python codecs don't support anything else.

The text from the 2.7 documentation is correct, also for 3.x:

http://docs.python.org/library/codecs.html#codec-objects

>> A typical example is a same-type codec such as rot13 that only transforms Unicode data.
> 
> I thought rot13 would only transform English (or Latin) alphabet.

Right, everything else passes through as-is.

Other examples are codecs that escape certain code points using e.g.
XML entity sequences, backslash notations or other such techniques.

For bytes, you have the zip, base64 and hex codecs which work in
a similar way.
msg121331 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-17 00:25
On Tue, Nov 16, 2010 at 7:19 PM, Marc-Andre Lemburg
<report@bugs.python.org> wrote:
..
>> * Decoding converts a bytes object encoded using a particular
>> character set encoding to a string object.
>> """ http://docs.python.org/dev/library/codecs.html?highlight=codecs#codecs.Codec.encode
>
> That's another documentation bug, then. The codec system has always
> supported other type combinations for encoding/decoding as well.
>
> Only certain methods on str and bytes objects in 3.x limit the possible
> types to either str or bytes - which probably results in the
> idea that Python codecs don't support anything else.
>
> The text from the 2.7 documentation is correct, also for 3.x:
>
> http://docs.python.org/library/codecs.html#codec-objects
>

I agree and will handle this in #10435 because codecs.h
(unsurprisingly) supports your POV and we don't want C-API docs to be
in conflict with Py-API docs.

If you have time, please take a look at
PyUnicode_As{En,De}codedObject() and
PyUnicode_As{En,De}DecodedUnicode() documentation in the attached
patch.
msg121332 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-17 01:11
> I agree and will handle this in #10435 because codecs.h

s/#10435/#10439/
msg121335 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-17 02:58
It looks like I misunderstood what PyUnicode_As{En,De}codedObject() and
PyUnicode_As{En,De}codedUnicode() functions are designed to do.  Attaching a corrected patch, issue10435a.diff.
msg121371 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-11-17 18:04
Alexander Belopolsky wrote:
> 
> If you have time, please take a look at
> PyUnicode_As{En,De}codedObject() and
> PyUnicode_As{En,De}DecodedUnicode() documentation in the attached
> patch.

Thanks. I'll try to have a look later tonight.
msg121386 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-11-17 22:20
Thanks for your work on this.

Please see my comments below:

--- Include/unicodeobject.h	(revision 86478)
+++ Include/unicodeobject.h	(working copy)
@@ -737,7 +737,7 @@
     const char *errors          /* error handling */
     );
 
-/* Encodes a Unicode object and returns the result as Python string
+/* Encodes a Unicode object and returns the result as Python bytes
    object. */
 

PyUnicode_AsEncodedObject() encodes the Unicode object to
whatever the codec returns, so the "bytes" is wrong in the
above line.


--- Doc/c-api/unicode.rst	(revision 86477)
+++ Doc/c-api/unicode.rst	(working copy)
@@ -528,7 +567,22 @@
    using the Python codec registry.  Return *NULL* if an exception was raised by
    the codec.
 
+.. c:function:: PyObject* PyUnicode_AsDecodedObject(PyObject *unicode, const char *encoding, const char *errors)
 
+   Create a Unicode object by decoding the encoded Unicode object
+   *unicode*.

The function does not guarantee that a Unicode object will be
returned. It merely passes a Unicode object to a codec's
decode function and returns whatever the codec returns.

+   *encoding* and *errors* have the same meaning as the
+   parameters of the same name in the :func:`unicode` built-in
+   function.  The codec to be used is looked up using the Python codec
+   registry.  Return *NULL* if an exception was raised by the codec.
+   Note that Python codecs do not accept Unicode objects for decoding,
+   so this method is only useful with user or 3rd party codecs.

Please strike the last sentence. The codecs that were wrongly removed
from Python3 will get added back and provide such functionality.

+.. c:function:: PyObject* PyUnicode_AsEncodedObject(PyObject *unicode, const char *encoding, const char *errors)
+
+   Use c:func:`PyUnicode_AsEncodedString` instead.

That's not a useful hint as PyUnicode_AsEncodedString() does something
different than PyUnicode_AsEncodedObject().

+   Same as c:func:`PyUnicode_AsEncodedString`, but without shortcuts
+   for common built-in encodings and without checking the type of the
+   object returned by encoding via the codec registry.  This method is
+   only useful with user or 3rd party codec that encodes string into
+   something other than bytes.

This should read:

   Decodes a Unicode object by passing the given Unicode object
   *unicode* to the codec for *encoding*.
   *encoding* and *errors* have the same meaning as the
   parameters of the same name in the :func:`unicode` built-in
   function.  The codec to be used is looked up using the Python codec
   registry.  Return *NULL* if an exception was raised by the codec.

+.. c:function:: PyObject* PyUnicode_AsEncodedUnicode(PyObject *unicode, const char *encoding, const char *errors)
+   
+   Use c:func:`PyUnicode_AsEncodedString` instead.

Please remove this as well.

+   Same as c:func:`PyUnicode_AsEncodedObject`, but raises
+   :exc:`TypeError` is encoding via the codec registry returns an
+   object other than string.  This method is only useful with user or
+   3rd party codec that encodes string into string.

Please remove the last sentence.

+.. c:function: int PyUnicode_EncodeDecimal(Py_UNICODE *s, Py_ssize_t length,
+                                           char *output,  const char *errors)
+
+   Takes a Unicode string holding a decimal value and writes it into
+   an output buffer using standard ASCII digit codes.
+
+   The output buffer has to provide at least length+1 bytes of storage
+   area. The output string is 0-terminated.
+
+   The encoder converts whitespace to ' ', decimal characters to their
+   corresponding ASCII digit and all other Latin-1 characters except
+   \0 as-is. Characters outside this range (Unicode ordinals 1-256)
+   are treated as errors. This includes embedded NULL bytes.
+
+   Error handling is defined by the errors argument:
+
+      NULL or "strict": raise a ValueError
+      "ignore": ignore the wrong characters (these are not copied to the
+                output buffer)
+      "replace": replaces illegal characters with '?'
+
+   Returns 0 on success, -1 on failure.
+   

+.. c:function:: void PyUnicode_Append(PyObject **pleft, PyObject *right)
+
+   Concat two strings and put the result in *pleft. Sets *pleft to
+   NULL on error.
+
+.. c:function:: void PyUnicode_AppendAndDel(PyObject **pleft, PyObject *right)
+
+   Concat two strings and put the result in *pleft and drop the right
+   object. Sets *pleft to NULL on error.
+
+

Please don't document these two obscure APIs. Instead we should
make them private functions by prepending them with an underscore.
If you look at the implementations of those two APIs, they
are little more than a macros around PyUnicode_Concat().

3rd party extensions should use PyUnicode_Concat() to achieve
the same effect.


+.. c:function:: void PyUnicode_InternImmortal(PyObject **string)
+ 
+   Use :c:func:`PyUnicode_InternInPlace` instead.
+
+   Same as :c:func:`PyUnicode_InternInPlace`, but the interned string
+   will never be released.
+

I don't think it's a good idea to make this a public API.
3rd party extensions should not need to make use of such
APIs.

Instead, we should make this a private API.
msg122155 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-22 19:00
On Wed, Nov 17, 2010 at 5:20 PM, Marc-Andre Lemburg
<report@bugs.python.org> wrote:
..
> -/* Encodes a Unicode object and returns the result as Python string
> +/* Encodes a Unicode object and returns the result as Python bytes
>    object. */
>
>
> PyUnicode_AsEncodedObject() encodes the Unicode object to
> whatever the codec returns, so the "bytes" is wrong in the
> above line.
>

The above line describes PyUnicode_AsEncodedString(), not
PyUnicode_AsEncodedObject().  The former has PyBytes_Check(v) after
calling  v = PyCodec_Encode(..).  As far as I can tell this is the
only difference that makes PyUnicode_AsEncodedObject() not redundant.

..
> +.. c:function:: PyObject* PyUnicode_AsDecodedObject(PyObject *unicode, const char *encoding, const char *errors)
>
> +   Create a Unicode object by decoding the encoded Unicode object
> +   *unicode*.
>
> The function does not guarantee that a Unicode object will be
> returned. It merely passes a Unicode object to a codec's
> decode function and returns whatever the codec returns.
>

Good point.  I am changing "Unicode object" to "Python object".

..
> +   Note that Python codecs do not accept Unicode objects for decoding,
> +   so this method is only useful with user or 3rd party codecs.
>
> Please strike the last sentence. The codecs that were wrongly removed
> from Python3 will get added back and provide such functionality.
>

Would it be acceptable to keep this note, but add "as of version 3.2"
or something like that?   I don't think there is a chance that these
codecs will be added in 3.2 given the current schedule.

..
> This should read:
>
>   Decodes a Unicode object by passing the given Unicode object
>   *unicode* to the codec for *encoding*.
>   *encoding* and *errors* have the same meaning as the
>   parameters of the same name in the :func:`unicode` built-in
>   function.  The codec to be used is looked up using the Python codec
>   registry.  Return *NULL* if an exception was raised by the codec.
>

Is the following better?

"""
    Decodes a Unicode object by passing the given Unicode object
    *unicode* to the codec for *encoding*.  *encoding* and *errors*
    have the same meaning as the parameters of the same name in the
    :func:`unicode` built-in  function. The codec to be used is
    looked up using the Python codec registry. Return *NULL* if an
    exception was raised by the codec.

    As of Python 3.2, this method is only useful with user or 3rd
    party codec that encodes string into something other than bytes.
    For encoding to bytes, use c:func:`PyUnicode_AsEncodedString`
    instead.
"""
..
>
> +.. c:function:: void PyUnicode_Append(PyObject **pleft, PyObject *right)
..
> +
> +.. c:function:: void PyUnicode_AppendAndDel(PyObject **pleft, PyObject *right)
..
>
> Please don't document these two obscure APIs. Instead we should
> make them private functions by prepending them with an underscore.
> If you look at the implementations of those two APIs, they
> are little more than a macros around PyUnicode_Concat().
>

I don't agree that they are obscure.  Python uses them in multiple
places and developers seem to know about them.  See patches submitted
to issue4113 and issue7584.

> 3rd party extensions should use PyUnicode_Concat() to achieve
> the same effect.
>

Hmm.  I would not be surprised if current 3rd party extensions used
PyUnicode_AppendAndDel() more often than PyUnicode_Concat().  (I know
that I learned about PyUnicode_AppendAndDel()  before
PyUnicode_Concat().)

Is there anything that makes PyUnicode_AppendAndDel() undesirable?   I
don't mind adding a recommendation to use PyUnicode_Concat() if there
is a practical reason for it or even a warning that
PyUnicode_AppendAndDel() may be deprecated in the future, but renaming
it to _PyUnicode_AppendAndDel() seems premature.

..
>
> I don't think it's a good idea to make this a public API.
> 3rd party extensions should not need to make use of such
> APIs.
>
> Instead, we should make this a private API.

I agree, but isn't it prudent to document it as deprecated for 3rd
party use first?
msg122215 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-11-23 13:46
Alexander Belopolsky wrote:
> 
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
> 
> On Wed, Nov 17, 2010 at 5:20 PM, Marc-Andre Lemburg
> <report@bugs.python.org> wrote:
> ..
>> -/* Encodes a Unicode object and returns the result as Python string
>> +/* Encodes a Unicode object and returns the result as Python bytes
>>    object. */
>>
>>
>> PyUnicode_AsEncodedObject() encodes the Unicode object to
>> whatever the codec returns, so the "bytes" is wrong in the
>> above line.
>>
> 
> The above line describes PyUnicode_AsEncodedString(), not
> PyUnicode_AsEncodedObject().  The former has PyBytes_Check(v) after
> calling  v = PyCodec_Encode(..).  As far as I can tell this is the
> only difference that makes PyUnicode_AsEncodedObject() not redundant.

In that case, the change is fine.

> ..
>> +.. c:function:: PyObject* PyUnicode_AsDecodedObject(PyObject *unicode, const char *encoding, const char *errors)
>>
>> +   Create a Unicode object by decoding the encoded Unicode object
>> +   *unicode*.
>>
>> The function does not guarantee that a Unicode object will be
>> returned. It merely passes a Unicode object to a codec's
>> decode function and returns whatever the codec returns.
>>
> 
> Good point.  I am changing "Unicode object" to "Python object".
> 
> ..
>> +   Note that Python codecs do not accept Unicode objects for decoding,
>> +   so this method is only useful with user or 3rd party codecs.
>>
>> Please strike the last sentence. The codecs that were wrongly removed
>> from Python3 will get added back and provide such functionality.
>>
> 
> Would it be acceptable to keep this note, but add "as of version 3.2"
> or something like that?   I don't think there is a chance that these
> codecs will be added in 3.2 given the current schedule.

Please remove the sentence or change it to:

 Note that most Python codecs only accept Unicode objects for
 decoding.

> ..
>> This should read:
>>
>>   Decodes a Unicode object by passing the given Unicode object
>>   *unicode* to the codec for *encoding*.
>>   *encoding* and *errors* have the same meaning as the
>>   parameters of the same name in the :func:`unicode` built-in
>>   function.  The codec to be used is looked up using the Python codec
>>   registry.  Return *NULL* if an exception was raised by the codec.
>>
> 
> Is the following better?
> 
> """
>     Decodes a Unicode object by passing the given Unicode object
>     *unicode* to the codec for *encoding*.  *encoding* and *errors*
>     have the same meaning as the parameters of the same name in the
>     :func:`unicode` built-in  function. The codec to be used is
>     looked up using the Python codec registry. Return *NULL* if an
>     exception was raised by the codec.
> 
>     As of Python 3.2, this method is only useful with user or 3rd
>     party codec that encodes string into something other than bytes.

Same as above.

>     For encoding to bytes, use c:func:`PyUnicode_AsEncodedString`
>     instead.
> """
> ..
>>
>> +.. c:function:: void PyUnicode_Append(PyObject **pleft, PyObject *right)
> ..
>> +
>> +.. c:function:: void PyUnicode_AppendAndDel(PyObject **pleft, PyObject *right)
> ..
>>
>> Please don't document these two obscure APIs. Instead we should
>> make them private functions by prepending them with an underscore.
>> If you look at the implementations of those two APIs, they
>> are little more than a macros around PyUnicode_Concat().
>>
> 
> I don't agree that they are obscure.  Python uses them in multiple
> places and developers seem to know about them.  See patches submitted
> to issue4113 and issue7584.

I found these references:

http://osdir.com/ml/python.python-3000.cvs/2007-11/msg00270.html

and

http://riverbankcomputing.co.uk/hg/sip/annotate/91a545605044/siplib/siplib.c

so you're right: they are already in use in the wild. Too bad...

Please add these porting notes to the documentation:

PyUnicode_Append() works like the PyString_Concat(), while
PyUnicode_AppendAndDel() works like PyString_ConcatAndDel().

>> 3rd party extensions should use PyUnicode_Concat() to achieve
>> the same effect.
>>
> 
> Hmm.  I would not be surprised if current 3rd party extensions used
> PyUnicode_AppendAndDel() more often than PyUnicode_Concat().  (I know
> that I learned about PyUnicode_AppendAndDel()  before
> PyUnicode_Concat().)

Certainly not more often. PyUnicode_Concat() has been around much
longer than the other two APIs which are only available in Python3.

> Is there anything that makes PyUnicode_AppendAndDel() undesirable?   I
> don't mind adding a recommendation to use PyUnicode_Concat() if there
> is a practical reason for it or even a warning that
> PyUnicode_AppendAndDel() may be deprecated in the future, but renaming
> it to _PyUnicode_AppendAndDel() seems premature.

Both APIs are just slight variants of the PyUnicode_Concat()
API. They change parameters in-place which is rather uncommon
for the Unicode API and don't return their result - in fact the
error reporting is somewhat broken: APIs which do in-place
modifcations usually return an integer for error reporting.
These APIs set the *pleft to NULL instead.

Finally, the naming is of PyUnicode_AppendAndDel() is not ideal.
"Del" would suggest that an object is deleted, but in reality
it is only decrefed. It is also not clear that the second argument
is affected, but not the first one.

> ..
>> [PyUnicode_InternImmortal(PyObject **p)]
>> I don't think it's a good idea to make this a public API.
>> 3rd party extensions should not need to make use of such
>> APIs.
>>
>> Instead, we should make this a private API.
> 
> I agree, but isn't it prudent to document it as deprecated for 3rd
> party use first?

I don't think that's needed in this case. The API is not used
outside Python3, it seems. If people complain in beta phase,
we can always add a deprecation function wrapper instead.
msg241743 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-04-21 20:42
I've looked at c-api/unicode.rst and I can't see any correlation between it and the names listed here in msg121302.  So either this was never completed or it's been all change in the mean time, so could somebody take a look please.
msg241746 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-04-21 21:38
Mark,

Unicode C-APIs have changed a lot since this issue was opened, but I think many of the listed functions are still present but not properly documented.

You can help by checking the Include/unicode.h file and compiling a list of functions that are there, don't start with _ and not documented in the reference manual.
msg241747 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-04-21 21:40
Sorry for the broken link, the correct header file is Include/unicodeobject.h
msg241748 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-04-21 21:43
Okay Alexander I'll give it a go, but not tonight :)
msg242295 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-04-30 23:45
List of just about everything that's in the header file but not in the rst file as I'm not sure which bits you normally wouldn't bother with.

Py_USING_UNICODE
Py_UNICODE_SIZE
Py_UNICODE_WIDE
Py_UNICODE_COPY
Py_UNICODE_FILL
Py_UNICODE_HIGH_SURROGATE
Py_UNICODE_LOW_SURROGATE
Py_UNICODE_MATCH
PyUnicode_WSTR_LENGTH
PyUnicode_AS_DATA
PyUnicode_IS_ASCII
PyUnicode_IS_COMPACT
PyUnicode_IS_COMPACT_ASCII
PyUnicode_IS_READY
Py_UNICODE_REPLACEMENT_CHARACTER 
PyUnicode_FromString
PyUnicode_GetMax
PyUnicode_Resize
PyUnicode_InternImmortal
PyUnicode_CHECK_INTERNED
PyUnicode_FromOrdinal
PyUnicode_GetDefaultEncoding
PyUnicode_AsDecodedObject
PyUnicode_AsDecodedUnicode
PyUnicode_AsEncodedObject
PyUnicode_AsEncodedUnicode
PyUnicode_BuildEncodingMap
PyUnicode_DecodeCodePageStateful
PyUnicode_EncodeDecimal
PyUnicode_Append
PyUnicode_AppendAndDel
PyUnicode_Partition
PyUnicode_RPartition
PyUnicode_RSplit
PyUnicode_IsIdentifier
Py_UNICODE_strlen
Py_UNICODE_strcpy
Py_UNICODE_strcat
Py_UNICODE_strncpy
Py_UNICODE_strcmp
Py_UNICODE_strncmp
Py_UNICODE_strchr
Py_UNICODE_strrchr
msg242498 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-05-03 18:31
Py_UNICODE_TOLOWER, Py_UNICODE_TOUPPER and Py_UNICODE_TOTITLE are all labelled deprecated in 3.3 and presumably can be removed completely.  Alternatively should these like many others be scheduled for removal in 4.0?
msg264573 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-04-30 18:40
This is a duplicate of issue 1944.
History
Date User Action Args
2022-04-11 14:57:08adminsetgithub: 54644
2016-04-30 18:40:27berker.peksagsetstatus: open -> closed

superseder: Document PyUnicode_* API

nosy: + berker.peksag
messages: + msg264573
resolution: duplicate
stage: patch review -> resolved
2015-05-03 18:31:58BreamoreBoysetmessages: + msg242498
2015-04-30 23:45:26BreamoreBoysetmessages: + msg242295
2015-04-21 21:43:59BreamoreBoysetmessages: + msg241748
2015-04-21 21:40:28belopolskysetmessages: + msg241747
2015-04-21 21:38:09belopolskysetmessages: + msg241746
2015-04-21 20:42:11BreamoreBoysetnosy: + BreamoreBoy
messages: + msg241743
2010-11-23 13:46:24lemburgsetmessages: + msg122215
2010-11-22 19:00:55belopolskysetmessages: + msg122155
2010-11-20 23:20:47belopolskylinkissue8647 superseder
2010-11-20 23:20:12belopolskylinkissue8646 superseder
2010-11-20 23:19:42belopolskylinkissue8645 superseder
2010-11-20 16:25:41hodgestarsetnosy: + hodgestar
2010-11-17 22:20:13lemburgsetmessages: + msg121386
2010-11-17 18:04:55lemburgsetmessages: + msg121371
2010-11-17 02:58:13belopolskysetfiles: + issue10435a.diff

messages: + msg121335
2010-11-17 01:11:54belopolskysetmessages: + msg121332
2010-11-17 01:04:03ezio.melottisetnosy: + ezio.melotti
2010-11-17 00:25:11belopolskysetmessages: + msg121331
2010-11-17 00:19:07lemburgsetmessages: + msg121330
2010-11-16 23:58:06belopolskysetfiles: + issue10435.diff
keywords: + patch
messages: + msg121328

stage: needs patch -> patch review
2010-11-16 23:14:55belopolskysetmessages: + msg121326
2010-11-16 22:54:42lemburgsetmessages: + msg121325
2010-11-16 22:45:32belopolskysetmessages: + msg121323
2010-11-16 22:21:43belopolskysetnosy: + lemburg, loewis, vstinner
2010-11-16 22:17:57belopolskysetmessages: + msg121321
2010-11-16 16:31:41georg.brandllinkissue9076 superseder
2010-11-16 16:23:15belopolskylinkissue8649 superseder
2010-11-16 16:16:45belopolskycreate