Message 121386 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	belopolsky, ezio.melotti, lemburg, loewis, vstinner
Date	2010-11-17.22:20:13
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<1290032414.97.0.560643505576.issue10435@psf.upfronthosting.co.za>
In-reply-to

Content
Thanks for your work on this. Please see my comments below: --- Include/unicodeobject.h (revision 86478) +++ Include/unicodeobject.h (working copy) @@ -737,7 +737,7 @@ const char errors / error handling / ); -/ Encodes a Unicode object and returns the result as Python string +/* Encodes a Unicode object and returns the result as Python bytes object. / PyUnicode_AsEncodedObject() encodes the Unicode object to whatever the codec returns, so the "bytes" is wrong in the above line. --- Doc/c-api/unicode.rst (revision 86477) +++ Doc/c-api/unicode.rst (working copy) @@ -528,7 +567,22 @@ using the Python codec registry. Return NULL* if an exception was raised by the codec. +.. c:function:: PyObject* PyUnicode_AsDecodedObject(PyObject unicode, const char encoding, const char errors) + Create a Unicode object by decoding the encoded Unicode object + unicode. The function does not guarantee that a Unicode object will be returned. It merely passes a Unicode object to a codec's decode function and returns whatever the codec returns. + encoding* and errors have the same meaning as the + parameters of the same name in the :func:`unicode` built-in + function. The codec to be used is looked up using the Python codec + registry. Return NULL if an exception was raised by the codec. + Note that Python codecs do not accept Unicode objects for decoding, + so this method is only useful with user or 3rd party codecs. Please strike the last sentence. The codecs that were wrongly removed from Python3 will get added back and provide such functionality. +.. c:function:: PyObject* PyUnicode_AsEncodedObject(PyObject unicode, const char encoding, const char errors) + + Use c:func:`PyUnicode_AsEncodedString` instead. That's not a useful hint as PyUnicode_AsEncodedString() does something different than PyUnicode_AsEncodedObject(). + Same as c:func:`PyUnicode_AsEncodedString`, but without shortcuts + for common built-in encodings and without checking the type of the + object returned by encoding via the codec registry. This method is + only useful with user or 3rd party codec that encodes string into + something other than bytes. This should read: Decodes a Unicode object by passing the given Unicode object unicode* to the codec for encoding. encoding and errors have the same meaning as the parameters of the same name in the :func:`unicode` built-in function. The codec to be used is looked up using the Python codec registry. Return NULL if an exception was raised by the codec. +.. c:function:: PyObject* PyUnicode_AsEncodedUnicode(PyObject unicode, const char encoding, const char errors) + + Use c:func:`PyUnicode_AsEncodedString` instead. Please remove this as well. + Same as c:func:`PyUnicode_AsEncodedObject`, but raises + :exc:`TypeError` is encoding via the codec registry returns an + object other than string. This method is only useful with user or + 3rd party codec that encodes string into string. Please remove the last sentence. +.. c:function: int PyUnicode_EncodeDecimal(Py_UNICODE s, Py_ssize_t length, + char output, const char errors) + + Takes a Unicode string holding a decimal value and writes it into + an output buffer using standard ASCII digit codes. + + The output buffer has to provide at least length+1 bytes of storage + area. The output string is 0-terminated. + + The encoder converts whitespace to ' ', decimal characters to their + corresponding ASCII digit and all other Latin-1 characters except + \0 as-is. Characters outside this range (Unicode ordinals 1-256) + are treated as errors. This includes embedded NULL bytes. + + Error handling is defined by the errors argument: + + NULL or "strict": raise a ValueError + "ignore": ignore the wrong characters (these are not copied to the + output buffer) + "replace": replaces illegal characters with '?' + + Returns 0 on success, -1 on failure. + +.. c:function:: void PyUnicode_Append(PyObject *pleft, PyObject right) + + Concat two strings and put the result in pleft. Sets pleft to + NULL on error. + +.. c:function:: void PyUnicode_AppendAndDel(PyObject *pleft, PyObject right) + + Concat two strings and put the result in pleft and drop the right + object. Sets pleft to NULL on error. + + Please don't document these two obscure APIs. Instead we should make them private functions by prepending them with an underscore. If you look at the implementations of those two APIs, they are little more than a macros around PyUnicode_Concat(). 3rd party extensions should use PyUnicode_Concat() to achieve the same effect. +.. c:function:: void PyUnicode_InternImmortal(PyObject **string) + + Use :c:func:`PyUnicode_InternInPlace` instead. + + Same as :c:func:`PyUnicode_InternInPlace`, but the interned string + will never be released. + I don't think it's a good idea to make this a public API. 3rd party extensions should not need to make use of such APIs. Instead, we should make this a private API.

Thanks for your work on this.

Please see my comments below:

--- Include/unicodeobject.h	(revision 86478)
+++ Include/unicodeobject.h	(working copy)
@@ -737,7 +737,7 @@
     const char *errors          /* error handling */
     );
 
-/* Encodes a Unicode object and returns the result as Python string
+/* Encodes a Unicode object and returns the result as Python bytes
    object. */
 

PyUnicode_AsEncodedObject() encodes the Unicode object to
whatever the codec returns, so the "bytes" is wrong in the
above line.


--- Doc/c-api/unicode.rst	(revision 86477)
+++ Doc/c-api/unicode.rst	(working copy)
@@ -528,7 +567,22 @@
    using the Python codec registry.  Return *NULL* if an exception was raised by
    the codec.
 
+.. c:function:: PyObject* PyUnicode_AsDecodedObject(PyObject *unicode, const char *encoding, const char *errors)
 
+   Create a Unicode object by decoding the encoded Unicode object
+   *unicode*.

The function does not guarantee that a Unicode object will be
returned. It merely passes a Unicode object to a codec's
decode function and returns whatever the codec returns.

+   *encoding* and *errors* have the same meaning as the
+   parameters of the same name in the :func:`unicode` built-in
+   function.  The codec to be used is looked up using the Python codec
+   registry.  Return *NULL* if an exception was raised by the codec.
+   Note that Python codecs do not accept Unicode objects for decoding,
+   so this method is only useful with user or 3rd party codecs.

Please strike the last sentence. The codecs that were wrongly removed
from Python3 will get added back and provide such functionality.

+.. c:function:: PyObject* PyUnicode_AsEncodedObject(PyObject *unicode, const char *encoding, const char *errors)
+
+   Use c:func:`PyUnicode_AsEncodedString` instead.

That's not a useful hint as PyUnicode_AsEncodedString() does something
different than PyUnicode_AsEncodedObject().

+   Same as c:func:`PyUnicode_AsEncodedString`, but without shortcuts
+   for common built-in encodings and without checking the type of the
+   object returned by encoding via the codec registry.  This method is
+   only useful with user or 3rd party codec that encodes string into
+   something other than bytes.

This should read:

   Decodes a Unicode object by passing the given Unicode object
   *unicode* to the codec for *encoding*.
   *encoding* and *errors* have the same meaning as the
   parameters of the same name in the :func:`unicode` built-in
   function.  The codec to be used is looked up using the Python codec
   registry.  Return *NULL* if an exception was raised by the codec.

+.. c:function:: PyObject* PyUnicode_AsEncodedUnicode(PyObject *unicode, const char *encoding, const char *errors)
+   
+   Use c:func:`PyUnicode_AsEncodedString` instead.

Please remove this as well.

+   Same as c:func:`PyUnicode_AsEncodedObject`, but raises
+   :exc:`TypeError` is encoding via the codec registry returns an
+   object other than string.  This method is only useful with user or
+   3rd party codec that encodes string into string.

Please remove the last sentence.

+.. c:function: int PyUnicode_EncodeDecimal(Py_UNICODE *s, Py_ssize_t length,
+                                           char *output,  const char *errors)
+
+   Takes a Unicode string holding a decimal value and writes it into
+   an output buffer using standard ASCII digit codes.
+
+   The output buffer has to provide at least length+1 bytes of storage
+   area. The output string is 0-terminated.
+
+   The encoder converts whitespace to ' ', decimal characters to their
+   corresponding ASCII digit and all other Latin-1 characters except
+   \0 as-is. Characters outside this range (Unicode ordinals 1-256)
+   are treated as errors. This includes embedded NULL bytes.
+
+   Error handling is defined by the errors argument:
+
+      NULL or "strict": raise a ValueError
+      "ignore": ignore the wrong characters (these are not copied to the
+                output buffer)
+      "replace": replaces illegal characters with '?'
+
+   Returns 0 on success, -1 on failure.
+   

+.. c:function:: void PyUnicode_Append(PyObject **pleft, PyObject *right)
+
+   Concat two strings and put the result in *pleft. Sets *pleft to
+   NULL on error.
+
+.. c:function:: void PyUnicode_AppendAndDel(PyObject **pleft, PyObject *right)
+
+   Concat two strings and put the result in *pleft and drop the right
+   object. Sets *pleft to NULL on error.
+
+

Please don't document these two obscure APIs. Instead we should
make them private functions by prepending them with an underscore.
If you look at the implementations of those two APIs, they
are little more than a macros around PyUnicode_Concat().

3rd party extensions should use PyUnicode_Concat() to achieve
the same effect.


+.. c:function:: void PyUnicode_InternImmortal(PyObject **string)
+ 
+   Use :c:func:`PyUnicode_InternInPlace` instead.
+
+   Same as :c:func:`PyUnicode_InternInPlace`, but the interned string
+   will never be released.
+

I don't think it's a good idea to make this a public API.
3rd party extensions should not need to make use of such
APIs.

Instead, we should make this a private API.

History
Date	User	Action	Args
2010-11-17 22:20:15	lemburg	set	recipients: + lemburg, loewis, belopolsky, vstinner, ezio.melotti
2010-11-17 22:20:14	lemburg	set	messageid: <1290032414.97.0.560643505576.issue10435@psf.upfronthosting.co.za>
2010-11-17 22:20:13	lemburg	link	issue10435 messages
2010-11-17 22:20:13	lemburg	create