Message 67726 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	alexandre.vassalotti, bhy, lemburg, loewis
Date	2008-06-05.20:45:32
SpamBayes Score	0.007354414
Marked as misclassified	No
Message-id	<48485068.3030700@egenix.com>
In-reply-to	<1212693280.68.0.603039245244.issue2799@psf.upfronthosting.co.za>

Content
On 2008-06-05 21:14, Alexandre Vassalotti wrote: > Alexandre Vassalotti <alexandre@peadrop.com> added the comment: > > I now think the proposed changes wouldn't be bad thing, after all. I > have been bitten myself by the confusing naming of the Unicode API. So, > there is definitely a potential for errors. > > The main problem with PyUnicode_AsString(), as Marc-André pointed out, > is it doesn't follow the API signature of the rest of the Unicode API: > > char PyUnicode_AsString(PyObject unicode); > PyObject PyUnicode_AsUTF8String(PyObject unicode); > PyObject PyUnicode_AsASCIIString(PyObject unicode); > > On the other hand, I do like the simple API of PyUnicode_AsString. Also, > I have to admit that the apparent similarity between the PyString and > the PyUnicode API helped me to port my code to Py3K when I first started > working on Python core. So, pragmatism might beat purity here. There are a few cases in the interpreter where it is indeed useful to have direct access to the buffer with the default encoded (= UTF-8 in Py3k) char* buffer. However, the naming of the API is poorly chosen, since the other PyUnicode_AsXYZ() APIs either return a PyObject* or copy the data to an output variable. How about PyUnicode_GetUTF8Buffer() or just PyUnicode_UTF8() ?! Note that the function must check the UTF-8 buffer for embedded NUL bytes and then raise an exception if it finds one. Otherwise, the API would silently cause truncations.

On 2008-06-05 21:14, Alexandre Vassalotti wrote:
> Alexandre Vassalotti <alexandre@peadrop.com> added the comment:
> 
> I now think the proposed changes wouldn't be bad thing, after all. I
> have been bitten myself by the confusing naming of the Unicode API. So,
> there is definitely a potential for errors. 
> 
> The main problem with PyUnicode_AsString(), as Marc-André pointed out,
> is it doesn't follow the API signature of the rest of the Unicode API:
> 
> char *PyUnicode_AsString(PyObject *unicode);
> PyObject *PyUnicode_AsUTF8String(PyObject *unicode);
> PyObject *PyUnicode_AsASCIIString(PyObject *unicode);
> 
> On the other hand, I do like the simple API of PyUnicode_AsString. Also,
> I have to admit that the apparent similarity between the PyString and
> the PyUnicode API helped me to port my code to Py3K when I first started
> working on Python core. So, pragmatism might beat purity here.

There are a few cases in the interpreter where it is indeed useful
to have direct access to the buffer with the default encoded (= UTF-8
in Py3k) char* buffer.

However, the naming of the API is poorly chosen, since the other
PyUnicode_AsXYZ() APIs either return a PyObject* or copy the
data to an output variable.

How about PyUnicode_GetUTF8Buffer() or just PyUnicode_UTF8() ?!

Note that the function *must* check the UTF-8 buffer for embedded
NUL bytes and then raise an exception if it finds one. Otherwise,
the API would silently cause truncations.

History
Date	User	Action	Args
2008-06-05 20:45:35	lemburg	set	spambayes_score: 0.00735441 -> 0.007354414 recipients: + lemburg, loewis, alexandre.vassalotti, bhy
2008-06-05 20:45:34	lemburg	link	issue2799 messages
2008-06-05 20:45:32	lemburg	create