Message67726
On 2008-06-05 21:14, Alexandre Vassalotti wrote:
> Alexandre Vassalotti <alexandre@peadrop.com> added the comment:
>
> I now think the proposed changes wouldn't be bad thing, after all. I
> have been bitten myself by the confusing naming of the Unicode API. So,
> there is definitely a potential for errors.
>
> The main problem with PyUnicode_AsString(), as Marc-André pointed out,
> is it doesn't follow the API signature of the rest of the Unicode API:
>
> char *PyUnicode_AsString(PyObject *unicode);
> PyObject *PyUnicode_AsUTF8String(PyObject *unicode);
> PyObject *PyUnicode_AsASCIIString(PyObject *unicode);
>
> On the other hand, I do like the simple API of PyUnicode_AsString. Also,
> I have to admit that the apparent similarity between the PyString and
> the PyUnicode API helped me to port my code to Py3K when I first started
> working on Python core. So, pragmatism might beat purity here.
There are a few cases in the interpreter where it is indeed useful
to have direct access to the buffer with the default encoded (= UTF-8
in Py3k) char* buffer.
However, the naming of the API is poorly chosen, since the other
PyUnicode_AsXYZ() APIs either return a PyObject* or copy the
data to an output variable.
How about PyUnicode_GetUTF8Buffer() or just PyUnicode_UTF8() ?!
Note that the function *must* check the UTF-8 buffer for embedded
NUL bytes and then raise an exception if it finds one. Otherwise,
the API would silently cause truncations. |
|
Date |
User |
Action |
Args |
2008-06-05 20:45:35 | lemburg | set | spambayes_score: 0.00735441 -> 0.007354414 recipients:
+ lemburg, loewis, alexandre.vassalotti, bhy |
2008-06-05 20:45:34 | lemburg | link | issue2799 messages |
2008-06-05 20:45:32 | lemburg | create | |
|