classification
Title: Remove PyUnicode_AsString(), rework PyUnicode_AsStringAndSize(), add PyUnicode_AsChar()
Type:
Components: Unicode Versions: Python 3.0
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: alexandre.vassalotti, lemburg
Priority: Keywords:

Created on 2008-05-09 10:31 by lemburg, last changed 2008-05-10 14:11 by lemburg.

Messages
msg66463 (view) Author: Marc-Andre Lemburg (lemburg) Date: 2008-05-09 10:31
The API PyUnicode_AsString() is pretty useless by itself - there's
no way to access the size information of the returned string without
again going to the Unicode object.

I'd suggest to remove the API altogether and not only deprecating it.

Furthermore, the API PyUnicode_AsStringAndSize() does not follow the API
signature of PyString_AsStringAndSize() in that it passes back the
pointer to the string as output parameter. That should be changed as
well. Note that PyString_AsStringAndSize() already does this for both
8-bit strings and Unicode, so the special Unicode API is not really
needed at all or you may want to rename PyString_AsStringAndSize() to
PyUnicode_AsStringAndSize().

Finally, since there are many cases where the string buffer contents are
copied to a new buffer, it's probably worthwhile to add a new API which
does the copying straight away and also deals with the overflow cases in
a central place. I'd suggest PyUnicode_AsChar() (with an API like
PyUnicode_AsWideChar()).

(this was taken from a comment on #1950)
msg66498 (view) Author: Alexandre Vassalotti (alexandre.vassalotti) Date: 2008-05-09 22:45
Honestly, I am not sure if removing PyUnicode_AsString() is a good idea.
There is many cases where the size of the returned string is not needed.
Furthermore, this would be a rather major backward-incompatible change
to be included in a beta release.

[copied from duplicate issue #2807]
msg66526 (view) Author: Marc-Andre Lemburg (lemburg) Date: 2008-05-10 14:11
IMO, it's better to correct API design errors early, rather than going
through a deprecation process.

Note that PyUnicode_AsString() is also different than its cousind
PyString_AsString(). 

PyString_AsString() is mostly used to access the char* buffer used by
the string object in order to change it, e.g. by first constructing a
new PyString object and then filling it in by accessing the internal
char* buffer directly.

Doing the same with PyUnicode_AsString() will not work. What's worse:
direct changes would go undetected, since the UTF8 PyString object is
held by the PyUnicode object internally.

Even if you just use PyUnicode_AsString() for reading and get the size
information from somewhere else, the API doesn't make sure that the
PyUnicode object doesn't have embedded 0 code points (which
PyString_AsString() does). PyUnicode_AsString() would have to use
PyString_AsString() for this instead of the PyString_AS_STRING() macro.
History
Date User Action Args
2008-05-10 14:11:13lemburgsetmessages: + msg66526
2008-05-09 22:45:14alexandre.vassalottisetnosy: + alexandre.vassalotti
messages: + msg66498
2008-05-09 22:43:17alexandre.vassalottilinkissue2807 superseder
2008-05-09 10:31:51lemburgcreate