Message 173089 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	dabeaz
Recipients	dabeaz, ezio.melotti
Date	2012-10-16.20:14:08
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1350418448.6.0.265249549125.issue16254@psf.upfronthosting.co.za>
In-reply-to

Content
The PyUnicode_AsWideCharString() function is described as creating a new buffer of type wchar_t allocated by PyMem_Alloc() (which must be freed by the user). However, if you use this function, it causes the size of the original string object to permanently increase. For example, suppose you had some extension code like this: static PyObject py_receive_wchar(PyObject self, PyObject args) { PyObject obj; wchar_t s; Py_ssize_t len; if (!PyArg_ParseTuple(args, "U", &obj)) { return NULL; } if ((s = PyUnicode_AsWideCharString(obj, &len)) == NULL) { return NULL; } / Do nothing / PyMem_Free(s); Py_RETURN_NONE; } Now, try an experiment (assume that the above extension function is available as 'receive_wchar'). >>> s = "Hell"1000 >>> len(s) 4000 >>> import sys >>> sys.getsizeof(s) 4049 >>> receive_wchar(s) >>> sys.getsizeof(s) 20053 >>> It seems that PyUnicode_AsWideCharString() may be filling in the wstr field of the associated PyASCIIObject structure from PEP393 (I haven't verified). Once filled, it never seems to be discarded. Background: I am trying to figure out how to convert from Unicode to (wchar_t, int *) that doesn't cause a permanent increase in the memory footprint of the original Unicode object. Also, I'm trying to stay away from deprecated Unicode APIs.

The PyUnicode_AsWideCharString() function is described as creating a new buffer of type wchar_t allocated by PyMem_Alloc() (which must be freed by the user).   However, if you use this function, it causes the size of the original string object to permanently increase.  For example, suppose you had some extension code like this:

static PyObject *py_receive_wchar(PyObject *self, PyObject *args) {
  PyObject *obj;
  wchar_t *s;
  Py_ssize_t len;

  if (!PyArg_ParseTuple(args, "U", &obj)) {
    return NULL;
  }
  if ((s = PyUnicode_AsWideCharString(obj, &len)) == NULL) {
    return NULL;
  }
  /* Do nothing */
  PyMem_Free(s);
  Py_RETURN_NONE;
}

Now, try an experiment (assume that the above extension function is available as 'receive_wchar'). 

>>> s = "Hell"*1000
>>> len(s)
4000
>>> import sys
>>> sys.getsizeof(s)
4049
>>> receive_wchar(s)
>>> sys.getsizeof(s)
20053
>>>

It seems that PyUnicode_AsWideCharString() may be filling in the wstr field of the associated PyASCIIObject structure from PEP393 (I haven't verified).  Once filled, it never seems to be discarded.

Background:  I am trying to figure out how to convert from Unicode to (wchar_t, int *) that doesn't cause a permanent increase in the memory footprint of the original Unicode object.  Also, I'm trying to stay away from deprecated Unicode APIs.

History
Date	User	Action	Args
2012-10-16 20:14:08	dabeaz	set	recipients: + dabeaz, ezio.melotti
2012-10-16 20:14:08	dabeaz	set	messageid: <1350418448.6.0.265249549125.issue16254@psf.upfronthosting.co.za>
2012-10-16 20:14:08	dabeaz	link	issue16254 messages
2012-10-16 20:14:08	dabeaz	create