Message 173101 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	loewis
Recipients	dabeaz, ezio.melotti, loewis
Date	2012-10-16.21:35:23
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1350423323.27.0.367484867962.issue16254@psf.upfronthosting.co.za>
In-reply-to

Content
As stated, this is not a bug: there is no memory leak, nor any deviation from documented behavior. You are right that it fills the wstr pointer, by calling PyUnicode_AsUnicodeAndSize in unicode_aswidechar, and then copying the data to a fresh buffer. This is merely the simplest implementation; it's certainly possible to improve it. Contributions are welcome. A number of things need to be considered: - Computing the wstr size is somewhat expensive if on a 16-bit wchar_t system, since the result may need surrogate pairs. - I would suggest that if possible, the wstr representation should be returned out of the unicode object (resetting wstr to NULL). This should produce the greatest reuse in code, yet avoid unnecessary copying. - It's not possible to do so for strings where wstr is shared with the canonical representation (i.e. a UCS-2 string on 16-bit wchar_t, and a UCS-4 string on 32-bit wchar_t). - I don't think wstr should be cleared if it was already filled when the function got called. Instead, wstr should only be returned if it was originally NULL.

As stated, this is not a bug: there is no memory leak, nor any deviation from documented behavior.

You are right that it fills the wstr pointer, by calling PyUnicode_AsUnicodeAndSize in unicode_aswidechar, and then copying the data to a fresh buffer.

This is merely the simplest implementation; it's certainly possible to improve it. Contributions are welcome.

A number of things need to be considered:
- Computing the wstr size is somewhat expensive if on a 16-bit wchar_t system, since the result may need surrogate pairs.
- I would suggest that if possible, the wstr representation should be returned out of the unicode object (resetting wstr to NULL). This should produce the greatest reuse in code, yet avoid unnecessary copying.
- It's not possible to do so for strings where wstr is shared with the canonical representation (i.e. a UCS-2 string on 16-bit wchar_t, and a UCS-4 string on 32-bit wchar_t).
- I don't think wstr should be cleared if it was already filled when the function got called. Instead, wstr should only be returned if it was originally NULL.

History
Date	User	Action	Args
2012-10-16 21:35:23	loewis	set	recipients: + loewis, ezio.melotti, dabeaz
2012-10-16 21:35:23	loewis	set	messageid: <1350423323.27.0.367484867962.issue16254@psf.upfronthosting.co.za>
2012-10-16 21:35:23	loewis	link	issue16254 messages
2012-10-16 21:35:23	loewis	create