This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dabeaz
Recipients Arfrever, dabeaz, ezio.melotti, loewis, vstinner
Date 2012-10-16.22:07:23
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1350425243.79.0.472032673968.issue16254@psf.upfronthosting.co.za>
In-reply-to
Content
Maybe it's not a bug, but I still think it's undesirable.   Basically, you have a function that allocates a buffer, fills it with data, and allows the buffer to be destroyed.   Yet, as a side effect, it allocates a second buffer, fills it, and permanently attaches it to the original string object.  Thus it makes the size of the string object blow up to a size substantially larger than it was before with no way to reclaim memory other than to delete the whole string.

Maybe this is some sort of rare event that doesn't matter, but maybe there's some bit of C extension code that is trying to pass a wchar_t array off to some external library.  The extension writer is using the PyUnicode_AsWideCharString() function with the understanding that it creates a new array and that you have to destroy it.   They understand that it's not super fast to have to make a copy, but it's better than nothing.  What's unfortunate is that all of this attention to memory management doesn't reward the programmer as a copy gets left behind on the string object anyways.

For instance, I start with a 10 Megabyte string, I pass it through a C extension function, and now the string is mysteriously using 50 Megabytes of memory.

I think the idea of filling wstr, returning it and clearing it (if originally NULL) would definitely work here.   Actually, that's exactly what I want--don't fill in the wstr member if it's not set already.  That way, it's possible for C extensions to temporarily get the wstr buffer, do something, and then toss it away without affecting the original string.

Another suggestion: An API function to simply clear wstr and the UTF-8 representation could also work.   Again, this is for extension writers who want to pull data out of strings, but don't want to leave these memory side effects behind.
History
Date User Action Args
2012-10-16 22:07:23dabeazsetrecipients: + dabeaz, loewis, vstinner, ezio.melotti, Arfrever
2012-10-16 22:07:23dabeazsetmessageid: <1350425243.79.0.472032673968.issue16254@psf.upfronthosting.co.za>
2012-10-16 22:07:23dabeazlinkissue16254 messages
2012-10-16 22:07:23dabeazcreate