Message 88310 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	ajaksu2, amaury.forgeotdarc, collinwinter, ezio.melotti, jafo, lemburg, orivej, pitrou, vstinner
Date	2009-05-25.09:21:02
SpamBayes Score	1.1471666e-07
Marked as misclassified	No
Message-id	<4A1A62FD.7030701@egenix.com>
In-reply-to	<1243241145.99.0.0906306322763.issue1943@psf.upfronthosting.co.za>

Content
Amaury Forgeot d'Arc wrote: > Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment: > > Looking at the comments, it seems that the performance gain comes from > the removal of the double allocation which is needed by the current design. > > Was the following implementation considered: > - keep the current PyUnicodeObject structure > - for small strings, allocate one chunk of memory: > sizeof(PyUnicodeObject)+2length. Then set self->str=(Py_UNICODE)(self+1); > - for large strings, self->str may be allocated separately. > - unicode_dealloc() must be careful and not free self->str if it is > contiguous to the object (it's probably a good idea to reuse the > self->state field for this purpose). AFAIK, this was not yet been investigated. Note that in real life applications, you hardly ever have to call malloc on small strings - these are managed by pymalloc as pieces of larger chunks and allocation/deallocation is generally fast. You have the same situation for PyUnicodeObject itself (which, as noted earlier, could be optimized in pymalloc even further, since the size of PyUnicodeObject is fixed). The OS malloc() is only called for longer strings and then only for the string buffer itself - the PyUnicodeObject is again completly managed by pymalloc, even in this case.

Amaury Forgeot d'Arc wrote:
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:
> 
> Looking at the comments, it seems that the performance gain comes from
> the removal of the double allocation which is needed by the current design.
> 
> Was the following implementation considered:
> - keep the current PyUnicodeObject structure
> - for small strings, allocate one chunk of memory:
> sizeof(PyUnicodeObject)+2*length. Then set self->str=(Py_UNICODE*)(self+1);
> - for large strings, self->str may be allocated separately.
> - unicode_dealloc() must be careful and not free self->str if it is
> contiguous to the object (it's probably a good idea to reuse the
> self->state field for this purpose).

AFAIK, this was not yet been investigated.

Note that in real life applications, you hardly ever have to
call malloc on small strings - these are managed by pymalloc as
pieces of larger chunks and allocation/deallocation is generally
fast. You have the same situation for PyUnicodeObject itself
(which, as noted earlier, could be optimized in pymalloc even further,
since the size of PyUnicodeObject is fixed).

The OS malloc() is only called for longer strings and then only
for the string buffer itself - the PyUnicodeObject is again completly
managed by pymalloc, even in this case.

History
Date	User	Action	Args
2009-05-25 09:21:05	lemburg	set	recipients: + lemburg, collinwinter, jafo, amaury.forgeotdarc, pitrou, vstinner, ajaksu2, orivej, ezio.melotti
2009-05-25 09:21:03	lemburg	link	issue1943 messages
2009-05-25 09:21:02	lemburg	create