This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients lemburg, orivej, pitrou
Date 2008-01-27.13:52:00
SpamBayes Score 5.954057e-07
Marked as misclassified No
Message-id <1201441925.84.0.708568372808.issue1943@psf.upfronthosting.co.za>
In-reply-to
Content
Your microbenchmark is biased towards your patched version. The
KEEPALIVE_SIZE_LIMIT will only cut in when you deallocate and then
reallocate Unicode objects. The free list used for Unicode objects is
also limited to 1024 objects - which isn't all that much. You could tune
 MAX_UNICODE_FREELIST_SIZE as well.

Regarding memory usage: this is difficult to measure in Python, since
pymalloc will keep memory chunks allocated even if they are not in use
by Python. However, this is a feature of pymalloc and not specific to
the Unicode implementation. It can be tuned in pymalloc. To get more
realistic memory measurements, you'd have to switch off pymalloc
altogether and then create a separate process that consumes lots of
memory to force the OS to have it allocate only memory that's really
needed to the process you're running for memory measurements. Of course,
keeping objects alive in a free list will always use more memory than
freeing them altogether and returning the memory to the OS. It's a
speed/space tradeoff. The RAM/CPU costs ratio has shifted a lot towards
RAM nowadays, so using more RAM is usually more efficient than using
more CPU time.

Regarding resize: you're right - the string object is a PyVarObject as
well and couldn't be changed at the time due to backwards compatibility
reasons. You should also note that when I added Unicode to Python 1.6,
it was a new and not commonly used type. Codecs were not used much
either, so there was no incentive to make resizing strings work better.
Later on, other optimizations were added to the Unicode implementation
that caused the PyUnicode_Resize() API to also require being able to
change the object address. Still, in the common case, it doesn't change
the object address.

The reason for using an external buffer for the Unicode object was to be
able to do further optimizations, such as share buffers between Unicode
objects. We never ended up using this, though, but there's still a lot
of room for speedups and more memory efficiency because of this design.

Like I already mentioned, PyObjects are also easier to extend at C level
- adding new variables to the object at the end is easy with PyObjects.
It's difficult for PyVarObjects, since you always have to take the
current size of the object into account and you always have to use
indirection to get at the extra variables due to the undefined offset of
the variables.

How much speedup do you get when you compare the pybench test with
KEEPALIVE_SIZE_LIMIT = 200 compared to your patched version ?
History
Date User Action Args
2008-01-27 13:52:06lemburgsetspambayes_score: 5.95406e-07 -> 5.954057e-07
recipients: + lemburg, pitrou, orivej
2008-01-27 13:52:05lemburgsetspambayes_score: 5.95406e-07 -> 5.95406e-07
messageid: <1201441925.84.0.708568372808.issue1943@psf.upfronthosting.co.za>
2008-01-27 13:52:04lemburglinkissue1943 messages
2008-01-27 13:52:01lemburgcreate