This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: String allocations waste 3 bytes of memory on average.
Type: performance Stage:
Components: Interpreter Core Versions: Python 3.0, Python 2.7, Python 2.6
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: mark.dickinson Nosy List: christian.heimes, mark.dickinson, rhettinger
Priority: normal Keywords: patch

Created on 2008-11-27 12:14 by mark.dickinson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
string_alloc.patch mark.dickinson, 2008-11-28 21:44
Messages (12)
msg76495 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-11-27 12:14
There are a number of places in Objects/stringobject.c where memory is 
allocated for a string of length n using:

PyObject_MALLOC(sizeof(PyStringObject) + n)

On my computer (OS X 10.5.5/Intel), and, I suspect, on most common 
platforms, the PyStringObject struct is going to contain some number of 
bytes (probably 3) of trailing padding;  the result is that the 
PyObject_MALLOC call above asks for 3 more bytes than are necessary, and 
on average the Python interpreter will waste 3 bytes of memory per string 
allocation.  Is there any reason not to replace these calls with:

PyObject_MALLOC(offsetof(PyStringObject, ob_sval) + n + 1)

instead?

Patch attached.
msg76498 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-11-27 12:54
Updated patch: fix overflow checks to use offsetof instead of sizeof as 
well.
msg76503 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-11-27 16:57
+1 on the idea.  -1 on the current patch.  Can you encapsulate this in a
simpler macro?  I find the proposed replacement to be cryptic and
error-prone (i.e. hard to mentally verify its correctness and somewhat
likely to be screwed-up by a future maintainer).
msg76512 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-11-27 18:42
Suggestion:

#define PyStringObject_SIZE (offsetof(PyStringObject, ob_sval) + 1)


 	/* Inline PyObject_NewVar */
-	op = (PyStringObject *)PyObject_MALLOC(sizeof(PyStringObject) + size);
 	/* Inline PyObject_NewVar */
+	op = (PyStringObject *)PyObject_MALLOC(PyStringObject_SIZE + size);
msg76552 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-11-28 20:16
Yep.  That works nicely.

Here's a revised patch.
msg76553 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-11-28 20:27
Why is +1 required here? If I understand offsetof() correctly than it
returns the position of the ob_sval element. Shouldn't PyStringObject +
offsetof(PyStringObject, ob_sval) point to the first element of the
ob_svall array?
msg76554 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-11-28 20:34
The +1 is there for the trailing null byte on the string:  if s is a 
Python string with len(s) == n, then the ob_sval array needs space for n+1 
characters.
msg76555 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-11-28 20:46
Ah! I forgot the trailing \0 byte ... Thanks Mark!
msg76558 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-11-28 21:44
Hmmm.  test_sys fails on 64-bit build.
Patch updated to fix this.

All tests now pass on 32-bit and 64-bit, debug and non-debug builds.
msg76577 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-11-29 01:48
Looks much cleaner.
msg77078 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-12-05 21:56
Applied to the trunk in r67601.  Will merge to other branches if the 
buildbots look okay.
msg77137 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-12-06 15:34
Applied to py3k in r67610.
History
Date User Action Args
2022-04-11 14:56:41adminsetgithub: 48695
2008-12-06 15:34:04mark.dickinsonsetstatus: open -> closed
messages: + msg77137
2008-12-05 21:56:59mark.dickinsonsetmessages: + msg77078
2008-11-29 01:48:45rhettingersetassignee: mark.dickinson
messages: + msg76577
resolution: accepted
versions: + Python 3.0
2008-11-28 21:44:55mark.dickinsonsetfiles: - string_alloc.patch
2008-11-28 21:44:48mark.dickinsonsetfiles: + string_alloc.patch
messages: + msg76558
2008-11-28 20:46:42christian.heimessetmessages: + msg76555
2008-11-28 20:34:09mark.dickinsonsetmessages: + msg76554
2008-11-28 20:27:46christian.heimessetnosy: + christian.heimes
messages: + msg76553
2008-11-28 20:16:11mark.dickinsonsetfiles: - string_alloc.patch
2008-11-28 20:16:04mark.dickinsonsetfiles: + string_alloc.patch
messages: + msg76552
2008-11-27 18:42:24rhettingersetmessages: + msg76512
2008-11-27 16:57:32rhettingersetnosy: + rhettinger
messages: + msg76503
2008-11-27 12:54:47mark.dickinsonsetfiles: - string_alloc.patch
2008-11-27 12:54:43mark.dickinsonsetfiles: + string_alloc.patch
messages: + msg76498
2008-11-27 12:14:44mark.dickinsoncreate