Message 156905 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	loewis, pitrou, serhiy.storchaka, vstinner
Date	2012-03-27.11:14:15
SpamBayes Score	6.1062266e-15
Marked as misclassified	No
Message-id	<1332846858.96.0.396992270129.issue14422@psf.upfronthosting.co.za>
In-reply-to

Content
It is possible to reduce PyASCIIObject.state to 8 bits instead of 32, move it to the end (exchange wstr and state) of the structure and pack the structure. As a result, the structure size is reduced by 3 bytes (state type changes from int to char). I expect a low or not overhead on performances because only PyASCIIObject.state field is affected and this field size is 8 bits. See also the issue #14419 which relies on memory alignment (of the ASCII string data) to optimize the ASCII decoder. If I understand correctly, my patch disables the possibility of this optimization. -- Example on Linux 32 bits: $ cat x.c #include <Python.h> int main() { printf("sizeof(PyASCIIObject)=%u bytes\n", sizeof(PyASCIIObject)); printf("sizeof(PyCompactUnicodeObject)=%u bytes\n", sizeof(PyCompactUnicodeObject)); printf("sizeof(PyUnicodeObject)=%u bytes\n", sizeof(PyUnicodeObject)); return 0; } # unpatched $ gcc -I Include/ -I . x.c -o x && ./x sizeof(PyASCIIObject)=24 bytes sizeof(PyCompactUnicodeObject)=36 bytes sizeof(PyUnicodeObject)=40 bytes # pack the 3 structures $ gcc -I Include/ -I . x.c -o x && ./x sizeof(PyASCIIObject)=21 bytes sizeof(PyCompactUnicodeObject)=33 bytes sizeof(PyUnicodeObject)=37 bytes -- We might also pack PyCompactUnicodeObject and PyUnicodeObject but it would have a bad impact on performances because utf8_length, utf8, wstr_length and data would not be aligned anymore.

It is possible to reduce PyASCIIObject.state to 8 bits instead of 32, move it to the end (exchange wstr and state) of the structure and pack the structure. As a result, the structure size is reduced by 3 bytes (state type changes from int to char).

I expect a low or not overhead on performances because only PyASCIIObject.state field is affected and this field size is 8 bits.

See also the issue #14419 which relies on memory alignment (of the ASCII string data) to optimize the ASCII decoder. If I understand correctly, my patch disables the possibility of this optimization.

--

Example on Linux 32 bits:

$ cat x.c 
#include <Python.h>

int main()
{
    printf("sizeof(PyASCIIObject)=%u bytes\n", sizeof(PyASCIIObject));
    printf("sizeof(PyCompactUnicodeObject)=%u bytes\n", sizeof(PyCompactUnicodeObject));
    printf("sizeof(PyUnicodeObject)=%u bytes\n", sizeof(PyUnicodeObject));
    return 0;
}

# unpatched
$ gcc -I Include/ -I . x.c -o x && ./x
sizeof(PyASCIIObject)=24 bytes
sizeof(PyCompactUnicodeObject)=36 bytes
sizeof(PyUnicodeObject)=40 bytes

# pack the 3 structures
$ gcc -I Include/ -I . x.c -o x && ./x
sizeof(PyASCIIObject)=21 bytes
sizeof(PyCompactUnicodeObject)=33 bytes
sizeof(PyUnicodeObject)=37 bytes

--

We might also pack PyCompactUnicodeObject and PyUnicodeObject but it would have a bad impact on performances because utf8_length, utf8, wstr_length and data would not be aligned anymore.

History
Date	User	Action	Args
2012-03-27 11:14:19	vstinner	set	recipients: + vstinner, loewis, pitrou, serhiy.storchaka
2012-03-27 11:14:18	vstinner	set	messageid: <1332846858.96.0.396992270129.issue14422@psf.upfronthosting.co.za>
2012-03-27 11:14:17	vstinner	link	issue14422 messages
2012-03-27 11:14:17	vstinner	create