Message 156901 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	pitrou, serhiy.storchaka, vstinner
Date	2012-03-27.10:34:20
SpamBayes Score	7.688175e-10
Marked as misclassified	No
Message-id	<201203271333.43861.storchaka@gmail.com>
In-reply-to	<CAMpsgwZoGoj7tb4qACLmgdMpsjt0Cd_gfNa8yBK=DAu2idzUtg@mail.gmail.com>

Content
> q is not the address of the Unicode string, but the address of the > data following the Unicode structure in memory. Strings created by > PyUnicode_New() are composed on one unique memory block: {structure, > data}. I know all that. #define _PyUnicode_COMPACT_DATA(op) \ (PyUnicode_IS_ASCII(op) ? \ ((void)((PyASCIIObject)(op) + 1)) : \ ((void)((PyCompactUnicodeObject)(op) + 1))) q is ((void)((PyASCIIObject)(op) + 1)). (PyASCIIObject)(op) + 1 is pointer to PyASCIIObject and has same alignment as PyASCIIObject. PyASCIIObject is aligned to sizeof(void ) because it starts with void * field. Consequently, q is aligned to sizeof(void ). It does not depend on the number and the size of the fields in PyASCIIObject, except for the first one. Of course, if _PyUnicode_COMPACT_DATA definition is changed, it will cease to be true. Then apply my first patch, which may be a bit less effective for short strings (performance for short strings is bad measureable through Python). However, for short strings, we can put a size limit: if (size >= 2 SIZEOF_LONG && ((size_t) p & LONG_PTR_MASK) == ((size_t) q & LONG_PTR_MASK)) {

> q is not the address of the Unicode string, but the address of the
> data following the Unicode structure in memory. Strings created by
> PyUnicode_New() are composed on one unique memory block: {structure,
> data}.

I know all that.

#define _PyUnicode_COMPACT_DATA(op)                     \
    (PyUnicode_IS_ASCII(op) ?                   \
     ((void*)((PyASCIIObject*)(op) + 1)) :              \
     ((void*)((PyCompactUnicodeObject*)(op) + 1)))

q is ((void*)((PyASCIIObject*)(op) + 1)). (PyASCIIObject*)(op) + 1 is pointer to PyASCIIObject and has same alignment as PyASCIIObject. PyASCIIObject is aligned to sizeof(void *) 
because it starts with void * field. Consequently, q is aligned to sizeof(void *). It does not depend on the number and the size of the fields in PyASCIIObject, except for the 
first one.

Of course, if _PyUnicode_COMPACT_DATA definition is changed, it will cease to be true. Then apply my first patch, which may be a bit less effective for short strings 
(performance for short strings is bad measureable through Python). However, for short strings, we can put a size limit:

if (size >= 2 * SIZEOF_LONG && ((size_t) p & LONG_PTR_MASK) == ((size_t) q & LONG_PTR_MASK)) {

History
Date	User	Action	Args
2012-03-27 10:34:21	serhiy.storchaka	set	recipients: + serhiy.storchaka, pitrou, vstinner
2012-03-27 10:34:21	serhiy.storchaka	link	issue14419 messages
2012-03-27 10:34:20	serhiy.storchaka	create