Message156901
> q is not the address of the Unicode string, but the address of the
> data following the Unicode structure in memory. Strings created by
> PyUnicode_New() are composed on one unique memory block: {structure,
> data}.
I know all that.
#define _PyUnicode_COMPACT_DATA(op) \
(PyUnicode_IS_ASCII(op) ? \
((void*)((PyASCIIObject*)(op) + 1)) : \
((void*)((PyCompactUnicodeObject*)(op) + 1)))
q is ((void*)((PyASCIIObject*)(op) + 1)). (PyASCIIObject*)(op) + 1 is pointer to PyASCIIObject and has same alignment as PyASCIIObject. PyASCIIObject is aligned to sizeof(void *)
because it starts with void * field. Consequently, q is aligned to sizeof(void *). It does not depend on the number and the size of the fields in PyASCIIObject, except for the
first one.
Of course, if _PyUnicode_COMPACT_DATA definition is changed, it will cease to be true. Then apply my first patch, which may be a bit less effective for short strings
(performance for short strings is bad measureable through Python). However, for short strings, we can put a size limit:
if (size >= 2 * SIZEOF_LONG && ((size_t) p & LONG_PTR_MASK) == ((size_t) q & LONG_PTR_MASK)) { |
|
Date |
User |
Action |
Args |
2012-03-27 10:34:21 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka, pitrou, vstinner |
2012-03-27 10:34:21 | serhiy.storchaka | link | issue14419 messages |
2012-03-27 10:34:20 | serhiy.storchaka | create | |
|