classification
Title: Fix misalignment in fastsearch_memchr_1char
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.4, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, haypo, larry, loewis, pitrou, python-dev, schwab, serhiy.storchaka, skrah
Priority: normal Keywords: patch

Created on 2013-11-09 16:32 by schwab, last changed 2014-03-24 22:08 by BreamoreBoy. This issue is now closed.

Files
File name Uploaded Description Edit
pyasciiobject-align.patch schwab, 2014-03-23 16:44 Add explicit padding to PyASCIIObject
Messages (28)
msg202487 - (view) Author: Andreas Schwab (schwab) * Date: 2013-11-09 16:32
sizeof(foo) is never a good approximation for the alignment of foo
msg202537 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-10 17:11
Current code assumes that PyUnicode_DATA() is aligned to PyUnicode_KIND() bytes. If this is not true on some platform it will be easer to add a padding than rewrite a code. A lot of code depends on this assumption. See also issue14422.

I afraid that proposed patch may slow down a search.
msg202541 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-11-10 17:59
I don't understand in which concrete situation the current code could be wrong. The start of the unicode string should always be aligned, due to how unicode objects are allocated.
msg202551 - (view) Author: Andreas Schwab (schwab) * Date: 2013-11-10 19:03
(gdb) p sizeof(PyASCIIObject)
$1 = 22
msg202554 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-11-10 19:11
Andreas Schwab <report@bugs.python.org> wrote:
> (gdb) p sizeof(PyASCIIObject)
> $1 = 22

m68k again? ;)
msg202573 - (view) Author: Andreas Schwab (schwab) * Date: 2013-11-10 20:50
There is nothing wrong with that.
msg204358 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-11-25 16:41
> (gdb) p sizeof(PyASCIIObject)
> $1 = 22

If the only issue is that the size should be 24 instead of 22,
could we perhaps add an unused uint16_t to the struct just
for this architecture?
msg204367 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-25 17:14
What is sizeof(PyUnicodeObject)?
msg204403 - (view) Author: Andreas Schwab (schwab) * Date: 2013-11-25 20:40
(gdb) p sizeof(PyUnicodeObject)
$1 = 38
msg206249 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-15 19:54
Antoine, if sizeof(PyUnicodeObject) is 38, the start of UCS4 unicode string is not aligned. This should be fixed.
msg206250 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-12-15 19:59
> Antoine, if sizeof(PyUnicodeObject) is 38, the start of UCS4 unicode
> string is not aligned. This should be fixed.

Probably. Does anyone want to propose a patch?
msg206253 - (view) Author: Andreas Schwab (schwab) * Date: 2013-12-15 20:20
How about adding explicit padding to the bitfield in PyASCIIObject?
msg206254 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-12-15 20:21
> How about adding explicit padding to the bitfield in PyASCIIObject?

That sounds ok to me, but it must be explicitly for 68k (or use a
sufficiently smart scheme not to affect already aligned architectures).
msg206285 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-12-16 09:21
If you compile Python with GCC, we can maybe try something with __attribute__ ((aligned (sizeof(void *)))) attribute. The attribute can be used on a structure field. The problem is that we don't care of the alignment of header attributes, only of data, but data is not a field but the data just after the structure.

Or is it possible to GCC to get a structure size aligned on 4 bytes?

For the explicit padding: how do you compute the size of the padding?

How about disabling the fast-path in FASTSEARCH if data is not aligned?

(You might get non-aligned if even if structure is correctly aligned, it may happen if the memory allocator does not align memory blocks. I don't know if Python may get "unaligned" memory blocks.)
msg206311 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-16 13:52
I think that adding __attribute__ ((aligned (sizeof(void *)))) to the PyObject type (or to the ob_base field) is enough.

If first byte of structure is aligned, then first byte past the structure should be aligned too.
msg214603 - (view) Author: Andreas Schwab (schwab) * Date: 2014-03-23 16:44
The attached patch has been tested on {i586,x86_64,ppc,ppc64,ppc64le,armv6hl,armv7hl,aarch64,m68k}-suse-linux.
msg214634 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-23 20:53
Are unnamed struct fields actually valid C?
msg214635 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-03-23 21:11
Yes, ยง6.7.2.1:

A bit-field declaration with no declarator, but only a colon and a width, indicates an unnamed bit-field.

With a footnote:

105) An unnamed bit-field structure member is useful for padding to conform to externally imposed layouts.


Unnamed struct members are valid, too:

There may be unnamed padding at the end of a structure or union.
msg214639 - (view) Author: Roundup Robot (python-dev) Date: 2014-03-23 21:55
New changeset 004ae1472a43 by Antoine Pitrou in branch '3.4':
Issue #19537: Fix PyUnicode_DATA() alignment under m68k.  Patch by Andreas Schwab.
http://hg.python.org/cpython/rev/004ae1472a43

New changeset 0128b25068de by Antoine Pitrou in branch 'default':
Issue #19537: Fix PyUnicode_DATA() alignment under m68k.  Patch by Andreas Schwab.
http://hg.python.org/cpython/rev/0128b25068de
msg214640 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-23 21:57
Ok, I've committed the patch, then. Thank you, Andreas.
msg214651 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-03-23 23:24
What is the size of the PyASCIIObject on x86/x64 with and without the
patch? GCC and Visual Studio provide attributes to align structures.
msg214652 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-23 23:30
> What is the size of the PyASCIIObject on x86/x64 with and without the
> patch?

Haven't tried on x86, but on x86-64 it's the same. If it changes it will
probably get detected by the sys.getsizeof() tests.
msg214673 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-03-24 08:16
2014-03-24 0:30 GMT+01:00 Antoine Pitrou <report@bugs.python.org>:
>> What is the size of the PyASCIIObject on x86/x64 with and without the
>> patch?
>
> Haven't tried on x86, but on x86-64 it's the same. If it changes it will
> probably get detected by the sys.getsizeof() tests.

test_sys has been modified:

-        asciifields = "nnbP"
+        asciifields = "nn4bP"

It would like to know if objects are bigger with the change or not.
msg214674 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-24 08:30
Look at the commits, not the patch.
msg214675 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-03-24 08:34
> Look at the commits, not the patch.

Ah! You wrote "Ok, I've committed the patch". Ok, your commit is fine. I would prefered a named field (called "padding"), but it's fine ;-)
msg214728 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-03-24 21:19
Martin, Larry msg214639 states "Fix PyUnicode_DATA() alignment under m68k.", your comments please.
msg214735 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-03-24 21:34
Mark Lawrence, stop playing off committers against each other. This is
outright trolling.
msg214745 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-03-24 22:08
Please do not threaten me, I've been told that I'm to be ignored, and I now observe that committers seem to be sneaking patches into the source control system when the platform is unsupported.  I'll stop placing comments on appropriate issues if and only if I get satisfactory answers to my questions, which so far are conspicuous by their absence.
History
Date User Action Args
2014-03-24 22:08:05BreamoreBoysetmessages: + msg214745
2014-03-24 21:34:53skrahsetmessages: + msg214735
2014-03-24 21:19:42BreamoreBoysetnosy: + larry, BreamoreBoy, loewis
messages: + msg214728
2014-03-24 08:34:24hayposetmessages: + msg214675
2014-03-24 08:30:20pitrousetmessages: + msg214674
2014-03-24 08:16:32hayposetmessages: + msg214673
2014-03-23 23:30:04pitrousetmessages: + msg214652
2014-03-23 23:24:20hayposetmessages: + msg214651
2014-03-23 21:57:20pitrousetstatus: open -> closed
resolution: fixed
messages: + msg214640

stage: needs patch -> resolved
2014-03-23 21:55:59python-devsetnosy: + python-dev
messages: + msg214639
2014-03-23 21:11:39skrahsetmessages: + msg214635
2014-03-23 20:53:41pitrousetmessages: + msg214634
2014-03-23 16:45:03schwabsetfiles: - xx
2014-03-23 16:44:09schwabsetfiles: + pyasciiobject-align.patch
keywords: + patch
messages: + msg214603
2013-12-16 13:52:23serhiy.storchakasetmessages: + msg206311
2013-12-16 09:21:19hayposetmessages: + msg206285
2013-12-15 20:21:14pitrousetmessages: + msg206254
2013-12-15 20:20:03schwabsetmessages: + msg206253
2013-12-15 19:59:35pitrousetmessages: + msg206250
2013-12-15 19:54:24serhiy.storchakasetmessages: + msg206249
stage: patch review -> needs patch
2013-11-25 20:40:27schwabsetmessages: + msg204403
2013-11-25 17:14:45serhiy.storchakasetmessages: + msg204367
2013-11-25 16:41:40skrahsetnosy: + skrah
messages: + msg204358
2013-11-12 22:12:03hayposetnosy: + haypo
2013-11-11 12:19:29skrahsetnosy: - skrah
2013-11-10 20:50:45schwabsetmessages: + msg202573
2013-11-10 19:11:34skrahsetmessages: + msg202554
2013-11-10 19:03:04schwabsetmessages: + msg202551
2013-11-10 17:59:50pitrousetmessages: + msg202541
2013-11-10 17:11:49serhiy.storchakasetnosy: + pitrou
messages: + msg202537
2013-11-10 12:58:38skrahsetnosy: + skrah
2013-11-09 17:13:59serhiy.storchakasetnosy: + serhiy.storchaka
stage: patch review

versions: + Python 3.4
2013-11-09 16:32:08schwabcreate