Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix misalignment in fastsearch_memchr_1char #63736

Closed
andreas-schwab mannequin opened this issue Nov 9, 2013 · 28 comments
Closed

Fix misalignment in fastsearch_memchr_1char #63736

andreas-schwab mannequin opened this issue Nov 9, 2013 · 28 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@andreas-schwab
Copy link
Mannequin

andreas-schwab mannequin commented Nov 9, 2013

BPO 19537
Nosy @loewis, @pitrou, @vstinner, @larryhastings, @skrah, @serhiy-storchaka, @andreas-schwab
Files
  • pyasciiobject-align.patch: Add explicit padding to PyASCIIObject
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-03-23.21:57:20.154>
    created_at = <Date 2013-11-09.16:32:08.481>
    labels = ['interpreter-core', 'type-bug']
    title = 'Fix misalignment in fastsearch_memchr_1char'
    updated_at = <Date 2014-03-24.22:08:05.478>
    user = 'https://github.com/andreas-schwab'

    bugs.python.org fields:

    activity = <Date 2014-03-24.22:08:05.478>
    actor = 'BreamoreBoy'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-03-23.21:57:20.154>
    closer = 'pitrou'
    components = ['Interpreter Core']
    creation = <Date 2013-11-09.16:32:08.481>
    creator = 'schwab'
    dependencies = []
    files = ['34589']
    hgrepos = []
    issue_num = 19537
    keywords = ['patch']
    message_count = 28.0
    messages = ['202487', '202537', '202541', '202551', '202554', '202573', '204358', '204367', '204403', '206249', '206250', '206253', '206254', '206285', '206311', '214603', '214634', '214635', '214639', '214640', '214651', '214652', '214673', '214674', '214675', '214728', '214735', '214745']
    nosy_count = 9.0
    nosy_names = ['loewis', 'pitrou', 'vstinner', 'larry', 'skrah', 'BreamoreBoy', 'python-dev', 'serhiy.storchaka', 'schwab']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue19537'
    versions = ['Python 3.3', 'Python 3.4']

    @andreas-schwab
    Copy link
    Mannequin Author

    andreas-schwab mannequin commented Nov 9, 2013

    sizeof(foo) is never a good approximation for the alignment of foo

    @andreas-schwab andreas-schwab mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Nov 9, 2013
    @serhiy-storchaka
    Copy link
    Member

    Current code assumes that PyUnicode_DATA() is aligned to PyUnicode_KIND() bytes. If this is not true on some platform it will be easer to add a padding than rewrite a code. A lot of code depends on this assumption. See also bpo-14422.

    I afraid that proposed patch may slow down a search.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 10, 2013

    I don't understand in which concrete situation the current code could be wrong. The start of the unicode string should always be aligned, due to how unicode objects are allocated.

    @andreas-schwab
    Copy link
    Mannequin Author

    andreas-schwab mannequin commented Nov 10, 2013

    (gdb) p sizeof(PyASCIIObject)
    $1 = 22

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Nov 10, 2013

    Andreas Schwab <report@bugs.python.org> wrote:

    (gdb) p sizeof(PyASCIIObject)
    $1 = 22

    m68k again? ;)

    @andreas-schwab
    Copy link
    Mannequin Author

    andreas-schwab mannequin commented Nov 10, 2013

    There is nothing wrong with that.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Nov 25, 2013

    (gdb) p sizeof(PyASCIIObject)
    $1 = 22

    If the only issue is that the size should be 24 instead of 22,
    could we perhaps add an unused uint16_t to the struct just
    for this architecture?

    @serhiy-storchaka
    Copy link
    Member

    What is sizeof(PyUnicodeObject)?

    @andreas-schwab
    Copy link
    Mannequin Author

    andreas-schwab mannequin commented Nov 25, 2013

    (gdb) p sizeof(PyUnicodeObject)
    $1 = 38

    @serhiy-storchaka
    Copy link
    Member

    Antoine, if sizeof(PyUnicodeObject) is 38, the start of UCS4 unicode string is not aligned. This should be fixed.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 15, 2013

    Antoine, if sizeof(PyUnicodeObject) is 38, the start of UCS4 unicode
    string is not aligned. This should be fixed.

    Probably. Does anyone want to propose a patch?

    @andreas-schwab
    Copy link
    Mannequin Author

    andreas-schwab mannequin commented Dec 15, 2013

    How about adding explicit padding to the bitfield in PyASCIIObject?

    @pitrou
    Copy link
    Member

    pitrou commented Dec 15, 2013

    How about adding explicit padding to the bitfield in PyASCIIObject?

    That sounds ok to me, but it must be explicitly for 68k (or use a
    sufficiently smart scheme not to affect already aligned architectures).

    @vstinner
    Copy link
    Member

    If you compile Python with GCC, we can maybe try something with __attribute__ ((aligned (sizeof(void *)))) attribute. The attribute can be used on a structure field. The problem is that we don't care of the alignment of header attributes, only of data, but data is not a field but the data just after the structure.

    Or is it possible to GCC to get a structure size aligned on 4 bytes?

    For the explicit padding: how do you compute the size of the padding?

    How about disabling the fast-path in FASTSEARCH if data is not aligned?

    (You might get non-aligned if even if structure is correctly aligned, it may happen if the memory allocator does not align memory blocks. I don't know if Python may get "unaligned" memory blocks.)

    @serhiy-storchaka
    Copy link
    Member

    I think that adding __attribute__ ((aligned (sizeof(void *)))) to the PyObject type (or to the ob_base field) is enough.

    If first byte of structure is aligned, then first byte past the structure should be aligned too.

    @andreas-schwab
    Copy link
    Mannequin Author

    andreas-schwab mannequin commented Mar 23, 2014

    The attached patch has been tested on {i586,x86_64,ppc,ppc64,ppc64le,armv6hl,armv7hl,aarch64,m68k}-suse-linux.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 23, 2014

    Are unnamed struct fields actually valid C?

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Mar 23, 2014

    Yes, §6.7.2.1:

    A bit-field declaration with no declarator, but only a colon and a width, indicates an unnamed bit-field.

    With a footnote:

    1. An unnamed bit-field structure member is useful for padding to conform to externally imposed layouts.

    Unnamed struct members are valid, too:

    There may be unnamed padding at the end of a structure or union.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 23, 2014

    New changeset 004ae1472a43 by Antoine Pitrou in branch '3.4':
    Issue bpo-19537: Fix PyUnicode_DATA() alignment under m68k. Patch by Andreas Schwab.
    http://hg.python.org/cpython/rev/004ae1472a43

    New changeset 0128b25068de by Antoine Pitrou in branch 'default':
    Issue bpo-19537: Fix PyUnicode_DATA() alignment under m68k. Patch by Andreas Schwab.
    http://hg.python.org/cpython/rev/0128b25068de

    @pitrou
    Copy link
    Member

    pitrou commented Mar 23, 2014

    Ok, I've committed the patch, then. Thank you, Andreas.

    @pitrou pitrou closed this as completed Mar 23, 2014
    @vstinner
    Copy link
    Member

    What is the size of the PyASCIIObject on x86/x64 with and without the
    patch? GCC and Visual Studio provide attributes to align structures.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 23, 2014

    What is the size of the PyASCIIObject on x86/x64 with and without the
    patch?

    Haven't tried on x86, but on x86-64 it's the same. If it changes it will
    probably get detected by the sys.getsizeof() tests.

    @vstinner
    Copy link
    Member

    2014-03-24 0:30 GMT+01:00 Antoine Pitrou <report@bugs.python.org>:

    > What is the size of the PyASCIIObject on x86/x64 with and without the
    > patch?

    Haven't tried on x86, but on x86-64 it's the same. If it changes it will
    probably get detected by the sys.getsizeof() tests.

    test_sys has been modified:

    •    asciifields = "nnbP"
      

    + asciifields = "nn4bP"

    It would like to know if objects are bigger with the change or not.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 24, 2014

    Look at the commits, not the patch.

    @vstinner
    Copy link
    Member

    Look at the commits, not the patch.

    Ah! You wrote "Ok, I've committed the patch". Ok, your commit is fine. I would prefered a named field (called "padding"), but it's fine ;-)

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Mar 24, 2014

    Martin, Larry msg214639 states "Fix PyUnicode_DATA() alignment under m68k.", your comments please.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Mar 24, 2014

    Mark Lawrence, stop playing off committers against each other. This is
    outright trolling.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Mar 24, 2014

    Please do not threaten me, I've been told that I'm to be ignored, and I now observe that committers seem to be sneaking patches into the source control system when the platform is unsupported. I'll stop placing comments on appropriate issues if and only if I get satisfactory answers to my questions, which so far are conspicuous by their absence.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants