Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceval: Wordcode follow up, explicit unsigned short read #71284

Closed
serprex mannequin opened this issue May 24, 2016 · 26 comments
Closed

ceval: Wordcode follow up, explicit unsigned short read #71284

serprex mannequin opened this issue May 24, 2016 · 26 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@serprex
Copy link
Mannequin

serprex mannequin commented May 24, 2016

BPO 27097
Nosy @mdickinson, @vstinner, @berkerpeksag, @serhiy-storchaka, @ztane, @serprex, @adrian17
Files
  • ushort.patch
  • ushort.patch: Regenerated for review
  • pybench-diff.txt
  • ushort2.patch
  • ushort3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2016-05-25.17:09:06.739>
    created_at = <Date 2016-05-24.00:08:19.059>
    labels = ['interpreter-core', 'performance']
    title = 'ceval: Wordcode follow up, explicit unsigned short read'
    updated_at = <Date 2016-05-26.13:23:35.003>
    user = 'https://github.com/serprex'

    bugs.python.org fields:

    activity = <Date 2016-05-26.13:23:35.003>
    actor = 'vstinner'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2016-05-25.17:09:06.739>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2016-05-24.00:08:19.059>
    creator = 'Demur Rumed'
    dependencies = []
    files = ['42961', '42964', '42966', '42980', '42990']
    hgrepos = []
    issue_num = 27097
    keywords = ['patch']
    message_count = 26.0
    messages = ['266209', '266256', '266257', '266261', '266262', '266263', '266264', '266278', '266282', '266283', '266290', '266296', '266305', '266306', '266313', '266314', '266315', '266343', '266364', '266375', '266378', '266380', '266381', '266389', '266428', '266434']
    nosy_count = 8.0
    nosy_names = ['mark.dickinson', 'vstinner', 'python-dev', 'berker.peksag', 'serhiy.storchaka', 'ztane', 'Demur Rumed', 'Adrian Wielgosik']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue27097'
    versions = ['Python 3.6']

    @serprex
    Copy link
    Mannequin Author

    serprex mannequin commented May 24, 2016

    Attached is a patch based off bpo-26647 which converts ceval to read opcode/oparg by casting next_instr to an unsigned short. I don't believe we need to complicate the code with checking the pointer alignment of the instruction stream; some effort must be gone through to generate bytecode off alignment, & if the issue comes up it's because someone's using the C API on a CPU that doesn't support unaligned reads to generate their own custom bytecode. Anyways pybench went from 16.5 on wordcode to 16.0 for me

    The change works through replacing NEXTOP/NEXTARG with a NEXTOPARG macro

    @serprex serprex mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels May 24, 2016
    @mdickinson
    Copy link
    Member

    This assignment:

    oparg = *(unsigned short*)next_instr
    

    has undefined behaviour, according to the C standards. I'm not sure it's a good idea to introduce this into the core.

    @mdickinson
    Copy link
    Member

    has undefined behaviour, according to the C standards.

    I retract this; I'm no longer convinced this is true. Sorry for the noise.

    @serhiy-storchaka
    Copy link
    Member

    What if change the type of next_instr to unsigned short*? This would guarantee that the read is aligned since the start of code data is aligned.

    I have small doubt about PREDICT(). I doubt that the compiler is smart enough to infer that *next_instr and *(unsigned short*)next_instr & 255 is the same. It can do unneeded work. I don't know whether this have negative effect.

    @serhiy-storchaka
    Copy link
    Member

    Using Raymond's microbenchmark from bpo-25823:

    $ ./python.exe exercise_oparg.py
    Unpatched:  0.8331978429996525
    Patched:    0.7600522060020012

    Pybench exposes 7% speed up (detailed report is attached).

    @adrian17
    Copy link
    Mannequin

    adrian17 mannequin commented May 24, 2016

    I'm pretty sure this is undefined behavior. Here's how similar code was fixed in Chromium:

    https://codereview.chromium.org/9117014/patch/1/2

    @serhiy-storchaka
    Copy link
    Member

    I hope following patch is free from undefined behavior.

    @serprex
    Copy link
    Mannequin Author

    serprex mannequin commented May 24, 2016

    There is no strict aliasing issues because aliasing is explicitly allowed for char buffers. The only issue is unaligned memory reads, but allocations are well aligned, so there'd have to be explicit work to allocate & then put code at an uneven offset. CPython never does this. This leaves the only issue being if a jump has an uneven jump offset. But jumping into the middle of an instruction is already undefined behavior because if I jump into a byte that happens to be BINARY_ADD on an empty stack it's undefined behavior. Jumping into the middle of an instruction thus falls under "Undefined behavior due to invalid bytecode"

    tl;dr There is no undefined behavior given correct bytecode layout, & we already have undefined behavior given incorrect bytecode layout. PREDICT concerns may be valid; the solution there may be to have a PEEKOPARG which we can then check the short value before splitting it if the opcode portion matches

    @vstinner
    Copy link
    Member

    I'm pretty sure this is undefined behavior.

    Are you talking about the exact C implementation, or the risk of unaligned memory accesses?

    I suggest to add an assertion to ensure that the pointer is aligned to 16-bits.

    @vstinner
    Copy link
    Member

    +#define NEXTOPARG() (oparg = *(unsigned short*)next_instr, opcode = OPOF(oparg), oparg = ARGOF(oparg), next_instr += 2)

    I dislike this approach. I would prefer to use the unsigned short* type for next_instr, and put an assertion when next_instr is first assigned.

    @serprex
    Copy link
    Mannequin Author

    serprex mannequin commented May 24, 2016

    There's some difficulty in changing next_instr to an unsigned short* unless pointer arithmetic converts it back to unsigned char*. Code relies on f_lasti to be a byte index more than it relies on f_lasti to be -1 at first. Should this change include converting jump offsets to word increments? This patch seeks to be a minimal change to achieve the desired aligned/batched memory read behavior of NEXTOP/NEXTARG. Perhaps a clear description of how deep the changes should be is in order

    It's also a bit hard to assert pointer alignment without failling back to stdint's uintptr_t. It may be sufficient to assert f_lasti is an even number

    @serhiy-storchaka
    Copy link
    Member

    Hmm, don't know why my patch was not uploaded first time. It contained exactly what Victor said. Yes, we need to multiply or divide by 2 when convert from pointer difference to byte offset, but this doesn't affect performance. I think the next step is the change of the meaning of jump offsets and instruction numbers.

    @ztane
    Copy link
    Mannequin

    ztane mannequin commented May 25, 2016

    Casting the pointer is OK, explicitly typing the pointer is even better; then the arithmetic is even clearer and it is easy to add offsets in words!

    However what I'd prefer here as the size of the type is important, to use uint16_t or typedef instead of just unsigned short, which is a statemeent that the value must be at least "2 bytes wide".

    @serhiy-storchaka
    Copy link
    Member

    This is separate issue (see bpo-17884).

    @vstinner
    Copy link
    Member

    ushort2.patch LGTM, but I added many small comments on the review ;-)

    @vstinner
    Copy link
    Member

    Antti Haapala:

    However what I'd prefer here as the size of the type is important, to use uint16_t or typedef instead of just unsigned short, which is a statemeent that the value must be at least "2 bytes wide".

    I'm not aware of a platform where sizeof(unsigned short) is not *exactly* 16-bits. At least, no platform supported by CPython.

    Include/unicodeobject.h is quite explicit:

    #if SIZEOF_SHORT == 2
    typedef unsigned short Py_UCS2;
    #else
    #error "Could not find a proper typedef for Py_UCS2"
    #endif

    Maybe we should use a similar code in ceval.c?

    #if SIZEOF_SHORT != 2
    #  error "Could not find a proper type for 16-bit pointer"
    #endif

    @vstinner
    Copy link
    Member

    Demur: "This patch [ ushort.patch ] seeks to be a minimal change to achieve the desired aligned/batched memory read behavior of NEXTOP/NEXTARG."

    We are concerned by undefined behaviours in CPython for portability reasons. Using a "unsigned short*" pointer avoids the undefined behaviour.

    I'm talking about unsigned char* => unsigned short* conversion followed by a dereference:

    +#define NEXTOPARG() (oparg = *(unsigned short*)next_instr, opcode = OPOF(oparg), oparg = ARGOF(oparg), next_instr += 2)

    With ushort2.patch, the risk is reduced at once place which is now "protected" by an assertion:

    + assert(_Py_IS_ALIGNED(PyBytes_AS_STRING(co->co_code), unsigned short));
    + first_instr = (unsigned short*) PyBytes_AS_STRING(co->co_code);

    @serhiy-storchaka
    Copy link
    Member

    Updated patch addresses Victor's comments.

    @vstinner
    Copy link
    Member

    ushort3.patch LGTM.

    You might also add (copy) the added assertions in PyCode_New*().

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 25, 2016

    New changeset eda3716d6425 by Serhiy Storchaka in branch 'default':
    Issue bpo-27097: Python interpreter is now about 7% faster due to optimized
    https://hg.python.org/cpython/rev/eda3716d6425

    @berkerpeksag
    Copy link
    Member

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 25, 2016

    New changeset b26d07812a8e by Serhiy Storchaka in branch 'default':
    Fixed the use of _Py_IS_ALIGNED (issue bpo-27097).
    https://hg.python.org/cpython/rev/b26d07812a8e

    @serhiy-storchaka
    Copy link
    Member

    Opps, sorry. Fixed now. Thank you Berker for fast response.

    @vstinner
    Copy link
    Member

    Why not using sizeof(unsigned short) in the macro, rtbaer than 2?

    @serprex
    Copy link
    Mannequin Author

    serprex mannequin commented May 26, 2016

    PREDICT could be optimized by having a HAS_ARG check on the constant op to decide whether we assign oparg = OPARG(word)

    @vstinner
    Copy link
    Member

    Please focus this issue on documenting changes, and open new issues
    for further optimizations.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants