Issue 27097: ceval: Wordcode follow up, explicit unsigned short read

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/71284

classification

Title:	ceval: Wordcode follow up, explicit unsigned short read
Type:	performance	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.6

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	serhiy.storchaka	Nosy List:	Adrian Wielgosik, Demur Rumed, berker.peksag, mark.dickinson, python-dev, serhiy.storchaka, vstinner, ztane
Priority:	normal	Keywords:	patch

Created on 2016-05-24 00:08 by Demur Rumed, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
ushort.patch	Demur Rumed, 2016-05-24 00:08
ushort.patch	serhiy.storchaka, 2016-05-24 06:18	Regenerated for review	review
pybench-diff.txt	serhiy.storchaka, 2016-05-24 19:19
ushort2.patch	serhiy.storchaka, 2016-05-25 03:50		review
ushort3.patch	serhiy.storchaka, 2016-05-25 13:10		review

Messages (26)
msg266209 - (view)	Author: Philip Dubé (Demur Rumed) *	Date: 2016-05-24 00:08
Attached is a patch based off #26647 which converts ceval to read opcode/oparg by casting next_instr to an unsigned short. I don't believe we need to complicate the code with checking the pointer alignment of the instruction stream; some effort must be gone through to generate bytecode off alignment, & if the issue comes up it's because someone's using the C API on a CPU that doesn't support unaligned reads to generate their own custom bytecode. Anyways pybench went from 16.5 on wordcode to 16.0 for me The change works through replacing NEXTOP/NEXTARG with a NEXTOPARG macro
msg266256 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-05-24 17:37
This assignment: oparg = (unsigned short)next_instr has undefined behaviour, according to the C standards. I'm not sure it's a good idea to introduce this into the core.
msg266257 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-05-24 18:00
> has undefined behaviour, according to the C standards. I retract this; I'm no longer convinced this is true. Sorry for the noise.
msg266261 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-05-24 19:05
What if change the type of next_instr to unsigned short? This would guarantee that the read is aligned since the start of code data is aligned. I have small doubt about PREDICT(). I doubt that the compiler is smart enough to infer that next_instr and (unsigned short)next_instr & 255 is the same. It can do unneeded work. I don't know whether this have negative effect.
msg266262 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-05-24 19:19
Using Raymond's microbenchmark from issue25823: $ ./python.exe exercise_oparg.py Unpatched: 0.8331978429996525 Patched: 0.7600522060020012 Pybench exposes 7% speed up (detailed report is attached).
msg266263 - (view)	Author: Adrian Wielgosik (Adrian Wielgosik) *	Date: 2016-05-24 19:29
I'm pretty sure this is undefined behavior. Here's how similar code was fixed in Chromium: https://codereview.chromium.org/9117014/patch/1/2
msg266264 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-05-24 20:03
I hope following patch is free from undefined behavior.
msg266278 - (view)	Author: Philip Dubé (Demur Rumed) *	Date: 2016-05-24 21:19
There is no strict aliasing issues because aliasing is explicitly allowed for char buffers. The only issue is unaligned memory reads, but allocations are well aligned, so there'd have to be explicit work to allocate & then put code at an uneven offset. CPython never does this. This leaves the only issue being if a jump has an uneven jump offset. But jumping into the middle of an instruction is already undefined behavior because if I jump into a byte that happens to be BINARY_ADD on an empty stack it's undefined behavior. Jumping into the middle of an instruction thus falls under "Undefined behavior due to invalid bytecode" tl;dr There is no undefined behavior given correct bytecode layout, & we already have undefined behavior given incorrect bytecode layout. PREDICT concerns may be valid; the solution there may be to have a PEEKOPARG which we can then check the short value before splitting it if the opcode portion matches
msg266282 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-05-24 21:37
> I'm pretty sure this is undefined behavior. Are you talking about the exact C implementation, or the risk of unaligned memory accesses? I suggest to add an assertion to ensure that the pointer is aligned to 16-bits.
msg266283 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-05-24 21:38
> +#define NEXTOPARG() (oparg = (unsigned short)next_instr, opcode = OPOF(oparg), oparg = ARGOF(oparg), next_instr += 2) I dislike this approach. I would prefer to use the unsigned short* type for next_instr, and put an assertion when next_instr is first assigned.
msg266290 - (view)	Author: Philip Dubé (Demur Rumed) *	Date: 2016-05-24 23:44
There's some difficulty in changing next_instr to an unsigned short* unless pointer arithmetic converts it back to unsigned char*. Code relies on f_lasti to be a byte index more than it relies on f_lasti to be -1 at first. Should this change include converting jump offsets to word increments? This patch seeks to be a minimal change to achieve the desired aligned/batched memory read behavior of NEXTOP/NEXTARG. Perhaps a clear description of how deep the changes should be is in order It's also a bit hard to assert pointer alignment without failling back to stdint's uintptr_t. It may be sufficient to assert f_lasti is an even number
msg266296 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-05-25 03:50
Hmm, don't know why my patch was not uploaded first time. It contained exactly what Victor said. Yes, we need to multiply or divide by 2 when convert from pointer difference to byte offset, but this doesn't affect performance. I think the next step is the change of the meaning of jump offsets and instruction numbers.
msg266305 - (view)	Author: Antti Haapala (ztane) *	Date: 2016-05-25 06:01
Casting the pointer is OK, explicitly typing the pointer is even better; then the arithmetic is even clearer and it is easy to add offsets in words! However what I'd prefer here as the size of the type is important, to use `uint16_t` or typedef instead of just `unsigned short`, which is a statemeent that the value must be at least "2 bytes wide".
msg266306 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-05-25 06:10
This is separate issue (see issue17884).
msg266313 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-05-25 06:56
ushort2.patch LGTM, but I added many small comments on the review ;-)
msg266314 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-05-25 06:58
Antti Haapala: > However what I'd prefer here as the size of the type is important, to use `uint16_t` or typedef instead of just `unsigned short`, which is a statemeent that the value must be at least "2 bytes wide". I'm not aware of a platform where sizeof(unsigned short) is not exactly 16-bits. At least, no platform supported by CPython. Include/unicodeobject.h is quite explicit: #if SIZEOF_SHORT == 2 typedef unsigned short Py_UCS2; #else #error "Could not find a proper typedef for Py_UCS2" #endif Maybe we should use a similar code in ceval.c? #if SIZEOF_SHORT != 2 # error "Could not find a proper type for 16-bit pointer" #endif
msg266315 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-05-25 07:01
Demur: "This patch [ ushort.patch ] seeks to be a minimal change to achieve the desired aligned/batched memory read behavior of NEXTOP/NEXTARG." We are concerned by undefined behaviours in CPython for portability reasons. Using a "unsigned short" pointer avoids the undefined behaviour. I'm talking about unsigned char => unsigned short* conversion followed by a dereference: +#define NEXTOPARG() (oparg = (unsigned short)next_instr, opcode = OPOF(oparg), oparg = ARGOF(oparg), next_instr += 2) With ushort2.patch, the risk is reduced at once place which is now "protected" by an assertion: + assert(_Py_IS_ALIGNED(PyBytes_AS_STRING(co->co_code), unsigned short)); + first_instr = (unsigned short*) PyBytes_AS_STRING(co->co_code);
msg266343 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-05-25 13:10
Updated patch addresses Victor's comments.
msg266364 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-05-25 14:40
ushort3.patch LGTM. You might also add (copy) the added assertions in PyCode_New*().
msg266375 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-05-25 17:03
New changeset eda3716d6425 by Serhiy Storchaka in branch 'default': Issue #27097: Python interpreter is now about 7% faster due to optimized https://hg.python.org/cpython/rev/eda3716d6425
msg266378 - (view)	Author: Berker Peksag (berker.peksag) *	Date: 2016-05-25 17:15
Debug builds are broken after eda3716d6425: http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%203.x/builds/4332/steps/compile/logs/stdio
msg266380 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-05-25 17:36
New changeset b26d07812a8e by Serhiy Storchaka in branch 'default': Fixed the use of _Py_IS_ALIGNED (issue #27097). https://hg.python.org/cpython/rev/b26d07812a8e
msg266381 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-05-25 17:37
Opps, sorry. Fixed now. Thank you Berker for fast response.
msg266389 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-05-25 19:01
Why not using sizeof(unsigned short) in the macro, rtbaer than 2?
msg266428 - (view)	Author: Philip Dubé (Demur Rumed) *	Date: 2016-05-26 12:13
PREDICT could be optimized by having a HAS_ARG check on the constant op to decide whether we assign oparg = OPARG(word)
msg266434 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-05-26 13:23
Please focus this issue on documenting changes, and open new issues for further optimizations.

History
Date	User	Action	Args
2022-04-11 14:58:31	admin	set	github: 71284
2016-05-26 13:23:35	vstinner	set	messages: + msg266434
2016-05-26 12:13:40	Demur Rumed	set	messages: + msg266428
2016-05-25 19:01:17	vstinner	set	messages: + msg266389
2016-05-25 17:37:49	serhiy.storchaka	set	messages: + msg266381
2016-05-25 17:36:19	python-dev	set	messages: + msg266380
2016-05-25 17:15:10	berker.peksag	set	nosy: + berker.peksag messages: + msg266378
2016-05-25 17:09:06	serhiy.storchaka	set	status: open -> closed
2016-05-25 17:07:00	serhiy.storchaka	link	issue25823 superseder
2016-05-25 17:04:35	serhiy.storchaka	set	assignee: serhiy.storchaka resolution: fixed stage: patch review -> resolved
2016-05-25 17:03:20	python-dev	set	nosy: + python-dev messages: + msg266375
2016-05-25 14:40:08	vstinner	set	messages: + msg266364
2016-05-25 13:10:39	serhiy.storchaka	set	files: + ushort3.patch messages: + msg266343
2016-05-25 07:01:32	vstinner	set	messages: + msg266315
2016-05-25 06:58:51	vstinner	set	messages: + msg266314
2016-05-25 06:56:10	vstinner	set	messages: + msg266313
2016-05-25 06:10:40	serhiy.storchaka	set	messages: + msg266306
2016-05-25 06:01:24	ztane	set	nosy: + ztane messages: + msg266305
2016-05-25 03:51:01	serhiy.storchaka	set	files: + ushort2.patch messages: + msg266296
2016-05-24 23:44:15	Demur Rumed	set	messages: + msg266290
2016-05-24 21:38:40	vstinner	set	messages: + msg266283
2016-05-24 21:37:38	vstinner	set	nosy: + vstinner messages: + msg266282
2016-05-24 21:19:48	Demur Rumed	set	messages: + msg266278
2016-05-24 20:03:57	serhiy.storchaka	set	messages: + msg266264
2016-05-24 19:29:51	Adrian Wielgosik	set	nosy: + Adrian Wielgosik messages: + msg266263
2016-05-24 19:19:23	serhiy.storchaka	set	files: + pybench-diff.txt messages: + msg266262
2016-05-24 19:05:10	serhiy.storchaka	set	messages: + msg266261
2016-05-24 18:00:21	mark.dickinson	set	messages: + msg266257
2016-05-24 17:37:24	mark.dickinson	set	nosy: + mark.dickinson messages: + msg266256
2016-05-24 06:18:33	serhiy.storchaka	set	nosy: + serhiy.storchaka
2016-05-24 06:18:07	serhiy.storchaka	set	files: + ushort.patch stage: patch review
2016-05-24 00:08:19	Demur Rumed	create