Message 351052 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	malin
Recipients	Greg Price, aeros, malin, mark.dickinson, rhettinger, sir-sigurd
Date	2019-09-03.04:02:44
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1567483364.75.0.314519224234.issue38015@roundup.psfhosted.org>
In-reply-to

Content
Commit 5e63ab0 replaces macro with this inline function: static inline int is_small_int(long long ival) { return -NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS; } (by default, NSMALLNEGINTS is 5, NSMALLPOSINTS is 257) However, when invoking this function, and `sizeof(value) < sizeof(long long)`, there is an unnecessary type casting. For example, on 32-bit platform, if `value` is `Py_ssize_t`, it needs to be converted to 8-byte `long long` type. The following assembly code is the beginning part of `PyLong_FromSsize_t(Py_ssize_t v)` function. (32-bit x86 build generated by GCC 9.2, with `-m32 -O2` option) Use macro before commit 5e63ab0: mov eax, DWORD PTR [esp+4] add eax, 5 cmp eax, 261 ja .L2 sal eax, 4 add eax, OFFSET FLAT:small_ints add DWORD PTR [eax], 1 ret .L2: jmp PyLong_FromSsize_t_rest(int) Use inlined function: push ebx mov eax, DWORD PTR [esp+8] mov edx, 261 mov ecx, eax mov ebx, eax sar ebx, 31 add ecx, 5 adc ebx, 0 cmp edx, ecx mov edx, 0 sbb edx, ebx jc .L7 cwde sal eax, 4 add eax, OFFSET FLAT:small_ints+80 add DWORD PTR [eax], 1 pop ebx ret .L7: pop ebx jmp PyLong_FromSsize_t_rest(int) On 32-bit x86 platform, 8-byte `long long` is implemented in using two registers, so the machine code is much longer than macro version. At least these hot functions are suffered from this: PyObject* PyLong_FromSsize_t(Py_ssize_t v) PyObject* PyLong_FromLong(long v) Replacing the inline function with a macro version will fix this: #define IS_SMALL_INT(ival) (-NSMALLNEGINTS <= (ival) && (ival) < NSMALLPOSINTS) If you want to see assembly code generated by major compilers, you can paste attached file demo.c to https://godbolt.org/ - demo.c was original written by Greg Price. - use `-m32 -O2` to generate 32-bit build.

Commit 5e63ab0 replaces macro with this inline function:

    static inline int
    is_small_int(long long ival)
    {
        return -NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS;
    }

(by default, NSMALLNEGINTS is 5, NSMALLPOSINTS is 257)


However, when invoking this function, and `sizeof(value) < sizeof(long long)`, there is an unnecessary type casting.

For example, on 32-bit platform, if `value` is `Py_ssize_t`, it needs to be converted to 8-byte `long long` type.

The following assembly code is the beginning part of `PyLong_FromSsize_t(Py_ssize_t v)` function.
(32-bit x86 build generated by GCC 9.2, with `-m32 -O2` option)

Use macro before commit 5e63ab0:
        mov     eax, DWORD PTR [esp+4]
        add     eax, 5
        cmp     eax, 261
        ja      .L2
        sal     eax, 4
        add     eax, OFFSET FLAT:small_ints
        add     DWORD PTR [eax], 1
        ret
.L2:    jmp     PyLong_FromSsize_t_rest(int)

Use inlined function:
        push    ebx
        mov     eax, DWORD PTR [esp+8]
        mov     edx, 261
        mov     ecx, eax
        mov     ebx, eax
        sar     ebx, 31
        add     ecx, 5
        adc     ebx, 0
        cmp     edx, ecx
        mov     edx, 0
        sbb     edx, ebx
        jc      .L7
        cwde
        sal     eax, 4
        add     eax, OFFSET FLAT:small_ints+80
        add     DWORD PTR [eax], 1
        pop     ebx
        ret
.L7:    pop     ebx
        jmp     PyLong_FromSsize_t_rest(int)

On 32-bit x86 platform, 8-byte `long long` is implemented in using two registers, so the machine code is much longer than macro version.

At least these hot functions are suffered from this:
  PyObject* PyLong_FromSsize_t(Py_ssize_t v)
  PyObject* PyLong_FromLong(long v)

Replacing the inline function with a macro version will fix this:
#define IS_SMALL_INT(ival) (-NSMALLNEGINTS <= (ival) && (ival) < NSMALLPOSINTS)

If you want to see assembly code generated by major compilers, you can paste attached file demo.c to https://godbolt.org/
- demo.c was original written by Greg Price.
- use `-m32 -O2` to generate 32-bit build.

History
Date	User	Action	Args
2019-09-03 04:02:44	malin	set	recipients: + malin, rhettinger, mark.dickinson, Greg Price, sir-sigurd, aeros
2019-09-03 04:02:44	malin	set	messageid: <1567483364.75.0.314519224234.issue38015@roundup.psfhosted.org>
2019-09-03 04:02:44	malin	link	issue38015 messages
2019-09-03 04:02:44	malin	create