This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: str.is* implementation seem suboptimal for single character strings
Type: performance Stage:
Components: Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, ezio.melotti, gdementen, georg.brandl, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2013-03-27 14:41 by gdementen, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (6)
msg185338 - (view) Author: Gaëtan de Menten (gdementen) Date: 2013-03-27 14:41
In isspace, isalpha, isalnum and isdigit, I see code like:

/* Shortcut for single character strings */
if (PyString_GET_SIZE(self) == 1 &&
    isspace(*p))
    return PyBool_FromLong(1);

Is it intentional to not use:

if (PyString_GET_SIZE(self) == 1))
    return PyBool_FromLong(isspace(*p) != 0);

which would be faster when the result is False (but a tad slower when it is True because of the extra comparison).

Also, is there a reason (other than historical) why the macros Py_RETURN_TRUE and Py_RETURN_FALSE are not used instead of their equivalent functions PyBool_FromLong(1) and PyBool_FromLong(0)?

See:
http://hg.python.org/cpython/file/e87364449954/Objects/stringobject.c#l3324
msg185392 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-03-28 00:35
The shortcut seems fairly pointless to me.
msg185393 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-03-28 00:45
If you would like to improve Python, you have to focus on the development version which is Python 3.4. In this version, the code is different:

    if (length == 1)
        return PyBool_FromLong(
            Py_UNICODE_ISSPACE(PyUnicode_READ(kind, data, 0)));

I'm not sure that having a special case for string of 1 character provide any speed up...
msg185394 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-03-28 00:49
There's still stuff in bytes_methods.c which looks like the old string code.
msg185407 - (view) Author: Gaëtan de Menten (gdementen) Date: 2013-03-28 09:14
Argl. I know I should have used 3.4... but that is what I thought I did. 
I used http://hg.python.org/cpython then "browse", and assumed it was the default branch... I know realize that since the last commit at that time was on the 2.7 branch, that is what I got. And I didn't even realize my mistake by the string vs unicode differences. Now, I will just go hide somewhere in shame...
msg185410 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013-03-28 09:56
Benjamin, if that comment means there's still something to be done, please reopen.
History
Date User Action Args
2022-04-11 14:57:43adminsetgithub: 61759
2013-03-28 09:56:32georg.brandlsetstatus: open -> closed

nosy: + georg.brandl
messages: + msg185410

resolution: works for me
2013-03-28 09:14:56gdementensetmessages: + msg185407
2013-03-28 00:49:37benjamin.petersonsetmessages: + msg185394
2013-03-28 00:45:43vstinnersetmessages: + msg185393
2013-03-28 00:35:50benjamin.petersonsetmessages: + msg185392
2013-03-28 00:31:22ezio.melottisetnosy: + vstinner, benjamin.peterson, ezio.melotti, serhiy.storchaka

versions: + Python 3.4
2013-03-27 14:41:09gdementencreate