This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: improve 27 percent performance on stringpbject.c (by prefetch and loop optimization)
Type: performance Stage:
Components: Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: abael, jcea, pitrou, r.david.murray, serhiy.storchaka
Priority: normal Keywords:

Created on 2012-08-01 09:33 by abael, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
stringjoin.c abael, 2012-08-04 08:00
Messages (7)
msg167107 - (view) Author: abael (abael) Date: 2012-08-01 09:33
Python-2.7.3/Objects/stringobject.c( SHA256SUM ad7795c75e2a25247e4dea4cc5327c225c4da03b7c7d57226c817ba6d12a316c)
static PyObject *string_join(PyStringObject *self, PyObject *orig);

OLD IMPLEMENT LOGIC(Pseudo code):
        char *sep = PyString_AS_STRING(self);
        const Py_ssize_t seplen = PyString_GET_SIZE(self);

        seq = PySequence_Fast(orig, "");
        seqlen = PySequence_Size(seq);

        if (seqlen == 0)
            return PyString_FromString("");
        else if (seqlen == 1)return the exactly first one item;
        else{
            for (i = 0; i < seqlen; i++) {
                const size_t old_sz = sz;
                item = PySequence_Fast_GET_ITEM(seq, i);
                if (!PyString_Check(item)){
                   if ( Py_USING_UNICODE and PyUnicode_Check(item))
                        return PyUnicode_Join((PyObject *)self, seq);
                   else  PyErr_Format(...);
                }
                sz += PyString_GET_SIZE(item);

                if (i != 0)
                    sz += seplen;
            }
        }

        /* Allocate result space. */
        res = PyString_FromStringAndSize((char*)NULL, sz);

        /* Catenate everything. */
        p = PyString_AS_STRING(res);
        for (i = 0; i < seqlen; ++i) {
            size_t n;
            item = PySequence_Fast_GET_ITEM(seq, i);
            n = PyString_GET_SIZE(item);
            Py_MEMCPY(p, PyString_AS_STRING(item), n);
            p += n;
            if (i < seqlen - 1) {
                Py_MEMCPY(p, sep, seplen);
                p += seplen;
            }
        }




Abael's IMPLEMENT LOGIC:
        char *sep = PyString_AS_STRING(self);
        const Py_ssize_t seplen = PyString_GET_SIZE(self);

        seq = PySequence_Fast(orig, "");
        seqlen = PySequence_Size(seq);

        if (seqlen == 0)
            return PyString_FromString("");
        if (seqlen == 1)
            return the exactly first one item;

        if (seqlen <0)return NULL

         /**** PREFETCH start, get the first item size, since here we can assume seqleng >= 2 ****/
        register size_t sz=0;
        register size_t old_sz=0;
        PyObject *res = NULL;

        item = PySequence_Fast_GET_ITEM(seq, 0);
        if (!PyString_Check(item)){
           if ( Py_USING_UNICODE and PyUnicode_Check(item))
                return PyUnicode_Join((PyObject *)self, seq);
           else  PyErr_Format(...);
        }

        sz += PyString_GET_SIZE(item);
        if (sz < old_sz || sz > PY_SSIZE_T_MAX) PyErr_SetString(PyExc_OverflowError,"join() result is too long for a Python string");
         /**** PREFETCH end, get the first item size, since here we can assume seqleng >= 2 ****/

        register Py_ssize_t i;
        for (i=1; i < seqlen; i++) { /**** then here we can loop start from 1 ****/
            const size_t old_sz = sz;
            item = PySequence_Fast_GET_ITEM(seq, i);
            if (!PyString_Check(item)){
               if ( Py_USING_UNICODE and PyUnicode_Check(item))
                    return PyUnicode_Join((PyObject *)self, seq);
               else  PyErr_Format(...);
            }
            sz += PyString_GET_SIZE(item);
            sz += seplen; /**** now we don't need to test (i != 0) every loop ****/
        }

        /* Allocate result space. */
        res = PyString_FromStringAndSize((char*)NULL, sz);

        /* Catenate everything. */
        /**** PREFETCH start, memcpy the first item first, since here we can assume seqleng >= 2 ****/
        register char *p = PyString_AS_STRING(res);
        item = PySequence_Fast_GET_ITEM(seq, 0);
        sz = PyString_GET_SIZE(item);
        Py_MEMCPY(p, PyString_AS_STRING(item),sz);
        p += sz;
        /**** PREFETCH end, memcpy the first item first, since here we can assume seqleng >= 2 ****/

        for (i=1; i<seqlen; ++i){ /**** here we also loop start from 1 ****/
            item = PySequence_Fast_GET_ITEM(seq, i);
            sz = PyString_GET_SIZE(item);
            Py_MEMCPY(p, sep, seplen); /**** avoid test (i < seqlen - 1) each loop in old implement ****/ 
            p += seplen;
            Py_MEMCPY(p, PyString_AS_STRING(item),sz);
            p += sz;
        }
        return res;
msg167123 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-01 13:04
Hi, several points:

- Python 2.7 is in bugfix mode; you need to work from the default Mercurial branch as explained in http://docs.python.org/devguide/ . In practice, this means your patch will target the future Python 3.4, and therefore either PyUnicode_Join or _PyBytes_Join.

- please provide an actual patch (a Mercurial diff, probably)

- please provide benchmarks (using e.g. timeit) which demonstrate the performance improvement you are proposing
msg167398 - (view) Author: abael (abael) Date: 2012-08-04 08:00
added my implement( with some enhancement, got better performance, at less
for my apps. ).

test result:
with small chunk of str, got double performanc,
and 111% for big chunks;  it

## Util funcion for text definition:
def pf(f,n):
    a=time()
    for i in xrange(n):
        f()
    b=time()
    print  n/(b-a)

#############################################################################
>>> s=['abcd',u'\u548c\u6613\u541b'] * 1000
>>> pf(lambda:''.join(s), 10000)
2289.28293164
>>> pf(lambda:text.join('',s), 10000)
4457.27947763
>>> s=['abcd'*1000,u'\u548c\u6613\u541b'*1000] * 1000
>>> pf(lambda:''.join(s), 100)
15.2374868484
>>> pf(lambda:text.join('',s), 100)
16.939913023

##############################################################################

2012/8/1 Antoine Pitrou <report@bugs.python.org>

>
> Antoine Pitrou added the comment:
>
> Hi, several points:
>
> - Python 2.7 is in bugfix mode; you need to work from the default
> Mercurial branch as explained in http://docs.python.org/devguide/ . In
> practice, this means your patch will target the future Python 3.4, and
> therefore either PyUnicode_Join or _PyBytes_Join.
>
> - please provide an actual patch (a Mercurial diff, probably)
>
> - please provide benchmarks (using e.g. timeit) which demonstrate the
> performance improvement you are proposing
>
> ----------
> nosy: +pitrou
> versions: +Python 3.4 -Python 2.7
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue15522>
> _______________________________________
>
msg167399 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-08-04 08:13
Please port your code to Python 3.3 and compare with it. Python 3.3 implementation of str.join() already more than twice faster then Python 2.7. Maybe your optimization will have no effect.
msg167528 - (view) Author: abael (abael) Date: 2012-08-06 01:28
do i need to fully maintain Python ?

2012/8/5 Serhiy Storchaka <report@bugs.python.org>

>
> Changes by Serhiy Storchaka <storchaka@gmail.com>:
>
>
> ----------
> status: open -> pending
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue15522>
> _______________________________________
>
msg167529 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-08-06 01:40
We are happy that you wish to contribute; however, our policy is that performance enhancements only go in to the newest version of Python, which means any submitted patch must be against the 'default' branch of the Python mercurial repository.  If you wish to propose a patch against default, we will be happy to review it.  If you do not wish to do so, we will close the issue, and your patch will remain available here on the tracker to anyone who wishes to apply it by hand to their own Python 2.7 installation.
msg173439 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-10-21 10:31
I'm closing this issue, since the OP since to have lost interest in his/her proposal. Abael, if you want to propose an actual patch, please open a new issue.
History
Date User Action Args
2022-04-11 14:57:33adminsetgithub: 59727
2012-10-21 10:31:44pitrousetstatus: open -> closed
resolution: rejected
messages: + msg173439
2012-10-21 05:21:41Ramchandra Aptesetstatus: pending -> open
title: impove 27 percent performance on stringpbject.c( by prefetch and loop optimization) -> improve 27 percent performance on stringpbject.c (by prefetch and loop optimization)
2012-10-19 22:13:54serhiy.storchakasetstatus: open -> pending
2012-08-06 01:40:35r.david.murraysetnosy: + r.david.murray
messages: + msg167529
2012-08-06 01:28:06abaelsetstatus: pending -> open

messages: + msg167528
2012-08-05 11:20:39serhiy.storchakasetstatus: open -> pending
2012-08-04 08:13:45serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg167399
2012-08-04 08:00:21abaelsetfiles: + stringjoin.c

messages: + msg167398
2012-08-01 13:42:29jceasetnosy: + jcea
2012-08-01 13:04:55pitrousetnosy: + pitrou

messages: + msg167123
versions: + Python 3.4, - Python 2.7
2012-08-01 09:35:06abaelsetcomponents: + Interpreter Core, - Library (Lib)
2012-08-01 09:33:52abaelcreate