Message 346202 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	methane
Recipients	methane, serhiy.storchaka, vstinner
Date	2019-06-21.10:30:07
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1561113007.49.0.331540252694.issue37348@roundup.psfhosted.org>
In-reply-to

Content
> I don't understand how _PyUnicodeWriter could be slow. It does not overallocate by default. It's just wrapper to implement efficient memory management. I misunderstood _PyUnicodeWriter. I thought it caused one more allocation, but it doesn't. But _PyUnicodeWriter is still slow, because gcc and clang are not smart enough to optimize _PyUnicodeWriter_Init() & _PyUnicodeWriter_Prepare(). See this example: ``` #define PY_SSIZE_T_CLEAN #include <Python.h> #define S(s) (s),strlen(s) int main(int argc, char argv[]) { Py_Initialize(); for (int i=0; i<100000000; i++) { //PyObject s = PyUnicode_FromString("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"); PyObject *s = _PyUnicode_FromASCII(S("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")); Py_DECREF(s); } return 0; } ``` PyUnicode_FromString() takes about 4 sec on my machine. _PyUnicode_FromASCII() is about 2 sec. By skipping _PyUnicodeWriter for ASCII string (GH-14283), PyUnicode_FromString() takes about 3 sec. ``` $ time ./x # PyUnicode_FromString real 0m4.085s user 0m4.081s sys 0m0.004s $ time ./y # PyUnicode_FromString (skip _PyUnicode_Writer, GH-14283) real 0m2.988s user 0m2.988s sys 0m0.000s $ time ./z # _PyUnicode_FromASCII $ time ./z real 0m1.975s user 0m1.975s sys 0m0.000s ```

> I don't understand how _PyUnicodeWriter could be slow. It does not overallocate by default. It's just wrapper to implement efficient memory management.

I misunderstood _PyUnicodeWriter.  I thought it caused one more allocation, but it doesn't.

But _PyUnicodeWriter is still slow, because gcc and clang are not smart enough to optimize _PyUnicodeWriter_Init() & _PyUnicodeWriter_Prepare().

See this example:

```
#define PY_SSIZE_T_CLEAN
#include <Python.h>

#define S(s) (s),strlen(s)

int
main(int argc, char *argv[])
{
    Py_Initialize();

    for (int i=0; i<100000000; i++) {
        //PyObject *s = PyUnicode_FromString("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
        PyObject *s = _PyUnicode_FromASCII(S("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
        Py_DECREF(s);
    }
    return 0;
}
```

PyUnicode_FromString() takes about 4 sec on my machine.  _PyUnicode_FromASCII() is about 2 sec.
By skipping _PyUnicodeWriter for ASCII string (GH-14283), PyUnicode_FromString() takes about 3 sec.

```
$ time ./x  # PyUnicode_FromString

real    0m4.085s
user    0m4.081s
sys     0m0.004s

$ time ./y  # PyUnicode_FromString (skip _PyUnicode_Writer, GH-14283)

real    0m2.988s
user    0m2.988s
sys     0m0.000s

$ time ./z  # _PyUnicode_FromASCII
$ time ./z

real    0m1.975s
user    0m1.975s
sys     0m0.000s
```

History
Date	User	Action	Args
2019-06-21 10:30:07	methane	set	recipients: + methane, vstinner, serhiy.storchaka
2019-06-21 10:30:07	methane	set	messageid: <1561113007.49.0.331540252694.issue37348@roundup.psfhosted.org>
2019-06-21 10:30:07	methane	link	issue37348 messages
2019-06-21 10:30:07	methane	create