classification
Title: Expose faster unicode<->ascii functions in the C-API
Type: performance Stage: resolved
Components: Unicode Versions: Python 3.3
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, jcea, loewis, pitrou, skrah, vstinner
Priority: normal Keywords:

Created on 2011-12-09 21:12 by skrah, last changed 2011-12-11 23:44 by pitrou. This issue is now closed.

Messages (6)
msg149124 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-12-09 21:12
I just ran the telco benchmark ...

  http://www.bytereef.org/mpdecimal/quickstart.html#telco-benchmark

... on _decimal to see how the PEP-393 changes affect the module.
The benchmark reads numbers from a binary file, does some calculations
and prints the result strings to a file.


Average results (10 iterations each):

Python 2.7:            5.87s
Revision 1726fa560112: 6.07s
Revision 7ffe3d304487: 6.56s


The bottleneck in telco.py is the line that writes a Decimal to the
output file:

  outfil.write("%s\n" % t)

The bottleneck in _decimal is (res is ascii):

   PyUnicode_FromString(res);

PyUnicode_DecodeASCII(res) has the same performance.


With this function ...

  static PyObject*
unicode_fromascii(const char* s, Py_ssize_t size)
{
    PyObject *res;
    res = PyUnicode_New(size, 127);
    if (!res)
        return NULL;
    memcpy(PyUnicode_1BYTE_DATA(res), s, size);
    return res;
}

... I get the same performance as with Python 2.7 (5.85s)!


I think it would be really beneficial for C-API users to have
more ascii low level functions that don't do error checking and
are simply as fast as possible.
msg149151 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-10 13:13
Le 09/12/2011 22:12, Stefan Krah a écrit :
> The bottleneck in _decimal is (res is ascii):
>
>     PyUnicode_FromString(res);
>
> PyUnicode_DecodeASCII(res) has the same performance.
>
>
> With this function ...
>
>    static PyObject*
> unicode_fromascii(const char* s, Py_ssize_t size)
> {
>      PyObject *res;
>      res = PyUnicode_New(size, 127);
>      if (!res)
>          return NULL;
>      memcpy(PyUnicode_1BYTE_DATA(res), s, size);
>      return res;
> }
>
> ... I get the same performance as with Python 2.7 (5.85s)!

The problem is that  unicode_fromascii() is unsafe: it doesn't check 
that the string is pure ASCII. That's why this function is private.

Because of the PEP 383, ASCII and UTF-8 decoders (PyUnicode_DecodeASCII 
and PyUnicode_FromString) have to first scan the input to check for 
errors, and then do a fast memcpy. The scanner of these two decoders is 
already optimized to process the input string word by word (word=the C 
long type), instead of byte by byte, using a bit mask.

--

You can write your own super fast ASCII decoder using two lines:

     res = PyUnicode_New(size, 127);
     memcpy(PyUnicode_1BYTE_DATA(res), s, size);

(this is exactly what unicode_fromascii does)

> I think it would be really beneficial for C-API users to have
> more ascii low level functions that don't do error checking and
> are simply as fast as possible.

It is really important to ensure that a ASCII string doesn't contain 
characters outside [U+0000; U+007F] because many operations on ASCII 
string are optimized (e.g. UTF-8 pointer is shared with the ASCII pointer).

I prefer to not expose such function or someone will use it without 
understanding exactly how dangerous it is.

Martin and other may disagree with me.

Do you know Murphy's Law? :-)
http://en.wikipedia.org/wiki/Murphy%27s_law
msg149159 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-12-10 14:16
> I prefer to not expose such function or someone will use it without
> understanding exactly how dangerous it is.

OK. - I'm afraid that I made an error in the benchmarks, since I
accidentally used a changed version of telco.py, namely:

    # t is a Decimal
    outfil.write("%s\n" % t) # original version
    ...

    outfil.write(str(t))     # changed version runs 
    outfil.write('\n')       # faster since PEP-393
    ...

Since PEP-393 the changed version with two calls to write()
runs quite a bit faster than the original. For Python-3.2
and 2.7 the original runs faster.

Do you have an idea what could cause this?
msg149253 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-12-11 22:22
It's reasonable that string % formatting might have become slower...

I wonder what the issue is at this point. Unless you can state a clear issue that you want to see resolved, I propose to close this report as invalid.
msg149260 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-12-11 23:36
Sorry, the title of the issue isn't correct any more. The revised
issue is that in 3.3

a) outfil.write("%s\n" % t)

is about 11% slower than in Python2.7 and 8% slower than in Python3.2.


On the other hand in 3.3 the hack

b) outfil.write(str(t)); outfil.write('\n') 

runs about as fast as a) in 3.2.


This doesn't necessarily show up in microbenchmarks with timeit, so
I thought I'd leave this open for others to see (and comment).

But if I understand correctly, the slowdown in string formatting is
expected, so we can indeed close this.
msg149261 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-11 23:44
> But if I understand correctly, the slowdown in string formatting is
> expected, so we can indeed close this.

Well, expected doesn't mean it shouldn't be improved, so finding a way to speed it up would be nice ;)
(probably difficult, though)
History
Date User Action Args
2011-12-11 23:44:24pitrousetnosy: + pitrou
messages: + msg149261
2011-12-11 23:36:34skrahsetstatus: open -> closed
resolution: not a bug
messages: + msg149260

stage: resolved
2011-12-11 22:22:27loewissetmessages: + msg149253
2011-12-10 14:16:46skrahsetmessages: + msg149159
2011-12-10 13:13:49vstinnersetmessages: + msg149151
2011-12-09 22:30:54jceasetnosy: + jcea
2011-12-09 21:12:31skrahcreate