classification
Title: See if PyToken_OneChar would be faster as a lookup table
Type: enhancement Stage: resolved
Components: Interpreter Core Versions:
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: pablogsal, petdance, serhiy.storchaka
Priority: normal Keywords:

Created on 2019-12-29 01:45 by petdance, last changed 2020-02-08 21:52 by petdance. This issue is now closed.

Messages (9)
msg358975 - (view) Author: Andy Lester (petdance) * Date: 2019-12-29 01:45
PyToken_OneChar in Parser/token.c is autogenerated.  I suspect it may be faster and smaller if it were a lookup into a static table of ops rather than a switch statement.  Check to see if it is.
msg358980 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-29 05:19
> PyToken_OneChar in Parser/token.c is autogenerated.  I suspect it may be faster and smaller if it were a lookup into a static table of ops rather than a switch statement.  Check to see if it is.

I suspect that it will be marginally faster (because it won't need to do bound checks to go to the default case) but under PGO it will make almost no difference.

You can try to see if there is any significant improvement using the pyperformance suite (https://pyperformance.readthedocs.io/) with and without your change (with PGO/LTO activated).
msg358982 - (view) Author: Andy Lester (petdance) * Date: 2019-12-29 06:19
Thank you. I appreciate the pointer.
msg359650 - (view) Author: Andy Lester (petdance) * Date: 2020-01-09 02:59
I tried out some experimenting with the lookup table vs. the switch
statement.

The relevant diff (not including the patches to the code generator) is:


--- Parser/token.c
+++ Parser/token.c
@@ -77,31 +77,36 @@
 int
 PyToken_OneChar(int c1)
 {
-    switch (c1) {
-    case '%': return PERCENT;
-    case '&': return AMPER;
-    case '(': return LPAR;
-    case ')': return RPAR;
-    case '*': return STAR;
-    case '+': return PLUS;
-    case ',': return COMMA;
-    case '-': return MINUS;
-    case '.': return DOT;
-    case '/': return SLASH;
-    case ':': return COLON;
-    case ';': return SEMI;
-    case '<': return LESS;
-    case '=': return EQUAL;
-    case '>': return GREATER;
-    case '@': return AT;
-    case '[': return LSQB;
-    case ']': return RSQB;
-    case '^': return CIRCUMFLEX;
-    case '{': return LBRACE;
-    case '|': return VBAR;
-    case '}': return RBRACE;
-    case '~': return TILDE;
-    }
+    static char op_lookup[] = {
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        PERCENT,   AMPER,     OP,
+        LPAR,      RPAR,      STAR,      PLUS,      COMMA,
+        MINUS,     DOT,       SLASH,     OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        COLON,     SEMI,
+        LESS,      EQUAL,     GREATER,   OP,        AT,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        LSQB,      OP,        RSQB,      CIRCUMFLEX,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        LBRACE,    VBAR,
+        RBRACE,    TILDE
+    };
+    if (c1>=37 && c1<=126)
+        return op_lookup[c1];
     return OP;
 }

To test the speed change, I couldn't use pyperformance, because the only
thing I wanted to time was the In my testing, I didn't use pyperformance
because the only part of the code I wanted to test was the actual
compilation of the code.  My solution for this was to find the 100 largest
*.py files in the cpython repo and compile them like so:

    python -m py_compile $(List-of-big-*.py-files)

The speedup was significant: My table-driven lookup ran the compile tests
about 10% than the existing switch approach.  That was without
--enable-optimizations in my configure.

However, as pablogsal suspected, with PGO enabled, the two approaches ran
the code in pretty much the same speed.

I do think that there may be merit in using a table-driven approach that
generates less code and doesn't rely on PGO speeding things up.

If anyone's interested, all my work is on branch Issue39150 in my fork
petdance/cpython.
msg359663 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-01-09 10:31
I will take a look and try to reproduce the benchmarks result you mentioned, but if under PGO there is no substantial performance increase is much less appealing as the lookup table code is less clear and maintainable than the simple switch. Also you have a branch that is a good candidate to trigger the same branch mispredictions as the default of the switch (c1>=37 && c1<=126).
msg359664 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-01-09 10:36
The difference around 10% looks very strange. Tokenizing is just one of parts of the compiler, and unlikely it is a narrow way. There are input/output, many memory allocations, encoding and decoding, many iterations and recursions. Are you sure that you run benchmarks in the same environment, multiple times, and got stable results? Could you provide a script that reproduces this?
msg359684 - (view) Author: Andy Lester (petdance) * Date: 2020-01-09 15:09
Yes, I ran it multiple times on my 2013 Macbook Pro and got ~10% speedup.  I also ran it on my Linux VM (that I only use for development) and got a speedup but less so.

The code I used to run the tests is at: https://github.com/petdance/cpython/blob/Issue39150/timetests

LOG=timelog.txt
NRUNS=100
NFILES=100

if [ -x ./python.exe ] ; then
    PYTHON=./python.exe
else
    PYTHON=./python
fi

# Build a file list, but throw out some that have syntax errors.
# Use the 100 biggest files.
FILES=$(
    find . -name '*.py' -type f -ls \
    | grep -v Tools/test2to3/ \
    | grep -v Lib/lib2to3/tests/ \
    | grep -v Lib/test/badsyntax \
    | grep -v Lib/test/bad_coding \
    | sort -r -n -k7 \
    | awk '{print $11}' \
    | head -n $NFILES
)

echo "Compiling $NFILES files for $NRUNS iterations"

rm -f $LOG

for (( i=1; i<=$NRUNS; i++ )) do
    echo "Run $i"
    { time $PYTHON -m py_compile $FILES; } >> $LOG 2>&1
done

perl -ne'if(/real\s+0m([\d.]+)s/){$t+=$1;$n++} END { printf "%d runs, average %0.4f\n", $n, $t/$n}' $LOG
msg359685 - (view) Author: Andy Lester (petdance) * Date: 2020-01-09 15:16
Re: branch prediction.

The branch

    if (c1>=37 && c1<=126)

could just as easily be

    if (c1>=0 && c1<=126)

with no other changes to the code.  It could be just

    if (c1<=126)

if c1 wasn't signed.

As far as branch prediction, we could make it something like

    if (c1>=0 && c1<=255)

and expand the table lookup, depending on what the likely inputs are.  I could play around with that some if anyone was interested.

I'm not trying to push on this.  Just sharing some thoughts and research.
msg361637 - (view) Author: Andy Lester (petdance) * Date: 2020-02-08 21:52
I'm closing this as it seems there's not much interest in this.
History
Date User Action Args
2020-02-08 21:52:56petdancesetstatus: open -> closed

messages: + msg361637
stage: resolved
2020-01-09 15:16:43petdancesetmessages: + msg359685
2020-01-09 15:09:29petdancesetmessages: + msg359684
2020-01-09 10:36:03serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg359664
2020-01-09 10:31:22pablogsalsetmessages: + msg359663
2020-01-09 02:59:51petdancesetmessages: + msg359650
2019-12-29 06:19:06petdancesetmessages: + msg358982
2019-12-29 05:19:11pablogsalsetnosy: + pablogsal
messages: + msg358980
2019-12-29 01:45:21petdancecreate