This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author petdance
Recipients pablogsal, petdance
Date 2020-01-09.02:59:50
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1578538791.41.0.111368876886.issue39150@roundup.psfhosted.org>
In-reply-to
Content
I tried out some experimenting with the lookup table vs. the switch
statement.

The relevant diff (not including the patches to the code generator) is:


--- Parser/token.c
+++ Parser/token.c
@@ -77,31 +77,36 @@
 int
 PyToken_OneChar(int c1)
 {
-    switch (c1) {
-    case '%': return PERCENT;
-    case '&': return AMPER;
-    case '(': return LPAR;
-    case ')': return RPAR;
-    case '*': return STAR;
-    case '+': return PLUS;
-    case ',': return COMMA;
-    case '-': return MINUS;
-    case '.': return DOT;
-    case '/': return SLASH;
-    case ':': return COLON;
-    case ';': return SEMI;
-    case '<': return LESS;
-    case '=': return EQUAL;
-    case '>': return GREATER;
-    case '@': return AT;
-    case '[': return LSQB;
-    case ']': return RSQB;
-    case '^': return CIRCUMFLEX;
-    case '{': return LBRACE;
-    case '|': return VBAR;
-    case '}': return RBRACE;
-    case '~': return TILDE;
-    }
+    static char op_lookup[] = {
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        PERCENT,   AMPER,     OP,
+        LPAR,      RPAR,      STAR,      PLUS,      COMMA,
+        MINUS,     DOT,       SLASH,     OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        COLON,     SEMI,
+        LESS,      EQUAL,     GREATER,   OP,        AT,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        LSQB,      OP,        RSQB,      CIRCUMFLEX,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        OP,        OP,
+        OP,        OP,        OP,        LBRACE,    VBAR,
+        RBRACE,    TILDE
+    };
+    if (c1>=37 && c1<=126)
+        return op_lookup[c1];
     return OP;
 }

To test the speed change, I couldn't use pyperformance, because the only
thing I wanted to time was the In my testing, I didn't use pyperformance
because the only part of the code I wanted to test was the actual
compilation of the code.  My solution for this was to find the 100 largest
*.py files in the cpython repo and compile them like so:

    python -m py_compile $(List-of-big-*.py-files)

The speedup was significant: My table-driven lookup ran the compile tests
about 10% than the existing switch approach.  That was without
--enable-optimizations in my configure.

However, as pablogsal suspected, with PGO enabled, the two approaches ran
the code in pretty much the same speed.

I do think that there may be merit in using a table-driven approach that
generates less code and doesn't rely on PGO speeding things up.

If anyone's interested, all my work is on branch Issue39150 in my fork
petdance/cpython.
History
Date User Action Args
2020-01-09 02:59:51petdancesetrecipients: + petdance, pablogsal
2020-01-09 02:59:51petdancesetmessageid: <1578538791.41.0.111368876886.issue39150@roundup.psfhosted.org>
2020-01-09 02:59:51petdancelinkissue39150 messages
2020-01-09 02:59:50petdancecreate