I tried out some experimenting with the lookup table vs. the switch
statement.
The relevant diff (not including the patches to the code generator) is:
--- Parser/token.c
+++ Parser/token.c
@@ -77,31 +77,36 @@
int
PyToken_OneChar(int c1)
{
- switch (c1) {
- case '%': return PERCENT;
- case '&': return AMPER;
- case '(': return LPAR;
- case ')': return RPAR;
- case '*': return STAR;
- case '+': return PLUS;
- case ',': return COMMA;
- case '-': return MINUS;
- case '.': return DOT;
- case '/': return SLASH;
- case ':': return COLON;
- case ';': return SEMI;
- case '<': return LESS;
- case '=': return EQUAL;
- case '>': return GREATER;
- case '@': return AT;
- case '[': return LSQB;
- case ']': return RSQB;
- case '^': return CIRCUMFLEX;
- case '{': return LBRACE;
- case '|': return VBAR;
- case '}': return RBRACE;
- case '~': return TILDE;
- }
+ static char op_lookup[] = {
+ OP, OP, OP, OP, OP,
+ OP, OP, OP, OP, OP,
+ OP, OP, OP, OP, OP,
+ OP, OP, OP, OP, OP,
+ OP, OP, OP, OP, OP,
+ OP, OP, OP, OP, OP,
+ OP, OP, OP, OP, OP,
+ OP, OP, PERCENT, AMPER, OP,
+ LPAR, RPAR, STAR, PLUS, COMMA,
+ MINUS, DOT, SLASH, OP, OP,
+ OP, OP, OP, OP, OP,
+ OP, OP, OP, COLON, SEMI,
+ LESS, EQUAL, GREATER, OP, AT,
+ OP, OP, OP, OP, OP,
+ OP, OP, OP, OP, OP,
+ OP, OP, OP, OP, OP,
+ OP, OP, OP, OP, OP,
+ OP, OP, OP, OP, OP,
+ OP, LSQB, OP, RSQB, CIRCUMFLEX,
+ OP, OP, OP, OP, OP,
+ OP, OP, OP, OP, OP,
+ OP, OP, OP, OP, OP,
+ OP, OP, OP, OP, OP,
+ OP, OP, OP, OP, OP,
+ OP, OP, OP, LBRACE, VBAR,
+ RBRACE, TILDE
+ };
+ if (c1>=37 && c1<=126)
+ return op_lookup[c1];
return OP;
}
To test the speed change, I couldn't use pyperformance, because the only
thing I wanted to time was the In my testing, I didn't use pyperformance
because the only part of the code I wanted to test was the actual
compilation of the code. My solution for this was to find the 100 largest
*.py files in the cpython repo and compile them like so:
python -m py_compile $(List-of-big-*.py-files)
The speedup was significant: My table-driven lookup ran the compile tests
about 10% than the existing switch approach. That was without
--enable-optimizations in my configure.
However, as pablogsal suspected, with PGO enabled, the two approaches ran
the code in pretty much the same speed.
I do think that there may be merit in using a table-driven approach that
generates less code and doesn't rely on PGO speeding things up.
If anyone's interested, all my work is on branch Issue39150 in my fork
petdance/cpython. |