classification
Title: Run parser twice; enable invalid_* rules only on the second run
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: lys.nikolaou Nosy List: gvanrossum, lys.nikolaou, pablogsal, terry.reedy
Priority: normal Keywords: patch

Created on 2020-10-22 23:35 by lys.nikolaou, last changed 2020-10-28 00:14 by lys.nikolaou. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 22111 merged lys.nikolaou, 2020-10-22 23:37
PR 23011 merged lys.nikolaou, 2020-10-27 23:49
Messages (5)
msg379384 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * (Python committer) Date: 2020-10-22 23:35
We can avoid having to go through all the invalid rules (which might be a significant performance boost, since these may call expensive rules like primary or others), if we run the parser two times.

On the first run, all the invalid rules are disabled and do not get expanded. If a parse failure occurs anywhere, then we run the parser a second time with all these rules enabled, in order to get the correct error message.

Some benchmarking by Pablo show a ~4% speedup in the stdlib benchmark and a ~10% in the xxl benchmark.
msg379508 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-10-24 01:17
Since I do a lot of interactive compiling, I appreciate faster feedback.  How much will the slowdown be on errors?
msg379697 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * (Python committer) Date: 2020-10-26 22:41
We do not have a big corpus of SyntaxErrors to test against, but some manual testing of running a file with a SyntaxError after a long complex line 1000 times shows no slowdown.

We keep the token stream for the second run, so we don't need to run the tokenizer all over again and the parsing is done much more quickly.
msg379698 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * (Python committer) Date: 2020-10-26 22:42
New changeset bca701403253379409dece03053dbd739c0bd059 by Lysandros Nikolaou in branch 'master':
bpo-42123: Run the parser two times and only enable invalid rules on the second run (GH-22111)
https://github.com/python/cpython/commit/bca701403253379409dece03053dbd739c0bd059
msg379811 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * (Python committer) Date: 2020-10-28 00:14
New changeset 24a7c298d47658295673dc04d1b6d59f2b200a7c by Lysandros Nikolaou in branch '3.9':
[3.9] bpo-42123: Run the parser two times and only enable invalid rules on the second run (GH-22111) (GH-23011)
https://github.com/python/cpython/commit/24a7c298d47658295673dc04d1b6d59f2b200a7c
History
Date User Action Args
2020-10-28 00:14:18lys.nikolaousetmessages: + msg379811
2020-10-27 23:49:54lys.nikolaousetpull_requests: + pull_request21927
2020-10-26 22:42:38lys.nikolaousetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-10-26 22:42:10lys.nikolaousetmessages: + msg379698
2020-10-26 22:41:15lys.nikolaousetmessages: + msg379697
2020-10-24 01:17:28terry.reedysetnosy: + terry.reedy

messages: + msg379508
title: Run the two times, only enable invalid_* rules on the second run -> Run parser twice; enable invalid_* rules only on the second run
2020-10-22 23:37:42lys.nikolaousetkeywords: + patch
stage: patch review
pull_requests: + pull_request21835
2020-10-22 23:35:43lys.nikolaoucreate