New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python tokenizer rewriting #69829
Comments
Here is preliminary patch that refactors the lowest level of Python tokenizer, reading and decoding. It splits the code on smaller simpler functions, decreases the source size by 37 lines, and fixes bugs: bpo-14811, bpo-18961, and a number of others. Added tests for most of fixed bugs (except leaks and others hardly reproducible). But the fix for other bugs can be harder, especially for issues with null byte (bpo-1105770, bpo-20115). Many bug easily can be fixed if read all Python file in memory instead of reading it line by line. I don't know if it is acceptable. |
Hi Serhiy, Just of your information but I think you know that, the tests pass ;-) [398/399] test_multiprocessing_spawn (138 sec) -- running: test_tools But I am interested by this part of CPython, I am not an expert in Stephane |
"especially for issues with null byte" I don't think that we should put to much energy in handling correctly NUL bytes. I see NUL bytes in code as bugs in the code, not in the Python parser. We *might* try to give warnings or better error messages to the user, that's all. |
New changeset 23a7481eafd4 by Serhiy Storchaka in branch 'default': |
@serhiy: did you still want to commit this? |
Oh, 6 years to fix this bug. Better late than never ;-) Thanks for reporting and for fixing it! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: