classification
Title: Python tokenizer rewriting
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.7
process
Status: open Resolution:
Dependencies: 26581 Superseder:
Assigned To: serhiy.storchaka Nosy List: Jim Fasarakis-Hilliard, brett.cannon, haypo, matrixise, python-dev, serhiy.storchaka, yselivanov
Priority: normal Keywords:

Created on 2015-11-17 01:27 by serhiy.storchaka, last changed 2017-03-14 14:57 by serhiy.storchaka.

Files
File name Uploaded Description Edit
tokenize_input.patch serhiy.storchaka, 2015-11-17 01:27 review
Messages (4)
msg254778 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-17 01:27
Here is preliminary patch that refactors the lowest level of Python tokenizer, reading and decoding. It splits the code on smaller simpler functions, decreases the source size by 37 lines, and fixes bugs: issue14811, issue18961, and a number of others. Added tests for most of fixed bugs (except leaks and others hardly reproducible). But the fix for other bugs can be harder, especially for issues with null byte (issue1105770, issue20115).

Many bug easily can be fixed if read all Python file in memory instead of reading it line by line. I don't know if it is acceptable.
msg255082 - (view) Author: St├ęphane Wirtel (matrixise) * Date: 2015-11-22 06:29
Hi Serhiy,

Just of your information but I think you know that, the tests pass ;-)

[398/399] test_multiprocessing_spawn (138 sec) -- running: test_tools
(108 sec)
[399/399] test_tools (121 sec)
385 tests OK.
3 tests altered the execution environment:
    test___all__ test_site test_warnings
11 tests skipped:
    test_devpoll test_kqueue test_msilib test_ossaudiodev
    test_startfile test_tix test_tk test_ttk_guionly test_winreg
    test_winsound test_zipfile64

But I am interested by this part of CPython, I am not an expert in
lexing and parsing but how can I help you ? I am a novice in this
domain.

Stephane
msg255355 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-11-25 14:17
"especially for issues with null byte"

I don't think that we should put to much energy in handling correctly NUL bytes. I see NUL bytes in code as bugs in the code, not in the Python parser. We *might* try to give warnings or better error messages to the user, that's all.
msg262091 - (view) Author: Roundup Robot (python-dev) Date: 2016-03-20 21:30
New changeset 23a7481eafd4 by Serhiy Storchaka in branch 'default':
Issues #25643, #26581: Added new tests for detecting Python source code encoding.
https://hg.python.org/cpython/rev/23a7481eafd4
History
Date User Action Args
2017-03-14 14:57:52serhiy.storchakasetkeywords: - patch
versions: + Python 3.7, - Python 3.6
2017-03-14 14:29:12Jim Fasarakis-Hilliardsetnosy: + Jim Fasarakis-Hilliard
2017-03-14 13:52:27serhiy.storchakalinkissue3353 dependencies
2016-03-20 21:30:29python-devsetnosy: + python-dev
messages: + msg262091
2016-03-17 12:04:22serhiy.storchakasetdependencies: + Double coding cookie
2015-11-25 14:17:10hayposetnosy: + haypo
messages: + msg255355
2015-11-22 06:29:50matrixisesetmessages: + msg255082
2015-11-22 04:47:25matrixisesetnosy: + matrixise
2015-11-17 17:42:47brett.cannonsetnosy: + brett.cannon
2015-11-17 17:22:56yselivanovsetnosy: + yselivanov
2015-11-17 01:27:33serhiy.storchakacreate