This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author pablogsal
Recipients Andrew.C, Anthony Sottile, Jim Fasarakis-Hilliard, amaury.forgeotdarc, berker.peksag, djmitche, effbot, kirkshorts, meador.inge, pablogsal, serhiy.storchaka, superluser
Date 2021-01-27.21:14:19
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1611782060.01.0.467016921921.issue3353@roundup.psfhosted.org>
In-reply-to
Content
Problems that you are going to find:

* The c tokenizer throws syntax errors while the tokenizer module does not. For example:

❯ python -c "1_"
  File "<string>", line 1
    1_
     ^
SyntaxError: invalid decimal literal

❯ python -m tokenize <<< "1_"
1,0-1,1:            NUMBER         '1'
1,1-1,2:            NAME           '_'
1,2-1,3:            NEWLINE        '\n'
2,0-2,0:            ENDMARKER      ''

* The encoding cannot be immediately specified. You need to thread it in many places.

* The readline() function can now return whatever or be whatever, that needs to be handled (better) in the c tokenizer to not crash.

* str/bytes in the c tokenizer.

* The c tokenizer does not get the full line in some cases or is tricky to get the full line.
History
Date User Action Args
2021-01-27 21:14:20pablogsalsetrecipients: + pablogsal, effbot, amaury.forgeotdarc, djmitche, kirkshorts, meador.inge, berker.peksag, serhiy.storchaka, superluser, Andrew.C, Anthony Sottile, Jim Fasarakis-Hilliard
2021-01-27 21:14:20pablogsalsetmessageid: <1611782060.01.0.467016921921.issue3353@roundup.psfhosted.org>
2021-01-27 21:14:20pablogsallinkissue3353 messages
2021-01-27 21:14:19pablogsalcreate