This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Tokenize module does not mirror "end-of-input" is newline behavior
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ammar2 Nosy List: Anthony Sottile, ammar2, asmeurer, benjamin.peterson, brechtm, gregory.p.smith, meador.inge, miss-islington, ned.deily, taleinat, terry.reedy
Priority: normal Keywords: patch

Created on 2018-06-19 07:41 by ammar2, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 7891 merged ammar2, 2018-06-24 12:34
PR 8132 merged ammar2, 2018-07-06 07:58
PR 8133 merged ammar2, 2018-07-06 08:08
PR 8134 merged ammar2, 2018-07-06 08:20
PR 10072 merged taleinat, 2018-10-24 06:05
PR 10073 merged taleinat, 2018-10-24 06:56
PR 10074 merged taleinat, 2018-10-24 07:24
PR 10075 merged taleinat, 2018-10-24 07:27
Messages (27)
msg319934 - (view) Author: Ammar Askar (ammar2) * (Python committer) Date: 2018-06-19 07:41
As was pointed out in https://bugs.python.org/issue33766 there is an edge case in the tokenizer whereby it will implicitly treat the end of input as a newline. The tokenize module in stdlib does not mirror the C code's behavior in this case.

tokenizer.c:

  ~/cpython $ echo -n 'x' | ./python
  ----------
  NAME ("x")
  NEWLINE
  ENDMARKER

tokenize module:

  ~/cpython $ echo -n 'x' | ./python -m tokenize
  1,0-1,1:            NAME           'x'            
  2,0-2,0:            ENDMARKER      ''

The instrumentation to have the C tokenizer dump out its tokens is mine, can provide a diff to produce that output if needed.
msg321154 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-06 07:19
New changeset c4ef4896eac86a6759901c8546e26de4695a1389 by Tal Einat (Ammar Askar) in branch 'master':
bpo-33899: Make tokenize module mirror end-of-file is end-of-line behavior (GH-7891)
https://github.com/python/cpython/commit/c4ef4896eac86a6759901c8546e26de4695a1389
msg321162 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-06 10:21
New changeset ab75d9e4244ee24bc96ea9d52362899e3bf365a2 by Tal Einat (Ammar Askar) in branch '3.7':
[3.7] bpo-33899: Make tokenize module mirror end-of-file is end-of-line behavior (GH-7891) (GH-8132)
https://github.com/python/cpython/commit/ab75d9e4244ee24bc96ea9d52362899e3bf365a2
msg321163 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-06 10:22
New changeset 11c36a3e16f7fd4e937466014e8393ede4b61a25 by Tal Einat (Ammar Askar) in branch '3.6':
[3.6] bpo-33899: Make tokenize module mirror end-of-file is end-of-line behavior (GH-7891) (GH-8134)
https://github.com/python/cpython/commit/11c36a3e16f7fd4e937466014e8393ede4b61a25
msg321164 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-06 10:23
New changeset 7829bba45d0e2446f3a0ca240bfe46959f01071e by Tal Einat (Ammar Askar) in branch '2.7':
[2.7] bpo-33899: Make tokenize module mirror end-of-file is end-of-line behavior (GH-7891) (#8133)
https://github.com/python/cpython/commit/7829bba45d0e2446f3a0ca240bfe46959f01071e
msg321165 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-06 10:24
Thanks for all of your work on this, Ammar!
msg328220 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2018-10-21 16:10
This change in behaviour is breaking pycodestyle: https://github.com/PyCQA/pycodestyle/issues/786

Perhaps it shouldn't have been backported (especially all the way to python2.7?)
msg328222 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-10-21 18:05
This was backported since it was considered a bug, but you are right that it broke backwards compatibility, and perhaps shouldn't have been backported.

Still, with 3.6.6 and 3.7.1 now released, that ship has sailed.

We could perhaps revert this on the 2.7 branch, but I feel that reverting this change only on 2.7 would just cause even more confusion.
msg328226 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2018-10-21 23:42
I'm surprised this was classified as a bug!  Though that's subjective so I get that it's difficult to decide what is and what isn't ¯\____(ツ)____/¯
msg328227 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-10-22 00:45
Apparently this change also affected IPython.  Perhaps we should add an entry to the whatsnew documents for 3.7.1 and 3.7.6:

https://docs.python.org/3/whatsnew/3.7.html#notable-changes-in-python-3-7-1

https://docs.python.org/3.6/whatsnew/3.6.html#notable-changes-in-python-3-6-7
msg328238 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-10-22 06:34
I'm sorry to have caused this mess, it was bad judgement on my part.

Adding mention in What's is a good idea, Ned, I'll do that.
msg328283 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-10-23 06:42
Ned, should this also be added to the 2.7 What's New? Or perhaps reverted on the 2.7 branch?
msg328318 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-10-23 14:14
I don't have a strong opinion about 2.7 here.  Ultimately, it's Benjamin's call.  But it might make sense to revert for 2.7 since it hasn't been released yet.
msg328324 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2018-10-23 15:59
Please revert in 2.7.
msg328353 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-10-24 06:33
See PR GH-10072 for reverting in 2.7.
msg328354 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2018-10-24 06:50
FYI, An example of other fallout from this change - patsy broke and needed this fix:

https://github.com/pydata/patsy/commit/4f53bbaf58c0bf1a9bed73fc67c7c6d0aa7f4e20#diff-53c70e68c6dfd4fe9b08427792cb2bd6
msg328355 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-10-24 06:57
See PR GH-10073 adding mention in "What's New".
msg328356 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2018-10-24 07:17
some pylint fallout appears to be addressed in https://github.com/PyCQA/pylint/commit/2698cbe56b44df7974de1c3374db8700296c6fad
msg328357 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2018-10-24 07:20
New changeset dfba1f67e7f1381ceb7cec8fbcfa37337620a9b0 by Gregory P. Smith (Tal Einat) in branch 'master':
bpo-33899: Mention tokenize behavior change in What's New (GH-10073)
https://github.com/python/cpython/commit/dfba1f67e7f1381ceb7cec8fbcfa37337620a9b0
msg328358 - (view) Author: miss-islington (miss-islington) Date: 2018-10-24 07:32
New changeset 9a0476283393f9988d0946491052d7724a7f9d21 by Miss Islington (bot) (Tal Einat) in branch '3.6':
[3.6] bpo-33899: Mention tokenize behavior change in What's New (GH-10073) (GH-10075)
https://github.com/python/cpython/commit/9a0476283393f9988d0946491052d7724a7f9d21
msg328359 - (view) Author: miss-islington (miss-islington) Date: 2018-10-24 07:33
New changeset b4c9874f5c7f64e1d41cbc588e515b8851bbb90c by Miss Islington (bot) (Tal Einat) in branch '3.7':
[3.7] bpo-33899: Mention tokenize behavior change in What's New (GH-10073) (GH-10074)
https://github.com/python/cpython/commit/b4c9874f5c7f64e1d41cbc588e515b8851bbb90c
msg328360 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-10-24 07:40
Thanks for helping with the fallout from this, Gregory.
msg328369 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-10-24 14:53
#33766 was about documenting the C tokenizer change, some years ago, that made end-of-file EOF and end-of-string EOS generate the NEWLINE token required to properly terminate statements.  "The end of input also serves
as an implicit terminator for the final physical line."

Although the tokenizer module intentionally does not exactly mirror the C tokenizer (it adds COMMENT tokens), it plausibly seems like a bug that it was not changed along with the C tokenizer, as it has since been tokenizing valid code as grammatically invalid.  But I agree that this fix is too disruptive for 2.7.
msg328383 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2018-10-24 17:32
New changeset a1f45ec73f0486b187633e7ebc0a4f559d29d7d9 by Benjamin Peterson (Tal Einat) in branch '2.7':
bpo-33899: Revert tokenize module adding an implicit final NEWLINE (GH-10072)
https://github.com/python/cpython/commit/a1f45ec73f0486b187633e7ebc0a4f559d29d7d9
msg328877 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2018-10-29 22:27
https://bugs.python.org/issue35107 filed to track further fallout from this API change.
msg330213 - (view) Author: Aaron Meurer (asmeurer) Date: 2018-11-21 19:21
Is it expected behavior that comments produce NEWLINE if they don't have a newline and don't produce NEWLINE if they do (that is, '# comment' produces NEWLINE but '# comment\n' does not)?
msg338601 - (view) Author: Brecht Machiels (brechtm) Date: 2019-03-22 11:46
In order to adapt code to this change, can we assume that a NEWLINE token with an empty string only occurs right before the ENDMARKER?
History
Date User Action Args
2022-04-11 14:59:01adminsetgithub: 78080
2019-03-22 11:46:47brechtmsetnosy: + brechtm
messages: + msg338601
2018-11-21 19:21:45asmeurersetnosy: + asmeurer
messages: + msg330213
2018-10-29 22:27:02gregory.p.smithsetmessages: + msg328877
2018-10-26 11:09:03taleinatsetversions: - Python 2.7
2018-10-25 15:57:03terry.reedysetpull_requests: - pull_request9424
2018-10-25 13:35:51Tim.Grahamsetpull_requests: + pull_request9424
2018-10-25 01:00:21ned.deilysetpull_requests: - pull_request9411
2018-10-25 00:59:53ned.deilysetpull_requests: - pull_request9415
2018-10-24 21:40:28Tim.Grahamsetpull_requests: + pull_request9415
2018-10-24 17:32:27benjamin.petersonsetmessages: + msg328383
2018-10-24 14:53:02terry.reedysetnosy: + terry.reedy
messages: + msg328369
2018-10-24 14:39:02corona10setpull_requests: + pull_request9411
2018-10-24 07:40:28taleinatsetmessages: + msg328360
2018-10-24 07:33:04miss-islingtonsetmessages: + msg328359
2018-10-24 07:32:42miss-islingtonsetnosy: + miss-islington
messages: + msg328358
2018-10-24 07:27:36taleinatsetpull_requests: + pull_request9409
2018-10-24 07:24:56taleinatsetpull_requests: + pull_request9408
2018-10-24 07:20:14gregory.p.smithsetmessages: + msg328357
2018-10-24 07:17:51gregory.p.smithsetmessages: + msg328356
2018-10-24 06:57:35taleinatsetmessages: + msg328355
2018-10-24 06:56:59taleinatsetpull_requests: + pull_request9407
2018-10-24 06:50:53gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg328354
2018-10-24 06:33:29taleinatsetmessages: + msg328353
2018-10-24 06:05:04taleinatsetpull_requests: + pull_request9406
2018-10-23 15:59:30benjamin.petersonsetmessages: + msg328324
2018-10-23 14:14:58ned.deilysetnosy: + benjamin.peterson
messages: + msg328318
2018-10-23 06:42:04taleinatsetmessages: + msg328283
2018-10-22 06:34:52taleinatsetmessages: + msg328238
2018-10-22 00:45:22ned.deilysetnosy: + ned.deily
messages: + msg328227
2018-10-21 23:42:03Anthony Sottilesetmessages: + msg328226
2018-10-21 18:05:38taleinatsetmessages: + msg328222
2018-10-21 16:10:26Anthony Sottilesetnosy: + Anthony Sottile
messages: + msg328220
2018-07-06 10:24:50taleinatsetstatus: open -> closed
versions: + Python 2.7, Python 3.6, Python 3.7
messages: + msg321165

resolution: fixed
stage: patch review -> resolved
2018-07-06 10:23:15taleinatsetmessages: + msg321164
2018-07-06 10:22:28taleinatsetmessages: + msg321163
2018-07-06 10:21:08taleinatsetmessages: + msg321162
2018-07-06 08:20:34ammar2setpull_requests: + pull_request7712
2018-07-06 08:08:06ammar2setpull_requests: + pull_request7711
2018-07-06 07:58:00ammar2setpull_requests: + pull_request7710
2018-07-06 07:19:11taleinatsetnosy: + taleinat
messages: + msg321154
2018-06-24 12:34:12ammar2setkeywords: + patch
stage: patch review
pull_requests: + pull_request7501
2018-06-22 19:23:47ned.deilysetnosy: + meador.inge
2018-06-19 07:41:52ammar2create