msg319934 - (view) |
Author: Ammar Askar (ammar2) * |
Date: 2018-06-19 07:41 |
As was pointed out in https://bugs.python.org/issue33766 there is an edge case in the tokenizer whereby it will implicitly treat the end of input as a newline. The tokenize module in stdlib does not mirror the C code's behavior in this case.
tokenizer.c:
~/cpython $ echo -n 'x' | ./python
----------
NAME ("x")
NEWLINE
ENDMARKER
tokenize module:
~/cpython $ echo -n 'x' | ./python -m tokenize
1,0-1,1: NAME 'x'
2,0-2,0: ENDMARKER ''
The instrumentation to have the C tokenizer dump out its tokens is mine, can provide a diff to produce that output if needed.
|
msg321154 - (view) |
Author: Tal Einat (taleinat) * |
Date: 2018-07-06 07:19 |
New changeset c4ef4896eac86a6759901c8546e26de4695a1389 by Tal Einat (Ammar Askar) in branch 'master':
bpo-33899: Make tokenize module mirror end-of-file is end-of-line behavior (GH-7891)
https://github.com/python/cpython/commit/c4ef4896eac86a6759901c8546e26de4695a1389
|
msg321162 - (view) |
Author: Tal Einat (taleinat) * |
Date: 2018-07-06 10:21 |
New changeset ab75d9e4244ee24bc96ea9d52362899e3bf365a2 by Tal Einat (Ammar Askar) in branch '3.7':
[3.7] bpo-33899: Make tokenize module mirror end-of-file is end-of-line behavior (GH-7891) (GH-8132)
https://github.com/python/cpython/commit/ab75d9e4244ee24bc96ea9d52362899e3bf365a2
|
msg321163 - (view) |
Author: Tal Einat (taleinat) * |
Date: 2018-07-06 10:22 |
New changeset 11c36a3e16f7fd4e937466014e8393ede4b61a25 by Tal Einat (Ammar Askar) in branch '3.6':
[3.6] bpo-33899: Make tokenize module mirror end-of-file is end-of-line behavior (GH-7891) (GH-8134)
https://github.com/python/cpython/commit/11c36a3e16f7fd4e937466014e8393ede4b61a25
|
msg321164 - (view) |
Author: Tal Einat (taleinat) * |
Date: 2018-07-06 10:23 |
New changeset 7829bba45d0e2446f3a0ca240bfe46959f01071e by Tal Einat (Ammar Askar) in branch '2.7':
[2.7] bpo-33899: Make tokenize module mirror end-of-file is end-of-line behavior (GH-7891) (#8133)
https://github.com/python/cpython/commit/7829bba45d0e2446f3a0ca240bfe46959f01071e
|
msg321165 - (view) |
Author: Tal Einat (taleinat) * |
Date: 2018-07-06 10:24 |
Thanks for all of your work on this, Ammar!
|
msg328220 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2018-10-21 16:10 |
This change in behaviour is breaking pycodestyle: https://github.com/PyCQA/pycodestyle/issues/786
Perhaps it shouldn't have been backported (especially all the way to python2.7?)
|
msg328222 - (view) |
Author: Tal Einat (taleinat) * |
Date: 2018-10-21 18:05 |
This was backported since it was considered a bug, but you are right that it broke backwards compatibility, and perhaps shouldn't have been backported.
Still, with 3.6.6 and 3.7.1 now released, that ship has sailed.
We could perhaps revert this on the 2.7 branch, but I feel that reverting this change only on 2.7 would just cause even more confusion.
|
msg328226 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2018-10-21 23:42 |
I'm surprised this was classified as a bug! Though that's subjective so I get that it's difficult to decide what is and what isn't ¯\____(ツ)____/¯
|
msg328227 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2018-10-22 00:45 |
Apparently this change also affected IPython. Perhaps we should add an entry to the whatsnew documents for 3.7.1 and 3.7.6:
https://docs.python.org/3/whatsnew/3.7.html#notable-changes-in-python-3-7-1
https://docs.python.org/3.6/whatsnew/3.6.html#notable-changes-in-python-3-6-7
|
msg328238 - (view) |
Author: Tal Einat (taleinat) * |
Date: 2018-10-22 06:34 |
I'm sorry to have caused this mess, it was bad judgement on my part.
Adding mention in What's is a good idea, Ned, I'll do that.
|
msg328283 - (view) |
Author: Tal Einat (taleinat) * |
Date: 2018-10-23 06:42 |
Ned, should this also be added to the 2.7 What's New? Or perhaps reverted on the 2.7 branch?
|
msg328318 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2018-10-23 14:14 |
I don't have a strong opinion about 2.7 here. Ultimately, it's Benjamin's call. But it might make sense to revert for 2.7 since it hasn't been released yet.
|
msg328324 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2018-10-23 15:59 |
Please revert in 2.7.
|
msg328353 - (view) |
Author: Tal Einat (taleinat) * |
Date: 2018-10-24 06:33 |
See PR GH-10072 for reverting in 2.7.
|
msg328354 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2018-10-24 06:50 |
FYI, An example of other fallout from this change - patsy broke and needed this fix:
https://github.com/pydata/patsy/commit/4f53bbaf58c0bf1a9bed73fc67c7c6d0aa7f4e20#diff-53c70e68c6dfd4fe9b08427792cb2bd6
|
msg328355 - (view) |
Author: Tal Einat (taleinat) * |
Date: 2018-10-24 06:57 |
See PR GH-10073 adding mention in "What's New".
|
msg328356 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2018-10-24 07:17 |
some pylint fallout appears to be addressed in https://github.com/PyCQA/pylint/commit/2698cbe56b44df7974de1c3374db8700296c6fad
|
msg328357 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2018-10-24 07:20 |
New changeset dfba1f67e7f1381ceb7cec8fbcfa37337620a9b0 by Gregory P. Smith (Tal Einat) in branch 'master':
bpo-33899: Mention tokenize behavior change in What's New (GH-10073)
https://github.com/python/cpython/commit/dfba1f67e7f1381ceb7cec8fbcfa37337620a9b0
|
msg328358 - (view) |
Author: miss-islington (miss-islington) |
Date: 2018-10-24 07:32 |
New changeset 9a0476283393f9988d0946491052d7724a7f9d21 by Miss Islington (bot) (Tal Einat) in branch '3.6':
[3.6] bpo-33899: Mention tokenize behavior change in What's New (GH-10073) (GH-10075)
https://github.com/python/cpython/commit/9a0476283393f9988d0946491052d7724a7f9d21
|
msg328359 - (view) |
Author: miss-islington (miss-islington) |
Date: 2018-10-24 07:33 |
New changeset b4c9874f5c7f64e1d41cbc588e515b8851bbb90c by Miss Islington (bot) (Tal Einat) in branch '3.7':
[3.7] bpo-33899: Mention tokenize behavior change in What's New (GH-10073) (GH-10074)
https://github.com/python/cpython/commit/b4c9874f5c7f64e1d41cbc588e515b8851bbb90c
|
msg328360 - (view) |
Author: Tal Einat (taleinat) * |
Date: 2018-10-24 07:40 |
Thanks for helping with the fallout from this, Gregory.
|
msg328369 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2018-10-24 14:53 |
#33766 was about documenting the C tokenizer change, some years ago, that made end-of-file EOF and end-of-string EOS generate the NEWLINE token required to properly terminate statements. "The end of input also serves
as an implicit terminator for the final physical line."
Although the tokenizer module intentionally does not exactly mirror the C tokenizer (it adds COMMENT tokens), it plausibly seems like a bug that it was not changed along with the C tokenizer, as it has since been tokenizing valid code as grammatically invalid. But I agree that this fix is too disruptive for 2.7.
|
msg328383 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2018-10-24 17:32 |
New changeset a1f45ec73f0486b187633e7ebc0a4f559d29d7d9 by Benjamin Peterson (Tal Einat) in branch '2.7':
bpo-33899: Revert tokenize module adding an implicit final NEWLINE (GH-10072)
https://github.com/python/cpython/commit/a1f45ec73f0486b187633e7ebc0a4f559d29d7d9
|
msg328877 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2018-10-29 22:27 |
https://bugs.python.org/issue35107 filed to track further fallout from this API change.
|
msg330213 - (view) |
Author: Aaron Meurer (asmeurer) |
Date: 2018-11-21 19:21 |
Is it expected behavior that comments produce NEWLINE if they don't have a newline and don't produce NEWLINE if they do (that is, '# comment' produces NEWLINE but '# comment\n' does not)?
|
msg338601 - (view) |
Author: Brecht Machiels (brechtm) |
Date: 2019-03-22 11:46 |
In order to adapt code to this change, can we assume that a NEWLINE token with an empty string only occurs right before the ENDMARKER?
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:01 | admin | set | github: 78080 |
2019-03-22 11:46:47 | brechtm | set | nosy:
+ brechtm messages:
+ msg338601
|
2018-11-21 19:21:45 | asmeurer | set | nosy:
+ asmeurer messages:
+ msg330213
|
2018-10-29 22:27:02 | gregory.p.smith | set | messages:
+ msg328877 |
2018-10-26 11:09:03 | taleinat | set | versions:
- Python 2.7 |
2018-10-25 15:57:03 | terry.reedy | set | pull_requests:
- pull_request9424 |
2018-10-25 13:35:51 | Tim.Graham | set | pull_requests:
+ pull_request9424 |
2018-10-25 01:00:21 | ned.deily | set | pull_requests:
- pull_request9411 |
2018-10-25 00:59:53 | ned.deily | set | pull_requests:
- pull_request9415 |
2018-10-24 21:40:28 | Tim.Graham | set | pull_requests:
+ pull_request9415 |
2018-10-24 17:32:27 | benjamin.peterson | set | messages:
+ msg328383 |
2018-10-24 14:53:02 | terry.reedy | set | nosy:
+ terry.reedy messages:
+ msg328369
|
2018-10-24 14:39:02 | corona10 | set | pull_requests:
+ pull_request9411 |
2018-10-24 07:40:28 | taleinat | set | messages:
+ msg328360 |
2018-10-24 07:33:04 | miss-islington | set | messages:
+ msg328359 |
2018-10-24 07:32:42 | miss-islington | set | nosy:
+ miss-islington messages:
+ msg328358
|
2018-10-24 07:27:36 | taleinat | set | pull_requests:
+ pull_request9409 |
2018-10-24 07:24:56 | taleinat | set | pull_requests:
+ pull_request9408 |
2018-10-24 07:20:14 | gregory.p.smith | set | messages:
+ msg328357 |
2018-10-24 07:17:51 | gregory.p.smith | set | messages:
+ msg328356 |
2018-10-24 06:57:35 | taleinat | set | messages:
+ msg328355 |
2018-10-24 06:56:59 | taleinat | set | pull_requests:
+ pull_request9407 |
2018-10-24 06:50:53 | gregory.p.smith | set | nosy:
+ gregory.p.smith messages:
+ msg328354
|
2018-10-24 06:33:29 | taleinat | set | messages:
+ msg328353 |
2018-10-24 06:05:04 | taleinat | set | pull_requests:
+ pull_request9406 |
2018-10-23 15:59:30 | benjamin.peterson | set | messages:
+ msg328324 |
2018-10-23 14:14:58 | ned.deily | set | nosy:
+ benjamin.peterson messages:
+ msg328318
|
2018-10-23 06:42:04 | taleinat | set | messages:
+ msg328283 |
2018-10-22 06:34:52 | taleinat | set | messages:
+ msg328238 |
2018-10-22 00:45:22 | ned.deily | set | nosy:
+ ned.deily messages:
+ msg328227
|
2018-10-21 23:42:03 | Anthony Sottile | set | messages:
+ msg328226 |
2018-10-21 18:05:38 | taleinat | set | messages:
+ msg328222 |
2018-10-21 16:10:26 | Anthony Sottile | set | nosy:
+ Anthony Sottile messages:
+ msg328220
|
2018-07-06 10:24:50 | taleinat | set | status: open -> closed versions:
+ Python 2.7, Python 3.6, Python 3.7 messages:
+ msg321165
resolution: fixed stage: patch review -> resolved |
2018-07-06 10:23:15 | taleinat | set | messages:
+ msg321164 |
2018-07-06 10:22:28 | taleinat | set | messages:
+ msg321163 |
2018-07-06 10:21:08 | taleinat | set | messages:
+ msg321162 |
2018-07-06 08:20:34 | ammar2 | set | pull_requests:
+ pull_request7712 |
2018-07-06 08:08:06 | ammar2 | set | pull_requests:
+ pull_request7711 |
2018-07-06 07:58:00 | ammar2 | set | pull_requests:
+ pull_request7710 |
2018-07-06 07:19:11 | taleinat | set | nosy:
+ taleinat messages:
+ msg321154
|
2018-06-24 12:34:12 | ammar2 | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request7501 |
2018-06-22 19:23:47 | ned.deily | set | nosy:
+ meador.inge
|
2018-06-19 07:41:52 | ammar2 | create | |