Issue 17061: tokenize unconditionally emits NL after comment lines & blank lines

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/61263

classification

Title:	tokenize unconditionally emits NL after comment lines & blank lines
Type:	enhancement	Stage:
Components:		Versions:	Python 3.6

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	martin.panter, meador.inge, takluyver
Priority:	normal	Keywords:

Created on 2013-01-28 11:14 by takluyver, last changed 2022-04-11 14:57 by admin.

Messages (4)
msg180846 - (view)	Author: Thomas Kluyver (takluyver) *	Date: 2013-01-28 11:14
The docs describe the NL token as "Token value used to indicate a non-terminating newline. The NEWLINE token indicates the end of a logical line of Python code; NL tokens are generated when a logical line of code is continued over multiple physical lines." However, after a comment or a blank line, tokenize emits NL, even when it's not inside a multi-line statement. For example: In [15]: for tok in tokenize.generate_tokens(StringIO('#comment\n').readline): print(tok) TokenInfo(type=54 (COMMENT), string='#comment', start=(1, 0), end=(1, 8), line='#comment\n') TokenInfo(type=55 (NL), string='\n', start=(1, 8), end=(1, 9), line='#comment\n') TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='') This makes it difficult to use tokenize to detect multi-line statements, as we want to do in IPython. In my tests so far, changing two instances of NL to NEWLINE in this block (lines 530 & 533) makes it behave as I expect: http://hg.python.org/cpython/file/a375c3d88c7e/Lib/tokenize.py#l524
msg181241 - (view)	Author: Meador Inge (meador.inge) *	Date: 2013-02-03 04:12
The current behavior seems consistent with the lexical definition for blank lines [1]: """ A logical line that contains only spaces, tabs, formfeeds and possibly a comment, is ignored (i.e., no NEWLINE token is generated). """ NL and COMMENT are used for items that the CPython tokenizer ignores (and are not really tokens). Also, the test suite explicitly tests for this case. Perhaps the tokenize documentation should be updated to say something like: """ NL tokens are generated when a logical line of code is continued over multiple physical lines and for blank lines. """ [1] http://docs.python.org/3.4/reference/lexical_analysis.html#blank-lines
msg182034 - (view)	Author: Thomas Kluyver (takluyver) *	Date: 2013-02-13 14:11
Hmm, that's interesting. For our purposes, a blank line or a comment line shouldn't result in a continuation prompt. This is consistent with what the plain Python shell does. As part of this, we're tokenizing the code, and if the final \n results in a NL token (instead of NEWLINE), we wait to build a 'Python line'. (Likewise if the final \n doesn't appear before EOFError, indicating that a string continues to the next line). Since tokenize doesn't expose parenlev (parentheses level), my modification to tokenize makes this work as we need. Maybe another way forward would be to make parenlev accessible in some way, so that we can use that rather than using NL == parenlev > 0?
msg252297 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-10-05 02:10
The plain Python shell does respond to lines with only a comment and/or horizontal space with a continuation prompt. It only treats completely blank lines without any horizontal space specially: >>> ... # Indented blank line above; completely blank line below: ... >>> Meador: The documentation already says what you proposed: “NL tokens are generated when a logical line of code is continued over multiple physical lines” <https://docs.python.org/dev/library/tokenize.html#tokenize.NL>. Thomas: It sounds like you actually want to differentiate newlines inside bracketed expressions from newlines outside of statements. I think this would require a new feature. Also, I noticed that an escaped continued newline doesn’t seem to generate any token at all. Not sure if this is a bug or intended, but it does seem inconsistent with the other uses of the NL token. $ ./python -btWall -m tokenize 1 + \ 1,0-1,1: NUMBER '1' 1,2-1,3: OP '+' 1 2,0-2,1: NUMBER '1' 2,1-2,2: NEWLINE '\n' 3,0-3,0: ENDMARKER ''

History
Date	User	Action	Args
2022-04-11 14:57:41	admin	set	github: 61263
2015-10-05 02:10:31	martin.panter	set	versions: + Python 3.6, - Python 2.6, Python 2.7, Python 3.2, Python 3.3 nosy: + martin.panter messages: + msg252297 type: behavior -> enhancement
2013-02-13 14:11:36	takluyver	set	messages: + msg182034
2013-02-03 04:12:32	meador.inge	set	type: behavior messages: + msg181241
2013-02-02 09:55:36	terry.reedy	set	nosy: + meador.inge
2013-01-28 11:14:28	takluyver	create