Message 347025 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ncoghlan
Recipients	Anthony Sottile, Julian, Terry Davis, barry, benjamin.peterson, eric.araujo, ezio.melotti, georg.brandl, ishimoto, lukasz.langa, ncoghlan, pablogsal, r.david.murray, serhiy.storchaka, steven.daprano, thautwarm, ulope
Date	2019-07-01.14:48:03
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1561992483.67.0.116666697307.issue12782@roundup.psfhosted.org>
In-reply-to

Content
Reviewing the thread, we never actually commented on thautwarm's proposal in https://bugs.python.org/issue12782#msg327875 that aims to strip out any INDENT, NEWLINE, and DEDENT tokens that appear between the opening "with" keyword and the statement header terminating ":". The problem with that is that line continuations are actually handled by the tokenizer, not the compiler, and the tokenizer already switches off the INDENT/NEWLINE/DEDENT token generation based on the following rules: * tracking opening & closing of triple-quoted strings * tracking opening & closing of parentheses ("()"), brackets ("[]"), and braces ("{}") * detecting a backslash immediately followed by a newline By design, the tokenizer is generally unaware of which NAME tokens are actually keywords - it's only aware of async & await at the moment as part of the backwards compatibility dance that allowed those to be gradually converted to full keywords over the course of a couple of releases. Hence why INDENT/NEWLINE/DEDENT never appear inside expressions in the Grammar: the tokenization rules mean that those tokens will never appear in those locations. And it isn't simply a matter of making the tokenizer aware of the combination of "with" and ":" as a new pairing that ignores linebreaks between them, as ":" can appear in many subexpressions (e.g. lambda functions, slice notation, and the new assignments expressions), and it's only the full parser that has enough context to tell which colon is the one that actually ends the statement header. Thus the design requirement is to come up with a grammar rule that allows this existing code to continue to compile and run correctly: ``` >>> from contextlib import nullcontext >>> with (nullcontext()) as example: ... pass ... >>> ``` While also enabling new code constructs like the following: with (nullcontext() as example): pass with (nullcontext(), nullcontext()): pass with (nullcontext() as example, nullcontext()): pass with (nullcontext(), nullcontext() as example): pass with (nullcontext() as example1, nullcontext() as example2): pass If we can get the Grammar to allow those additional placements of parentheses, then the existing tokenizer will take care of the rest.

Reviewing the thread, we never actually commented on thautwarm's proposal in https://bugs.python.org/issue12782#msg327875 that aims to strip out any INDENT, NEWLINE, and DEDENT tokens that appear between the opening "with" keyword and the statement header terminating ":".

The problem with that is that line continuations are actually handled by the tokenizer, *not* the compiler, and the tokenizer already switches off the INDENT/NEWLINE/DEDENT token generation based on the following rules:

* tracking opening & closing of triple-quoted strings 
* tracking opening & closing of parentheses ("()"), brackets ("[]"), and braces ("{}")
* detecting a backslash immediately followed by a newline

By design, the tokenizer is generally unaware of which NAME tokens are actually keywords - it's only aware of async & await at the moment as part of the backwards compatibility dance that allowed those to be gradually converted to full keywords over the course of a couple of releases.

Hence why INDENT/NEWLINE/DEDENT never appear inside expressions in the Grammar: the tokenization rules mean that those tokens will never appear in those locations.

And it isn't simply a matter of making the tokenizer aware of the combination of "with" and ":" as a new pairing that ignores linebreaks between them, as ":" can appear in many subexpressions (e.g. lambda functions, slice notation, and the new assignments expressions), and it's only the full parser that has enough context to tell which colon is the one that actually ends the statement header.

Thus the design requirement is to come up with a grammar rule that allows this existing code to continue to compile and run correctly:

```
>>> from contextlib import nullcontext
>>> with (nullcontext()) as example:
...     pass
... 
>>> 
```

While also enabling new code constructs like the following:


    with (nullcontext() as example):
        pass

    with (nullcontext(), nullcontext()):
        pass

    with (nullcontext() as example, nullcontext()):
        pass

    with (nullcontext(), nullcontext() as example):
        pass

    with (nullcontext() as example1, nullcontext() as example2):
        pass

If we can get the Grammar to allow those additional placements of parentheses, then the existing tokenizer will take care of the rest.

History
Date	User	Action	Args
2019-07-01 14:48:03	ncoghlan	set	recipients: + ncoghlan, barry, georg.brandl, ishimoto, benjamin.peterson, ezio.melotti, eric.araujo, steven.daprano, r.david.murray, lukasz.langa, Julian, serhiy.storchaka, ulope, Anthony Sottile, pablogsal, thautwarm, Terry Davis
2019-07-01 14:48:03	ncoghlan	set	messageid: <1561992483.67.0.116666697307.issue12782@roundup.psfhosted.org>
2019-07-01 14:48:03	ncoghlan	link	issue12782 messages
2019-07-01 14:48:03	ncoghlan	create