Issue 43833: Unexpected Parsing of Numeric Literals Concatenated with Boolean Operators

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/87999

classification

Title:	Unexpected Parsing of Numeric Literals Concatenated with Boolean Operators
Type:	behavior	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.10

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	Anthony Sottile, Carl.Friedrich.Bolz, Guido.van.Rossum, Joshua.Landau, alimuldal, gvanrossum, miss-islington, nedbat, pablogsal, pxeger, rhettinger, rrauenza, sco1, serhiy.storchaka, shreyanavigyan, steve.dower
Priority:	normal	Keywords:	patch

Created on 2021-04-13 18:27 by sco1, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 25466	merged	serhiy.storchaka, 2021-04-18 15:12
PR 26614	merged	miss-islington, 2021-06-08 23:31

Messages (27)
msg390981 - (view)	Author: sco1 (sco1)	Date: 2021-04-13 18:27
Came across this riddle today: >>> [0x_for x in (1, 2, 3)] [15] Initially I thought this was related to PEP 515 but the unexpected behavior extends to simpler examples as well, such as: >>> x = 5 >>> 123or x 123 >>> 123and x 5 I'm not familiar enough with C to understand why this is being parsed/tokenized this way, but this seems like it should instead be a SyntaxError. This appears to be fairly old behavior, as the non-underscored version works back to at least 2.7. And a bonus: >>> 0x1decade or more 31378142
msg390984 - (view)	Author: sco1 (sco1)	Date: 2021-04-13 18:43
Sorry, the bonus, while fun, I don't think is related
msg390988 - (view)	Author: Alistair Muldal (alimuldal)	Date: 2021-04-13 19:07
Several other keywords seem to be affected, including `if`, `else`, `is`, and `in`
msg390991 - (view)	Author: Carl Friedrich Bolz-Tereick (Carl.Friedrich.Bolz) *	Date: 2021-04-13 19:18
It's not just about keywords. Eg '1x' tokenizes too but then produces a syntax error in the parser. Keywords are only special in that they can be used to write syntactically meaningful things with these concatenated numbers.
msg390993 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2021-04-13 19:33
This is know behaviour unfortunately and cannot be changed because of backwards compatibility.
msg390995 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2021-04-13 19:43
Good example! Similar issue was discussed on the mailing list 3 years ago (https://mail.python.org/archives/list/python-dev@python.org/thread/D2WPCITHG2LBQAP7DBTC6CY26WQUBAKP/#D2WPCITHG2LBQAP7DBTC6CY26WQUBAKP). Now with new example it perhaps should be reconsidered.
msg390996 - (view)	Author: Shreyan Avigyan (shreyanavigyan) *	Date: 2021-04-13 19:50
Hi. I'm totally confused about other keywords but I'm a little concerned about the "and", "or" operator when used on, not only "int" (also known as "long") but also most Python objects other then bool type. Mostly when used on Python built-in objects "and", "or" keyword returns a very peculiar result. The "and" keyword returns the Python object on the left hand side while "or" returns the Python object on the right hand side. This applies to all Python object, built-in or user-defined, unless it has a specific __and__ or __or__ method defined. What is actually going on?
msg390997 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2021-04-13 19:58
We tried changing this IIRC and it broke code in the stdlib (now reformatted) so it will break code in the wild. I am not sure the gains are worth it.
msg390998 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2021-04-13 19:58
Better example: >>> [0x1for x in (1,2)] [31] The code is parsed as [0x1f or x in (1,2)] instead of [0x1 for x in (1,2)] as you may expect.
msg390999 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2021-04-13 20:00
Precisely because examples like that changing this is a breaking change. Don't get me wrong: I would love to change it, but I don't know if is worth the risk
msg391001 - (view)	Author: sco1 (sco1)	Date: 2021-04-13 20:09
Appreciate the additional historical context, I also was pointed to this in the documentation: https://docs.python.org/3/reference/lexical_analysis.html#whitespace-between-tokens If a parsing change is undesired from a backwards compatibility standpoint, would it be something that could be included in PEP 8?
msg391002 - (view)	Author: Anthony Sottile (Anthony Sottile) *	Date: 2021-04-13 20:23
here's quite a few other cases as well -- I'd love for this to be clarified in PEP8 such that I can rationalize crafting a rule for it in `pycodestyle` -- https://github.com/PyCQA/pycodestyle/issues/371
msg391003 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2021-04-13 20:25
One thing we could consider as Serhiy proposed on the mailing list is to emit a Syntax Warning. The ambiguous cases are specially scary so I think that makes sense
msg391042 - (view)	Author: Shreyan Avigyan (shreyanavigyan) *	Date: 2021-04-14 08:09
Hi. I just want to know why is and, or operators behaving like this. The behavior is described in https://bugs.python.org/issue43833#msg390996. Moreover I researched a little more and found out even if __and__, __or__ methods are defined the and, or operators doesn't seem to work. As Serhiy described in https://bugs.python.org/issue43833#msg390998 the parser reads [0x1for x in (1,2)] as [0x1f or x in (1,2)] which is the parser's fault but why is the or operator behaving like that?
msg391050 - (view)	Author: Carl Friedrich Bolz-Tereick (Carl.Friedrich.Bolz) *	Date: 2021-04-14 09:14
@shreyanavigyan This is a bit off-topic, but it's called "short-circuiting", described here: https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not (or/and aren't really "operators", like +/- etc, they cannot be overridden, they evaluate their components lazily and are therefore almost control flow)
msg391051 - (view)	Author: Shreyan Avigyan (shreyanavigyan) *	Date: 2021-04-14 09:20
@Carl.Friedrich.Bolz Thanks a lot for clarifying. For a second, I thought it was maybe a bug.
msg391335 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2021-04-18 15:25
PR 25466 makes the tokenizer emitting a deprecation warning if the numeric literal is followed by one of keywords which are valid after numeric literals. In future releases it will be changed to syntax warning, and finally to syntax error. It is breaking change, because it makes invalid currently allowed syntax like `0 in x` or `1or x` (but `0or x` is already error). See also issue21642 which allowed parsing "1else" as "1 else". Not all were agreed with that fix. Perhaps we need to rewrite also some paragraphs in the language specification.
msg391336 - (view)	Author: sco1 (sco1)	Date: 2021-04-18 16:18
We can also see this kind of thing with other literals, would that be in scope here? e.g. ``` Python 3.9.4 (default, Apr 5 2021, 12:33:45) [Clang 12.0.0 (clang-1200.0.32.29)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> "foo"in ["foo", "bar"] True >>> [1,]in [[1,]] True ```
msg391340 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2021-04-18 18:17
There is no issues with lists and strings. "]" clearly ends the list display, and a quote ends a string literal. The problem with numeric literals is that they can contain letters, so it is not clear (for human reader) where the numeric literals ends and the keyword starts. Adding new numeric prefixes or suffixes or new keywords can break existing code.
msg391341 - (view)	Author: sco1 (sco1)	Date: 2021-04-18 18:28
Makes sense, thanks!
msg391351 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2021-04-19 03:13
I recommend just letting this be. Aside from it allowing for a cute riddle, in the real world seems to be harmless and not worth breaking code. There are lots of other harmless oddities such as the space-invader increment operator: x -=- 1 FWIW, a code reformatter such as Black will remove any weirdness.
msg391354 - (view)	Author: Guido van Rossum (Guido.van.Rossum)	Date: 2021-04-19 03:43
Actually I believe a real case was reported on python-dev. I think it is not clean that the boundary between numbers and identifiers is so fluid.
msg395367 - (view)	Author: miss-islington (miss-islington)	Date: 2021-06-08 23:31
New changeset 2ea6d890281c415e0a2f00e63526e592da8ce3d9 by Serhiy Storchaka in branch 'main': bpo-43833: Emit warnings for numeric literals followed by keyword (GH-25466) https://github.com/python/cpython/commit/2ea6d890281c415e0a2f00e63526e592da8ce3d9
msg395368 - (view)	Author: miss-islington (miss-islington)	Date: 2021-06-08 23:52
New changeset eeefa7f6c0cc64bc74c3b96a0ebbff1a2b9d3199 by Miss Islington (bot) in branch '3.10': bpo-43833: Emit warnings for numeric literals followed by keyword (GH-25466) https://github.com/python/cpython/commit/eeefa7f6c0cc64bc74c3b96a0ebbff1a2b9d3199
msg396498 - (view)	Author: Patrick Reader (pxeger) *	Date: 2021-06-24 16:17
I would like to note that syntax like this is in heavy use in the Code Golf community (a sport in which the aim is to write the shortest code possible to complete a particular task). It will be disappointing if it becomes an error and break many past programs (you can search for phrases like `1and`, `0for` on https://codegolf.stackexchange.com/search?q=0for for examples). I could understand if this change remains because code golf is not exactly an important thing with serious ramifications, but I think it should be taken in to consideration as a use-case nonetheless.
msg405939 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2021-11-08 09:32
Do we have a plan for when this will be turned into a non-silent warning and when into an error?
msg405949 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2021-11-08 13:52
Unless I am missing something it should be 3.11 non-silent warning and 3.12 syntax error

History
Date	User	Action	Args
2022-04-11 14:59:44	admin	set	github: 87999
2021-11-08 13:52:29	pablogsal	set	messages: + msg405949
2021-11-08 09:32:39	gvanrossum	set	messages: + msg405939
2021-06-24 16:17:36	pxeger	set	nosy: + pxeger messages: + msg396498
2021-06-09 00:25:00	pablogsal	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2021-06-08 23:52:31	miss-islington	set	messages: + msg395368
2021-06-08 23:31:20	miss-islington	set	pull_requests: + pull_request25198
2021-06-08 23:31:18	miss-islington	set	nosy: + miss-islington messages: + msg395367
2021-04-19 03:43:26	Guido.van.Rossum	set	nosy: + Guido.van.Rossum messages: + msg391354
2021-04-19 03:13:02	rhettinger	set	nosy: + rhettinger messages: + msg391351
2021-04-18 18:28:42	sco1	set	messages: + msg391341
2021-04-18 18:17:50	serhiy.storchaka	set	messages: + msg391340
2021-04-18 16:19:00	sco1	set	messages: + msg391336
2021-04-18 15:25:14	serhiy.storchaka	set	versions: - Python 3.7, Python 3.8, Python 3.9 nosy: + gvanrossum, Joshua.Landau, steve.dower messages: + msg391335 components: + Interpreter Core
2021-04-18 15:12:09	serhiy.storchaka	set	keywords: + patch stage: patch review pull_requests: + pull_request24190
2021-04-14 09:20:12	shreyanavigyan	set	messages: + msg391051
2021-04-14 09:14:48	Carl.Friedrich.Bolz	set	messages: + msg391050
2021-04-14 08:09:26	shreyanavigyan	set	messages: + msg391042
2021-04-13 20:25:20	pablogsal	set	messages: + msg391003
2021-04-13 20:23:38	Anthony Sottile	set	nosy: + Anthony Sottile messages: + msg391002
2021-04-13 20:09:00	sco1	set	messages: + msg391001
2021-04-13 20:00:57	pablogsal	set	messages: + msg390999
2021-04-13 19:58:46	serhiy.storchaka	set	messages: + msg390998
2021-04-13 19:58:09	pablogsal	set	messages: + msg390997
2021-04-13 19:50:18	shreyanavigyan	set	nosy: + shreyanavigyan messages: + msg390996
2021-04-13 19:43:02	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg390995
2021-04-13 19:33:50	pablogsal	set	nosy: + pablogsal messages: + msg390993
2021-04-13 19:18:45	Carl.Friedrich.Bolz	set	nosy: + Carl.Friedrich.Bolz messages: + msg390991
2021-04-13 19:07:29	alimuldal	set	nosy: + alimuldal messages: + msg390988
2021-04-13 18:43:51	sco1	set	messages: + msg390984
2021-04-13 18:43:04	rrauenza	set	nosy: + rrauenza
2021-04-13 18:29:44	nedbat	set	nosy: + nedbat
2021-04-13 18:27:19	sco1	create