New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different error messages for same error - invalid string prefixes #84427
Comments
While testing pegen, we found this out: Python 3.9.0a5+ (heads/pegen:502dfb719e, Apr 10 2020, 20:57:05)
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> fur''
File "<stdin>", line 1
fur''
^
SyntaxError: invalid syntax
>>> eval("fur''")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
fur''
^
SyntaxError: unexpected EOF while parsing |
After some investigating, I found out that this is happening because when we execute OTOH there is no newline when passing it as a string inside an |
So a slightly shorter example uses ru''. This is an error because you can't combine the r prefix and the u prefix (in fact you can't combine anything with the r prefix). I declare that this is a bug to report EOF here and the code should *first* check for a valid combination (e.g. fr'' or rf'') and only *then* try to distinguish between single and triple quotes. |
At the moment, if the prefix is invalid, it gets tokenized as a NAME node and the string following is tokenized as a STRING node with no prefix. ╰─ echo "ur''" | ./python -m tokenize ╰─ echo "f''" | ./python -m tokenize |
But that is the tokenize.py module. It does not have 100% the same behavior as the builtin tokenizer.c module. In this case it probably doesn't. |
In my understanding of the C code that's what the C tokenizer is doing as well. Here's the relevant snippet of the tokenizer ( Line 1358 in 4b222c9
Am I missing something? |
I have working code that checks if there is a quotation mark right after an identifier. Here is an example: ╰─ ./python
Python 3.9.0a5+ (heads/pegen-dirty:502dfb719e, Apr 11 2020, 14:43:12)
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> ur''
File "<stdin>", line 1
ur''
^
SyntaxError: invalid string prefix One observation about this is that it has precedence over an EOL error: >>> ur'
File "<stdin>", line 1
ur'
^
SyntaxError: invalid string prefix Would that be acceptable? |
yes On Sat, Apr 11, 2020 at 04:46 Lysandros Nikolaou <report@bugs.python.org>
-- |
I reopen the issue. The following code works on Python 3.8, but fails with SyntaxError on Python 3.9.0a6 with the old and the new parser (see bpo-40431): clear = "NORMAL"
print(dict(state=clear, bg="#d00" if clear else"#fca")) Well, the code looks like a typo error... but it worked previously. Not sure if we should keep the SyntaxError or not. Fixing the code is trivial: see PR 19777 attached to bpo-40431; replace >else"#fca"< with >else "#fca"<. Note: I first reported the issue to https://bugs.python.org/issue40334#msg367580 |
I personally think the new behaviour is more consistent and more likely to catch errors. |
Would it be possible to emit a SyntaxWarning in 3.9 and only emit a SyntaxError in 3.10? |
Personally I don't think that |
I think that a linter can be very pedantic on such code. My concern is about backward compatibility. The >else"#fca"< code was added 6 years ago (commit 8b95d5e) and nobody complained, it "just worked". If we keep the SyntaxError, random projects will be broken, but I don't see much benefits for the users. In short, I concur with Guido :-) |
A possible solution would be to only emit a SyntaxError if the NAME directly preceding a STRING token contains one of the valid string prefixes (either one of 'f', 'r', 'u', 'b'). This would still output a nicer error message, but would not break code like the one of the example. What do you think about this? |
For addressing the backwards compatibility concern, I think we should just convert it to a SyntaxWarning for cases like the above to indicate that it's not really correct syntax, but not harmful enough to justify code breakage. I think it fits the documented description of SyntaxWarning well, which is to address "dubious syntax". Lysandros Nikolaou wrote:
That would certainly help to minimize the breakage, so I'd be in favor of that over a SyntaxError for all invalid prefixes. But, I'm not certain that it's additionally harmful if an invalid string prefix proceeds a valid one. Is there any additional harm, other than from a visual perspective? |
The parser is not consistent here.
This would not help for We should either require a whitespace between an identifier and a string literal (with SyntaxWarning in meantime) or allow to omit it. |
reportlab reported failures on code like: norm=lambda m: m+(m and(m[-1]!='\n'and'\n'or'')or'\n') Note that |
From earlier in this issue: https://bugs.python.org/msg366164
That's not true: 'rf' is a valid prefix: rf'{1}'. And so are all of the wacky combinations rF, Rf, RF, fr, fR, Fr, FR. |
I think it was a typo. You cannot combine anything with the u prefix. |
I think you're right, since rb obviously works, too. I just wanted to make sure we're covering all the bases. There's some code in either the stdlib or the test suite where I generate all of the valid prefixes (it's something like 80 prefixes). I can dig it up if anyone needs it. |
What's also possible is to handle keywords at tokenizer level and return a different token type for each one of them (like with async and await), which is currently handled at parser level. This would enable us to allow reserved keywords right before a STRING token, thus covering all the possible broken code cases, but continue disallowing arbitrary NAMEs, which would mean better error-reporting in the case of invalid prefixes. |
Sadly, the tokenizer is unaware of keywords (maybe except async and await because their history as soft keywords) as that distinction usually is up to the parser as the parser is the one who knows about the grammar. Introducing keywords in the tokenizer would couple them too much IMHO apart from other possible problems. |
If it's all handled by the tokenizer, how come it's different in the new Anyway, it seems we really have to support what the old parser supported |
Is not different in the new parser: both parsers have analogous behaviour now: ~/github/python/master master
❯ ./python
Python 3.9.0a6+ (heads/master:84724dd239, Apr 29 2020, 20:26:52)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> norm=lambda m: m+(m and(m[-1]!='\n'and'\n'or'')or'\n')
File "<stdin>", line 1
norm=lambda m: m+(m and(m[-1]!='\n'and'\n'or'')or'\n')
^
SyntaxError: invalid string prefix
>>>
~/github/python/master master
❯ ./python -X oldparser
Python 3.9.0a6+ (heads/master:84724dd239, Apr 29 2020, 20:26:52)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> norm=lambda m: m+(m and(m[-1]!='\n'and'\n'or'')or'\n')
File "<stdin>", line 1
norm=lambda m: m+(m and(m[-1]!='\n'and'\n'or'')or'\n')
^
SyntaxError: invalid string prefix This issue is exclusively due to the changes in #19476 if I understand correctly. |
Indeed, reverting commit 41d5b94 makes it work with both parsers: ~/github/python/master master*
❯ ./python -X oldparser
Python 3.9.0a6+ (heads/master-dirty:84724dd239, Apr 29 2020, 20:29:53)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> norm=lambda m: m+(m and(m[-1]!='\n'and'\n'or'')or'\n')
>>>
~/github/python/master master*
❯ ./python
Python 3.9.0a6+ (heads/master-dirty:84724dd239, Apr 29 2020, 20:29:53)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> norm=lambda m: m+(m and(m[-1]!='\n'and'\n'or'')or'\n')
>>> |
The original issue was about different error messages in REPL and eval(). But it is not related to string prefixes. We have the same difference without involving strings: >>> a b
File "<stdin>", line 1
a b
^
SyntaxError: invalid syntax
>>> eval("a b")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
a b
^
SyntaxError: unexpected EOF while parsing I suggest to revert this change and close the issue. |
We also get: File "/usr/lib/python3.9/site-packages/dnf/comps.py", line 111 |
The revert is in. Now the question is if we want to take additional action to address the original issue of this. |
I will soon come back with what Fedora package were affected by the problem. That could give some data about how to handle this. |
Not that many: cpython itself (fixed via PR) I gave up sending PRs at this point. waf |
Lysandros:
If someone cares of that, I suggest to open an issue in pylint, pyflakes and similar tools to emit a warning in linters. |
Ok, I'm closing this, after consulting with Guido. |
The following change broke test_fstring when using the old parser. I reopen the issue. commit 846d8b2 (refs/bisect/bad)
$ ./python -X oldparser -m test -v test_fstring
(...) ====================================================================== During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/vstinner/python/master/Lib/test/test_fstring.py", line 29, in assertAllRaise
eval(str)
AssertionError: "unexpected EOF while parsing" does not match "invalid syntax (<string>, line 1)"
(...) |
The error can be seen on the new cool AMD64 Arch Linux VintageParser 3.x, vintage is the new cool: |
We should run the tests with the old parser in at least one build. |
That was my typo in turtledemo.__main__ (similar lines before had the space) and I wish that it had been caught then. Why did compileall in the test suite not catch this SyntaxError before 3.6.0a6 was released? Does it intentionally skip .__main__ files? Should I add a test_turtledemo file (with checks for the presence of tkinter and turtle)? |
"make install" ignores compileall errors no? If I recall correctly, only an error is logged into stdlib, "make install" doesn't fail. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: