Different error messages for same error - invalid string prefixes #84427

lysnikolaou · 2020-04-10T18:05:58Z

BPO	40246
Nosy	@gvanrossum, @terryjreedy, @vstinner, @ericvsmith, @encukou, @serhiy-storchaka, @hroncok, @lysnikolaou, @pablogsal, @aeros
PRs	bpo-40246: Report a different error message for invalid string prefixes #19476 bpo-40246: Revert reporting of invalid string prefixes #19888 [3.9] bpo-40246: Fix test_fstring when run with the old parser #20402

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-05-26.00:12:34.805>
created_at = <Date 2020-04-10.18:05:58.302>
labels = ['interpreter-core', 'type-bug', '3.9']
title = 'Different error messages for same error - invalid string prefixes'
updated_at = <Date 2020-05-26.00:12:34.804>
user = 'https://github.com/lysnikolaou'

bugs.python.org fields:

activity = <Date 2020-05-26.00:12:34.804>
actor = 'lys.nikolaou'
assignee = 'none'
closed = True
closed_date = <Date 2020-05-26.00:12:34.805>
closer = 'lys.nikolaou'
components = ['Interpreter Core']
creation = <Date 2020-04-10.18:05:58.302>
creator = 'lys.nikolaou'
dependencies = []
files = []
hgrepos = []
issue_num = 40246
keywords = ['patch']
message_count = 41.0
messages = ['366143', '366146', '366164', '366165', '366187', '366197', '366202', '366215', '366260', '367586', '367587', '367588', '367591', '367592', '367597', '367599', '367611', '367628', '367649', '367658', '367660', '367664', '367672', '367689', '367690', '367691', '367693', '367698', '367977', '368038', '368039', '368040', '368045', '368062', '368191', '368919', '368920', '368925', '369077', '369217', '369931']
nosy_count = 10.0
nosy_names = ['gvanrossum', 'terry.reedy', 'vstinner', 'eric.smith', 'petr.viktorin', 'serhiy.storchaka', 'hroncok', 'lys.nikolaou', 'pablogsal', 'aeros']
pr_nums = ['19476', '19888', '20402']
priority = 'normal'
resolution = 'wont fix'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue40246'
versions = ['Python 3.9']

lysnikolaou · 2020-04-10T18:05:58Z

While testing pegen, we found this out:

Python 3.9.0a5+ (heads/pegen:502dfb719e, Apr 10 2020, 20:57:05) 
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> fur''
  File "<stdin>", line 1
    fur''
       ^
SyntaxError: invalid syntax
>>> eval("fur''")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1
    fur''
       ^
SyntaxError: unexpected EOF while parsing

lysnikolaou · 2020-04-10T18:44:30Z

After some investigating, I found out that this is happening because when we execute fur'' in the interactive interpreter there is an implicit newline, which means that EOF does not get reached.

OTOH there is no newline when passing it as a string inside an eval call. After the tokenizer encounters the first quote, it calls tok_nextc twice more, in order to differentiate between a single-quoted and a triple-quoted string, thus reaching EOF and proapgating an EOF error up through the perrdetail struct.

gvanrossum · 2020-04-10T22:56:14Z

So a slightly shorter example uses ru''. This is an error because you can't combine the r prefix and the u prefix (in fact you can't combine anything with the r prefix).

I declare that this is a bug to report EOF here and the code should *first* check for a valid combination (e.g. fr'' or rf'') and only *then* try to distinguish between single and triple quotes.

lysnikolaou · 2020-04-10T23:01:32Z

At the moment, if the prefix is invalid, it gets tokenized as a NAME node and the string following is tokenized as a STRING node with no prefix.

╰─ echo "ur''" | ./python -m tokenize
1,0-1,2: NAME 'ur'
1,2-1,4: STRING "''"
1,4-1,5: NEWLINE '\n'
2,0-2,0: ENDMARKER ''

╰─ echo "f''" | ./python -m tokenize
1,0-1,3: STRING "f''"
1,3-1,4: NEWLINE '\n'
2,0-2,0: ENDMARKER ''

gvanrossum · 2020-04-11T03:42:01Z

But that is the tokenize.py module. It does not have 100% the same behavior as the builtin tokenizer.c module. In this case it probably doesn't.

lysnikolaou · 2020-04-11T10:35:41Z

In my understanding of the C code that's what the C tokenizer is doing as well.

Here's the relevant snippet of the tokenizer (

cpython/Parser/tokenizer.c

Line 1358 in 4b222c9

if (is_potential_identifier_start(c)) {

): When the tokenizer sees a valid identifier start, it goes into a loop that checks for a valid combination of string prefixes. If the combination is valid and it sees a quote directly after that, it goto's to the STRING-handling code. If not, then it breaks out of the loop and returns a NAME node.

Am I missing something?

lysnikolaou · 2020-04-11T11:46:26Z

I have working code that checks if there is a quotation mark right after an identifier. Here is an example:

╰─ ./python
Python 3.9.0a5+ (heads/pegen-dirty:502dfb719e, Apr 11 2020, 14:43:12) 
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> ur''
  File "<stdin>", line 1
    ur''
     ^
SyntaxError: invalid string prefix

One observation about this is that it has precedence over an EOL error:

>>> ur'
  File "<stdin>", line 1
    ur'
     ^
SyntaxError: invalid string prefix

Would that be acceptable?

gvanrossum · 2020-04-11T15:13:25Z

yes

On Sat, Apr 11, 2020 at 04:46 Lysandros Nikolaou <report@bugs.python.org>
wrote:

Lysandros Nikolaou <lisandrosnik@gmail.com> added the comment:

I have working code that checks if there is a quotation mark right after
an identifier. Here is an example:

╰─ ./python
Python 3.9.0a5+ (heads/pegen-dirty:502dfb719e, Apr 11 2020, 14:43:12)
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> ur''
File "<stdin>", line 1
ur''
^
SyntaxError: invalid string prefix

One observation about this is that it has precedence over an EOL error:

>>> ur'
File "<stdin>", line 1
ur'
^
SyntaxError: invalid string prefix

Would that be acceptable?

----------

Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue40246\>

--
--Guido (mobile)

pablogsal · 2020-04-12T18:21:07Z

New changeset 41d5b94 by Lysandros Nikolaou in branch 'master':
bpo-40246: Report a better error message for invalid string prefixes (GH-19476)
41d5b94

lysnikolaou · 2020-04-29T00:10:25Z

On bpo-40334 @vstinner reported that this change broke some code in turtledemo. The code looks like this:

print(dict(state=clear, bg="#d00" if clear else"#fca"))

Due to the absence of a space between else and ", the else keyword is now interpreted as an invalid string prefix. Is that acceptable?

vstinner · 2020-04-29T00:17:06Z

I reopen the issue.

The following code works on Python 3.8, but fails with SyntaxError on Python 3.9.0a6 with the old and the new parser (see bpo-40431):

    clear = "NORMAL"
    print(dict(state=clear, bg="#d00" if clear else"#fca"))

Well, the code looks like a typo error... but it worked previously.

Not sure if we should keep the SyntaxError or not. Fixing the code is trivial: see PR 19777 attached to bpo-40431; replace >else"#fca"< with >else "#fca"<.

Note: I first reported the issue to https://bugs.python.org/issue40334#msg367580

pablogsal · 2020-04-29T00:18:39Z

I personally think the new behaviour is more consistent and more likely to catch errors.

vstinner · 2020-04-29T00:22:07Z

I personally think the new behaviour is more consistent and more likely to catch errors.

Would it be possible to emit a SyntaxWarning in 3.9 and only emit a SyntaxError in 3.10?

gvanrossum · 2020-04-29T00:26:22Z

Personally I don't think that else"stuff" should become a syntax error. It's no different from other edge cases where for some reason the space seems optional, like 1and 2.

vstinner · 2020-04-29T00:35:48Z

I think that a linter can be very pedantic on such code. My concern is about backward compatibility.

The >else"#fca"< code was added 6 years ago (commit 8b95d5e) and nobody complained, it "just worked". If we keep the SyntaxError, random projects will be broken, but I don't see much benefits for the users.

In short, I concur with Guido :-)

lysnikolaou · 2020-04-29T00:45:55Z

A possible solution would be to only emit a SyntaxError if the NAME directly preceding a STRING token contains one of the valid string prefixes (either one of 'f', 'r', 'u', 'b'). This would still output a nicer error message, but would not break code like the one of the example. What do you think about this?

aeros · 2020-04-29T01:47:03Z

For addressing the backwards compatibility concern, I think we should just convert it to a SyntaxWarning for cases like the above to indicate that it's not really correct syntax, but not harmful enough to justify code breakage. I think it fits the documented description of SyntaxWarning well, which is to address "dubious syntax".

Lysandros Nikolaou wrote:

A possible solution would be to only emit a SyntaxError if the NAME directly preceding a STRING token contains one of the valid string prefixes (either one of 'f', 'r', 'u', 'b'). This would still output a nicer error message, but would not break code like the one of the example. What do you think about this?

That would certainly help to minimize the breakage, so I'd be in favor of that over a SyntaxError for all invalid prefixes. But, I'm not certain that it's additionally harmful if an invalid string prefix proceeds a valid one. Is there any additional harm, other than from a visual perspective?

serhiy-storchaka · 2020-04-29T06:50:26Z

It's no different from other edge cases where for some reason the space seems optional, like 1and 2.

The parser is not consistent here. 0or[] is an error, while 0and[] and 1or[] are valid. See https://mail.python.org/archives/list/python-dev@python.org/message/D2WPCITHG2LBQAP7DBTC6CY26WQUBAKP/

A possible solution would be to only emit a SyntaxError if the NAME directly preceding a STRING token contains one of the valid string prefixes (either one of 'f', 'r', 'u', 'b').

This would not help for if'a'<=x<='z'. And we will get more breakage when add more string prefixes.

We should either require a whitespace between an identifier and a string literal (with SyntaxWarning in meantime) or allow to omit it.

encukou · 2020-04-29T14:52:35Z

reportlab reported failures on code like:

norm=lambda m: m+(m and(m[-1]!='\n'and'\n'or'')or'\n')

Note that or has a r in it.

ericvsmith · 2020-04-29T15:20:58Z

From earlier in this issue: https://bugs.python.org/msg366164

So a slightly shorter example uses ru''. This is an error because you can't combine the r prefix and the u prefix (in fact you can't combine anything with the r prefix).

That's not true: 'rf' is a valid prefix: rf'{1}'. And so are all of the wacky combinations rF, Rf, RF, fr, fR, Fr, FR.

serhiy-storchaka · 2020-04-29T15:25:55Z

I think it was a typo. You cannot combine anything with the u prefix.

ericvsmith · 2020-04-29T15:37:24Z

I think you're right, since rb obviously works, too. I just wanted to make sure we're covering all the bases. There's some code in either the stdlib or the test suite where I generate all of the valid prefixes (it's something like 80 prefixes). I can dig it up if anyone needs it.

lysnikolaou · 2020-04-29T16:27:02Z

What's also possible is to handle keywords at tokenizer level and return a different token type for each one of them (like with async and await), which is currently handled at parser level. This would enable us to allow reserved keywords right before a STRING token, thus covering all the possible broken code cases, but continue disallowing arbitrary NAMEs, which would mean better error-reporting in the case of invalid prefixes.

pablogsal · 2020-04-29T19:11:47Z

What's also possible is to handle keywords at tokenizer level

Sadly, the tokenizer is unaware of keywords (maybe except async and await because their history as soft keywords) as that distinction usually is up to the parser as the parser is the one who knows about the grammar. Introducing keywords in the tokenizer would couple them too much IMHO apart from other possible problems.

gvanrossum · 2020-04-29T19:18:31Z

If it's all handled by the tokenizer, how come it's different in the new
parser?

Anyway, it seems we really have to support what the old parser supported
here. I don't know exactly what rules it uses. Maybe only look for a string
quote following a name that can definitely be a string prefix?

pablogsal · 2020-04-29T19:28:54Z

If it's all handled by the tokenizer, how come it's different in the newparser?

Is not different in the new parser: both parsers have analogous behaviour now:

~/github/python/master master
❯ ./python
Python 3.9.0a6+ (heads/master:84724dd239, Apr 29 2020, 20:26:52)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> norm=lambda m: m+(m and(m[-1]!='\n'and'\n'or'')or'\n')
  File "<stdin>", line 1
    norm=lambda m: m+(m and(m[-1]!='\n'and'\n'or'')or'\n')
                                         ^
SyntaxError: invalid string prefix
>>>

~/github/python/master master
❯ ./python -X oldparser
Python 3.9.0a6+ (heads/master:84724dd239, Apr 29 2020, 20:26:52)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> norm=lambda m: m+(m and(m[-1]!='\n'and'\n'or'')or'\n')
  File "<stdin>", line 1
    norm=lambda m: m+(m and(m[-1]!='\n'and'\n'or'')or'\n')
                                       ^
SyntaxError: invalid string prefix

This issue is exclusively due to the changes in #19476 if I understand correctly.

pablogsal · 2020-04-29T19:30:33Z

Indeed, reverting commit 41d5b94 makes it work with both parsers:

~/github/python/master master*
❯ ./python -X oldparser
Python 3.9.0a6+ (heads/master-dirty:84724dd239, Apr 29 2020, 20:29:53)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> norm=lambda m: m+(m and(m[-1]!='\n'and'\n'or'')or'\n')
>>>

~/github/python/master master*
❯ ./python
Python 3.9.0a6+ (heads/master-dirty:84724dd239, Apr 29 2020, 20:29:53)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> norm=lambda m: m+(m and(m[-1]!='\n'and'\n'or'')or'\n')
>>>

serhiy-storchaka · 2020-04-29T20:14:06Z

The original issue was about different error messages in REPL and eval(). But it is not related to string prefixes. We have the same difference without involving strings:

>>> a b
  File "<stdin>", line 1
    a b
      ^
SyntaxError: invalid syntax
>>> eval("a b")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1
    a b
      ^
SyntaxError: unexpected EOF while parsing

I suggest to revert this change and close the issue.

hroncok · 2020-05-03T15:47:35Z

We also get:

File "/usr/lib/python3.9/site-packages/dnf/comps.py", line 111
return'.'.join(lcl)
^
SyntaxError: invalid string prefix

pablogsal · 2020-05-04T11:32:41Z

New changeset 846d8b2 by Lysandros Nikolaou in branch 'master':
bpo-40246: Revert reporting of invalid string prefixes (GH-19888)
846d8b2

lysnikolaou · 2020-05-04T11:42:24Z

The revert is in. Now the question is if we want to take additional action to address the original issue of this.

hroncok · 2020-05-04T11:49:28Z

I will soon come back with what Fedora package were affected by the problem. That could give some data about how to handle this.

hroncok · 2020-05-04T12:51:27Z

Not that many:

cpython itself (fixed via PR)
demjson (fixed via PR)
asn1crypto (fixed via PR)
dnf (fixed via PR)
freeipa (fixed via PR)

I gave up sending PRs at this point.

waf
weasyprint
virt-who
thrift
salt
wxpython4
rosdistro
mne
pycairo
libstoragemgmt (possibly via bundled lsm/external/xmltodict)
dput-ng
ddupdate
aubio (via bundled waf)

vstinner · 2020-05-04T16:27:28Z

Lysandros:

The revert is in. Now the question is if we want to take additional action to address the original issue of this.

If someone cares of that, I suggest to open an issue in pylint, pyflakes and similar tools to emit a warning in linters.

lysnikolaou · 2020-05-05T18:17:42Z

Ok, I'm closing this, after consulting with Guido.

vstinner · 2020-05-15T02:54:19Z

The following change broke test_fstring when using the old parser. I reopen the issue.

commit 846d8b2 (refs/bisect/bad)
Author: Lysandros Nikolaou <lisandrosnik@gmail.com>
Date: Mon May 4 14:32:18 2020 +0300

bpo-40246: Revert reporting of invalid string prefixes (GH-19888)

Due to backwards compatibility concerns regarding keywords immediately followed by a string without whitespace between them (like in `bg="#d00" if clear else"#fca"`) will fail to parse,
commit 41d5b94af44e34ac05d4cd57460ed104ccf96628 has to be reverted.

$ ./python -X oldparser -m test -v test_fstring 
(...)

======================================================================
FAIL: test_invalid_string_prefixes (test.test_fstring.TestCase) (str='BF""')
----------------------------------------------------------------------
File "<string>", line 1
BF""
^
SyntaxError: invalid syntax

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/vstinner/python/master/Lib/test/test_fstring.py", line 29, in assertAllRaise
    eval(str)
AssertionError: "unexpected EOF while parsing" does not match "invalid syntax (<string>, line 1)"
(...)

vstinner · 2020-05-15T02:54:51Z

The error can be seen on the new cool AMD64 Arch Linux VintageParser 3.x, vintage is the new cool:
https://buildbot.python.org/all/#builders/648/builds/185

gvanrossum · 2020-05-15T04:37:00Z

We should run the tests with the old parser in at least one build.

terryjreedy · 2020-05-16T23:04:06Z

The >else"#fca"< code was added 6 years ago (commit 8b95d5e)

That was my typo in turtledemo.__main__ (similar lines before had the space) and I wish that it had been caught then.

Why did compileall in the test suite not catch this SyntaxError before 3.6.0a6 was released? Does it intentionally skip .__main__ files? Should I add a test_turtledemo file (with checks for the presence of tkinter and turtle)?

vstinner · 2020-05-18T13:31:57Z

"make install" ignores compileall errors no? If I recall correctly, only an error is logged into stdlib, "make install" doesn't fail.

pablogsal · 2020-05-26T00:10:07Z

New changeset 6cb0ad2 by Lysandros Nikolaou in branch '3.9':
bpo-40246: Fix test_fstring when run with the old parser (GH-20402)
6cb0ad2

lysnikolaou added 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Apr 10, 2020

pablogsal closed this as completed Apr 12, 2020

vstinner reopened this Apr 29, 2020

lysnikolaou closed this as completed May 5, 2020

vstinner reopened this May 15, 2020

lysnikolaou closed this as completed May 26, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

Different error messages for same error - invalid string prefixes #84427

Different error messages for same error - invalid string prefixes #84427

Comments

lysnikolaou commented Apr 10, 2020

lysnikolaou commented Apr 10, 2020

lysnikolaou commented Apr 10, 2020

gvanrossum commented Apr 10, 2020

lysnikolaou commented Apr 10, 2020

gvanrossum commented Apr 11, 2020

lysnikolaou commented Apr 11, 2020

lysnikolaou commented Apr 11, 2020

gvanrossum commented Apr 11, 2020

pablogsal commented Apr 12, 2020

lysnikolaou commented Apr 29, 2020

vstinner commented Apr 29, 2020

pablogsal commented Apr 29, 2020

vstinner commented Apr 29, 2020

gvanrossum commented Apr 29, 2020

vstinner commented Apr 29, 2020

lysnikolaou commented Apr 29, 2020

aeros commented Apr 29, 2020

serhiy-storchaka commented Apr 29, 2020

encukou commented Apr 29, 2020

ericvsmith commented Apr 29, 2020

serhiy-storchaka commented Apr 29, 2020

ericvsmith commented Apr 29, 2020

lysnikolaou commented Apr 29, 2020

pablogsal commented Apr 29, 2020

gvanrossum commented Apr 29, 2020

pablogsal commented Apr 29, 2020

pablogsal commented Apr 29, 2020

serhiy-storchaka commented Apr 29, 2020

hroncok mannequin commented May 3, 2020

pablogsal commented May 4, 2020

lysnikolaou commented May 4, 2020

hroncok mannequin commented May 4, 2020

hroncok mannequin commented May 4, 2020

vstinner commented May 4, 2020

lysnikolaou commented May 5, 2020

vstinner commented May 15, 2020

vstinner commented May 15, 2020

gvanrossum commented May 15, 2020

terryjreedy commented May 16, 2020

vstinner commented May 18, 2020

pablogsal commented May 26, 2020