Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codeop._maybe_compile passes code with error + triple quotes #90679

Closed
tusharsadhwani mannequin opened this issue Jan 25, 2022 · 17 comments
Closed

codeop._maybe_compile passes code with error + triple quotes #90679

tusharsadhwani mannequin opened this issue Jan 25, 2022 · 17 comments
Labels
3.11 bug and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@tusharsadhwani
Copy link
Mannequin

tusharsadhwani mannequin commented Jan 25, 2022

BPO 46521
Nosy @terryjreedy, @lysnikolaou, @pablogsal, @isidentical, @tusharsadhwani
PRs
  • bpo-46521: Fix codeop to use a new partial-input mode of the parser #31010
  • [3.10] bpo-46521: Fix codeop to use a new partial-input mode of the parser (GH-31010) #31213
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2022-02-08.12:26:02.331>
    created_at = <Date 2022-01-25.15:09:36.138>
    labels = ['interpreter-core', 'type-bug', '3.11']
    title = 'codeop._maybe_compile passes code with error + triple quotes'
    updated_at = <Date 2022-02-08.12:26:02.331>
    user = 'https://github.com/tusharsadhwani'

    bugs.python.org fields:

    activity = <Date 2022-02-08.12:26:02.331>
    actor = 'pablogsal'
    assignee = 'none'
    closed = True
    closed_date = <Date 2022-02-08.12:26:02.331>
    closer = 'pablogsal'
    components = ['Parser']
    creation = <Date 2022-01-25.15:09:36.138>
    creator = 'tusharsadhwani'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 46521
    keywords = ['patch']
    message_count = 17.0
    messages = ['411608', '411626', '411628', '411629', '411633', '411634', '411635', '412053', '412087', '412090', '412091', '412113', '412123', '412139', '412831', '412832', '412833']
    nosy_count = 5.0
    nosy_names = ['terry.reedy', 'lys.nikolaou', 'pablogsal', 'BTaskaya', 'tusharsadhwani']
    pr_nums = ['31010', '31213']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue46521'
    versions = ['Python 3.11']

    @tusharsadhwani
    Copy link
    Mannequin Author

    tusharsadhwani mannequin commented Jan 25, 2022

    compile_command used to raise error for this until Python 3.9:

    >>> import code
    >>> code.compile_command("abc def '''")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python3.9/codeop.py", line 132, in compile_command
        return _maybe_compile(_compile, source, filename, symbol)
      File "/usr/lib/python3.9/codeop.py", line 106, in _maybe_compile
        raise err1
      File "/usr/lib/python3.9/codeop.py", line 93, in _maybe_compile
        code1 = compiler(source + "\n", filename, symbol)
      File "/usr/lib/python3.9/codeop.py", line 111, in _compile
        return compile(source, filename, symbol, PyCF_DONT_IMPLY_DEDENT)
      File "<input>", line 1
        abc def '''
            ^
    SyntaxError: invalid syntax
    

    But in Python 3.10.0 it no longer is an error:

    >>> import code
    >>> code.compile_command("abc def '''")
    >>>
    

    @tusharsadhwani tusharsadhwani mannequin added 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Jan 25, 2022
    @pablogsal
    Copy link
    Member

    This is due to the fact that the new parser doesn't detect the syntax error of "abc def" after it has parsed the full text, but before that happens, the tokenizer has detected a problem (the ''' is not closed) and this is considered incomplete output for code (the same if you do code.compile_command('"""')).

    Unfortunately, this is not easy to fix once we have the tokenizer error in place so I am afraid we probably need to close this as "won't fix", unless someone has a good idea on how to accommodate for this case.

    @tusharsadhwani
    Copy link
    Mannequin Author

    tusharsadhwani mannequin commented Jan 25, 2022

    wontfix would really suck, because that would mean every REPL written with the code module will be broken, even IPython:

    $ ipython
    Python 3.10.0 (default, Oct 11 2021, 05:33:59) [GCC 11.2.0]
    Type 'copyright', 'credits' or 'license' for more information
    IPython 8.0.1 -- An enhanced Interactive Python. Type '?' for help.
    
    In [1]: abc def '''
       ...: 
       ...: 
    

    @pablogsal
    Copy link
    Member

    wontfix would really suck, because that would mean every REPL written with the code module will be broken, even IPython:

    I understand, but I don't see a way to fix this without reverting the change to detect unclosed triple quites or without committing a lot of code to handle special cases.

    Also notice that in your example with python, as soon as you close the quotes you get an error:

    In [1]: asdf dsfsd """
    ...:
    ...:
    ...: """
    Input In [1]
    asdf dsfsd """
    ^
    SyntaxError: invalid syntax

    So is not as dramatic as you mention when you say "every REPL written with the code module will be broken". Although I understand it depends on the optics.

    @tusharsadhwani
    Copy link
    Mannequin Author

    tusharsadhwani mannequin commented Jan 25, 2022

    You're right. There was another bug in my code that was causing the SyntaxError to not show up at all, sorry about that.

    Can you help me figure out why this bug doesn't show up in the normal Python REPL?

    >>> abc def '''
      File "<stdin>", line 1
        abc def '''
            ^^^
    SyntaxError: invalid syntax
    

    I could then use whatever logic the REPL itself uses instead of relying on code.compile_command(), because my requirement is to detect if a code can be incomplete Python code, without ever compiling it.

    @pablogsal
    Copy link
    Member

    Can you help me figure out why this bug doesn't show up in the normal Python REPL?

    That's because the normal Python REPL works very differently when in interactive mode. This is because the tokenizer in interactive mode is coupled with reading from standard input.

    We would need to somehow decouple this behaviour with interactive mode, allow a new option to the parser that allows activating this mode and then hook this to codeop module (and maybe other places).

    @pablogsal
    Copy link
    Member

    because my requirement is to detect if a code can be incomplete Python code, without ever compiling it.

    AS I mentioned in other issues, unfortunately the new parser doesn't allow to do this as the old one does, because how it works. The codeop module hacks around this by comparing the error messages if you add new lines, which is know to be fragile and quite bug-friendly.

    So I am afraid there is not going to be a reliable and supported way to do this with the new parser. You may need to use a 3rd party library that allows parsing incomplete code.

    @terryjreedy
    Copy link
    Member

    Tushar, I am the principle IDLE maintainer and have dealt with similar interactive compile problems on and off for a few years. Code module uses codeop._maybe_compile, which in turn uses batch-mode compile().

    To review: Code can be in 3 states: legal, illegal (there is an positive error that must be fixed), and incomplete (might become legal with more input). Batch mode compile has all the code it is going to get, so it raises an exception for both bad and incomplete code.

    In command-line languages, '\n' definitely signals end-of-command. Even in interactive mode, incomplete commands get an error message. The user must retype or recall and add more.

    Python is a multiline and compound statement language. Newline may instead signal the end of a compound statement header or the end of a nested statement, or even just be present for visual formatting. Being able to continue incomplete statements after newline is essential.

    In interactive mode, the interpreter looks at code after each newline and differentiates between unrecoverable and merely incomplete errors. In the latter case, it sends a prompt to enter more and then reads more.

    codeop._maybe compile attempts to simulate interactive mode and make a trinary decision (returning None for 'incomplete') using batch-mode binary compile(). A hack using repeated compiles, a warning filter, and a helper function classifies code that failed the initial compile. I suspect that a) it has never been perfect and b) it cannot be (see experiment below).

    The issue here is that _maybe_compile returns None instead of passing on the compile syntax error. Some debug prints would reveal exactly why.

    An alternate approach might be to compile just once and use the error message and marked error range to split. But that would require different message-range pairs for the different cases. Compile does not seem to give this to us. For the current case, 3.11.a04 compile (and ast.parse) give the same response to both bad and incomplete lines.

    >>> compile("a b '''", '', 'single')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "", line 1
        a b '''
            ^
    SyntaxError: unterminated triple-quoted string literal (detected at line 1)
    >>> compile("s='''", '', 'single')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "", line 1
        s='''
          ^
    SyntaxError: unterminated triple-quoted string literal (detected at line  1

    But the REPL somehow treats the two lines differently when directly entered.

    >>> s = '''
    ...
    ... '''
    >>> a b '''
      File "<stdin>", line 1
        a b '''
            ^
    SyntaxError: unterminated triple-quoted string literal (detected at line 1)

    Pablo, is there any possibility that the internal REPL parser could be wrapped, exposed to Python, and called with fake stdin/out/err objects?

    Or if one executes 'python -i -c "start code"', what is required of the standard streams for '-i' to be effective and actually shift into interactive mode, reading from and writing to those streams? Just claim to be a tty? It is any different than for reading responses to "input('prompt')"?

    @terryjreedy terryjreedy added 3.11 bug and security fixes and removed 3.10 only security fixes labels Jan 29, 2022
    @terryjreedy terryjreedy changed the title compile_command not raising syntax error when command ends with triple quotes codeop._maybe_compile passes code with error + triple quotes Jan 29, 2022
    @terryjreedy terryjreedy added 3.11 bug and security fixes and removed 3.10 only security fixes labels Jan 29, 2022
    @terryjreedy terryjreedy changed the title compile_command not raising syntax error when command ends with triple quotes codeop._maybe_compile passes code with error + triple quotes Jan 29, 2022
    @pablogsal
    Copy link
    Member

    > Pablo, is there any possibility that the internal REPL parser could be wrapped, exposed to Python, and called with fake stdin/out/err objects?

    I would really advise against this. Unfortunately, the state of affairs is that the REPL is somehow super entangled with the parser, to the point that is the tokenizer asking for more characters that triggers a read from stdin. Exposing this with some API would be super dangerous because we would be elevated to the "supported" level anything that works just because it happens to be close to what the interactive mode needs.

    On the other side, codeop is fundamentally flawed in the way is built because relies on comparing error messages from the parser, which is a very poor way of handling semantic information.

    What we need here is a mode of the parser that somehow raises a very specific error on incomplete input, but not an incorrect one. And that may be not immediate to implement given in how many places we can raise lexer and parser errors. I will give it a go, in any case

    @pablogsal
    Copy link
    Member

    Ugh, the approach to do that breaks super heavily test_idle :(

    @pablogsal
    Copy link
    Member

    If we want to go with this approach, I am going to need help to fix test_idle as I have no idea why is failing if test_codeop passes.

    @terryjreedy
    Copy link
    Member

    Thank you for the PR. As I wrote on my preliminary review, I see this likely 1 failure that might may be in the test fixture. Will test and debug later.

    @terryjreedy
    Copy link
    Member

    With my fix to the PR:

    >>> a b '''
    SyntaxError: unterminated triple-quoted string literal (detected at line 1)
    >>> a '''
    ...

    The message is off, and can be left for another issue (or not), but the behavior is correct.

    With the hack removed, all the tests in test_codeop that the hack works can be removed and replaced by a simple test that a code object, error, or None are returned in 3 different cases.

    @pablogsal
    Copy link
    Member

    The message is off

    That's because the tokenizer sees the error before the parser even has time to see the other one. Not sure if is technically anything to fix here other than the order of reporting two different errors, which may be a bit tricky to fix.

    @pablogsal
    Copy link
    Member

    New changeset 69e1097 by Pablo Galindo Salgado in branch 'main':
    bpo-46521: Fix codeop to use a new partial-input mode of the parser (GH-31010)
    69e1097

    @pablogsal
    Copy link
    Member

    New changeset 5b58db7 by Pablo Galindo Salgado in branch '3.10':
    [3.10] bpo-46521: Fix codeop to use a new partial-input mode of the parser (GH-31010). (GH-31213)
    5b58db7

    @pablogsal
    Copy link
    Member

    I am not backporting to 3.9 because the parser is different enough that introducing this would also introduce some unintended side effects.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 bug and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants