Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar Incongruence #77947

Closed
IsaacElliott mannequin opened this issue Jun 4, 2018 · 14 comments
Closed

Grammar Incongruence #77947

IsaacElliott mannequin opened this issue Jun 4, 2018 · 14 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error

Comments

@IsaacElliott
Copy link
Mannequin

IsaacElliott mannequin commented Jun 4, 2018

BPO 33766
Nosy @gvanrossum, @terryjreedy, @serhiy-storchaka, @ammaraskar, @miss-islington
PRs
  • bpo-33766: Document that end of file or string is a newline  #7383
  • [3.7] bpo-33766: Document that end of file or string is a newline (GH-7383) #7569
  • [3.6] bpo-33766: Document that end of file or string is a newline (GH-7383) #7571
  • [2.7] bpo-33766: Document that end of file or string is a newline (GH-7383) #7570
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-06-10.00:28:18.423>
    created_at = <Date 2018-06-04.04:05:20.070>
    labels = ['3.8', 'type-bug', '3.7', 'docs']
    title = 'Grammar Incongruence'
    updated_at = <Date 2018-06-10.00:28:18.423>
    user = 'https://bugs.python.org/IsaacElliott'

    bugs.python.org fields:

    activity = <Date 2018-06-10.00:28:18.423>
    actor = 'terry.reedy'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2018-06-10.00:28:18.423>
    closer = 'terry.reedy'
    components = ['Documentation']
    creation = <Date 2018-06-04.04:05:20.070>
    creator = 'Isaac Elliott'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33766
    keywords = ['patch']
    message_count = 14.0
    messages = ['318617', '318620', '318622', '318624', '318625', '318627', '318629', '318631', '318633', '318634', '318668', '319091', '319186', '319187']
    nosy_count = 7.0
    nosy_names = ['gvanrossum', 'terry.reedy', 'docs@python', 'serhiy.storchaka', 'ammar2', 'Isaac Elliott', 'miss-islington']
    pr_nums = ['7383', '7569', '7571', '7570']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue33766'
    versions = ['Python 2.7', 'Python 3.6', 'Python 3.7', 'Python 3.8']

    @IsaacElliott
    Copy link
    Mannequin Author

    IsaacElliott mannequin commented Jun 4, 2018

    echo 'print("a");print("b")' > test.py

    This program is grammatically incorrect according to the specification (https://docs.python.org/3.8/reference/grammar.html). But Python 3 runs it without issue.

    It's this production here

    simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE

    which says 'simple_stmt's must be terminated by a newline. However, the program I wrote doesn't contain any newlines.

    I think the grammar spec is missing some information, but I'm not quite sure what. Does anyone have an idea?

    @IsaacElliott IsaacElliott mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.7 (EOL) end of life 3.8 only security fixes type-bug An unexpected behavior, bug, or error labels Jun 4, 2018
    @serhiy-storchaka
    Copy link
    Member

    NEWLINE is not a newline. It is the NEWLINE token. And it is generated at the end of file.

    $ echo 'print("a");print("b")' | ./python -m tokenize
    1,0-1,5:            NAME           'print'        
    1,5-1,6:            OP             '('            
    1,6-1,9:            STRING         '"a"'          
    1,9-1,10:           OP             ')'            
    1,10-1,11:          OP             ';'            
    1,11-1,16:          NAME           'print'        
    1,16-1,17:          OP             '('            
    1,17-1,20:          STRING         '"b"'          
    1,20-1,21:          OP             ')'            
    1,21-1,22:          NEWLINE        '\n'           
    2,0-2,0:            ENDMARKER      ''

    @IsaacElliott
    Copy link
    Mannequin Author

    IsaacElliott mannequin commented Jun 4, 2018

    Thanks for the clarification. Is there a reference to this in the documentation?

    @ammaraskar
    Copy link
    Member

    @IsaacElliott
    Copy link
    Mannequin Author

    IsaacElliott mannequin commented Jun 4, 2018

    I went through that document before I created this issue. I can't find anything which describes this behavior - could you be more specific please?

    @ammaraskar
    Copy link
    Member

    Actually, echo implicitly puts a newline at the end. If you run with echo -n, this is the output:

    $ echo -n 'print("a");print("b")' | python3 -m tokenize
    1,0-1,5:            NAME           'print'
    1,5-1,6:            OP             '('
    1,6-1,9:            STRING         '"a"'
    1,9-1,10:           OP             ')'
    1,10-1,11:          OP             ';'
    1,11-1,16:          NAME           'print'
    1,16-1,17:          OP             '('
    1,17-1,20:          STRING         '"b"'
    1,20-1,21:          OP             ')'
    2,0-2,0:            ENDMARKER      ''

    No newline token present.

    @serhiy-storchaka
    Copy link
    Member

    Good point Ammar.

    Seems there is also a missing corner case in the definition of a physical line:

    https://docs.python.org/3.8/reference/lexical_analysis.html#physical-lines
    """
    A physical line is a sequence of characters terminated by an end-of-line sequence. In source files, any of the standard platform line termination sequences can be used - the Unix form using ASCII LF (linefeed), the Windows form using the ASCII sequence CR LF (return followed by linefeed), or the old Macintosh form using the ASCII CR (return) character. All of these forms can be used equally, regardless of platform.
    """

    It misses a case when a physical line is terminated by the end of file.

    @serhiy-storchaka serhiy-storchaka added docs Documentation in the Doc dir and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jun 4, 2018
    @ammaraskar
    Copy link
    Member

    Relevant bit of the parser that emits a fake newline at the end of the file if not present: https://github.com/python/cpython/blob/master/Parser/tokenizer.c#L1059-L1069

    @IsaacElliott
    Copy link
    Mannequin Author

    IsaacElliott mannequin commented Jun 4, 2018

    Cool, thanks for the help. Should I submit a PR with the updated documentation?

    @ammaraskar
    Copy link
    Member

    Sorry, I was already working on the patch by the time you posted the comment. If we see above, it seems like the tokenize module doesn't correctly mirror the behavior of the C tokenizer. Do you want to try fixing that as a bug? That would involve making a new bpo ticket and submitting a PR there.

    @gvanrossum
    Copy link
    Member

    I am fine with adding this to the docs. But the irony of the case is that the echo command adds a newline, so the original premise (that test.py contains an invalid program) is incorrect. ;-)

    @terryjreedy
    Copy link
    Member

    A few years ago, there was a particular case in which compile failed without a trailing newline.  We fixed it so that it would work anyway.
    Unless we are willing for a conforming Python interpreter to fail
    >>> exec('print("hello")')
    hello

    The Reference Manual should be clear that EOF and EOS (end-of-string) is treated as NEWLINE.

    @terryjreedy
    Copy link
    Member

    New changeset 0aa17ee by Terry Jan Reedy (Ammar Askar) in branch 'master':
    bpo-33766: Document that end of file or string is a newline (GH-7383)
    0aa17ee

    @miss-islington
    Copy link
    Contributor

    New changeset f01b951 by Miss Islington (bot) in branch '2.7':
    bpo-33766: Document that end of file or string is a newline (GH-7383)
    f01b951

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants