Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tokenize: mishandles line joining #46433

Closed
jaredgrubb mannequin opened this issue Feb 25, 2008 · 8 comments
Closed

tokenize: mishandles line joining #46433

jaredgrubb mannequin opened this issue Feb 25, 2008 · 8 comments
Assignees
Labels
3.8 only security fixes 3.9 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@jaredgrubb
Copy link
Mannequin

jaredgrubb mannequin commented Feb 25, 2008

BPO 2180
Nosy @rhettinger, @gpshead, @meadori, @asottile, @miss-islington
PRs
  • bpo-2180: Treat line continuation at EOF as a SyntaxError #13401
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2019-05-18.21:02:56.115>
    created_at = <Date 2008-02-25.01:55:51.416>
    labels = ['extension-modules', 'type-bug', '3.8', '3.9']
    title = 'tokenize: mishandles line joining'
    updated_at = <Date 2019-05-18.21:02:56.114>
    user = 'https://bugs.python.org/jaredgrubb'

    bugs.python.org fields:

    activity = <Date 2019-05-18.21:02:56.114>
    actor = 'gregory.p.smith'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2019-05-18.21:02:56.115>
    closer = 'gregory.p.smith'
    components = ['Extension Modules']
    creation = <Date 2008-02-25.01:55:51.416>
    creator = 'jaredgrubb'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 2180
    keywords = ['patch']
    message_count = 8.0
    messages = ['62956', '62960', '116977', '116985', '143716', '339576', '342807', '342817']
    nosy_count = 7.0
    nosy_names = ['jhylton', 'rhettinger', 'gregory.p.smith', 'jaredgrubb', 'meador.inge', 'Anthony Sottile', 'miss-islington']
    pr_nums = ['13401']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue2180'
    versions = ['Python 3.8', 'Python 3.9']

    @jaredgrubb jaredgrubb mannequin added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Feb 25, 2008
    @jaredgrubb
    Copy link
    Mannequin Author

    jaredgrubb mannequin commented Feb 25, 2008

    tokenize does not handle line joining properly, as the following string
    fails the CPython tokenizer but passes the tokenize module.

    Example 1:
    >>> s = "if 1:\n  \\\n  #hey\n  print 1"
    >>> exec s
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<string>", line 3
        #hey
           ^
    SyntaxError: invalid syntax
    
    >>> tokenize.tokenize(StringIO(s).readline)
    1,0-1,2:	NAME	'if'
    1,3-1,4:	NUMBER	'1'
    1,4-1,5:	OP	':'
    1,5-1,6:	NEWLINE	'\n'
    2,0-2,2:	INDENT	'  '
    3,2-3,6:	COMMENT	'#hey'
    3,6-3,7:	NEWLINE	'\n'
    4,2-4,7:	NAME	'print'
    4,8-4,9:	NUMBER	'1'
    5,0-5,0:	DEDENT	''
    5,0-5,0:	ENDMARKER	''

    @jaredgrubb
    Copy link
    Mannequin Author

    jaredgrubb mannequin commented Feb 25, 2008

    CPython allows \ at EOF, but tokenize does not.

    >>> s = 'print 1\\\n'
    >>> exec s
    1
    >>> tokenize.tokenize(StringIO(s).readline)
    1,0-1,5:	NAME	'print'
    1,6-1,7:	NUMBER	'1'
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File
    "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/tokenize.py",
    line 153, in tokenize
        tokenize_loop(readline, tokeneater)
      File
    "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/tokenize.py",
    line 159, in tokenize_loop
        for token_info in generate_tokens(readline):
      File
    "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/tokenize.py",
    line 283, in generate_tokens
        raise TokenError, ("EOF in multi-line statement", (lnum, 0))
    tokenize.TokenError: ('EOF in multi-line statement', (2, 0))

    @jafo jafo mannequin assigned jhylton Mar 20, 2008
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Sep 20, 2010

    Nobody appears to be interested so I'll close this in a couple of weeks unless someone objects, unless a patch is provided.

    @rhettinger
    Copy link
    Contributor

    Mark, please stop closing these based on age.
    The needs to be a determination whether this
    is a valid bug. If so, then a patch is needed.
    If not, it can be closed.

    @meadori
    Copy link
    Member

    meadori commented Sep 8, 2011

    That syntax error is coming from the CPython parser and *not* the tokenizer. Both CPython and the 'tokenizer' modules produce the same tokenization:

    [meadori@motherbrain cpython]$ cat repro.py
    if 1:
    \

    pass
    [meadori@motherbrain cpython]$ ./python tokenize.py repro.py
    0,0-0,0: ENCODING 'utf-8'
    1,0-1,2: NAME 'if'
    1,3-1,4: NUMBER '1'
    1,4-1,5: OP ':'
    1,5-1,6: NEWLINE '\n'
    2,0-2,2: INDENT ' '
    3,0-3,1: NEWLINE '\n'
    4,2-4,6: NAME 'pass'
    4,6-4,7: NEWLINE '\n'
    5,0-5,0: DEDENT ''
    5,0-5,0: ENDMARKER ''
    [44319 refs]
    [meadori@motherbrain cpython]$ ./python -d repro.py | grep Token | tail -10
    File "repro.py", line 3

    ^
    

    SyntaxError: invalid syntax
    [44305 refs]
    Token NEWLINE/'' ... It's a token we know
    Token DEDENT/'' ... It's a token we know
    Token NEWLINE/'' ... It's a token we know
    Token ENDMARKER/'' ... It's a token we know
    Token NAME/'if' ... It's a keyword
    Token NUMBER/'1' ... It's a token we know
    Token COLON/':' ... It's a token we know
    Token NEWLINE/'' ... It's a token we know
    Token INDENT/'' ... It's a token we know
    Token NEWLINE/'' ... It's a token we know

    The NEWLINE INDENT NEWLINE tokenization causes the parser to choke because 'suite' nonterminals:

    suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT

    are defined as NEWLINE INDENT.

    It seems appropriate that the NEWLINE after INDENT should be dropped by both tokenizers. In other words, I think:
    """
    if 1:
    \

    pass
    """

    should produce the same tokenization as:

    """
    if 1:

    pass
    """

    This seems consistent with with how explicit line joining is defined [2].

    [1] http://hg.python.org/cpython/file/92842e347d98/Grammar/Grammar
    [2] http://docs.python.org/reference/lexical_analysis.html#explicit-line-joining

    @asottile
    Copy link
    Mannequin

    asottile mannequin commented Apr 7, 2019

    Here's an example in the wild which still reproduces with python3.8a3:

    https://github.com/SecureAuthCorp/impacket/blob/194b22ed2fc85c4f241375fb7ebe4e0d89626c8c/impacket/examples/remcomsvc.py#L1669

    This was reported as a bug on flake8:

    https://gitlab.com/pycqa/flake8/issues/532

    Here's the reproduction with python3.8:

    $ python3.8 --version --version
    Python 3.8.0a3 (default, Mar 27 2019, 03:46:44) 
    [GCC 7.3.0]
    $ python3.8 impacket/examples/remcomsvc.py 
    $ python3.8 -mtokenize impacket/examples/remcomsvc.py 
    impacket/examples/remcomsvc.py:1670:0: error: EOF in multi-line statement

    @asottile asottile mannequin added 3.8 only security fixes 3.9 only security fixes labels Apr 7, 2019
    @gpshead gpshead self-assigned this May 18, 2019
    @miss-islington
    Copy link
    Contributor

    New changeset abea73b by Miss Islington (bot) (Anthony Sottile) in branch 'master':
    bpo-2180: Treat line continuation at EOF as a SyntaxError (GH-13401)
    abea73b

    @gpshead
    Copy link
    Member

    gpshead commented May 18, 2019

    Thanks for figuring this one out Anthony! :)

    @gpshead gpshead closed this as completed May 18, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants