Message143716
That syntax error is coming from the CPython parser and *not* the tokenizer. Both CPython and the 'tokenizer' modules produce the same tokenization:
[meadori@motherbrain cpython]$ cat repro.py
if 1:
\
pass
[meadori@motherbrain cpython]$ ./python tokenize.py repro.py
0,0-0,0: ENCODING 'utf-8'
1,0-1,2: NAME 'if'
1,3-1,4: NUMBER '1'
1,4-1,5: OP ':'
1,5-1,6: NEWLINE '\n'
2,0-2,2: INDENT ' '
3,0-3,1: NEWLINE '\n'
4,2-4,6: NAME 'pass'
4,6-4,7: NEWLINE '\n'
5,0-5,0: DEDENT ''
5,0-5,0: ENDMARKER ''
[44319 refs]
[meadori@motherbrain cpython]$ ./python -d repro.py | grep Token | tail -10
File "repro.py", line 3
^
SyntaxError: invalid syntax
[44305 refs]
Token NEWLINE/'' ... It's a token we know
Token DEDENT/'' ... It's a token we know
Token NEWLINE/'' ... It's a token we know
Token ENDMARKER/'' ... It's a token we know
Token NAME/'if' ... It's a keyword
Token NUMBER/'1' ... It's a token we know
Token COLON/':' ... It's a token we know
Token NEWLINE/'' ... It's a token we know
Token INDENT/'' ... It's a token we know
Token NEWLINE/'' ... It's a token we know
The NEWLINE INDENT NEWLINE tokenization causes the parser to choke because 'suite' nonterminals:
suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
are defined as NEWLINE INDENT.
It seems appropriate that the NEWLINE after INDENT should be dropped by both tokenizers. In other words, I think:
"""
if 1:
\
pass
"""
should produce the same tokenization as:
"""
if 1:
pass
"""
This seems consistent with with how explicit line joining is defined [2].
[1] http://hg.python.org/cpython/file/92842e347d98/Grammar/Grammar
[2] http://docs.python.org/reference/lexical_analysis.html#explicit-line-joining |
|
Date |
User |
Action |
Args |
2011-09-08 01:39:12 | meador.inge | set | recipients:
+ meador.inge, jhylton, rhettinger, jaredgrubb, BreamoreBoy |
2011-09-08 01:39:11 | meador.inge | set | messageid: <1315445951.89.0.378249045427.issue2180@psf.upfronthosting.co.za> |
2011-09-08 01:39:11 | meador.inge | link | issue2180 messages |
2011-09-08 01:39:10 | meador.inge | create | |
|