This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tokenize: does not allow CR for a newline
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: gvanrossum, jaredgrubb
Priority: normal Keywords:

Created on 2008-02-25 02:06 by jaredgrubb, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (7)
msg62959 - (view) Author: Jared Grubb (jaredgrubb) Date: 2008-02-25 02:06
tokenize recognizes '\n' and '\r\n' as newlines, but does not tolerate '\r':

>>> s = "print 1\nprint 2\r\nprint 3\r"
>>> open('temp.py','w').write(s)
>>> exec(open('temp.py','r'))
1
2
3
>>> tokenize.tokenize(open('temp.py','r').readline)
1,0-1,5:	NAME	'print'
1,6-1,7:	NUMBER	'1'
1,7-1,8:	NEWLINE	'\n'
2,0-2,5:	NAME	'print'
2,6-2,7:	NUMBER	'2'
2,7-2,9:	NEWLINE	'\r\n'
3,0-3,5:	NAME	'print'
3,6-3,7:	NUMBER	'3'
3,7-3,8:	ERRORTOKEN	'\r'
4,0-4,0:	ENDMARKER	''
msg65131 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-04-08 00:39
I don't think this ought to be changed in exec().  It ought to be done
by opening the file using universal newlines.
msg65133 - (view) Author: Jared Grubb (jaredgrubb) Date: 2008-04-08 01:28
This is not a report on a bug in exec(), but rather a bug in the
tokenize module -- the behavior between the CPython tokenizer and the
tokenize module is not consistent. If you look in the tokenize.py
source, it contains code to recognize both \n and \r\n as newlines, but
it ignores the possibility that \r could be the line ending character
(as the Python reference says).
msg65143 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-04-08 04:46
I still think it shouldn't be tokenize's business to handle this.  I'm
not quite sure how exec() manages to do this; I note that this gives a
syntax error:

exec('x = 1\rprint x\r')
msg65151 - (view) Author: Jared Grubb (jaredgrubb) Date: 2008-04-08 06:07
Yes, but exec(string) also gives a syntax error for \r\n:

exec('x=1\r\nprint x') 

The only explanation I could find for ONLY permitting \n as newlines in
 exec(string) comes from PEP278: "There is no support for universal
newlines in strings passed to eval() or exec. It is envisioned that such
strings always have the standard \n line feed, if the strings come from
a file that file can be read with universal newlines." (This is why my
original example had to be exec(file) and not just a simple exec(string))

Of the 3 newline types, exec(*) allows 1 or all 3 as the case may be,
and tokenize allows exactly 2 of them. I honestly am not sure what the
"right" way is (or should be), but either way, the tokenize module is
not consistent with exec.

(By the way, if you're curious why I filed this issue and Issue#2180,
I'm working on the PyPy project to help improve its current Python
lexer/parser. In order to ensure that it is correct and robust, I was
experimenting with corner cases in Python syntax and I found these cases
where tokenize disagrees with exec.)
msg65152 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-04-08 06:12
I recommend that you only care about \n and consider everything else
unspecified.
msg65153 - (view) Author: Jared Grubb (jaredgrubb) Date: 2008-04-08 06:29
I actually hadnt thought of that. PyPy should actually use universal
newlines to its advantage; after all, it IS written in Python... Thanks
for the suggestion!

In any case, I wanted to get this bug about the standard library in your
record, in case you wanted to handle it. It is fairly innocuous, so I'll
let it go. Take care.
History
Date User Action Args
2022-04-11 14:56:31adminsetgithub: 46435
2008-04-08 06:29:41jaredgrubbsetmessages: + msg65153
2008-04-08 06:12:04gvanrossumsetmessages: + msg65152
2008-04-08 06:07:13jaredgrubbsetmessages: + msg65151
2008-04-08 04:46:53gvanrossumsetmessages: + msg65143
2008-04-08 01:28:51jaredgrubbsetmessages: + msg65133
2008-04-08 00:39:44gvanrossumsetstatus: open -> closed
resolution: rejected
messages: + msg65131
2008-03-20 02:54:15jafosetpriority: normal
assignee: gvanrossum
components: + Library (Lib), - Extension Modules
nosy: + gvanrossum
2008-02-25 02:06:47jaredgrubbcreate