classification
Title: bytes literals erroneously tokenized
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, ezio.melotti, flox, meador.inge, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2012-06-13 12:47 by flox, last changed 2012-06-17 02:54 by meador.inge. This issue is now closed.

Files
File name Uploaded Description Edit
tokenize_bytes_py2.patch serhiy.storchaka, 2012-06-16 13:50 Patch for Python 2 review
tokenize_bytes_py3.patch serhiy.storchaka, 2012-06-16 14:14 Patch for Python 3 review
tokenize_bytes_py2-2.patch serhiy.storchaka, 2012-06-16 20:30 review
tokenize_bytes_py3-2.patch serhiy.storchaka, 2012-06-16 20:30 review
Messages (12)
msg162705 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-06-13 12:47
With Python 2.7, both b'hello' and br'hello' are wrong.
With Python 3.3, b'hello' is wrong.



$ python2.7 -m tokenize <<<"'hello', u'hello', ur'hello', b'hello', br'hello'"
1,0-1,7:	STRING	"'hello'"
1,7-1,8:	OP	','
1,9-1,17:	STRING	"u'hello'"
1,17-1,18:	OP	','
1,19-1,28:	STRING	"ur'hello'"
1,28-1,29:	OP	','
1,30-1,31:	NAME	'b'
1,31-1,38:	STRING	"'hello'"
1,38-1,39:	OP	','
1,40-1,42:	NAME	'br'
1,42-1,49:	STRING	"'hello'"
1,49-1,50:	NEWLINE	'\n'
2,0-2,0:	ENDMARKER	''

$ python3.3 -m tokenize <<<"'hello', u'hello', ur'hello', b'hello', br'hello', rb'hello'"
1,0-1,7:            STRING         "'hello'"      
1,7-1,8:            OP             ','            
1,9-1,17:           STRING         "u'hello'"     
1,17-1,18:          OP             ','            
1,19-1,28:          STRING         "ur'hello'"    
1,28-1,29:          OP             ','            
1,30-1,31:          NAME           'b'            
1,31-1,38:          STRING         "'hello'"      
1,38-1,39:          OP             ','            
1,40-1,49:          STRING         "br'hello'"    
1,49-1,50:          OP             ','            
1,51-1,60:          STRING         "rb'hello'"    
1,60-1,61:          NEWLINE        '\n'           
2,0-2,0:            ENDMARKER      ''
msg162968 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-16 13:38
Here is a patch (for Python 3.3).
msg162970 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-16 13:50
Here is a patch for Python 2.
msg162971 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-16 13:51
And here is a better patch for Python 3.
msg162975 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-16 14:14
And here is an even better patch for Python 3.
msg162988 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-16 20:30
Patches updated with tests.
msg162991 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2012-06-16 21:10
The Python 3 patch looks OK, except that several of the tests are duplicated.  I am looking at the Python 2 patch now.
msg162992 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2012-06-16 21:12
Nevermind, the tests are OK.  I missed the swapped quotes.
msg162993 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-16 21:17
> The Python 3 patch looks OK, except that several of the tests are duplicated.

What tests?
msg162996 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2012-06-16 21:38
Python 2 patch looks OK too.  I will commit these later today.  Thanks for the patches!
msg162997 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-06-16 22:04
LGTM too. Thanks.
msg163007 - (view) Author: Roundup Robot (python-dev) Date: 2012-06-17 02:50
New changeset 35d3a8ed7997 by Meador Inge in branch '2.7':
Issue #15054: Fix incorrect tokenization of 'b' and 'br' string literals.
http://hg.python.org/cpython/rev/35d3a8ed7997

New changeset 115b0cb52c6c by Meador Inge in branch 'default':
Issue #15054: Fix incorrect tokenization of 'b' string literals.
http://hg.python.org/cpython/rev/115b0cb52c6c
History
Date User Action Args
2012-06-17 02:54:46meador.ingesetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: - Python 3.2
2012-06-17 02:50:58python-devsetnosy: + python-dev
messages: + msg163007
2012-06-16 22:04:51floxsetmessages: + msg162997
2012-06-16 21:38:09meador.ingesetmessages: + msg162996
2012-06-16 21:17:01serhiy.storchakasetmessages: + msg162993
2012-06-16 21:12:00meador.ingesetmessages: + msg162992
2012-06-16 21:10:33meador.ingesetmessages: + msg162991
2012-06-16 20:47:47pitrousetstage: test needed -> patch review
2012-06-16 20:30:42serhiy.storchakasetfiles: + tokenize_bytes_py2-2.patch, tokenize_bytes_py3-2.patch

messages: + msg162988
2012-06-16 16:50:06eric.araujosetnosy: + meador.inge

stage: needs patch -> test needed
2012-06-16 14:14:11serhiy.storchakasetfiles: + tokenize_bytes_py3.patch

messages: + msg162975
2012-06-16 14:13:10serhiy.storchakasetfiles: - tokenize_bytes_py3.patch
2012-06-16 13:51:44serhiy.storchakasetfiles: - tokenize_bytes.patch
2012-06-16 13:51:23serhiy.storchakasetfiles: + tokenize_bytes_py3.patch

messages: + msg162971
2012-06-16 13:50:13serhiy.storchakasetfiles: + tokenize_bytes_py2.patch

messages: + msg162970
2012-06-16 13:38:48serhiy.storchakasetfiles: + tokenize_bytes.patch

nosy: + serhiy.storchaka
messages: + msg162968

keywords: + patch
2012-06-16 12:12:22ezio.melottisetnosy: + ezio.melotti

stage: needs patch
2012-06-13 12:54:19pitrousetnosy: + benjamin.peterson
2012-06-13 12:47:54floxcreate