Message 328927 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	terry.reedy
Recipients	ammar2, gregory.p.smith, meador.inge, pablogsal, serhiy.storchaka, taleinat, terry.reedy
Date	2018-10-30.14:59:51
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1540911591.25.0.788709270274.issue35107@psf.upfronthosting.co.za>
In-reply-to

Content
It seems to me a bug that if '\n' is not present, tokenize adds both NL and NEWLINE tokens, instead of just one of them. Moreover, both tuples of the double correction look wrong. If '\n' is present, TokenInfo(type=56 (NL), string='\n', start=(1, 1), end=(1, 2), line='#\n') looks correct. If NL represents a real character, the length 0 string='' in the generated TokenInfo(type=56 (NL), string='', start=(1, 1), end=(1, 1), line='#'), seems wrong. I suspect that the idea was to mis-represent NL to avoid '\n' being added by untokenize. In TokenInfo(type=4 (NEWLINE), string='', start=(1, 1), end=(1, 2), line='') string='' is mismatched by length = 2-1 = 1. I am inclined to think that the following would be the correct added token, which should untokenize correctly TokenInfo(type=4 (NEWLINE), string='', start=(1, 1), end=(1, 1), line='') ast.dump(ast.parse(s)) returns 'Module(body=[])' for both versions of 's', so no help there.

It seems to me a bug that if '\n' is not present, tokenize adds both NL and NEWLINE tokens, instead of just one of them.  Moreover, both tuples of the double correction look wrong.

If '\n' is present,
  TokenInfo(type=56 (NL), string='\n', start=(1, 1), end=(1, 2), line='#\n')
looks correct.

If NL represents a real character, the length 0 string='' in the generated
  TokenInfo(type=56 (NL), string='', start=(1, 1), end=(1, 1), line='#'),
seems wrong.  I suspect that the idea was to mis-represent NL to avoid '\n' being added by untokenize.  In
  TokenInfo(type=4 (NEWLINE), string='', start=(1, 1), end=(1, 2), line='')
string='' is mismatched by length = 2-1 = 1.  I am inclined to think that the following would be the correct added token, which should untokenize correctly
  TokenInfo(type=4 (NEWLINE), string='', start=(1, 1), end=(1, 1), line='')

ast.dump(ast.parse(s)) returns 'Module(body=[])' for both versions of 's', so no help there.

History
Date	User	Action	Args
2018-10-30 14:59:51	terry.reedy	set	recipients: + terry.reedy, gregory.p.smith, taleinat, meador.inge, serhiy.storchaka, ammar2, pablogsal
2018-10-30 14:59:51	terry.reedy	set	messageid: <1540911591.25.0.788709270274.issue35107@psf.upfronthosting.co.za>
2018-10-30 14:59:51	terry.reedy	link	issue35107 messages
2018-10-30 14:59:51	terry.reedy	create