Author esoma
Recipients esoma
Date 2020-12-19.15:46:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1608392787.93.0.861284496438.issue42687@roundup.psfhosted.org>
In-reply-to
Content
'<>' is not recognized by the tokenize module as a single token, instead it is two tokens.

```
$ python -c "import tokenize; import io; import pprint; pprint.pprint(list(tokenize.tokenize(io.BytesIO(b'<>').readline)))"
[TokenInfo(type=62 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line=''),
 TokenInfo(type=54 (OP), string='<', start=(1, 0), end=(1, 1), line='<>'),
 TokenInfo(type=54 (OP), string='>', start=(1, 1), end=(1, 2), line='<>'),
 TokenInfo(type=4 (NEWLINE), string='', start=(1, 2), end=(1, 3), line=''),
 TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')]
```


I would expect:
```
[TokenInfo(type=62 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line=''),
 TokenInfo(type=54 (OP), string='<>', start=(1, 0), end=(1, 2), line='<>'),
 TokenInfo(type=4 (NEWLINE), string='', start=(1, 2), end=(1, 3), line=''),
 TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')]
```

This is the behavior of the CPython tokenizer which the tokenizer module tries "to match the working of".
History
Date User Action Args
2020-12-19 15:46:27esomasetrecipients: + esoma
2020-12-19 15:46:27esomasetmessageid: <1608392787.93.0.861284496438.issue42687@roundup.psfhosted.org>
2020-12-19 15:46:27esomalinkissue42687 messages
2020-12-19 15:46:27esomacreate