This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Joshua.Landau
Recipients Joshua.Landau
Date 2015-05-14.13:00:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1431608427.75.0.966529056278.issue24194@psf.upfronthosting.co.za>
In-reply-to
Content
This is valid:

    ℘· = 1
    print(℘·)
    #>>> 1

But this gives an error token:

    from io import BytesIO
    from tokenize import tokenize

    stream = BytesIO("℘·".encode("utf-8"))
    print(*tokenize(stream.read), sep="\n")
    #>>> TokenInfo(type=56 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
    #>>> TokenInfo(type=53 (ERRORTOKEN), string='℘', start=(1, 0), end=(1, 1), line='℘·')
    #>>> TokenInfo(type=53 (ERRORTOKEN), string='·', start=(1, 1), end=(1, 2), line='℘·')
    #>>> TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')


This is a continuation of http://bugs.python.org/issue9712. I'm not able to reopen the issue, so I thought I should report it anew.

It is tokenize that is wrong - Other_ID_Start and Other_ID_Continue are documented to be valid:

https://docs.python.org/3.5/reference/lexical_analysis.html#identifiers
History
Date User Action Args
2015-05-14 13:00:27Joshua.Landausetrecipients: + Joshua.Landau
2015-05-14 13:00:27Joshua.Landausetmessageid: <1431608427.75.0.966529056278.issue24194@psf.upfronthosting.co.za>
2015-05-14 13:00:27Joshua.Landaulinkissue24194 messages
2015-05-14 13:00:27Joshua.Landaucreate