This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gdr@garethrees.org
Recipients benjamin.peterson, daniel.urban, eric.snow, ezio.melotti, gdr@garethrees.org, r.david.murray, terry.reedy, vladris
Date 2011-08-05.21:11:50
SpamBayes Score 1.2516488e-11
Marked as misclassified No
Message-id <1312578711.21.0.396413594125.issue12675@psf.upfronthosting.co.za>
In-reply-to
Content
Terry: agreed. Does anyone actually use this module? Does anyone know what the design goals are for tokenize? If someone can tell me, I'll do my best to make it meet them.

Meanwhile, here's another bug. Each character of trailing whitespace is tokenized as an ERRORTOKEN.

    Python 3.3.0a0 (default:c099ba0a278e, Aug  2 2011, 12:35:03) 
    [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from tokenize import tokenize,untokenize
    >>> from io import BytesIO
    >>> list(tokenize(BytesIO('1 '.encode('utf8')).readline))
    [TokenInfo(type=57 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line=''), TokenInfo(type=2 (NUMBER), string='1', start=(1, 0), end=(1, 1), line='1 '), TokenInfo(type=54 (ERRORTOKEN), string=' ', start=(1, 1), end=(1, 2), line='1 '), TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')]
History
Date User Action Args
2011-08-05 21:11:51gdr@garethrees.orgsetrecipients: + gdr@garethrees.org, terry.reedy, benjamin.peterson, ezio.melotti, r.david.murray, daniel.urban, eric.snow, vladris
2011-08-05 21:11:51gdr@garethrees.orgsetmessageid: <1312578711.21.0.396413594125.issue12675@psf.upfronthosting.co.za>
2011-08-05 21:11:50gdr@garethrees.orglinkissue12675 messages
2011-08-05 21:11:50gdr@garethrees.orgcreate