msg252397 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-10-06 15:36 |
Importing the tokenize modifies the content of the token module.
>>> import token
>>> len(token.tok_name)
59
>>> import tokenize
>>> len(token.tok_name)
61
Such side effect looks as bad design to me. Either token.tok_name should contain elements needed for tokenize, or tokenize should not modify token.tok_name.
|
msg252398 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2015-10-06 15:39 |
I agree :-) Do you want to work on a patch?
|
msg252407 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-10-06 16:39 |
I left the decision for modules maintainers.
|
msg252439 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-10-06 23:30 |
I would fix this by making tokenize.tok_name a copy. It looks like this behaviour dates back to 1997 (see revision 1efc4273fdb7).
|
msg293786 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-05-16 21:02 |
> I would fix this by making tokenize.tok_name a copy. It looks like this behaviour dates back to 1997 (see revision 1efc4273fdb7).
token.tok_name is part of the Python public API:
https://docs.python.org/dev/library/token.html#token.tok_name
whereas tokenize.tok_name isn't documented. So I dislike having two disconnected mappings. I prefer to add tokenize tokens directly in Lib/token.py, and then get COMMENT, NL and ENCODING using tok_name.index().
|
msg293787 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-05-16 21:06 |
Extract of Lib/lib2to3/pgen2/driver.py:
if type in (tokenize.COMMENT, tokenize.NL):
...
if debug:
self.logger.debug("%s %r (prefix=%r)",
token.tok_name[type], value, prefix)
The code uses tokenize.COMMENT and look for this constant into token.tok_name. If token.tok_name doesn't contain COMMENT anymore, it breaks lib2to3, no? At least the debug mode which might not be covered by test_lib2to3.
|
msg293789 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-05-16 21:09 |
Another example from Tools/i18n/pygettext.py, TokenEater::
def __call__(self, ttype, tstring, stup, etup, line):
# dispatch
## import token
## print >> sys.stderr, 'ttype:', token.tok_name[ttype], \
## 'tstring:', tstring
self.__state(ttype, tstring, stup[0])
...
eater.set_filename(filename)
try:
tokens = tokenize.tokenize(fp.readline)
for _token in tokens:
eater(*_token)
Another example using token.tok_name with token types coming from tokenize.
|
msg293793 - (view) |
Author: Albert-Jan Nijburg (Albert-Jan Nijburg) * |
Date: 2017-05-16 21:24 |
> I prefer to add tokenize tokens directly in Lib/token.py, and then get COMMENT, NL and ENCODING using tok_name.index().
That would make more sense from a breaking change perspective, but we would step on the toes of anyone adding `COMMENT`, `NL`, or `ENCODING` to `token.h` because `token.py` is generated from that.
It would also make much more sense to have them as fields on `token` if they are in `tok_name` in `token`.
|
msg293796 - (view) |
Author: Albert-Jan Nijburg (Albert-Jan Nijburg) * |
Date: 2017-05-16 21:29 |
lib2to3 appears to have it's own token.py as well with NL and COMMENT withtout ENCODING...
Lib/lib2to3/pgen2/token.py
|
msg293799 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-05-16 21:34 |
> lib2to3 appears to have it's own token.py as well with NL and COMMENT withtout ENCODING...
Oh you are right: Lib/lib2to3/pgen2/driver.py uses Lib/lib2to3/pgen2/token.py.
|
msg294345 - (view) |
Author: Albert-Jan Nijburg (Albert-Jan Nijburg) * |
Date: 2017-05-24 11:16 |
I've updated the PR and added the tokenize tokens to token.h and their names to tokenizer.c. This way they'll show up when you run token.py. The names will always be in tok_name and tokenizer.py will use those. Not breaking the public api and no longer modifying token.py when you import tokenizer.py.
|
msg294751 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-05-30 11:54 |
https://github.com/python/cpython/pull/1608 doesn't include a Misc/NEWS entry, but I don't want to stress albertjan with Misc/NEWS since I was already annoying enough on the review :-) I'm not sure that the change deserves a NEWS entry. If you consider that yes, it does. It can be done in a separated change.
|
msg294755 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-05-30 13:32 |
Albert-Jan's patch should be merged prior my patch for issue30455, because it can be backported, by my patch shouldn't. But first we should compare them and find differences in results.
My patch adds COMMENT, NL and ENCODING in the documentation, Albert-Jan's patch doesn't. I don't know whether it is worth to add these names to the token module in bugfix releases, and whether it is worth to document them.
My patch wraps all names in _PyParser_TokenNames after <ERRORTOKEN> with angle parenthesis, Albert-Jan's patch inserts new names between <ERRORTOKEN> and <N_TOKENS> without angle parenthesis. I don't know what is more correct.
|
msg294761 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-05-30 14:26 |
I suggest to only modify the master branch. Nobody complained before. It's only a very minor surprising behaviour.
|
msg294774 - (view) |
Author: Albert-Jan Nijburg (Albert-Jan Nijburg) * |
Date: 2017-05-30 17:49 |
Let me know if you want me to add/change anything about my PR :) I'm happy to do so.
|
msg294785 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-05-30 21:12 |
If merge this only in master, the new members of the token module should be documented (add a versionchanged directive in token.rst). This also needs a Misc/NEWS entry.
|
msg294823 - (view) |
Author: Albert-Jan Nijburg (Albert-Jan Nijburg) * |
Date: 2017-05-31 08:12 |
I've updated token.rst and Misc/NEWS. Let me know if the wording is correct.
|
msg294832 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-05-31 11:04 |
If move COMMENT, NL and ENCODING to the token module, their documentation should be moved from tokenize.rst to token.rst.
I have asked on Python-Dev whether this is a right way. [1]
[1] https://mail.python.org/pipermail/python-dev/2017-May/148080.html
|
msg294842 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-05-31 14:00 |
New changeset fc354f07855a9197e71f851ad930cbf5652f9160 by Victor Stinner (Albert-Jan Nijburg) in branch 'master':
bpo-25324: copy tok_name before changing it (#1608)
https://github.com/python/cpython/commit/fc354f07855a9197e71f851ad930cbf5652f9160
|
msg294957 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-06-01 17:08 |
We got a bug report from Coverity:
*** CID 1411801: Incorrect expression (MISSING_COMMA)
/Parser/tokenizer.c: 111 in ()
105 "OP",
106 "AWAIT",
107 "ASYNC",
108 "<ERRORTOKEN>",
109 "COMMENT",
110 "NL",
>>> CID 1411801: Incorrect expression (MISSING_COMMA)
>>> In the initialization of "_PyParser_TokenNames", a suspicious concatenated string ""ENCODING<N_TOKENS>"" is produced.
111 "ENCODING"
112 "<N_TOKENS>"
113 };
114
115
116 /* Create and initialize a new tok_state structure */
I missed this typo :-p
|
msg294958 - (view) |
Author: Albert-Jan Nijburg (Albert-Jan Nijburg) * |
Date: 2017-06-01 17:20 |
Aah! Oops I can fix later today.
On Thu, 1 Jun 2017 at 18:08, STINNER Victor <report@bugs.python.org> wrote:
>
> STINNER Victor added the comment:
>
> We got a bug report from Coverity:
>
> *** CID 1411801: Incorrect expression (MISSING_COMMA)
> /Parser/tokenizer.c: 111 in ()
> 105 "OP",
> 106 "AWAIT",
> 107 "ASYNC",
> 108 "<ERRORTOKEN>",
> 109 "COMMENT",
> 110 "NL",
> >>> CID 1411801: Incorrect expression (MISSING_COMMA)
> >>> In the initialization of "_PyParser_TokenNames", a suspicious
> concatenated string ""ENCODING<N_TOKENS>"" is produced.
> 111 "ENCODING"
> 112 "<N_TOKENS>"
> 113 };
> 114
> 115
> 116 /* Create and initialize a new tok_state structure */
>
> I missed this typo :-p
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue25324>
> _______________________________________
>
|
msg294966 - (view) |
Author: Mariatta (Mariatta) *  |
Date: 2017-06-01 20:51 |
New changeset c9ccacea3ff441b1ff519c7399602b7db16f9783 by Mariatta (Albert-Jan Nijburg) in branch 'master':
bpo-25324: add missing comma in Parser/tokenizer.c (GH-1910)
https://github.com/python/cpython/commit/c9ccacea3ff441b1ff519c7399602b7db16f9783
|
msg294970 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-06-01 21:12 |
Albert-Jan Nijburg added the comment:
> Aah! Oops I can fix later today.
Don't worry. Our test suite and all reviewers also missed it ;-)
|
msg295267 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-06-06 15:43 |
New changeset 5cefb6cfdd089d237ba6724bb5311ee4f04be59f by Serhiy Storchaka in branch 'master':
bpo-25324: Move the description of tokenize tokens to token.rst. (#1911)
https://github.com/python/cpython/commit/5cefb6cfdd089d237ba6724bb5311ee4f04be59f
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:22 | admin | set | github: 69511 |
2017-06-06 15:54:21 | serhiy.storchaka | set | status: open -> closed stage: resolved resolution: fixed versions:
+ Python 3.7, - Python 2.7, Python 3.4, Python 3.5, Python 3.6 |
2017-06-06 15:43:38 | serhiy.storchaka | set | messages:
+ msg295267 |
2017-06-01 22:33:44 | serhiy.storchaka | set | pull_requests:
+ pull_request1991 |
2017-06-01 21:12:33 | vstinner | set | messages:
+ msg294970 |
2017-06-01 20:51:29 | Mariatta | set | nosy:
+ Mariatta messages:
+ msg294966
|
2017-06-01 20:02:37 | Albert-Jan Nijburg | set | pull_requests:
+ pull_request1990 |
2017-06-01 17:20:21 | Albert-Jan Nijburg | set | messages:
+ msg294958 |
2017-06-01 17:08:45 | vstinner | set | messages:
+ msg294957 |
2017-05-31 14:00:23 | vstinner | set | messages:
+ msg294842 |
2017-05-31 11:04:10 | serhiy.storchaka | set | messages:
+ msg294832 |
2017-05-31 08:12:14 | Albert-Jan Nijburg | set | messages:
+ msg294823 |
2017-05-30 21:12:20 | serhiy.storchaka | set | messages:
+ msg294785 |
2017-05-30 17:49:55 | Albert-Jan Nijburg | set | messages:
+ msg294774 |
2017-05-30 14:26:58 | vstinner | set | messages:
+ msg294761 |
2017-05-30 13:32:25 | serhiy.storchaka | set | messages:
+ msg294755 |
2017-05-30 11:54:45 | vstinner | set | messages:
+ msg294751 |
2017-05-24 11:16:28 | Albert-Jan Nijburg | set | messages:
+ msg294345 |
2017-05-16 21:34:09 | vstinner | set | messages:
+ msg293799 |
2017-05-16 21:29:09 | Albert-Jan Nijburg | set | messages:
+ msg293796 |
2017-05-16 21:24:31 | Albert-Jan Nijburg | set | nosy:
+ Albert-Jan Nijburg messages:
+ msg293793
|
2017-05-16 21:09:26 | vstinner | set | messages:
+ msg293789 |
2017-05-16 21:06:23 | vstinner | set | messages:
+ msg293787 |
2017-05-16 21:02:54 | vstinner | set | messages:
+ msg293786 |
2017-05-16 15:44:07 | Albert-Jan Nijburg | set | pull_requests:
+ pull_request1699 |
2015-12-22 08:41:28 | josephgordon | set | nosy:
+ josephgordon
|
2015-10-06 23:30:46 | martin.panter | set | nosy:
+ martin.panter messages:
+ msg252439
|
2015-10-06 16:39:18 | serhiy.storchaka | set | messages:
+ msg252407 |
2015-10-06 15:39:00 | vstinner | set | nosy:
+ vstinner messages:
+ msg252398
|
2015-10-06 15:36:38 | serhiy.storchaka | create | |