This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Unicode literals in tokenize.py and tests.
Type: Stage:
Components: Library (Lib) Versions: Python 3.0
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, georg.brandl, gvanrossum, loewis, ron_adam
Priority: normal Keywords: patch

Created on 2007-11-11 14:32 by ron_adam, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tokenize_patch.diff ron_adam, 2007-11-11 14:32
Messages (8)
msg57366 - (view) Author: Ron Adam (ron_adam) * Date: 2007-11-11 14:32
Replaced Unicode literals in tokenize.py and it's tests files with byte
literals.

Added a compile step to the test to make sure the text file used in the
test are valid python code.  This will catch changes that need to be
done in to the text (gold file) for future python versions.
msg57370 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-11-11 15:38
I don't think you can have raw bytes (rb"..." etc.) literals.
msg57371 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-11-11 16:19
Yes, raw byte strings are possible:

>>> br"\x"
b'\\x'
msg57392 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-11-12 05:05
I think this patch is wrong. Python source code is inherently text, so
generate_tokens should decode the input, rather than operating on bytes.
msg57402 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-11-12 17:00
I'm with Martin. Adam, why do you think tokenize should use bytes
instead of text strings?
msg57406 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-11-12 17:25
Martin, Guido: I think you misunderstand the patch description: it
doesn't make tokenize process bytes instead of bytes, but makes it
tokenize the new b"..." literals instead of the old u"..." literals.
msg57411 - (view) Author: Ron Adam (ron_adam) * Date: 2007-11-12 17:37
George is correct.  The changes are minimal.

The only addition is to run the tokenize_tests.txt file though compile()
as a way to force an exception if it needs updating in the future.  The
results of the compile are not used.
msg57413 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-11-12 17:40
Got it. Checked in as revision 58951.
History
Date User Action Args
2022-04-11 14:56:28adminsetgithub: 45761
2008-01-06 22:29:45adminsetkeywords: - py3k
versions: Python 3.0
2007-11-12 17:40:51gvanrossumsetstatus: open -> closed
resolution: accepted
messages: + msg57413
2007-11-12 17:37:28ron_adamsetmessages: + msg57411
2007-11-12 17:25:55georg.brandlsetmessages: + msg57406
2007-11-12 17:00:54gvanrossumsetnosy: + gvanrossum
messages: + msg57402
2007-11-12 05:05:52loewissetnosy: + loewis
messages: + msg57392
2007-11-11 16:19:17christian.heimessetpriority: normal
nosy: + christian.heimes
messages: + msg57371
keywords: + py3k, patch
2007-11-11 15:38:17georg.brandlsetnosy: + georg.brandl
messages: + msg57370
2007-11-11 14:32:56ron_adamcreate