Message 63628 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ocean-city
Recipients	loewis, ocean-city
Date	2008-03-17.08:56:13
SpamBayes Score	0.09839203
Marked as misclassified	No
Message-id	<1205744175.37.0.470372180921.issue2301@psf.upfronthosting.co.za>
In-reply-to

Content
Hello. I tracked down source code and found where err->text is set. Index: Parser/parsetok.c =================================================================== --- Parser/parsetok.c (revision 61411) +++ Parser/parsetok.c (working copy) @@ -218,7 +218,7 @@ assert(tok->cur - tok->buf < INT_MAX); err_ret->offset = (int)(tok->cur - tok->buf); len = tok->inp - tok->buf; - text = PyTokenizer_RestoreEncoding(tok, len, &err_ret->offset); +/* text = PyTokenizer_RestoreEncoding(tok, len, &err_ret->offset); / if (text == NULL) { text = (char ) PyObject_MALLOC(len + 1); if (text != NULL) { It seems tok->buf is encoded with UTF-8, and PyTokenizer_RestoreEncoding() resotores it to original encoding of source file. So I tried above patch, output was expected on cp932/euc_jp source files. Maybe this function is not needed in py3k? I cannot find other place where this function is used. # Probably PyErr_ProgramText() needs more effort to be fixed.

Hello. I tracked down source code and found where err->text is set.

Index: Parser/parsetok.c
===================================================================
--- Parser/parsetok.c	(revision 61411)
+++ Parser/parsetok.c	(working copy)
@@ -218,7 +218,7 @@
 			assert(tok->cur - tok->buf < INT_MAX);
 			err_ret->offset = (int)(tok->cur - tok->buf);
 			len = tok->inp - tok->buf;
-			text = PyTokenizer_RestoreEncoding(tok, len, &err_ret->offset);
+/*			text = PyTokenizer_RestoreEncoding(tok, len, &err_ret->offset); */
 			if (text == NULL) {
 				text = (char *) PyObject_MALLOC(len + 1);
 				if (text != NULL) {

It seems tok->buf is encoded with UTF-8, and
PyTokenizer_RestoreEncoding() resotores it to original encoding of
source file. So I tried above patch, output was expected on cp932/euc_jp
source files.

Maybe this function is not needed in py3k? I cannot find other place
where this function is used.

# Probably PyErr_ProgramText() needs more effort to be fixed.

History
Date	User	Action	Args
2008-03-17 08:56:15	ocean-city	set	spambayes_score: 0.098392 -> 0.09839203 recipients: + ocean-city, loewis
2008-03-17 08:56:15	ocean-city	set	spambayes_score: 0.098392 -> 0.098392 messageid: <1205744175.37.0.470372180921.issue2301@psf.upfronthosting.co.za>
2008-03-17 08:56:14	ocean-city	link	issue2301 messages
2008-03-17 08:56:13	ocean-city	create