classification
Title: Use correct encoding for printing SyntaxErrors
Type: Stage:
Components: Interpreter Core Versions: Python 2.6, Python 2.5
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: loewis Nosy List: gvanrossum, ishimoto, lemburg, loewis, nnorwitz
Priority: high Keywords: patch

Created on 2004-09-20 13:37 by ishimoto, last changed 2007-11-15 20:40 by gvanrossum. This issue is now closed.

Files
File name Uploaded Description Edit
parsetok.patch ishimoto, 2004-09-20 13:37
1031213.patch ishimoto, 2006-03-18 07:06
display_exception.py ishimoto, 2007-10-11 07:44
Messages (20)
msg46912 - (view) Author: Atsuo Ishimoto (ishimoto) * Date: 2004-09-20 13:37
When SyntaxError occurs and the module contains 
source encodings definition, current implementation 
prints error line in UTF8. This patch reverts the line into 
original encoding for printing.

This patch calls some memory-allocation APIs such as 
PyUnicode_DecodeUTF8. I'm not sure I can (or should) 
call PyErr_Clear() here if error happened.
msg46913 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2005-10-02 05:45
Logged In: YES 
user_id=33168

I'm hoping that someone more familiar with unicode could
take a look at this.  The patch looks ok to me, but I
don't know how to test that it works.  I'm inclined to accept
it, unless I hear otherwise.
msg46914 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2005-10-02 18:08
Logged In: YES 
user_id=38388

Please use the "replace" error handler when recoding the
source line
to Unicode - this will reduce the probability of the
conversion failing.

If you do get an error, it's likely going to be an unknown
encoding or
less likely a memory problem. Please add some logic to deal
with these
errors as well - currently you don't call PyError_Clear() or
take some
other action which may lead to confusing error reports (e.g.
error
popping up randomly during program execution due to the set
error).
msg46915 - (view) Author: Atsuo Ishimoto (ishimoto) * Date: 2005-10-13 06:38
Logged In: YES 
user_id=463672

Thanks for your comments. I'll post a revised patch and test
case later.
msg46916 - (view) Author: Atsuo Ishimoto (ishimoto) * Date: 2006-03-18 07:06
Logged In: YES 
user_id=463672

Sorry for my laziness. I revised a patch for current trunk.

- Use "replace" for recoding source
- Reports error with PyErr_Print()
- Test case
msg46917 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-07-30 17:04
Logged In: YES 
user_id=33168

Note to self (or anyone interested): remember to look into this.
msg46918 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-07-16 20:35
I think Martin von Loewis knows more about this.
msg55641 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-09-04 14:23
Thanks for the patch. It wouldn't work as-is, because it broke PGEN. I
fixed that, and committed the change as r57961 and r57962.
msg55642 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-09-04 14:46
We should make sure this is *not* merged into Py3k; there, things remain
unicode until they're printed, at which point the only encoding that
matters is the output file's encoding.
msg56319 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-10-10 18:58
ishimoto: in dec_utf8, there is a PyErr_Print call. What is the purpose
of this call?
msg56334 - (view) Author: Atsuo Ishimoto (ishimoto) * Date: 2007-10-11 01:54
PyErr_Print() is called to report exception raised by codec. 
If PyUnicode_DecodeUTF8() or PyUnicode_AsEncodedString() return NULL,
PyErr_Print() is called.
msg56335 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-11 02:20
> PyErr_Print() is called to report exception raised by codec.
> If PyUnicode_DecodeUTF8() or PyUnicode_AsEncodedString() return NULL,
> PyErr_Print() is called.

This comment is not very helpful; it describes what happens, but not
why, or whether that is a good idea. I believe that if this call is
ever reached, two tracebacks will be printed, confusing the user.
msg56338 - (view) Author: Atsuo Ishimoto (ishimoto) * Date: 2007-10-11 06:04
Sorry for insufficient comment.

When a codec raised an exception, I think the exception should be 
reported. Otherwise, user cannot know why Python prints broken line
of code.  

Should we silently clear the exception raised by codecs, or print a
message such as "Codec raised an exception while processing compile
error." ?
msg56339 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-10-11 06:09
> Should we silently clear the exception raised by codecs, or print a
> message such as "Codec raised an exception while processing compile
> error." ?

Can you create a test case that triggers that specific problem?

Regards,
Martin
msg56340 - (view) Author: Atsuo Ishimoto (ishimoto) * Date: 2007-10-11 07:44
Codecs would hardly ever raises exception here. 
Usually, exception raised here would be a MemoryError. The unicode 
string we are trying to encode is just decoded by same codec. If codec
raises exception other than MemoryError, the codec will likely have problem.

I attached a script to print exception raised by codec. I wrote a "buggy" 
encoder, which never return string but raises an exception.
msg56347 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-11 16:53
There are tons of situations where such an exception will be
suppressed, ofr better or for worse. I don't think this one deserves
such a radical approach.

On 10/11/07, atsuo ishimoto <report@bugs.python.org> wrote:
>
> atsuo ishimoto added the comment:
>
> Codecs would hardly ever raises exception here.
> Usually, exception raised here would be a MemoryError. The unicode
> string we are trying to encode is just decoded by same codec. If codec
> raises exception other than MemoryError, the codec will likely have problem.
>
> I attached a script to print exception raised by codec. I wrote a "buggy"
> encoder, which never return string but raises an exception.
>
> _____________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1031213>
> _____________________________________
>
msg56435 - (view) Author: Atsuo Ishimoto (ishimoto) * Date: 2007-10-15 08:13
That's fine with me. Please replace PyErr_Print() with PyErr_Clear().
msg56445 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-15 15:54
> atsuo ishimoto added the comment:
>
> That's fine with me. Please replace PyErr_Print() with PyErr_Clear().

Done.

Committed revision 58471.
msg57519 - (view) Author: Atsuo Ishimoto (ishimoto) * Date: 2007-11-15 05:08
In release25-maint, PyErr_Print() should be replaced with PyErr_Clear()
also.
msg57558 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-11-15 20:40
> In release25-maint, PyErr_Print() should be replaced with
> PyErr_Clear() also.

Committed revision 58991.
History
Date User Action Args
2007-11-15 20:40:02gvanrossumsetmessages: + msg57558
2007-11-15 05:08:42ishimotosetmessages: + msg57519
2007-10-15 15:54:33gvanrossumsetmessages: + msg56445
2007-10-15 08:13:01ishimotosetmessages: + msg56435
2007-10-11 16:53:28gvanrossumsetmessages: + msg56347
2007-10-11 07:44:29ishimotosetfiles: + display_exception.py
messages: + msg56340
2007-10-11 06:09:59loewissetmessages: + msg56339
2007-10-11 06:04:27ishimotosetmessages: + msg56338
2007-10-11 02:20:08gvanrossumsetmessages: + msg56335
2007-10-11 01:54:49ishimotosetmessages: + msg56334
2007-10-10 18:58:06loewissetmessages: + msg56319
2007-09-04 14:46:11gvanrossumsetmessages: + msg55642
2007-09-04 14:23:45loewissetstatus: open -> closed
resolution: accepted
messages: + msg55641
versions: + Python 2.6, Python 2.5
2007-08-23 21:52:37georg.brandlsettitle: Patch for bug #780725 -> Use correct encoding for printing SyntaxErrors
2007-08-23 21:52:01georg.brandllinkissue780725 superseder
2004-09-20 13:37:30ishimotocreate