Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use correct encoding for printing SyntaxErrors #40933

Closed
atsuoishimoto mannequin opened this issue Sep 20, 2004 · 20 comments
Closed

Use correct encoding for printing SyntaxErrors #40933

atsuoishimoto mannequin opened this issue Sep 20, 2004 · 20 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@atsuoishimoto
Copy link
Mannequin

atsuoishimoto mannequin commented Sep 20, 2004

BPO 1031213
Nosy @malemburg, @gvanrossum, @loewis, @atsuoishimoto
Files
  • parsetok.patch
  • 1031213.patch
  • display_exception.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/loewis'
    closed_at = <Date 2007-09-04.14:23:45.464>
    created_at = <Date 2004-09-20.13:37:30.000>
    labels = ['interpreter-core']
    title = 'Use correct encoding for printing SyntaxErrors'
    updated_at = <Date 2007-11-15.20:40:02.938>
    user = 'https://github.com/atsuoishimoto'

    bugs.python.org fields:

    activity = <Date 2007-11-15.20:40:02.938>
    actor = 'gvanrossum'
    assignee = 'loewis'
    closed = True
    closed_date = <Date 2007-09-04.14:23:45.464>
    closer = 'loewis'
    components = ['Interpreter Core']
    creation = <Date 2004-09-20.13:37:30.000>
    creator = 'ishimoto'
    dependencies = []
    files = ['6255', '6256', '8508']
    hgrepos = []
    issue_num = 1031213
    keywords = ['patch']
    message_count = 20.0
    messages = ['46912', '46913', '46914', '46915', '46916', '46917', '46918', '55641', '55642', '56319', '56334', '56335', '56338', '56339', '56340', '56347', '56435', '56445', '57519', '57558']
    nosy_count = 5.0
    nosy_names = ['lemburg', 'gvanrossum', 'loewis', 'nnorwitz', 'ishimoto']
    pr_nums = []
    priority = 'high'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1031213'
    versions = ['Python 2.6', 'Python 2.5']

    @atsuoishimoto
    Copy link
    Mannequin Author

    atsuoishimoto mannequin commented Sep 20, 2004

    When SyntaxError occurs and the module contains
    source encodings definition, current implementation
    prints error line in UTF8. This patch reverts the line into
    original encoding for printing.

    This patch calls some memory-allocation APIs such as
    PyUnicode_DecodeUTF8. I'm not sure I can (or should)
    call PyErr_Clear() here if error happened.

    @atsuoishimoto atsuoishimoto mannequin assigned loewis Sep 20, 2004
    @atsuoishimoto atsuoishimoto mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Sep 20, 2004
    @atsuoishimoto atsuoishimoto mannequin assigned loewis Sep 20, 2004
    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Oct 2, 2005

    Logged In: YES
    user_id=33168

    I'm hoping that someone more familiar with unicode could
    take a look at this. The patch looks ok to me, but I
    don't know how to test that it works. I'm inclined to accept
    it, unless I hear otherwise.

    @malemburg
    Copy link
    Member

    Logged In: YES
    user_id=38388

    Please use the "replace" error handler when recoding the
    source line
    to Unicode - this will reduce the probability of the
    conversion failing.

    If you do get an error, it's likely going to be an unknown
    encoding or
    less likely a memory problem. Please add some logic to deal
    with these
    errors as well - currently you don't call PyError_Clear() or
    take some
    other action which may lead to confusing error reports (e.g.
    error
    popping up randomly during program execution due to the set
    error).

    @atsuoishimoto
    Copy link
    Mannequin Author

    atsuoishimoto mannequin commented Oct 13, 2005

    Logged In: YES
    user_id=463672

    Thanks for your comments. I'll post a revised patch and test
    case later.

    @atsuoishimoto
    Copy link
    Mannequin Author

    atsuoishimoto mannequin commented Mar 18, 2006

    Logged In: YES
    user_id=463672

    Sorry for my laziness. I revised a patch for current trunk.

    • Use "replace" for recoding source
    • Reports error with PyErr_Print()
    • Test case

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Jul 30, 2006

    Logged In: YES
    user_id=33168

    Note to self (or anyone interested): remember to look into this.

    @gvanrossum
    Copy link
    Member

    I think Martin von Loewis knows more about this.

    @birkenfeld birkenfeld changed the title Patch for bug #780725 Use correct encoding for printing SyntaxErrors Aug 23, 2007
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Sep 4, 2007

    Thanks for the patch. It wouldn't work as-is, because it broke PGEN. I
    fixed that, and committed the change as r57961 and r57962.

    @loewis loewis mannequin closed this as completed Sep 4, 2007
    @gvanrossum
    Copy link
    Member

    We should make sure this is *not* merged into Py3k; there, things remain
    unicode until they're printed, at which point the only encoding that
    matters is the output file's encoding.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Oct 10, 2007

    ishimoto: in dec_utf8, there is a PyErr_Print call. What is the purpose
    of this call?

    @atsuoishimoto
    Copy link
    Mannequin Author

    atsuoishimoto mannequin commented Oct 11, 2007

    PyErr_Print() is called to report exception raised by codec.
    If PyUnicode_DecodeUTF8() or PyUnicode_AsEncodedString() return NULL,
    PyErr_Print() is called.

    @gvanrossum
    Copy link
    Member

    PyErr_Print() is called to report exception raised by codec.
    If PyUnicode_DecodeUTF8() or PyUnicode_AsEncodedString() return NULL,
    PyErr_Print() is called.

    This comment is not very helpful; it describes what happens, but not
    why, or whether that is a good idea. I believe that if this call is
    ever reached, two tracebacks will be printed, confusing the user.

    @atsuoishimoto
    Copy link
    Mannequin Author

    atsuoishimoto mannequin commented Oct 11, 2007

    Sorry for insufficient comment.

    When a codec raised an exception, I think the exception should be
    reported. Otherwise, user cannot know why Python prints broken line
    of code.

    Should we silently clear the exception raised by codecs, or print a
    message such as "Codec raised an exception while processing compile
    error." ?

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Oct 11, 2007

    Should we silently clear the exception raised by codecs, or print a
    message such as "Codec raised an exception while processing compile
    error." ?

    Can you create a test case that triggers that specific problem?

    Regards,
    Martin

    @atsuoishimoto
    Copy link
    Mannequin Author

    atsuoishimoto mannequin commented Oct 11, 2007

    Codecs would hardly ever raises exception here.
    Usually, exception raised here would be a MemoryError. The unicode
    string we are trying to encode is just decoded by same codec. If codec
    raises exception other than MemoryError, the codec will likely have problem.

    I attached a script to print exception raised by codec. I wrote a "buggy"
    encoder, which never return string but raises an exception.

    @gvanrossum
    Copy link
    Member

    There are tons of situations where such an exception will be
    suppressed, ofr better or for worse. I don't think this one deserves
    such a radical approach.

    On 10/11/07, atsuo ishimoto <report@bugs.python.org> wrote:

    atsuo ishimoto added the comment:

    Codecs would hardly ever raises exception here.
    Usually, exception raised here would be a MemoryError. The unicode
    string we are trying to encode is just decoded by same codec. If codec
    raises exception other than MemoryError, the codec will likely have problem.

    I attached a script to print exception raised by codec. I wrote a "buggy"
    encoder, which never return string but raises an exception.


    Tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1031213\>


    @atsuoishimoto
    Copy link
    Mannequin Author

    atsuoishimoto mannequin commented Oct 15, 2007

    That's fine with me. Please replace PyErr_Print() with PyErr_Clear().

    @gvanrossum
    Copy link
    Member

    atsuo ishimoto added the comment:

    That's fine with me. Please replace PyErr_Print() with PyErr_Clear().

    Done.

    Committed revision 58471.

    @atsuoishimoto
    Copy link
    Mannequin Author

    atsuoishimoto mannequin commented Nov 15, 2007

    In release25-maint, PyErr_Print() should be replaced with PyErr_Clear()
    also.

    @gvanrossum
    Copy link
    Member

    In release25-maint, PyErr_Print() should be replaced with
    PyErr_Clear() also.

    Committed revision 58991.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants