Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Py3k] line number is wrong after encoding declaration #46637

Closed
ocean-city mannequin opened this issue Mar 18, 2008 · 22 comments
Closed

[Py3k] line number is wrong after encoding declaration #46637

ocean-city mannequin opened this issue Mar 18, 2008 · 22 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@ocean-city
Copy link
Mannequin

ocean-city mannequin commented Mar 18, 2008

BPO 2384
Nosy @warsaw, @amauryfa, @pitrou, @vstinner
Dependencies
  • bpo-3975: PyTraceBack_Print() doesn't respect # coding: xxx header
  • Files
  • test_traceback.patch
  • tokenizer-coding-4.patch: Fix this issue and add a testcase (for this issue and issue 3975) (version 4)
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2008-10-09.23:52:49.754>
    created_at = <Date 2008-03-18.07:28:38.119>
    labels = ['type-bug']
    title = '[Py3k] line number is wrong after encoding declaration'
    updated_at = <Date 2008-10-09.23:52:49.753>
    user = 'https://bugs.python.org/ocean-city'

    bugs.python.org fields:

    activity = <Date 2008-10-09.23:52:49.753>
    actor = 'amaury.forgeotdarc'
    assignee = 'none'
    closed = True
    closed_date = <Date 2008-10-09.23:52:49.754>
    closer = 'amaury.forgeotdarc'
    components = ['None']
    creation = <Date 2008-03-18.07:28:38.119>
    creator = 'ocean-city'
    dependencies = ['3975']
    files = ['9943', '11738']
    hgrepos = []
    issue_num = 2384
    keywords = ['patch']
    message_count = 22.0
    messages = ['63905', '64157', '64965', '67953', '71308', '71336', '72183', '73512', '73843', '73845', '73846', '73847', '73852', '73857', '73883', '73889', '73929', '74394', '74435', '74497', '74498', '74611']
    nosy_count = 7.0
    nosy_names = ['barry', 'amaury.forgeotdarc', 'pitrou', 'vstinner', 'ocean-city', 'jmfauth', 'dlitz']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue2384'
    versions = ['Python 3.0']

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Mar 18, 2008

    # This issue inherits from bpo-2301.

    If there is "# coding: ????" is in source code and
    coding is neigher utf-8 nor iso-8859-1, line number (tok->lineno)
    becomes wrong.

    Please look into Parser/tokenizer.c. In this case,
    tok->decoding_state becomes STATE_NORMAL, so fp_setreadl
    newly opens file but *doesn't* seek to current position.
    (Or maybe can we reuse already opened file?)

    So

    # coding: ascii
    # 1
    # 2
    # 3
    raise RuntimeError("a")
    # 4
    # 5
    # 6

    outputs

    C:\Documents and Settings\WhiteRabbit>py3k ascii.py

    Traceback (most recent call last):
      File "ascii.py", line 6, in <module>
        # 4
    RuntimeError: a
    [22821 refs]

    One line shifted because line number wrongly +1

    And

    # dummy
    # coding: ascii
    # 1
    # 2
    # 3
    raise RuntimeError("a")
    # 4
    # 5
    # 6

    outputs

    C:\Documents and Settings\WhiteRabbit>py3k ascii.py

    Traceback (most recent call last):
      File "ascii.py", line 8, in <module>
        # 5
    RuntimeError: a
    [22821 refs]

    Two lines shifted because line number wrongly +2

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Mar 20, 2008

    Following dirty hack workarounds this bug. Comment of this function
    says not ascii compatible encoding is not supported yet, (ie: UTF-16)
    so probably this works.

    Index: Parser/tokenizer.c
    ===================================================================

    --- Parser/tokenizer.c	(revision 61632)
    +++ Parser/tokenizer.c	(working copy)
    @@ -464,6 +464,7 @@
     	Py_XDECREF(tok->decoding_readline);
     	readline = PyObject_GetAttrString(stream, "readline");
     	tok->decoding_readline = readline;
    +	tok->lineno = -1; /* dirty hack */
     
       cleanup:
     	Py_XDECREF(stream);

    But if multibyte character is in line like this, its line will not be
    printed.

    # coding: cp932
    # 1
    raise RuntimeError("あいうえお")
    # 2

    C:\Documents and Settings\WhiteRabbit>py3k cp932.py
    Traceback (most recent call last):
      File "cp932.py", line 3, in <module>
        [22819 refs]

    This is because Python/trackeback.c 's tb_displayline() assumes
    input line is encoded with UTF-8. (simply using FILE structure +
    Py_UniversalNewlineFgets)

    # http://mail.python.org/pipermail/python-3000/2008-March/012546.html
    # sounds nice, if we can replace all FILE structure to Python's own
    # fast enough codeced Reader or something.

    @ocean-city ocean-city mannequin added the type-bug An unexpected behavior, bug, or error label Mar 20, 2008
    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Apr 5, 2008

    I've written testcase for lineno problem.

    @warsaw
    Copy link
    Member

    warsaw commented Jun 11, 2008

    This is a bug and not a new feature, so it could go in after beta. I'm
    knocking it down to a critical.

    @warsaw
    Copy link
    Member

    warsaw commented Aug 18, 2008

    While this is a bug, it's not serious enough to hold up the release.

    @jmfauth
    Copy link
    Mannequin

    jmfauth mannequin commented Aug 18, 2008

    Py3.0b2. This bug seems to be quite annoying. Especially when one works
    with a main module importing modules which are importing modules and so
    on, all modules having an encoding declaration. The Traceback (and the
    user) is (are) a little bit lost.

    ---------
    # -- coding: cp1252 --
    # modb.py

    def fb():
        i = 1
        j = 0
        r =  i / j    

    # -- coding: cp1252 --
    # moda.py

    import modb
    
    def fa():
        modb.fb()

    # -- coding: cp1252 --
    # main.py

    import moda
    
    def main():
        moda.fa()
    
    if __name__ == '__main__':
        main()

    Running main.py leads to an

    >c:\python30\pythonw -u "main.py"
    (Traceback (most recent call last):
      File "main.py", line 11, in <module>
        
      File "main.py", line 8, in main
        
      File "C:\jm\jmpy3\moda.py", line 8, in fa
        
      File "C:\jm\jmpy3\modb.py", line 8, in fb
        
    ZeroDivisionError: int division or modulo by zero
    >Exit code: 1

    @DLitz
    Copy link
    Mannequin

    DLitz mannequin commented Aug 30, 2008

    Could "-- coding: ascii --" and other equivalent encodings be fixed,
    at least, before the release?

    @jmfauth
    Copy link
    Mannequin

    jmfauth mannequin commented Sep 21, 2008

    Python 3.0rc1

    If the lines are now displayed correctly, I think there is still a
    numbering issue, a +1 offset.

    Python 2.5.2

    # -- coding: cp1252 -- <<<< line 1, first line

    s = 'abc'
    import dummy
    s = 'def'

    >pythonw -u "testpy2.py"
    Traceback (most recent call last):
      File "testpy2.py", line 4, in <module>
        import dummy
    ImportError: No module named dummy
    >Exit code: 1

    Python 3.0rc1

    # -- coding: cp1252 --

    s = 'abc'
    import dummy
    s = 'def'

    >c:\python30\pythonw -u "testpy3.py"
    Traceback (most recent call last):
      File "testpy3.py", line 5, in <module>
        s = 'def'
    ImportError: No module named dummy
    >Exit code: 1

    @pitrou
    Copy link
    Member

    pitrou commented Sep 26, 2008

    bpo-3973 is a duplicate.

    @vstinner
    Copy link
    Member

    By setting lineto to 1 (as proposed ocean-city), ASCII tests (test1
    and test2, see below) works correctly. This change doesn't impact
    utf-8/iso-8859-1 charset (it's special case).

    --- test1 ---
    # coding: ASCII
    raise Exception("here")
    -------------

    --- test2 ---
    # useless at line 1
    # coding: ASCII
    raise Exception("here")
    -------------

    I don't know how to test a UTF-16 file. Can someone write a testcase?

    @vstinner
    Copy link
    Member

    ocean-city testcase is invalid: it uses subprocess.call() which
    returns the exit code, not the Python error line number! Here is a
    better testcase using subprocess.Popen() checking the line number but
    also the display line. It tests ASCII, UTF-8 and GBK charsets. Using
    GBK charset, you get the bug described by ocean-city (problem with
    multibyte charset). My testcase takes also care of script with #
    coding at the second line.

    @vstinner
    Copy link
    Member

    Hum, about the empty line error using a multibyte charset, the issue
    is different. PyTraceBack_Print() calls _Py_DisplaySourceLine() which
    doesn't take care of the charset.

    @vstinner
    Copy link
    Member

    Here is a patch fixing this issue: it's quite the same that ocean-city
    patch, but I prefer to patch lineno only if set_readline() succeed.

    About the truncated traceback for multibyte charset: see the new
    bpo-3975.

    @vstinner
    Copy link
    Member

    Oh! My patch breaks "python -m". The problem is maybe no in the token
    parser but... somewhere else?
    --- test.py ---
    # coding: ASCII
    raise Exception("line 2")
    # try again!
    ---------------

    Python 3.0 trunk unpatched:
    ---

    $ ./python test.py
    Traceback (most recent call last):
      File "test.py", line 3, in <module>
    
    $ ./python -m test
    Traceback (most recent call last):
      File "/home/haypo/prog/py3k/Lib/runpy.py", line 121, in 
    _run_module_as_main
        "__main__", fname, loader, pkg_name)
      File "/home/haypo/prog/py3k/Lib/runpy.py", line 34, in _run_code
        exec(code, run_globals)
      File "/home/haypo/prog/py3k/test.py", line 2, in <module>
        raise Exception("line 2")
    Exception: line 2

    Python 3.0 trunk + tokenizer-coding.patch:
    ---

    marge$ ./python test.py
    Traceback (most recent call last):
      File "test.py", line 2, in <module>
        raise Exception("line 2")
    Exception: line 2
    
    Traceback (most recent call last):
      File "/home/haypo/prog/py3k/Lib/runpy.py", line 121, in 
    _run_module_as_main
        "__main__", fname, loader, pkg_name)
      File "/home/haypo/prog/py3k/Lib/runpy.py", line 34, in _run_code
        exec(code, run_globals)
      File "/home/haypo/prog/py3k/test.py", line 1, in <module>
        # coding: ASCII
    Exception: line 2

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Sep 26, 2008

    Victor, this is fp_setreadl's problem, so if put "tok->lineno = -1"
    anywhere, it should be in fp_setreadl(), I think.

    	r = set_readline(tok, cs);
    	if (r) {
    		/* 1 */
    		tok->encoding = cs;
    		tok->decoding_state = STATE_NORMAL;

    At /* 1 */, set_readline could be buf_setreadl(), and fp_setreadl is
    called elsewhere.

    @vstinner
    Copy link
    Member

    @Ocean-City: Oops, sorry. Using your patch (set lineno in
    fp_setreadl()), it works on both cases ("python test.py" or "python -m
    test").

    The new patch includes your fix for tokenizer.c and a new version of the
    testcase.

    @vstinner
    Copy link
    Member

    bpo-2832 is a duplicate.

    @vstinner
    Copy link
    Member

    vstinner commented Oct 6, 2008

    benjamin was afraid by the comment /* dirty hack */ in my previous
    comment. After reading tokenizer.c again and again, I can say that the
    fix is correct: the file is closed and then re-opened by fp_setreadl()
    (using io.open()), and so the file cursor goes back to the file start.

    @amauryfa
    Copy link
    Member

    amauryfa commented Oct 7, 2008

    Your patch does the correct thing, however an explanation of the -1
    value would be welcome. Something like:
    /* The file has been reopened; parsing will restart from

    • the beginning of the file, we have to reset the line number.
    • But this function has been called from inside tok_nextc() which
    • will increment lineno before it returns. So we set it -1 so that
    • the next call to tok_nextc() will start with tok->lineno == 0.
      */

    Or we could change the place of the tok->lineno++ in tok_nextc() so that
    it is called before the call to decoding_fgets(); other changes will be
    needed.

    Then, I think that your test is not correct: What is the meaning of the
    following line?
    sys.exit(traceback.tb_lineno(sys.exc_info()[2]))
    (the module "traceback" has no attribute "tp_lineno")
    I presume that you intended something like:
    traceback.print_exc()
    sys.exit(sys.exc_info()[2].tb_lineno)
    and test at some point that "process.returncode == lineno"

    @vstinner
    Copy link
    Member

    vstinner commented Oct 7, 2008

    @Amaury: Ok, I added your long comment in tokenizer.c. You're also
    right about the strange code in the test. I reused ocean-city's
    test. "sys.exc_info()[2].tb_lineno" raises an additional (useless)
    error. So I simplified the code to use only "raise RuntimeError(...)"
    with the try/except/else.

    Since tokenizer.c is hard to understand, I don't wnat to change the
    code of tok_nextc().

    @amauryfa
    Copy link
    Member

    amauryfa commented Oct 7, 2008

    This issue depends on bpo-3975 to properly display tracebacks from python
    files with encoding.

    @amauryfa
    Copy link
    Member

    amauryfa commented Oct 9, 2008

    Committed r66867.

    I had to considerably change the unit tests, because the subprocess
    output is not utf-8 encoded; it's not even the same as sys.stdout,
    because the spawned process uses a PIPE, not a terminal: on my winXP,
    the main interpreter uses cp437, but the subprocess says cp1252. So I
    first run a 'python -c "print(sys.stdout.encoding)"' in the same
    conditions just to retrieve the encoding. fun fun.
    I hope this still works on Unixes, will watch the buildbots.

    @amauryfa amauryfa closed this as completed Oct 9, 2008
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants