Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imp.find_module('test/badsyntax_pep3120') causes segfault #53565

Closed
ronadam mannequin opened this issue Jul 21, 2010 · 24 comments
Closed

imp.find_module('test/badsyntax_pep3120') causes segfault #53565

ronadam mannequin opened this issue Jul 21, 2010 · 24 comments
Labels
stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@ronadam
Copy link
Mannequin

ronadam mannequin commented Jul 21, 2010

BPO 9319
Nosy @jcea, @pitrou, @vstinner, @benjaminp, @ezio-melotti, @bitdancer, @Trundle, @skrah, @ambv
Files
  • p3k_i9313.diff: Test and Fix
  • tokenizer_encoding_filename.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-04-22.22:44:08.242>
    created_at = <Date 2010-07-21.06:29:24.070>
    labels = ['library', 'type-crash']
    title = "imp.find_module('test/badsyntax_pep3120') causes segfault"
    updated_at = <Date 2012-07-15.12:17:17.615>
    user = 'https://bugs.python.org/ronadam'

    bugs.python.org fields:

    activity = <Date 2012-07-15.12:17:17.615>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-04-22.22:44:08.242>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2010-07-21.06:29:24.070>
    creator = 'ron_adam'
    dependencies = []
    files = ['18391', '20148']
    hgrepos = []
    issue_num = 9319
    keywords = ['patch']
    message_count = 24.0
    messages = ['111009', '111032', '111052', '111518', '111952', '112870', '112882', '112922', '119114', '124018', '124033', '124558', '124559', '130644', '132989', '132992', '133984', '134281', '134282', '134283', '134365', '134381', '144430', '165522']
    nosy_count = 11.0
    nosy_names = ['jcea', 'pitrou', 'vstinner', 'ron_adam', 'benjamin.peterson', 'ezio.melotti', 'r.david.murray', 'Trundle', 'skrah', 'lukasz.langa', 'python-dev']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue9319'
    versions = ['Python 3.2']

    @ronadam
    Copy link
    Mannequin Author

    ronadam mannequin commented Jul 21, 2010

    help('modules spam') causes segfault.

    When pydoc tries goes though the files it does the following in the ModuleScanner class.

    (minimal example)

    >>> for importer, modname, ispkg in pkgutil.walk_packages():
    ...     if modname == 'test.badsyntax_pep3120':
    ...         loader = importer.find_module(modname)
    ... 
    Segmentation fault

    Adding:

       if modname=='test.badsyntax_pep3120':
          continue

    At the top of the for loop will suppress the segfault in pydoc by skipping that file.

    A bit further probing narrowed it down to this...

    >>> loader = pkgutil.get_loader('test.badsyntax_pep3120')
    Segmentation fault

    I'm not familiar with the pkgutil module so I didn't go further.

    @ronadam ronadam mannequin added stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump labels Jul 21, 2010
    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 21, 2010

    Looks like tok->filename isn't set in PyTokenizer_FromFile:

    Index: Parser/tokenizer.c
    ===================================================================

    --- Parser/tokenizer.c  (revision 82984)
    +++ Parser/tokenizer.c  (working copy)
    @@ -818,6 +818,7 @@
         tok->cur = tok->inp = tok->buf;
         tok->end = tok->buf + BUFSIZ;
         tok->fp = fp;
    +    tok->filename = "XXX";
         tok->prompt = ps1;
         tok->nextprompt = ps2;
         if (enc != NULL) {

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 21, 2010

    To be a little clearer: Since tok->filename is used as a flag in other
    places, I'm not sure where to set it without breaking other things.
    This is the location of the segfault and the following would "fix" it
    (using a placeholder for the name):

    Index: Parser/tokenizer.c
    ===================================================================

    --- Parser/tokenizer.c  (revision 83019)
    +++ Parser/tokenizer.c  (working copy)
    @@ -582,6 +582,8 @@
         if (badchar) {
             /* Need to add 1 to the line number, since this line
                has not been counted, yet.  */
    +        if (tok->filename == NULL)
    +            tok->filename = "<file>";
             PyErr_Format(PyExc_SyntaxError,
                 "Non-UTF-8 code starting with '\\x%.2x' "
                 "in file %.200s on line %i, "

    @ronadam
    Copy link
    Mannequin Author

    ronadam mannequin commented Jul 25, 2010

    Trying to look at this a bit further...

    The PyErr_Format function then calls PyUnicode_FromFormatV in Objects/unicodeobject.c to do the actual formating.

    It looks like there have been a number of issues of this type. Search the tracker for unicodeobject.c and PyUnicode_FromFormatV.

    This may be related to http://bugs.python.org/issue7330.

    @ronadam
    Copy link
    Mannequin Author

    ronadam mannequin commented Jul 29, 2010

    The error happens when Null is passed to strlen in (unicodeobject.c, line 860)

    Passing NULL to a string format function is probably in the category of don't do that.

    Stefans solution of checking for NULL before calling PyErr_Format looks to me to be correct. As he notes, the same technique is used in another place.

    It might be good to add a note somewhere not to pass NULL to C string format functions.

    @ronadam
    Copy link
    Mannequin Author

    ronadam mannequin commented Aug 4, 2010

    This is by far the simplest fix for this. See patch file.

    This patch is what Stefan Krah suggested and I agree unless someone a lot more familiar with the import process can take a look at this and re-factor things so the filename is passed along with the file descriptor so the error can include it. I tried and each change required other changes in order to get the correct failure and error messages. I gave up when the tests for non existent file started failing with ']' on the first line.

    I can't commit this, so could someone who can take a look at the patch and move this along.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Aug 4, 2010

    The diff which I posted was not really meant as the ultimate fix. It's a
    workaround. I think the real filename should be passed down to
    PyTokenizer_FromFile or
    PyTokenizer_FindEncoding.

    @ronadam
    Copy link
    Mannequin Author

    ronadam mannequin commented Aug 4, 2010

    I added you to this Victor because it looks like what your doing to rewrite the imports to work with Unicode (issue:9425) overlaps this.

    See the test in the patch.

    Your rewrite may fix this as the segfault has to do with getting the file encoding. My apologies if this isn't the case.

    Ron

    @ronadam
    Copy link
    Mannequin Author

    ronadam mannequin commented Oct 19, 2010

    The test in the patch isn't quite right. The following still fails.

    Python 3.2a3+ (py3k:85719, Oct 18 2010, 22:32:47) 
    [GCC 4.4.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import imp
    >>> imp.find_module('test/badsyntax_pep3120')
    Segmentation fault

    @bitdancer
    Copy link
    Member

    help no longer segfaults, but the find_module call still does; updating title. The patch does cure the segfault, but as Stefan says it isn't the best fix since having '<file>' in the error message instead of the real file name isn't very useful.

    @bitdancer bitdancer changed the title segfault when searching modules with help() imp.find_module('test/badsyntax_pep3120') causes segfault Dec 15, 2010
    @ronadam
    Copy link
    Mannequin Author

    ronadam mannequin commented Dec 15, 2010

    Pydoc skips the badsysntax_pep3120 file for now. When this gets fixed that workaround should be removed. The work around is commented and refers to this issue #.

    @vstinner
    Copy link
    Member

    p3k_i9313.diff is just a workaround, not the correct fix. The problem is that PyTokenizer_FindEncoding() doesn't get the filename.

    I wrote tokenizer_encoding_filename.patch which add PyTokenizer_FindEncodingFilename() and patch import.c and traceback.c to pass the filename.

    Hum, I'm not sure that my patch works if the locale encoding is not UTF-8: import.c manipulates path in the filesystem encoding, whereas PyTokenizer_FindEncodingFilename() expects UTF-8 filename.

    See this patch as a draft, I will try to fix the encoding issue.

    @vstinner
    Copy link
    Member

    See also bpo-9738 (Document the encoding of functions bytes arguments of the C API) to check which encoding is expected :-p

    @vstinner
    Copy link
    Member

    I'm working on bpo-3080 which changes the import machinery to use Unicode instead of byte strings, and so it will be possible to fix this issue correctly.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 4, 2011

    New changeset 7b8d625eb6e4 by Victor Stinner in branch 'default':
    Issue bpo-9319: Include the filename in "Non-UTF8 code ..." syntax error.
    http://hg.python.org/cpython/rev/7b8d625eb6e4

    @vstinner
    Copy link
    Member

    vstinner commented Apr 4, 2011

    Hum, I'm not sure that my patch works if the locale encoding is not
    UTF-8: import.c manipulates path in the filesystem encoding, whereas
    PyTokenizer_FindEncodingFilename() expects UTF-8 filename.

    Thanks to my work on bpo-3080, the import machinery (and other functions using Python modules and filenames) manipulates filenames as Unicode, and so I don't have to care about the filename encoding anymore.

    6e9dc970ac0e (of issue bpo-10785) fixed the crash, 7b8d625eb6e4 added the filename into the SyntaxError.

    @vstinner vstinner closed this as completed Apr 4, 2011
    @Trundle
    Copy link
    Mannequin

    Trundle mannequin commented Apr 18, 2011

    How about applying the workaround patch to Python 3.2? An unprecise error message is way better than a segfault.

    @bitdancer bitdancer reopened this Apr 22, 2011
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 22, 2011

    New changeset fa5e348889c2 by Victor Stinner in branch '3.2':
    Issue bpo-9319: Fix a crash on parsing a Python source code without encoding
    http://hg.python.org/cpython/rev/fa5e348889c2

    @vstinner
    Copy link
    Member

    Fixed in 3.2 (fa5e348889c2) and 3.3 (7b8d625eb6e4). The bug is a regression introduced in Python 3.2, so Python 3.1 doesn't need to be fixed.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 22, 2011

    New changeset 701069f9888c by Victor Stinner in branch '3.2':
    Issue bpo-9319: Fix the unit test
    http://hg.python.org/cpython/rev/701069f9888c

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 25, 2011

    New changeset bb62908896fe by Jesus Cea in branch 'default':
    Correctly merging bpo-9319 into 3.3?
    http://hg.python.org/cpython/rev/bb62908896fe

    @vstinner
    Copy link
    Member

    Correctly merging bpo-9319 into 3.3?

    No, this issue was fixed differently in Python 3.3. I see that you reverted (bb62908896fe) this merge (except the test, which is a good idea).

    @ezio-melotti
    Copy link
    Member

    Can the workaround be removed from Lib/pydoc.py:2001 ?
    I tried to remove it from 3.2 and help('modules spam') seems to work fine.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 15, 2012

    New changeset ce0687a8383b by Nick Coghlan in branch 'default':
    Issue bpo-9319: Remove the workaround for this since fixed problem from pydoc
    http://hg.python.org/cpython/rev/ce0687a8383b

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants