Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modulefinder traceback regression starting on Windows #84441

Closed
barry-scott mannequin opened this issue Apr 12, 2020 · 30 comments
Closed

modulefinder traceback regression starting on Windows #84441

barry-scott mannequin opened this issue Apr 12, 2020 · 30 comments
Labels
3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@barry-scott
Copy link
Mannequin

barry-scott mannequin commented Apr 12, 2020

BPO 40260
Nosy @barry-scott, @pfmoore, @tjguk, @zware, @zooba, @asottile, @miss-islington, @brandtbucher
PRs
  • bpo-40260: Allow compile() to handle the module's source decoding #19488
  • [3.8] bpo-40260: Update modulefinder to use io.open_code() and respect coding comments (GH-19488) #19522
  • bpo-40260: fix API regression with file_info mode field #19549
  • bpo-40260: revert refactor of file_info tuple #19595
  • [3.8] bpo-40260: Revert breaking changes made in modulefinder (GH-19595) #19621
  • bpo-40260: Remove unnecessary newline in compile() call #19641
  • [3.8] bpo-40260: Remove unnecessary newline in compile() call (GH-19641) #19659
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-04-20.16:13:58.112>
    created_at = <Date 2020-04-12.08:37:00.963>
    labels = ['3.8', 'type-bug', 'library', '3.9']
    title = 'modulefinder traceback regression starting on Windows'
    updated_at = <Date 2020-04-22.19:05:17.162>
    user = 'https://github.com/barry-scott'

    bugs.python.org fields:

    activity = <Date 2020-04-22.19:05:17.162>
    actor = 'miss-islington'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-04-20.16:13:58.112>
    closer = 'steve.dower'
    components = ['Library (Lib)']
    creation = <Date 2020-04-12.08:37:00.963>
    creator = 'barry-scott'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40260
    keywords = ['patch']
    message_count = 30.0
    messages = ['366227', '366295', '366414', '366427', '366432', '366434', '366544', '366549', '366550', '366552', '366555', '366556', '366557', '366563', '366564', '366584', '366769', '366827', '366836', '366842', '366967', '366968', '366969', '367001', '367020', '367026', '367030', '367035', '367040', '367042']
    nosy_count = 9.0
    nosy_names = ['barry-scott', 'paul.moore', 'tim.golden', 'python-dev', 'zach.ware', 'steve.dower', 'Anthony Sottile', 'miss-islington', 'brandtbucher']
    pr_nums = ['19488', '19522', '19549', '19595', '19621', '19641', '19659']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue40260'
    versions = ['Python 3.8', 'Python 3.9']

    @barry-scott
    Copy link
    Mannequin Author

    barry-scott mannequin commented Apr 12, 2020

    modulefinder.py does not open source files in "rb" which
    prevents compile() from applying the encoding rules.

    This first showed up for me on Windows with Python 3.8.

    Here is my test case and the results on Windows with 3.8.

    import modulefinder
    
    with open('import_functools.py', 'w') as f:
       f.write('import functools\n')
    
    mf = modulefinder.ModuleFinder()
    mf.run_script('import_functools.py')
    print('Test passed')
    py -3.8-32 modulefinder_test.py
    Traceback (most recent call last):
     File "modulefinder_test.py", line 7, in <module>
       mf.run_script('import_functools.py')
     File "C:\Python38.win32\lib\modulefinder.py", line 165, in run_script
       self.load_module('__main__', fp, pathname, stuff)
     File "C:\Python38.win32\lib\modulefinder.py", line 360, in load_module
       self.scan_code(co, m)
     File "C:\Python38.win32\lib\modulefinder.py", line 433, in scan_code
       self._safe_import_hook(name, m, fromlist, level=0)
     File "C:\Python38.win32\lib\modulefinder.py", line 378, in _safe_import_hook
       self.import_hook(name, caller, level=level)
     File "C:\Python38.win32\lib\modulefinder.py", line 177, in import_hook
       q, tail = self.find_head_package(parent, name)
     File "C:\Python38.win32\lib\modulefinder.py", line 233, in find_head_package
       q = self.import_module(head, qname, parent)
     File "C:\Python38.win32\lib\modulefinder.py", line 326, in import_module
       m = self.load_module(fqname, fp, pathname, stuff)
     File "C:\Python38.win32\lib\modulefinder.py", line 343, in load_module
       co = compile(fp.read()+'\n', pathname, 'exec')
     File "C:\Python38.win32\lib\encodings\cp1252.py", line 23, in decode
       return codecs.charmap_decode(input,self.errors,decoding_table)[0]
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 308: character maps to <undefined>

    Barry

    @barry-scott barry-scott mannequin added 3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir labels Apr 12, 2020
    @SilentGhost SilentGhost mannequin added type-bug An unexpected behavior, bug, or error labels Apr 12, 2020
    @zooba
    Copy link
    Member

    zooba commented Apr 13, 2020

    (+Windows tag)

    These should use io.open_code() these days anyway, which always opens in 'rb'.

    Could you make that change?

    @barry-scott
    Copy link
    Mannequin Author

    barry-scott mannequin commented Apr 14, 2020

    io.open_code() used in the patch.

    @barry-scott barry-scott mannequin removed OS-windows labels Apr 14, 2020
    @zooba
    Copy link
    Member

    zooba commented Apr 14, 2020

    Thanks, Barry!

    I tweaked your NEWS entry a little, so once CI completes I'll merge and backport.

    @zooba
    Copy link
    Member

    zooba commented Apr 14, 2020

    New changeset d42e582 by Barry in branch 'master':
    bpo-40260: Update modulefinder to use io.open_code() and respect coding comments (GH-19488)
    d42e582

    @miss-islington
    Copy link
    Contributor

    New changeset 59047fa by Miss Islington (bot) in branch '3.8':
    bpo-40260: Update modulefinder to use io.open_code() and respect coding comments (GH-19488)
    59047fa

    @zooba zooba closed this as completed Apr 14, 2020
    @zooba zooba closed this as completed Apr 14, 2020
    @asottile
    Copy link
    Mannequin

    asottile mannequin commented Apr 15, 2020

    This patch has broken debian's builds due to use of modulefinder -- notably the type of file_info changed as a side-effect of this patch and is now causing:

    ../debian/pymindeps.py:29: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
      import imp
    Traceback (most recent call last):
      File "../debian/pymindeps.py", line 185, in <module>
        main(sys.argv[1:])
      File "../debian/pymindeps.py", line 178, in main
        mf.run_script(arg)
      File "/tmp/code/Lib/modulefinder.py", line 163, in run_script
        self.load_module('__main__', fp, pathname, stuff)
      File "../debian/pymindeps.py", line 91, in load_module
        suffix, mode, type = file_info
    ValueError: not enough values to unpack (expected 3, got 2)

    This is breaking python3.8 and python3.9 (should I open a separate issue or is it ok to continue discussion here?)

    @zooba
    Copy link
    Member

    zooba commented Apr 15, 2020

    Go ahead and fix it against this one.

    @zooba zooba reopened this Apr 15, 2020
    @zooba zooba reopened this Apr 15, 2020
    @zooba
    Copy link
    Member

    zooba commented Apr 15, 2020

    Would be ideal to add a test that hits this case as well.

    @barry-scott
    Copy link
    Mannequin Author

    barry-scott mannequin commented Apr 15, 2020

    I need to see the code of pymindeps to understand what you are doing and how to fix this. Can you post a URL to the source please?

    Are you aware that load_module() changed in other ways that are required to fix the bug?

    You may have to change yout pymindeps code anyway even with the mode restored to the tuple.

    Steve: Do I need a new PR or just commit a fix to the same branch?

    @barry-scott
    Copy link
    Mannequin Author

    barry-scott mannequin commented Apr 15, 2020

    Regarding test case. I will need to know what pymindeps is doing to be able to design a suitable test case.

    @asottile
    Copy link
    Mannequin

    asottile mannequin commented Apr 15, 2020

    I'm admittedly a little unfamiliar with what it does as well -- I'm mostly repackaging debian sources for deadsnakes. If I'm correct in my assumption, it is attempting to find a minimal set of modules to package into python3-minimal and ensure that that set is correct

    here's my current version of the source: https://github.com/deadsnakes/python3.9-nightly/blob/3eb45671e2f1910a6f117580f1733e431cce8a6e/debian/pymindeps.py

    @asottile
    Copy link
    Mannequin

    asottile mannequin commented Apr 15, 2020

    (additionally, I'm not sure this should be backported to python3.8, especially with changed behaviour)

    @zooba
    Copy link
    Member

    zooba commented Apr 15, 2020

    The 3.8 behaviour is clearly broken, so backporting is fine. Also, io.open_code() is a security feature.

    The core of the issue is that apparently load_module() is a public API, and the file_info parameter used to be a 3-tuple and is now a 2-tuple.

    AFAICT, the fix should restore the 3-tuple everywhere and just ignore the mode parameter. The test should pass in a 3-tuple and make sure that it doesn't crash like this.

    @zooba
    Copy link
    Member

    zooba commented Apr 15, 2020

    Sorry, misready. They're monkeypatching the API. It isn't documented, but it's also not clearly internal.

    We don't have a strong need to change the meaning of that argument though, so best to leave it as it was and not break anyone.

    @barry-scott
    Copy link
    Mannequin Author

    barry-scott mannequin commented Apr 16, 2020

    I have the fix coded and tested.

    I run out of git knowledge to update the PR branch so that I can push the fix.

    I'll work on it more later in the day.

    @barry-scott
    Copy link
    Mannequin Author

    barry-scott mannequin commented Apr 19, 2020

    I have pushed the fix onto #19595
    with an API test case and the changes to keep the debain subclassing
    working.

    I'm new the the work flow. Let me know if I need to change anything.

    @zooba
    Copy link
    Member

    zooba commented Apr 20, 2020

    Thanks! I deleted the NEWS file (we've already got one for this issue in this release) and skipped the check instead. Once CI passes, I'll merge it.

    @zooba
    Copy link
    Member

    zooba commented Apr 20, 2020

    New changeset 9b0b5d2 by Barry in branch 'master':
    bpo-40260: Revert breaking changes made in modulefinder (GH-19595)
    9b0b5d2

    @miss-islington
    Copy link
    Contributor

    New changeset 81de3c2 by Miss Islington (bot) in branch '3.8':
    bpo-40260: Revert breaking changes made in modulefinder (GH-19595)
    81de3c2

    @zooba zooba closed this as completed Apr 20, 2020
    @zooba zooba closed this as completed Apr 20, 2020
    @asottile
    Copy link
    Mannequin

    asottile mannequin commented Apr 22, 2020

    This is still failing, but now in a different way:

    Traceback (most recent call last):
      File "../debian/pymindeps.py", line 185, in <module>
        main(sys.argv[1:])
      File "../debian/pymindeps.py", line 178, in main
        mf.run_script(arg)
      File "/tmp/code/Lib/modulefinder.py", line 163, in run_script
        self.load_module('__main__', fp, pathname, stuff)
      File "../debian/pymindeps.py", line 92, in load_module
        m = modulefinder.ModuleFinder.load_module(self, fqname,
      File "/tmp/code/Lib/modulefinder.py", line 359, in load_module
        self.scan_code(co, m)
      File "/tmp/code/Lib/modulefinder.py", line 432, in scan_code
        self._safe_import_hook(name, m, fromlist, level=0)
      File "/tmp/code/Lib/modulefinder.py", line 377, in _safe_import_hook
        self.import_hook(name, caller, level=level)
      File "../debian/pymindeps.py", line 42, in import_hook
        return modulefinder.ModuleFinder.import_hook(self, name, caller,
      File "/tmp/code/Lib/modulefinder.py", line 175, in import_hook
        q, tail = self.find_head_package(parent, name)
      File "/tmp/code/Lib/modulefinder.py", line 231, in find_head_package
        q = self.import_module(head, qname, parent)
      File "../debian/pymindeps.py", line 48, in import_module
        m = modulefinder.ModuleFinder.import_module(self,
      File "/tmp/code/Lib/modulefinder.py", line 325, in import_module
        m = self.load_module(fqname, fp, pathname, stuff)
      File "../debian/pymindeps.py", line 92, in load_module
        m = modulefinder.ModuleFinder.load_module(self, fqname,
      File "/tmp/code/Lib/modulefinder.py", line 342, in load_module
        co = compile(fp.read()+b'\n', pathname, 'exec')
    TypeError: can only concatenate str (not "bytes") to str

    @asottile
    Copy link
    Mannequin

    asottile mannequin commented Apr 22, 2020

    The failure above is fp is now a text file but before it was a binary file. this causes fp.read() + b'\n' to fail

    I can send a patch -- I don't think the b'\n' is necessary any more (as far as I can tell it used to be there for an old version of python where compile(...) errored if the file did not end in a newline, that's now handled in the parser internally though)

    @asottile
    Copy link
    Mannequin

    asottile mannequin commented Apr 22, 2020

    actually I said that backwards in the previous message -- it was a text file and now it's a binary file

    @barry-scott
    Copy link
    Mannequin Author

    barry-scott mannequin commented Apr 22, 2020

    Anthony,

    Now that everything is opened using open_code that returns bytes its
    not clear to me why this breaks for you.

    Further the data must be bytes for the codings to be figured out.

    Removing the b'\n' may be reasonable, but not for the reason given.
    I kept it as the original code did this and assumed it was needed.
    Maybe for Windows users that forget to put a trailing NL?

    How can I reproduce your user case?

    @asottile
    Copy link
    Mannequin

    asottile mannequin commented Apr 22, 2020

    debian has its own implementation of find_modules which still returns a text file for PY_SOURCE

    changing the type of that io object is the "breaking change" and maybe shouldn't be backported to python3.8 in a patch release

    @zooba
    Copy link
    Member

    zooba commented Apr 22, 2020

    If we want to be strictly correct here, the fix should check the "mode" variable to determine the type. And given the fragility of the code being used here, there's no doubt some reliance on a trailing newline somewhere too.

    I haven't bumped the categorisation, but at some level this is a security fix. All executable code is expected to be loaded through io.open_code(), and the fact that this didn't was a breach of that guarantee. There's a limit to how much we will accommodate people using it outside of what's documented.

    Anthony, since you can apparently validate this, could you run through all the fixes necessary to keep that working? I want to stop this game of chasing, either by fixing everything in one go or by apologising for the breakage (the io.open_code() fix is staying).

    @asottile
    Copy link
    Mannequin

    asottile mannequin commented Apr 22, 2020

    The trailing newline was added in 1.5.1 78fc363 -- back then compile(...) could not take strings which did not end in newlines

    now it can, so it is safe to remove -- I did the same triage on another PR here: #17817

    @zooba
    Copy link
    Member

    zooba commented Apr 22, 2020

    Okay, no problems then.

    We'll have to close that other PR now. Too bad nobody noticed it (or
    knew where to send it - I doubt modulefinder has an obvious owner).

    @zooba
    Copy link
    Member

    zooba commented Apr 22, 2020

    New changeset 39652cd by Anthony Sottile in branch 'master':
    bpo-40260: Remove unnecessary newline in compile() call (GH-19641)
    39652cd

    @miss-islington
    Copy link
    Contributor

    New changeset fc45cb4 by Miss Islington (bot) in branch '3.8':
    bpo-40260: Remove unnecessary newline in compile() call (GH-19641)
    fc45cb4

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants