classification
Title: modulefinder traceback regression starting on Windows
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Anthony Sottile, barry-scott, brandtbucher, miss-islington, paul.moore, python-dev, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2020-04-12 08:37 by barry-scott, last changed 2020-04-22 19:05 by miss-islington. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 19488 merged python-dev, 2020-04-12 17:31
PR 19522 merged miss-islington, 2020-04-14 19:16
PR 19549 open barry-scott, 2020-04-16 08:30
PR 19595 merged barry-scott, 2020-04-19 09:10
PR 19621 merged miss-islington, 2020-04-20 14:59
PR 19641 merged Anthony Sottile, 2020-04-22 02:04
PR 19659 merged miss-islington, 2020-04-22 18:43
Messages (30)
msg366227 - (view) Author: Barry Alan Scott (barry-scott) * Date: 2020-04-12 08:37
modulefinder.py does not open source files in "rb" which
prevents compile() from applying the encoding rules.

This first showed up for me on Windows with Python 3.8.

Here is my test case and the results on Windows with 3.8.

import modulefinder

with open('import_functools.py', 'w') as f:
   f.write('import functools\n')

mf = modulefinder.ModuleFinder()
mf.run_script('import_functools.py')
print('Test passed')

py -3.8-32 modulefinder_test.py
Traceback (most recent call last):
 File "modulefinder_test.py", line 7, in <module>
   mf.run_script('import_functools.py')
 File "C:\Python38.win32\lib\modulefinder.py", line 165, in run_script
   self.load_module('__main__', fp, pathname, stuff)
 File "C:\Python38.win32\lib\modulefinder.py", line 360, in load_module
   self.scan_code(co, m)
 File "C:\Python38.win32\lib\modulefinder.py", line 433, in scan_code
   self._safe_import_hook(name, m, fromlist, level=0)
 File "C:\Python38.win32\lib\modulefinder.py", line 378, in _safe_import_hook
   self.import_hook(name, caller, level=level)
 File "C:\Python38.win32\lib\modulefinder.py", line 177, in import_hook
   q, tail = self.find_head_package(parent, name)
 File "C:\Python38.win32\lib\modulefinder.py", line 233, in find_head_package
   q = self.import_module(head, qname, parent)
 File "C:\Python38.win32\lib\modulefinder.py", line 326, in import_module
   m = self.load_module(fqname, fp, pathname, stuff)
 File "C:\Python38.win32\lib\modulefinder.py", line 343, in load_module
   co = compile(fp.read()+'\n', pathname, 'exec')
 File "C:\Python38.win32\lib\encodings\cp1252.py", line 23, in decode
   return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 308: character maps to <undefined>

Barry
msg366295 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-13 10:15
(+Windows tag)

These should use io.open_code() these days anyway, which always opens in 'rb'.

Could you make that change?
msg366414 - (view) Author: Barry Alan Scott (barry-scott) * Date: 2020-04-14 17:57
io.open_code() used in the patch.
msg366427 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-14 18:34
Thanks, Barry!

I tweaked your NEWS entry a little, so once CI completes I'll merge and backport.
msg366432 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-14 19:16
New changeset d42e5820631cd66ee1eab8f610d4b58f3dfdd81c by Barry in branch 'master':
bpo-40260: Update modulefinder to use io.open_code() and respect coding comments (GH-19488)
https://github.com/python/cpython/commit/d42e5820631cd66ee1eab8f610d4b58f3dfdd81c
msg366434 - (view) Author: miss-islington (miss-islington) Date: 2020-04-14 19:34
New changeset 59047fab0ef37f583c9e7c3a48d67792fd10ff91 by Miss Islington (bot) in branch '3.8':
bpo-40260: Update modulefinder to use io.open_code() and respect coding comments (GH-19488)
https://github.com/python/cpython/commit/59047fab0ef37f583c9e7c3a48d67792fd10ff91
msg366544 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2020-04-15 19:47
This patch has broken debian's builds due to use of modulefinder -- notably the type of `file_info` changed as a side-effect of this patch and is now causing:

../debian/pymindeps.py:29: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
Traceback (most recent call last):
  File "../debian/pymindeps.py", line 185, in <module>
    main(sys.argv[1:])
  File "../debian/pymindeps.py", line 178, in main
    mf.run_script(arg)
  File "/tmp/code/Lib/modulefinder.py", line 163, in run_script
    self.load_module('__main__', fp, pathname, stuff)
  File "../debian/pymindeps.py", line 91, in load_module
    suffix, mode, type = file_info
ValueError: not enough values to unpack (expected 3, got 2)


This is breaking python3.8 and python3.9 (should I open a separate issue or is it ok to continue discussion here?)
msg366549 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-15 20:27
Go ahead and fix it against this one.
msg366550 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-15 20:27
Would be ideal to add a test that hits this case as well.
msg366552 - (view) Author: Barry Alan Scott (barry-scott) * Date: 2020-04-15 20:34
I need to see the code of pymindeps to understand what you are doing and how to fix this. Can you post a URL to the source please?

Are you aware that load_module() changed in other ways that are required to fix the bug?

You may have to change yout pymindeps code anyway even with the mode restored to the tuple.

Steve: Do I need a new PR or just commit a fix to the same branch?
msg366555 - (view) Author: Barry Alan Scott (barry-scott) * Date: 2020-04-15 20:40
Regarding test case. I will need to know what pymindeps is doing to be able to design a suitable test case.
msg366556 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2020-04-15 20:45
I'm admittedly a little unfamiliar with what it does as well -- I'm mostly repackaging debian sources for deadsnakes.  If I'm correct in my assumption, it is attempting to find a minimal set of modules to package into `python3-minimal` and ensure that that set is correct

here's my current version of the source: https://github.com/deadsnakes/python3.9-nightly/blob/3eb45671e2f1910a6f117580f1733e431cce8a6e/debian/pymindeps.py
msg366557 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2020-04-15 20:45
(additionally, I'm not sure this should be backported to python3.8, especially with changed behaviour)
msg366563 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-15 21:58
The 3.8 behaviour is clearly broken, so backporting is fine. Also, io.open_code() is a security feature.

The core of the issue is that apparently load_module() is a public API, and the file_info parameter used to be a 3-tuple and is now a 2-tuple.

AFAICT, the fix should restore the 3-tuple everywhere and just ignore the mode parameter. The test should pass in a 3-tuple and make sure that it doesn't crash like this.
msg366564 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-15 22:02
Sorry, misready. They're monkeypatching the API. It isn't documented, but it's also not clearly internal.

We don't have a strong need to change the meaning of that argument though, so best to leave it as it was and not break anyone.
msg366584 - (view) Author: Barry Alan Scott (barry-scott) * Date: 2020-04-16 08:47
I have the fix coded and tested.

I run out of git knowledge to update the PR branch so that I can push the fix.

I'll work on it more later in the day.
msg366769 - (view) Author: Barry Alan Scott (barry-scott) * Date: 2020-04-19 09:17
I have pushed the fix onto https://github.com/python/cpython/pull/19595
with an API test case and the changes to keep the debain subclassing
working.

I'm new the the work flow. Let me know if I need to change anything.
msg366827 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-20 13:35
Thanks! I deleted the NEWS file (we've already got one for this issue in this release) and skipped the check instead. Once CI passes, I'll merge it.
msg366836 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-20 14:58
New changeset 9b0b5d2baebd0b6a545317200c313a6a7408731e by Barry in branch 'master':
bpo-40260: Revert breaking changes made in modulefinder (GH-19595)
https://github.com/python/cpython/commit/9b0b5d2baebd0b6a545317200c313a6a7408731e
msg366842 - (view) Author: miss-islington (miss-islington) Date: 2020-04-20 15:18
New changeset 81de3c225774179cdc82a1733a64e4a876ff02b5 by Miss Islington (bot) in branch '3.8':
bpo-40260: Revert breaking changes made in modulefinder (GH-19595)
https://github.com/python/cpython/commit/81de3c225774179cdc82a1733a64e4a876ff02b5
msg366967 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2020-04-22 01:43
This is still failing, but now in a different way:


Traceback (most recent call last):
  File "../debian/pymindeps.py", line 185, in <module>
    main(sys.argv[1:])
  File "../debian/pymindeps.py", line 178, in main
    mf.run_script(arg)
  File "/tmp/code/Lib/modulefinder.py", line 163, in run_script
    self.load_module('__main__', fp, pathname, stuff)
  File "../debian/pymindeps.py", line 92, in load_module
    m = modulefinder.ModuleFinder.load_module(self, fqname,
  File "/tmp/code/Lib/modulefinder.py", line 359, in load_module
    self.scan_code(co, m)
  File "/tmp/code/Lib/modulefinder.py", line 432, in scan_code
    self._safe_import_hook(name, m, fromlist, level=0)
  File "/tmp/code/Lib/modulefinder.py", line 377, in _safe_import_hook
    self.import_hook(name, caller, level=level)
  File "../debian/pymindeps.py", line 42, in import_hook
    return modulefinder.ModuleFinder.import_hook(self, name, caller,
  File "/tmp/code/Lib/modulefinder.py", line 175, in import_hook
    q, tail = self.find_head_package(parent, name)
  File "/tmp/code/Lib/modulefinder.py", line 231, in find_head_package
    q = self.import_module(head, qname, parent)
  File "../debian/pymindeps.py", line 48, in import_module
    m = modulefinder.ModuleFinder.import_module(self,
  File "/tmp/code/Lib/modulefinder.py", line 325, in import_module
    m = self.load_module(fqname, fp, pathname, stuff)
  File "../debian/pymindeps.py", line 92, in load_module
    m = modulefinder.ModuleFinder.load_module(self, fqname,
  File "/tmp/code/Lib/modulefinder.py", line 342, in load_module
    co = compile(fp.read()+b'\n', pathname, 'exec')
TypeError: can only concatenate str (not "bytes") to str
msg366968 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2020-04-22 01:55
The failure above is `fp` is now a text file but before it was a binary file.  this causes `fp.read()` + `b'\n'` to fail

I can send a patch -- I don't think the `b'\n'` is necessary any more (as far as I can tell it used to be there for an old version of python where `compile(...)` errored if the file did not end in a newline, that's now handled in the parser internally though)
msg366969 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2020-04-22 02:05
actually I said that backwards in the previous message -- it was a text file and now it's a binary file
msg367001 - (view) Author: Barry Alan Scott (barry-scott) * Date: 2020-04-22 13:18
Anthony,

Now that everything is opened using open_code that returns bytes its
not clear to me why this breaks for you.

Further the data must be bytes for the codings to be figured out.

Removing the b'\n' may be reasonable, but not for the reason given.
I kept it as the original code did this and assumed it was needed.
Maybe for Windows users that forget to put a trailing NL?

How can I reproduce your user case?
msg367020 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2020-04-22 16:51
debian has its own implementation of find_modules which still returns a text file for PY_SOURCE

changing the type of that io object is the "breaking change" and maybe shouldn't be backported to python3.8 in a patch release
msg367026 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-22 17:12
If we want to be strictly correct here, the fix should check the "mode" variable to determine the type. And given the fragility of the code being used here, there's no doubt some reliance on a trailing newline somewhere too.

I haven't bumped the categorisation, but at some level this is a security fix. All executable code is expected to be loaded through io.open_code(), and the fact that this didn't was a breach of that guarantee. There's a limit to how much we will accommodate people using it outside of what's documented.

Anthony, since you can apparently validate this, could you run through all the fixes necessary to keep that working? I want to stop this game of chasing, either by fixing everything in one go or by apologising for the breakage (the io.open_code() fix is staying).
msg367030 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2020-04-22 17:28
The trailing newline was added in 1.5.1 78fc3634cbfd65a6be8abfd1b7fc7cbd0ccbfb39 -- back then `compile(...)` could not take strings which did not end in newlines

now it can, so it is safe to remove -- I did the same triage on another PR here: https://github.com/python/cpython/pull/17817
msg367035 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-22 17:44
Okay, no problems then.

We'll have to close that other PR now. Too bad nobody noticed it (or 
knew where to send it - I doubt modulefinder has an obvious owner).
msg367040 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-04-22 18:43
New changeset 39652cd8bdf7c82b7c6055089a4ed90ee546a448 by Anthony Sottile in branch 'master':
bpo-40260: Remove unnecessary newline in compile() call (GH-19641)
https://github.com/python/cpython/commit/39652cd8bdf7c82b7c6055089a4ed90ee546a448
msg367042 - (view) Author: miss-islington (miss-islington) Date: 2020-04-22 19:05
New changeset fc45cb4400409572f05c8b54f2c6f06cbefb1b4e by Miss Islington (bot) in branch '3.8':
bpo-40260: Remove unnecessary newline in compile() call (GH-19641)
https://github.com/python/cpython/commit/fc45cb4400409572f05c8b54f2c6f06cbefb1b4e
History
Date User Action Args
2020-04-22 19:05:17miss-islingtonsetmessages: + msg367042
2020-04-22 18:43:11miss-islingtonsetpull_requests: + pull_request18985
2020-04-22 18:43:07steve.dowersetmessages: + msg367040
2020-04-22 17:44:56steve.dowersetmessages: + msg367035
2020-04-22 17:28:46Anthony Sottilesetmessages: + msg367030
2020-04-22 17:12:15steve.dowersetmessages: + msg367026
2020-04-22 16:51:03Anthony Sottilesetmessages: + msg367020
2020-04-22 13:18:25barry-scottsetmessages: + msg367001
2020-04-22 02:05:32Anthony Sottilesetmessages: + msg366969
2020-04-22 02:04:45Anthony Sottilesetpull_requests: + pull_request18965
2020-04-22 01:55:22Anthony Sottilesetmessages: + msg366968
2020-04-22 01:43:54Anthony Sottilesetmessages: + msg366967
2020-04-20 16:13:58steve.dowersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-04-20 15:18:18miss-islingtonsetmessages: + msg366842
2020-04-20 14:59:05miss-islingtonsetpull_requests: + pull_request18951
2020-04-20 14:58:50steve.dowersetmessages: + msg366836
2020-04-20 13:35:16steve.dowersetmessages: + msg366827
2020-04-19 09:17:52barry-scottsetmessages: + msg366769
2020-04-19 09:10:07barry-scottsetpull_requests: + pull_request18931
2020-04-16 08:47:10barry-scottsetmessages: + msg366584
2020-04-16 08:30:21barry-scottsetstage: needs patch -> patch review
pull_requests: + pull_request18895
2020-04-15 22:02:13steve.dowersetmessages: + msg366564
2020-04-15 21:58:17steve.dowersetmessages: + msg366563
2020-04-15 20:45:50Anthony Sottilesetmessages: + msg366557
2020-04-15 20:45:00Anthony Sottilesetmessages: + msg366556
2020-04-15 20:40:03barry-scottsetmessages: + msg366555
2020-04-15 20:34:50barry-scottsetmessages: + msg366552
2020-04-15 20:27:44steve.dowersetmessages: + msg366550
2020-04-15 20:27:22steve.dowersetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg366549

stage: resolved -> needs patch
2020-04-15 19:47:04Anthony Sottilesetnosy: + Anthony Sottile
messages: + msg366544
2020-04-14 19:37:34steve.dowersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-04-14 19:34:51miss-islingtonsetmessages: + msg366434
2020-04-14 19:16:19miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request18872
2020-04-14 19:16:10steve.dowersetmessages: + msg366432
2020-04-14 18:34:38steve.dowersetmessages: + msg366427
2020-04-14 17:57:30barry-scottsetmessages: + msg366414
components: - Windows
2020-04-13 10:15:07steve.dowersetnosy: + steve.dower, paul.moore, tim.golden, zach.ware
messages: + msg366295
components: + Windows
2020-04-12 17:31:33python-devsetkeywords: + patch
nosy: + python-dev

pull_requests: + pull_request18841
stage: needs patch -> patch review
2020-04-12 10:19:28SilentGhostsetnosy: + brandtbucher

type: behavior
stage: needs patch
2020-04-12 08:37:00barry-scottcreate