msg366227 - (view) |
Author: Barry Alan Scott (barry-scott) * |
Date: 2020-04-12 08:37 |
modulefinder.py does not open source files in "rb" which
prevents compile() from applying the encoding rules.
This first showed up for me on Windows with Python 3.8.
Here is my test case and the results on Windows with 3.8.
import modulefinder
with open('import_functools.py', 'w') as f:
f.write('import functools\n')
mf = modulefinder.ModuleFinder()
mf.run_script('import_functools.py')
print('Test passed')
py -3.8-32 modulefinder_test.py
Traceback (most recent call last):
File "modulefinder_test.py", line 7, in <module>
mf.run_script('import_functools.py')
File "C:\Python38.win32\lib\modulefinder.py", line 165, in run_script
self.load_module('__main__', fp, pathname, stuff)
File "C:\Python38.win32\lib\modulefinder.py", line 360, in load_module
self.scan_code(co, m)
File "C:\Python38.win32\lib\modulefinder.py", line 433, in scan_code
self._safe_import_hook(name, m, fromlist, level=0)
File "C:\Python38.win32\lib\modulefinder.py", line 378, in _safe_import_hook
self.import_hook(name, caller, level=level)
File "C:\Python38.win32\lib\modulefinder.py", line 177, in import_hook
q, tail = self.find_head_package(parent, name)
File "C:\Python38.win32\lib\modulefinder.py", line 233, in find_head_package
q = self.import_module(head, qname, parent)
File "C:\Python38.win32\lib\modulefinder.py", line 326, in import_module
m = self.load_module(fqname, fp, pathname, stuff)
File "C:\Python38.win32\lib\modulefinder.py", line 343, in load_module
co = compile(fp.read()+'\n', pathname, 'exec')
File "C:\Python38.win32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 308: character maps to <undefined>
Barry
|
msg366295 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2020-04-13 10:15 |
(+Windows tag)
These should use io.open_code() these days anyway, which always opens in 'rb'.
Could you make that change?
|
msg366414 - (view) |
Author: Barry Alan Scott (barry-scott) * |
Date: 2020-04-14 17:57 |
io.open_code() used in the patch.
|
msg366427 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2020-04-14 18:34 |
Thanks, Barry!
I tweaked your NEWS entry a little, so once CI completes I'll merge and backport.
|
msg366432 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2020-04-14 19:16 |
New changeset d42e5820631cd66ee1eab8f610d4b58f3dfdd81c by Barry in branch 'master':
bpo-40260: Update modulefinder to use io.open_code() and respect coding comments (GH-19488)
https://github.com/python/cpython/commit/d42e5820631cd66ee1eab8f610d4b58f3dfdd81c
|
msg366434 - (view) |
Author: miss-islington (miss-islington) |
Date: 2020-04-14 19:34 |
New changeset 59047fab0ef37f583c9e7c3a48d67792fd10ff91 by Miss Islington (bot) in branch '3.8':
bpo-40260: Update modulefinder to use io.open_code() and respect coding comments (GH-19488)
https://github.com/python/cpython/commit/59047fab0ef37f583c9e7c3a48d67792fd10ff91
|
msg366544 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2020-04-15 19:47 |
This patch has broken debian's builds due to use of modulefinder -- notably the type of `file_info` changed as a side-effect of this patch and is now causing:
../debian/pymindeps.py:29: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
Traceback (most recent call last):
File "../debian/pymindeps.py", line 185, in <module>
main(sys.argv[1:])
File "../debian/pymindeps.py", line 178, in main
mf.run_script(arg)
File "/tmp/code/Lib/modulefinder.py", line 163, in run_script
self.load_module('__main__', fp, pathname, stuff)
File "../debian/pymindeps.py", line 91, in load_module
suffix, mode, type = file_info
ValueError: not enough values to unpack (expected 3, got 2)
This is breaking python3.8 and python3.9 (should I open a separate issue or is it ok to continue discussion here?)
|
msg366549 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2020-04-15 20:27 |
Go ahead and fix it against this one.
|
msg366550 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2020-04-15 20:27 |
Would be ideal to add a test that hits this case as well.
|
msg366552 - (view) |
Author: Barry Alan Scott (barry-scott) * |
Date: 2020-04-15 20:34 |
I need to see the code of pymindeps to understand what you are doing and how to fix this. Can you post a URL to the source please?
Are you aware that load_module() changed in other ways that are required to fix the bug?
You may have to change yout pymindeps code anyway even with the mode restored to the tuple.
Steve: Do I need a new PR or just commit a fix to the same branch?
|
msg366555 - (view) |
Author: Barry Alan Scott (barry-scott) * |
Date: 2020-04-15 20:40 |
Regarding test case. I will need to know what pymindeps is doing to be able to design a suitable test case.
|
msg366556 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2020-04-15 20:45 |
I'm admittedly a little unfamiliar with what it does as well -- I'm mostly repackaging debian sources for deadsnakes. If I'm correct in my assumption, it is attempting to find a minimal set of modules to package into `python3-minimal` and ensure that that set is correct
here's my current version of the source: https://github.com/deadsnakes/python3.9-nightly/blob/3eb45671e2f1910a6f117580f1733e431cce8a6e/debian/pymindeps.py
|
msg366557 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2020-04-15 20:45 |
(additionally, I'm not sure this should be backported to python3.8, especially with changed behaviour)
|
msg366563 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2020-04-15 21:58 |
The 3.8 behaviour is clearly broken, so backporting is fine. Also, io.open_code() is a security feature.
The core of the issue is that apparently load_module() is a public API, and the file_info parameter used to be a 3-tuple and is now a 2-tuple.
AFAICT, the fix should restore the 3-tuple everywhere and just ignore the mode parameter. The test should pass in a 3-tuple and make sure that it doesn't crash like this.
|
msg366564 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2020-04-15 22:02 |
Sorry, misready. They're monkeypatching the API. It isn't documented, but it's also not clearly internal.
We don't have a strong need to change the meaning of that argument though, so best to leave it as it was and not break anyone.
|
msg366584 - (view) |
Author: Barry Alan Scott (barry-scott) * |
Date: 2020-04-16 08:47 |
I have the fix coded and tested.
I run out of git knowledge to update the PR branch so that I can push the fix.
I'll work on it more later in the day.
|
msg366769 - (view) |
Author: Barry Alan Scott (barry-scott) * |
Date: 2020-04-19 09:17 |
I have pushed the fix onto https://github.com/python/cpython/pull/19595
with an API test case and the changes to keep the debain subclassing
working.
I'm new the the work flow. Let me know if I need to change anything.
|
msg366827 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2020-04-20 13:35 |
Thanks! I deleted the NEWS file (we've already got one for this issue in this release) and skipped the check instead. Once CI passes, I'll merge it.
|
msg366836 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2020-04-20 14:58 |
New changeset 9b0b5d2baebd0b6a545317200c313a6a7408731e by Barry in branch 'master':
bpo-40260: Revert breaking changes made in modulefinder (GH-19595)
https://github.com/python/cpython/commit/9b0b5d2baebd0b6a545317200c313a6a7408731e
|
msg366842 - (view) |
Author: miss-islington (miss-islington) |
Date: 2020-04-20 15:18 |
New changeset 81de3c225774179cdc82a1733a64e4a876ff02b5 by Miss Islington (bot) in branch '3.8':
bpo-40260: Revert breaking changes made in modulefinder (GH-19595)
https://github.com/python/cpython/commit/81de3c225774179cdc82a1733a64e4a876ff02b5
|
msg366967 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2020-04-22 01:43 |
This is still failing, but now in a different way:
Traceback (most recent call last):
File "../debian/pymindeps.py", line 185, in <module>
main(sys.argv[1:])
File "../debian/pymindeps.py", line 178, in main
mf.run_script(arg)
File "/tmp/code/Lib/modulefinder.py", line 163, in run_script
self.load_module('__main__', fp, pathname, stuff)
File "../debian/pymindeps.py", line 92, in load_module
m = modulefinder.ModuleFinder.load_module(self, fqname,
File "/tmp/code/Lib/modulefinder.py", line 359, in load_module
self.scan_code(co, m)
File "/tmp/code/Lib/modulefinder.py", line 432, in scan_code
self._safe_import_hook(name, m, fromlist, level=0)
File "/tmp/code/Lib/modulefinder.py", line 377, in _safe_import_hook
self.import_hook(name, caller, level=level)
File "../debian/pymindeps.py", line 42, in import_hook
return modulefinder.ModuleFinder.import_hook(self, name, caller,
File "/tmp/code/Lib/modulefinder.py", line 175, in import_hook
q, tail = self.find_head_package(parent, name)
File "/tmp/code/Lib/modulefinder.py", line 231, in find_head_package
q = self.import_module(head, qname, parent)
File "../debian/pymindeps.py", line 48, in import_module
m = modulefinder.ModuleFinder.import_module(self,
File "/tmp/code/Lib/modulefinder.py", line 325, in import_module
m = self.load_module(fqname, fp, pathname, stuff)
File "../debian/pymindeps.py", line 92, in load_module
m = modulefinder.ModuleFinder.load_module(self, fqname,
File "/tmp/code/Lib/modulefinder.py", line 342, in load_module
co = compile(fp.read()+b'\n', pathname, 'exec')
TypeError: can only concatenate str (not "bytes") to str
|
msg366968 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2020-04-22 01:55 |
The failure above is `fp` is now a text file but before it was a binary file. this causes `fp.read()` + `b'\n'` to fail
I can send a patch -- I don't think the `b'\n'` is necessary any more (as far as I can tell it used to be there for an old version of python where `compile(...)` errored if the file did not end in a newline, that's now handled in the parser internally though)
|
msg366969 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2020-04-22 02:05 |
actually I said that backwards in the previous message -- it was a text file and now it's a binary file
|
msg367001 - (view) |
Author: Barry Alan Scott (barry-scott) * |
Date: 2020-04-22 13:18 |
Anthony,
Now that everything is opened using open_code that returns bytes its
not clear to me why this breaks for you.
Further the data must be bytes for the codings to be figured out.
Removing the b'\n' may be reasonable, but not for the reason given.
I kept it as the original code did this and assumed it was needed.
Maybe for Windows users that forget to put a trailing NL?
How can I reproduce your user case?
|
msg367020 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2020-04-22 16:51 |
debian has its own implementation of find_modules which still returns a text file for PY_SOURCE
changing the type of that io object is the "breaking change" and maybe shouldn't be backported to python3.8 in a patch release
|
msg367026 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2020-04-22 17:12 |
If we want to be strictly correct here, the fix should check the "mode" variable to determine the type. And given the fragility of the code being used here, there's no doubt some reliance on a trailing newline somewhere too.
I haven't bumped the categorisation, but at some level this is a security fix. All executable code is expected to be loaded through io.open_code(), and the fact that this didn't was a breach of that guarantee. There's a limit to how much we will accommodate people using it outside of what's documented.
Anthony, since you can apparently validate this, could you run through all the fixes necessary to keep that working? I want to stop this game of chasing, either by fixing everything in one go or by apologising for the breakage (the io.open_code() fix is staying).
|
msg367030 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2020-04-22 17:28 |
The trailing newline was added in 1.5.1 78fc3634cbfd65a6be8abfd1b7fc7cbd0ccbfb39 -- back then `compile(...)` could not take strings which did not end in newlines
now it can, so it is safe to remove -- I did the same triage on another PR here: https://github.com/python/cpython/pull/17817
|
msg367035 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2020-04-22 17:44 |
Okay, no problems then.
We'll have to close that other PR now. Too bad nobody noticed it (or
knew where to send it - I doubt modulefinder has an obvious owner).
|
msg367040 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2020-04-22 18:43 |
New changeset 39652cd8bdf7c82b7c6055089a4ed90ee546a448 by Anthony Sottile in branch 'master':
bpo-40260: Remove unnecessary newline in compile() call (GH-19641)
https://github.com/python/cpython/commit/39652cd8bdf7c82b7c6055089a4ed90ee546a448
|
msg367042 - (view) |
Author: miss-islington (miss-islington) |
Date: 2020-04-22 19:05 |
New changeset fc45cb4400409572f05c8b54f2c6f06cbefb1b4e by Miss Islington (bot) in branch '3.8':
bpo-40260: Remove unnecessary newline in compile() call (GH-19641)
https://github.com/python/cpython/commit/fc45cb4400409572f05c8b54f2c6f06cbefb1b4e
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:29 | admin | set | github: 84441 |
2020-04-22 19:05:17 | miss-islington | set | messages:
+ msg367042 |
2020-04-22 18:43:11 | miss-islington | set | pull_requests:
+ pull_request18985 |
2020-04-22 18:43:07 | steve.dower | set | messages:
+ msg367040 |
2020-04-22 17:44:56 | steve.dower | set | messages:
+ msg367035 |
2020-04-22 17:28:46 | Anthony Sottile | set | messages:
+ msg367030 |
2020-04-22 17:12:15 | steve.dower | set | messages:
+ msg367026 |
2020-04-22 16:51:03 | Anthony Sottile | set | messages:
+ msg367020 |
2020-04-22 13:18:25 | barry-scott | set | messages:
+ msg367001 |
2020-04-22 02:05:32 | Anthony Sottile | set | messages:
+ msg366969 |
2020-04-22 02:04:45 | Anthony Sottile | set | pull_requests:
+ pull_request18965 |
2020-04-22 01:55:22 | Anthony Sottile | set | messages:
+ msg366968 |
2020-04-22 01:43:54 | Anthony Sottile | set | messages:
+ msg366967 |
2020-04-20 16:13:58 | steve.dower | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2020-04-20 15:18:18 | miss-islington | set | messages:
+ msg366842 |
2020-04-20 14:59:05 | miss-islington | set | pull_requests:
+ pull_request18951 |
2020-04-20 14:58:50 | steve.dower | set | messages:
+ msg366836 |
2020-04-20 13:35:16 | steve.dower | set | messages:
+ msg366827 |
2020-04-19 09:17:52 | barry-scott | set | messages:
+ msg366769 |
2020-04-19 09:10:07 | barry-scott | set | pull_requests:
+ pull_request18931 |
2020-04-16 08:47:10 | barry-scott | set | messages:
+ msg366584 |
2020-04-16 08:30:21 | barry-scott | set | stage: needs patch -> patch review pull_requests:
+ pull_request18895 |
2020-04-15 22:02:13 | steve.dower | set | messages:
+ msg366564 |
2020-04-15 21:58:17 | steve.dower | set | messages:
+ msg366563 |
2020-04-15 20:45:50 | Anthony Sottile | set | messages:
+ msg366557 |
2020-04-15 20:45:00 | Anthony Sottile | set | messages:
+ msg366556 |
2020-04-15 20:40:03 | barry-scott | set | messages:
+ msg366555 |
2020-04-15 20:34:50 | barry-scott | set | messages:
+ msg366552 |
2020-04-15 20:27:44 | steve.dower | set | messages:
+ msg366550 |
2020-04-15 20:27:22 | steve.dower | set | status: closed -> open resolution: fixed -> (no value) messages:
+ msg366549
stage: resolved -> needs patch |
2020-04-15 19:47:04 | Anthony Sottile | set | nosy:
+ Anthony Sottile messages:
+ msg366544
|
2020-04-14 19:37:34 | steve.dower | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2020-04-14 19:34:51 | miss-islington | set | messages:
+ msg366434 |
2020-04-14 19:16:19 | miss-islington | set | nosy:
+ miss-islington pull_requests:
+ pull_request18872
|
2020-04-14 19:16:10 | steve.dower | set | messages:
+ msg366432 |
2020-04-14 18:34:38 | steve.dower | set | messages:
+ msg366427 |
2020-04-14 17:57:30 | barry-scott | set | messages:
+ msg366414 components:
- Windows |
2020-04-13 10:15:07 | steve.dower | set | nosy:
+ steve.dower, paul.moore, tim.golden, zach.ware messages:
+ msg366295 components:
+ Windows
|
2020-04-12 17:31:33 | python-dev | set | keywords:
+ patch nosy:
+ python-dev
pull_requests:
+ pull_request18841 stage: needs patch -> patch review |
2020-04-12 10:19:28 | SilentGhost | set | nosy:
+ brandtbucher
type: behavior stage: needs patch |
2020-04-12 08:37:00 | barry-scott | create | |