classification
Title: "pydoc -w " writes out page with empty "Package Contents" section
Type: behavior Stage:
Components: Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ncoghlan Nosy List: Arfrever, brett.cannon, chris.jerdonek, christopherthemagnificent, eric.araujo, georg.brandl, ncoghlan, python-dev
Priority: release blocker Keywords:

Created on 2012-07-13 04:31 by christopherthemagnificent, last changed 2012-07-17 17:12 by christopherthemagnificent. This issue is now closed.

Messages (16)
msg165352 - (view) Author: Christopher the Magnificent (christopherthemagnificent) Date: 2012-07-13 04:31
Let there be a folder "testpkg" contained in $SOME_DIR with three empty files: "__init__.py", "bob.py", and "sally.py

If I run "pydoc3.2 -w testpkg" inside $SOME_DIR, it will output the file $SOME_DIR/testpkg.html

In testpkg.html there will be a section called "Package Contents" with two links named "bob" and "sally".

If I instead run "pydoc3.3 -w testpkg" inside $SOME_DIR, it will still output the file $SOME_DIR/testpkg.html

Only this time, in testpkg.html the section called "Package Contents" will be empty when there should be links named "bob" and "sally"
msg165446 - (view) Author: √Čric Araujo (eric.araujo) * (Python committer) Date: 2012-07-14 12:58
Sounds like a pkgutil-related issue to me.
msg165453 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-14 14:54
Indeed, pydoc relies on pkgutil.walk_packages to work out what to document, and that's broken currently due to the reliance on a non-standard importer API that isn't in PEP 302 (not even as an optional extension, like the get_filename() used by runpy)

I'm not seeing any way out other than to add an API to the importlib importers that pkgutil can use. My preference is to call that method "_iter_modules" right now, and then we can look at making it (or something like it) public for 3.4.
msg165456 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-07-14 17:06
Ugh, I don't exactly love the idea of adding a method to any of importlib's finders simply because PJE didn't try to make this non-standard API part of PEP 302 or something. But basically pkgutil is worthless without doing something about this damn iter_modules() method that the module keeps expecting.

Nick's proposal of adding importlib._bootstrap.FileFinder._iter_modules() is probably the best we can do with the timeline we have. But if we do this then I want to deprecate pkgutil in Python 3.4 and we can then get a proper API that is documented for module discovery and can have whatever helper code is needed in importlib (since namespace packages take care of the need for extend_path() and the only other use for pkgutil). That's the only deal I'm willing to strike here if we are going to keep pkgutil working in Python 3.3 as people expect.
msg165458 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-07-14 17:31
Not great, but that sounds reasonable.
msg165460 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-14 17:42
I agree pkgutil is pretty much useless right now and deprecation worth considering.  But isn't another option simply to change pkgutil's internals to provide its own iter_modules whenever it finds that method missing?  This seems to be what it has done in the past for some code paths when it made its own ImpImporter wrappers (in pkgutil.get_importer()):

http://hg.python.org/cpython/file/416cd57d38cf/Lib/pkgutil.py#l363

It seems this would at least work for FileFinders, though I haven't thought this through to know for sure one way or the other.
msg165462 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-14 18:16
In Python 2.7, I just did this test:

>>> import sys, pkgutil
>>> for path in sys.path:
...   print pkgutil.get_importer(path)

And got only pkgutil.ImpImporter instances and imp.NullImporter objects.

So even before, at least in the most common case, it looks like pkgutil may have been relying on its "wrapped" importers for access to an iter_modules() method.

When I do the same test in Python 3.3, I get only FileFinder instances.  So in Python 3.3, pkgutil just isn't getting to the lines that would otherwise create ImpImporter instances that would work for our purposes (presumably because sys.path_hooks is populated differently in Python 3.3).
msg165463 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-07-14 18:35
Right, you aren't getting ImpImporters because they are only used when a module doesn't define __loader__. But in Python 3.3, by default *all* modules get that attribute defined.

And yes, we could either tweak pkgutil to recognize FileFinder and special-case it or we can add the method and tweak the probably two or three places the method is called. It's a matter of effort.
msg165487 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-15 02:18
*sigh* And, of course, there's no meaningful regression test defined for pkgutil.walk_packages, which is why the test suite didn't pick this up :(

Oh well, at least I have a clear place to start.
msg165512 - (view) Author: Roundup Robot (python-dev) Date: 2012-07-15 08:10
New changeset 3987667bf98f by Nick Coghlan in branch 'default':
Take the first step in resolving the messy pkgutil vs importlib edge cases by basing pkgutil explicitly on importlib, deprecating its internal import emulation and setting __main__.__loader__ correctly so that runpy still works (Affects #15343, #15314, #15357)
http://hg.python.org/cpython/rev/3987667bf98f
msg165518 - (view) Author: Roundup Robot (python-dev) Date: 2012-07-15 11:19
New changeset 9101eab6178c by Nick Coghlan in branch 'default':
Issue #15343: Handle importlib.machinery.FileFinder instances in pkgutil.walk_packages (et al)
http://hg.python.org/cpython/rev/9101eab6178c
msg165520 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-15 11:53
Fixing this has uncovered another issue: the old import emulation in PEP 302 ignored encoding cookies, thus merrily decoding everything as utf-8 in "get_source()". importlib is smarter about this, which means the pydoc tests started failing as they tried to load the files with invalid encoding cookies.

I plan to tackle this in two parts:

- move get_source() and get_code() in importlib towards consistently raising ImportError, regardless of the reason the result couldn't be provided (chaining the exceptions to ensure details aren't lost)

- update pydoc to tolerate ImportError from get_source.
msg165521 - (view) Author: Roundup Robot (python-dev) Date: 2012-07-15 12:12
New changeset 7d202353a728 by Nick Coghlan in branch 'default':
Issue #15343: A lot more than just unicode decoding can go wrong when retrieving a source file
http://hg.python.org/cpython/rev/7d202353a728
msg165523 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-15 12:19
Main change in that last commit is really the one to make pydoc ignore just about *any* exception from get_source(). This should make it more robust against buggy loaders, too.
msg165524 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-15 12:24
And a manual check confirms the higher level issue is also fixed. (I believe there's already a meta issue somewhere about the lack of automated tests for pydoc's emitted HTML)
msg165726 - (view) Author: Christopher the Magnificent (christopherthemagnificent) Date: 2012-07-17 17:12
ISSUE CONFIRMED FIXED ON MY END, AND MANY THANKS

I downloaded, compiled, and installed the latest Python 3.3 beta on my machine with Mercurial, and can confirm that the problem is no longer presenting itself.

Thank you and great job to you smart people that know the guts of Python and its libraries, who coordinated and coded to get this fixed, especially Mr. Nick Coghlan, who it appears took on the responsibility for resolving this.  My hat is off to you all!

--Christopher

P. S.  I'm fairly new to the bug forums.  If anyone is bothered that I wrote this after the issue closed, just let me know what the policy is so I can learn to adapt to it.  If no one complains, I will assume it was okay in this instance.  :-)
History
Date User Action Args
2012-07-17 17:12:24christopherthemagnificentsetmessages: + msg165726
2012-07-15 12:24:52ncoghlansetstatus: open -> closed
resolution: fixed
messages: + msg165524
2012-07-15 12:19:16ncoghlansetmessages: + msg165523
2012-07-15 12:12:28python-devsetmessages: + msg165521
2012-07-15 11:53:49ncoghlansetmessages: + msg165520
2012-07-15 11:19:28python-devsetmessages: + msg165518
2012-07-15 10:12:09ncoghlanlinkissue15358 dependencies
2012-07-15 08:10:05python-devsetnosy: + python-dev
messages: + msg165512
2012-07-15 03:49:07Arfreversetnosy: + Arfrever
2012-07-15 02:18:55ncoghlansetassignee: ncoghlan
messages: + msg165487
2012-07-14 18:35:21brett.cannonsetmessages: + msg165463
2012-07-14 18:16:29chris.jerdoneksetmessages: + msg165462
2012-07-14 17:42:15chris.jerdoneksetmessages: + msg165460
2012-07-14 17:31:59georg.brandlsetmessages: + msg165458
2012-07-14 17:06:18brett.cannonsetnosy: + chris.jerdonek
messages: + msg165456
2012-07-14 14:54:33ncoghlansetpriority: normal -> release blocker
nosy: + georg.brandl
messages: + msg165453

2012-07-14 12:58:15eric.araujosetnosy: + eric.araujo
messages: + msg165446
2012-07-14 12:57:39eric.araujosetnosy: + brett.cannon, ncoghlan
2012-07-13 04:32:37christopherthemagnificentsettitle: "pydoc -w <package>" writes out page with empty "package contents" section -> "pydoc -w <package>" writes out page with empty "Package Contents" section
2012-07-13 04:32:01christopherthemagnificentsettitle: "pydoc -w <package>" write out page with empty "package contents" section -> "pydoc -w <package>" writes out page with empty "package contents" section
2012-07-13 04:31:43christopherthemagnificentcreate