This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Patch to add a get_data function to pkgutil
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 2.6
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: ncoghlan Nosy List: ncoghlan, paul.moore, pje
Priority: normal Keywords: patch

Created on 2008-03-20 20:58 by paul.moore, last changed 2022-04-11 14:56 by admin. This issue is now closed.

File name Uploaded Description Edit
pkgutil.patch paul.moore, 2008-03-21 21:09
Messages (21)
msg64208 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2008-03-20 20:58
This patch adds a new get_data function to the pkgutil module, allowing
access to data stored in the package directory. It wraps the PEP 302
loader "get_data" function, so that it works with such loaders (for
example, zipimport).

The patch includes documentation and tests (I created a new test_pkgutil
test module, but did not add tests for the existing pkgutil functionality).

All tests pass on Windows against svn rev 61679 (except test_tcl, which
was skipped as I didn't build TCL support).
msg64211 - (view) Author: PJ Eby (pje) * (Python committer) Date: 2008-03-20 21:36
Hi Paul.  AFAICT, this doesn't look like it will actually work for
filesystem data.  get_loader() will return None for "regular",
filesystem-installed modules, at least in Python 2.5.  Perhaps you
should add a test case for that?
msg64219 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2008-03-20 22:51
I'm not sure I understand. It uses pkgutil.get_loader, which returns a
wrapper for filesystem modules. You pointed me to it. It seems to work,
that's what test_getdata_filesys is testing in

Can you supply a testcase that fails? (BTW, this is a patch for the
trunk - ie 2.6-to be. I can't see why 2.5 would be different, but maybe
that's the problem for you?)
msg64224 - (view) Author: PJ Eby (pje) * (Python committer) Date: 2008-03-20 23:28
Oops, my bad.  I'm thinking of an entirely unrelated get_loader()
function.  Meanwhile, I misread your patch entirely, and thought you had
some dead code for os.path processing that is in fact live.  So there is
"another" problem (really the only one) that I spotted on this re-read.

Your patch is calling load_module() even if the module is already in
sys.modules.  This will *reload* the module, potentially causing
problems elsewhere in the system.  You can test this by adding an
assertion to your test's load_module(), that the module isn't already in
sys.modules.  Then call get_data for the same module twice.

Sorry again for the mixup.
msg64225 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2008-03-20 23:55
Is that true? loader.load_module(pkg) should return sys.modules[pkg]
without reloading if it already exists. See PEP 302 "Specification part
1: The Importer Protocol":

The load_module() method has a few responsibilities that it must
fulfil *before* it runs any code:

 - If there is an existing module object named 'fullname' in
   sys.modules, the loader must use that existing module.
   (Otherwise, the reload() builtin will not work correctly.)
   If a module named 'fullname' does not exist in sys.modules,
   the loader must create a new module object and add it to

I've added a test for this, but I'm not 100% sure it's right. It
certainly works with unchanged code. If you can make it fail, let me
know and I'll work up a fix. But I think repeated calls to load_module()
should be safe (assuming loaders work as required by PEP 302).
msg64226 - (view) Author: PJ Eby (pje) * (Python committer) Date: 2008-03-21 00:04
reload() is implemented by calling the PEP 302 load_module() method 
on the existing module object.
msg64231 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-03-21 04:49
To clarify the intent of the section of PEP 302 Paul quoted: the
*module* object gets reused, but the contents of mod.__dict__ are
clobbered and the module code re-executed.

So it's the same module object, but with brand new contents (this is why
"from foo import bar" and "reload(foo)" do not play nicely with each
other, but doing "import foo" and then invoking "" picks up the
new version of "bar" after a reload).
msg64237 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2008-03-21 12:24
Nick, thanks I now see the issue. I'll work up a test to check for this
(and then correct it :-)).
msg64238 - (view) Author: PJ Eby (pje) * (Python committer) Date: 2008-03-21 12:35
An easy way to test it: just change your load_module() to raise an 
AssertionError if the module is already in sys.modules, and the main 
body of the test to make two get_data() calls for the same module.
msg64240 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2008-03-21 13:07
But that's not a valid loader.

I'm still struggling here. I see what you're trying to get at, but I
can't see how I can write a valid loader that does this.

To test the problem you're suggesting (that the code calls load_module
when the module is already loaded) I need a valid loader which does
something detectably different of the module is already loaded when it runs.

Obviously, I can fix the get_data code - that's not even remotely hard.
But I'd rather create a failing test with the current code, so that I
can confirm that the "problem" is fixed. At the moment, I can't even
demonstrate a problem!
msg64241 - (view) Author: PJ Eby (pje) * (Python committer) Date: 2008-03-21 13:51
Why does it need to be a valid loader?  It's a mock, not a real 
loader.  But if it really bothers you, have it increment a global in 
the module or something, and put the assertion in the test 
proper.  Heck, have it update a counter in the module it returns, 
making it into a loader that keeps track of how many times you reload 
the same module.  Then it's a useful example loader.  :)
msg64242 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2008-03-21 14:15
It has to be a valid loader, as I see no reason to support loaders that
aren't valid. In any case, I did try incrementing a counter and it
doesn't demonstrate the problem. If you try the currently attached
patch, you should see that. (I assume you've tried or at least read the
current patch - but the fact that you're suggesting the approach I have
implemented makes me wonder. I did re-upload the patch after you
reported the issue - msg 64225 - maybe you didn't notice this, as I
deleted the old patch?) If you do see what I mean, please tell me where
my code is wrong. I don't want to add a fix without a test showing why
the current behaviour is wrong. The test_alreadyloaded test is intended
to do that, but the current pkgutil code doesn't fail the test - so
either the test is wrong (and I'd appreciate help fixing the test) or
the "problem" isn't real, and I can leave the code as is.
msg64243 - (view) Author: PJ Eby (pje) * (Python committer) Date: 2008-03-21 15:04
But I'm getting a failure on that test, and the other one, too:

$ ./python Lib/test/
test_alreadyloaded (__main__.PkgutilTests) ... FAIL
test_getdata_filesys (__main__.PkgutilTests) ... FAIL
test_getdata_pep302 (__main__.PkgutilTests) ... ok

FAIL: test_alreadyloaded (__main__.PkgutilTests)
Traceback (most recent call last):
   File "Lib/test/", line 45, in test_alreadyloaded
     self.assertEqual(foo.loads, 1)
AssertionError: 2 != 1

FAIL: test_getdata_filesys (__main__.PkgutilTests)
Traceback (most recent call last):
   File "Lib/test/", line 30, in test_getdata_filesys
     self.assert_('PkgutilTests' in this_file)

Ran 3 tests in 0.017s

FAILED (failures=2)

I'm rebuilding my entire 2.6 checkout to double-check there's not 
something else going on, but I know my Lib/ tree is up-to-date.
msg64260 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2008-03-21 19:30
Thanks for the update. Something's seriously screwy here. I am getting
no failures, so you can probably see why I was confused. Can you tell me
what platform, etc (anything that might be relevant) and I'll try to see
what's going on.
msg64265 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2008-03-21 19:49
By the way, for comparison, I'm running the test (under Windows) using
rt -q -v test_pkgutil from the PCBuild directory. I can't see how
test_getdata_filesys can fail, as long as you're running it from the
correct place - it reads the file relative to the test
package, so it won't run outside of there (maybe I should change this,
but that's a separate issue for now).

Here's my output:

>rt -q -v test_pkgutil
.\\python  -E -tt ../lib/test/ -v test_pkgutil
test_alreadyloaded (test.test_pkgutil.PkgutilTests) ... ok
test_getdata_filesys (test.test_pkgutil.PkgutilTests) ... ok
test_getdata_pep302 (test.test_pkgutil.PkgutilTests) ... ok

Ran 3 tests in 0.000s

1 test OK.
msg64266 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2008-03-21 19:52
Ah, no. I see your command line. I get the same failure as you in that
case. I can see why test_getdata_filesys fails in that case, I'll fix
that. I'm confused as to why test_alreadyloaded fails there but not via
rt.bat, but nevertheless I can fix that now I can see it. Thanks for
your patience. Leave it with me.
msg64282 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2008-03-21 21:09
OK, I found it. There were 2 problems, the double-loading one, and a
problem with adding my importer to sys.meta_path - if the test failed, I
wasn't removing it (so it was there for the next test, and interfering
with it).

Here's a fixed patch. Thanks, Phillip, for persevering!
msg65508 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-04-15 09:49
I should be able to review/commit this for the next alpha (looking into
it right now).
msg65509 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2008-04-15 10:19
msg65510 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-04-15 10:26
Commited as rev 62350
msg65511 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-04-15 10:28
Test file added in rev 62351 (forgot to svn add it the first time around)
Date User Action Args
2022-04-11 14:56:32adminsetgithub: 46691
2008-04-15 10:28:51ncoghlansetmessages: + msg65511
2008-04-15 10:26:34ncoghlansetstatus: open -> closed
resolution: accepted
messages: + msg65510
2008-04-15 10:19:59paul.mooresetmessages: + msg65509
2008-04-15 09:49:20ncoghlansetassignee: ncoghlan
messages: + msg65508
2008-03-21 21:09:32paul.mooresetfiles: - pkgutil.patch
2008-03-21 21:09:17paul.mooresetfiles: + pkgutil.patch
messages: + msg64282
2008-03-21 19:52:50paul.mooresetmessages: + msg64266
2008-03-21 19:49:38paul.mooresetmessages: + msg64265
2008-03-21 19:30:42paul.mooresetmessages: + msg64260
2008-03-21 15:04:55pjesetmessages: + msg64243
2008-03-21 14:15:07paul.mooresetmessages: + msg64242
2008-03-21 13:51:36pjesetmessages: + msg64241
2008-03-21 13:07:05paul.mooresetmessages: + msg64240
2008-03-21 12:35:08pjesetmessages: + msg64238
2008-03-21 12:24:04paul.mooresetmessages: + msg64237
2008-03-21 04:49:31ncoghlansetnosy: + ncoghlan
messages: + msg64231
2008-03-21 00:04:33pjesetmessages: + msg64226
2008-03-20 23:55:50paul.mooresetfiles: - pkgutil.patch
2008-03-20 23:55:32paul.mooresetfiles: + pkgutil.patch
messages: + msg64225
2008-03-20 23:28:51pjesetmessages: + msg64224
2008-03-20 22:51:52paul.mooresetmessages: + msg64219
2008-03-20 21:36:27pjesetnosy: + pje
messages: + msg64211
2008-03-20 20:58:32paul.moorecreate