This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Random behaviour when importing two modules with the same name but different source files
Type: Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: George3d6, brett.cannon, eric.snow, ncoghlan, steven.daprano
Priority: normal Keywords:

Created on 2021-08-14 20:22 by George3d6, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (4)
msg399596 - (view) Author: George (George3d6) Date: 2021-08-14 20:22
Warning: There's a higher probability this is "expected" undefined behaviour or not even undefined and I'm just a moron. In addtion, I couldn't actually replicate it outside of the specific context it happened. But if it sounds plausible and it's something that shouldn't happen I can spend more time trying to replicate.

1. In two different python processes I'm "dynamically" creating a module named `M` using a file `m1.py` that contains a class `C`. Then I create an object of tpye `C` and pickle it. (let's call this object `c1`)
2. In a different thread I do the exact same thing, but the file is `m2.py` then I create an object of type `C` and pickle it. (call this one `c2`)
3. Then, in the same thread, I recreate the module named `M` from `m1.py` and unpickle `c1`, second I create a module named `M` from `m2.py` (this doesn't cause an error) and unpickle `c2`.
4. This (spurprisingly?) seems to basically work fine in most cases. Except for one (and I can't find why it's special) where for some reason `c2` starts calling the methods from a class that's not it's own. In other words `c1` usually maps ot `M.C --> m1.py` and `c2` to `M.C --> m2.py` | But randomly `c2` will start looking up methods in `M.C --> m1.py`, or at least that's what stack traces & debuggers seem to indicate.

The way I create the module `M` in all cases:

```
with open(`m1.py`, 'wb') as fp:
	fp.write(code.encode('utf-8'))
	spec = importlib.util.spec_from_file_location('M', fp.name)
	temp_module = importlib.util.module_from_spec(spec)
	sys.modules['M] = temp_module
	spec.loader.exec_module(temp_module)

# Note: Same for the other module but using `m2.py`, the code I use here contains a class `C` in both cases
```

This seems, unexpected. I wouldn't expect the recreation to cause a crash, but I'd expect it to either override the previous `M` for all existing objects instantiated from that module in all cases, or in no cases... currently it seems that both modules stay loaded and lookups are made randomly.
msg399603 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2021-08-15 02:08
"Undefined behaviour" has a particular meaning to C programmers, which is relevant to Python because the interpreter is written in C. It's probably best not to use that term here.

Let me see if I understand your issue.

* You have two separate Python processes.

* Each process has a thread which dynamically writes a file called "m1.py", containing a class C.

* Each process has a second thread which dynamically writes a file called "m2.py", also containing a class C.

* Each thread then imports its file using the common name "M", and tries to pickle and unpickle objects of type C.

* And seemingly at random, each thread sometimes picks up its class M.C, but sometimes the class M.C from the other thread.

* Not sure if you get any cross-process contamination as well (that is, process 1 picks up the modules from process 2), but it wouldn't surprise me in the least.


My instinct here is to back away in horror *wink*

You have a lot of non-deterministic code here. I'm kinda impressed that it ever works at all :-)

1. If you have two processes writing to the same file "m1.py", its a lottery which one will end up actually written to disk. It is at least theoretically possible that the data actually on the disk could be a hybrid of bits of process 1's m1.py and bits of process 2's m1.py.

2. Likewise for the file m2.py.

3. When you go to import the files, it is non-deterministic which file you will see, e.g.

- process 1 writes its m1.py
- process 2 writes its m1.py, overriding the previous m1.py
- process 1 goes to import m1.py, but ends up reading the m1.py
  created by process 2

So that's how you could get cross-process contamination.
msg399604 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2021-08-15 02:20
The importing from multiple threads is possibly also non-deterministic, but I'm not an expert on the importlib module.

It looks to me like a another plausible source of random/arbitrary behaviour could be:


1. Within a single process, you have two threads running in non-deterministic order. So you could have, let's say:

- thread 1 imports M from file m1.py
- thread 2 tries to import M, and the import system sees that
  M is cached in sys.modules and uses that instead.


So even though the two threads are writing to different source files, they both call the module M, which means that you can have two threads stomping on each other's toes trying to import different classes C from different modules both called M.

I don't think this is supported at all. I'm not really qualified to rule out a bug in the importlib functions but to me it surely looks like a case of "don't do that".

Remember that sys.modules is cached globally per-process, so once you start pickling and unpickling your M.C instances, its a lottery which one you will get; furthermore, if you have instances:

    x = M.C()
    replace module M with a new module M
    y = M.C()

only y is using the new definition of C from the new module M, instance x is still using the original class with its original methods.

I'll leave it to Brett, Nick or Eric to confirm that there's nothing to fix in importlib, but my advice is to avoid using dynamically created modules **all with the same name** that touch the file system from multiple threads in multiple processes at the same time. There is far too many opportunities for non-deterministic behaviour to mess up your expectations.
msg399659 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2021-08-16 16:35
So first, don't import from threads. It's non-deterministic as you have seen. You should do all imports **before** you start running multi-threaded code if multiple threads are going to access the 

Second, tossing in pickle is just asking for more trouble. 😉

The key thing to know is the master copy of a module is kept in `sys.modules`. But classes keep a reference to the module they were loaded from, not what `sys.modules` happens to have at that moment. So due to threading indeterminism it's quite possible to end up unpickling in such a way that the module that eventually ends up in `sys.modules` is not what your unpickled class is referencing.

As such, I'm closing as "wont fix".
History
Date User Action Args
2022-04-11 14:59:48adminsetgithub: 89079
2021-08-16 16:35:41brett.cannonsetstatus: open -> closed
resolution: wont fix
messages: + msg399659

stage: resolved
2021-08-15 02:20:58steven.dapranosetnosy: + brett.cannon, ncoghlan, eric.snow
messages: + msg399604
2021-08-15 02:08:45steven.dapranosetnosy: + steven.daprano

messages: + msg399603
title: Undefined/random behaviour when importing two modules with the same name but different source files -> Random behaviour when importing two modules with the same name but different source files
2021-08-14 20:22:30George3d6create