New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os.environ raises RuntimeError: dictionary changed size during iteration #74626
Comments
We're facing ocasional RuntimeError exceptions in a multithreaded application when one of the threads creates new entries to the environment (os.environ). I'm not so sure if the attached patch fixes this issue the right way, so, feel free to propose another approach. Traceback Sample of the issue in our production environment: RuntimeError: dictionary changed size during iteration [ internal code ] File "newrelic/api/background_task.py", line 103, in wrapper [ internal code ] File "retry/api.py", line 74, in retry_decorator [ internal code ] File "requests/sessions.py", line 531, in get |
Even with the patch, I don't think it's safe to modify os.environ while it's being accessed concurrently in another thread. The other thread's modification could arrive while the dict() call in your patch is running (in CPython the GIL might protect you, but that's an implementation detail). I think the real solution is that your application uses a lock or some other concurrency mechanism to protect access to os.environ. |
I agree with Jelle about the fix's implementation. But I disagree of suggestion to implement lock at the application side. I disagree because some external libraries may want to access os.environ in a concurrent fashion without lock acquiring. That's the case reported in my previous comment: urllib/requests.py (https://github.com/python/cpython/blob/master/Lib/urllib/request.py#L2468) iterates over os.environ with no lock control. How could I handle this issue in this case? Would it be a good idea implement the lock mechanism in current os._Environ (https://github.com/python/cpython/blob/master/Lib/os.py#L666) mapping class? |
Previous report: bpo-25641. At least in Posix, the “putenv” function is not required to be thread safe. |
There are exceptions being raised in many applications (as reported here and in http://bugs.python.org/issue25641) and there are three paths to follow:
If we chose to handle this exception we need to decide where we'll do it. At os.py module? At urllib/request.py? At all http client libraries (eg. python-requests)? At our applications? If we chose to avoid this exception we, probably, need to implement some kind of lock/mutex in os.environ. I can implement any of these options. You just need to decide which one is best. If we chose the third option, we can just close this issue. |
The environ class could have its own lock and then the __iter__ method could be rewritten as follows: def __iter__(self):
with self._lock:
keys = list(self._data)
for key in keys:
yield self.decodekey(key) |
This patch implements a lock protection (as suggested by Antoine) using |
This is an alternative implementation using This is consequence of the fact that CPython interpreter uses |
@osantana: you'd need to be using the lock at least in |
I'm fine with a low-level lock, though as Mark says you could use _thread._RLock(). |
Isn't list(dict) atomic? At least in CPython. I suspect that protecting *all* accesses to _data with a lock will hit the performance and invalidate the reason of using _data for caching. |
I doubt it. There's no special code for list(dict), it uses regular iteration which could probably hit the GC and then release the GIL.
I'm not convinced os.environ is performance-critical. Perhaps getenv() is really expensive on some systems? |
While there are no special code for list(dict), regular iteration over exact dict don't hit the GC. |
Doesn't it? Creating a dict iterator creates a GC-tracked object. |
But it is destroyed only after iterating. |
Every time you create a GC-tracked object, you can invoke the GC. See _PyObject_GC_Alloc. |
Invoking GC before iterating or after iterating doesn't break the atomicity of iterating. |
Perhaps. But this discussion shows the matter is complicated and we shouldn't rely on fragile assumptions about implementation details. So we need a lock. |
New version of patch:
I've some questions pending:
|
See also bpo-24484. The solution depends on the assumption that list(dict) is atomic. |
Osvaldo, could you please create a pull request on GitHub based on your first patch? But use list() instead of dict(). |
Thank you for your contribution Osvaldo. |
Hi Serhiy Storchaka, I need to thank you for your valuable guidance. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: