classification
Title: os.environ raises RuntimeError: dictionary changed size during iteration
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Danilo Shiga, Jelle Zijlstra, mark.dickinson, martin.panter, osantana, pitrou, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-05-23 14:35 by osantana, last changed 2017-07-05 03:46 by osantana. This issue is now closed.

Files
File name Uploaded Description Edit
fix_os_environ_iteration_issue.diff osantana, 2017-05-23 14:35 patch with fix and test
fix_os_environ_iter_issue_low_level_thread_lock.diff osantana, 2017-05-24 22:37 (use this) use _thread.allocate_lock() as lock provider
fix_os_environ_iter_issue_threading_lock.diff osantana, 2017-05-24 22:46
fix_os_environ_iter_issue_low_level_thread_lock_2.diff osantana, 2017-05-25 17:08
Pull Requests
URL Status Linked Edit
PR 2409 merged osantana, 2017-06-26 14:32
PR 2556 merged serhiy.storchaka, 2017-07-04 04:22
PR 2557 merged serhiy.storchaka, 2017-07-04 04:24
Messages (27)
msg294250 - (view) Author: Osvaldo Santana Neto (osantana) * Date: 2017-05-23 14:35
We're facing ocasional RuntimeError exceptions in a multithreaded application when one of the threads creates new entries to the environment (os.environ).

I'm not so sure if the attached patch fixes this issue the right way, so, feel free to propose another approach.

Traceback Sample of the issue in our production environment:


RuntimeError: dictionary changed size during iteration
  File "python3.5/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)


  [ internal code ]


  File "newrelic/api/background_task.py", line 103, in wrapper
    return wrapped(*args, **kwargs)
  File "python3.5/contextlib.py", line 30, in inner
    return func(*args, **kwds)
  File "python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "raven/base.py", line 851, in make_decorator
    yield
  File "python3.5/contextlib.py", line 30, in inner
    return func(*args, **kwds)
  

  [ internal code ]


  File "retry/api.py", line 74, in retry_decorator
    logger)
  File "retry/api.py", line 33, in __retry_internal
    return f()


  [ internal code ]


  File "requests/sessions.py", line 531, in get
    return self.request('GET', url, **kwargs)
  File "requests/sessions.py", line 509, in request
    prep.url, proxies, stream, verify, cert
  File "requests/sessions.py", line 686, in merge_environment_settings
    env_proxies = get_environ_proxies(url, no_proxy=no_proxy)
  File "requests/utils.py", line 696, in get_environ_proxies
    return getproxies()
  File "urllib/request.py", line 2393, in getproxies_environment
    for name, value in os.environ.items():
  File "_collections_abc.py", line 676, in __iter__
    for key in self._mapping:
  File "python3.5/os.py", line 702, in __iter__
    for key in self._data:
msg294257 - (view) Author: Jelle Zijlstra (Jelle Zijlstra) * Date: 2017-05-23 15:47
Even with the patch, I don't think it's safe to modify os.environ while it's being accessed concurrently in another thread. The other thread's modification could arrive while the dict() call in your patch is running (in CPython the GIL might protect you, but that's an implementation detail).

I think the real solution is that your application uses a lock or some other concurrency mechanism to protect access to os.environ.
msg294261 - (view) Author: Osvaldo Santana Neto (osantana) * Date: 2017-05-23 16:29
I agree with Jelle about the fix's implementation. But I disagree of suggestion to implement lock at the application side.

I disagree because some external libraries may want to access os.environ in a concurrent fashion without lock acquiring. That's the case reported in my previous comment: urllib/requests.py (https://github.com/python/cpython/blob/master/Lib/urllib/request.py#L2468) iterates over os.environ with no lock control.

How could I handle this issue in this case? Would it be a good idea  implement the lock mechanism in current os._Environ (https://github.com/python/cpython/blob/master/Lib/os.py#L666) mapping class?
msg294352 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-05-24 12:48
Previous report: Issue 25641. At least in Posix, the “putenv” function is not required to be thread safe.
msg294354 - (view) Author: Osvaldo Santana Neto (osantana) * Date: 2017-05-24 13:25
There are exceptions being raised in many applications (as reported here and in http://bugs.python.org/issue25641) and there are three paths to follow: 

1. We handle this exception somewhere;
2. We avoid raising it;
3. Just leave it. I don't care about this exception.

If we chose to handle this exception we need to decide where we'll do it. At os.py module? At urllib/request.py? At all http client libraries (eg. python-requests)? At our applications?

If we chose to avoid this exception we, probably, need to implement some kind of lock/mutex in os.environ.

I can implement any of these options. You just need to decide which one is best.

If we chose the third option, we can just close this issue.
msg294398 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-05-24 21:30
The environ class could have its own lock and then the __iter__ method could be rewritten as follows:

    def __iter__(self):
        with self._lock:
            keys = list(self._data)
        for key in keys:
            yield self.decodekey(key)
msg294406 - (view) Author: Osvaldo Santana Neto (osantana) * Date: 2017-05-24 22:37
This patch implements a lock protection (as suggested by Antoine) using `_thread.allocate_lock()` module (instead of `threading.Lock()`) due to the fact that CPython interpreter already imports `_thread` module during its bootstrap process.
msg294407 - (view) Author: Osvaldo Santana Neto (osantana) * Date: 2017-05-24 22:46
This is an alternative implementation using `threading.Lock()`. The main issue with this implementation relies on the fact that we've to import the `threading` module during CPython interpreter loading (currently it's not imported by default).

This is consequence of the fact that CPython interpreter uses `os` module when bootstrapping.
msg294444 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2017-05-25 07:32
@osantana: you'd need to be using the lock at least in `__setitem__` and `__delitem__` as well, else there's nothing preventing modifications to the dictionary during the `list` call. It may make sense to protect *all* accesses (read or write) to `_data`, though you'd need to either use a re-entrant lock in that case or be very careful about not acquiring the lock twice.
msg294445 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-05-25 07:46
I'm fine with a low-level lock, though as Mark says you could use _thread._RLock().
msg294452 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-25 10:10
Isn't list(dict) atomic? At least in CPython.

I suspect that protecting *all* accesses to _data with a lock will hit the performance and invalidate the reason of using _data for caching.
msg294453 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-05-25 10:13
> Isn't list(dict) atomic? At least in CPython.

I doubt it. There's no special code for list(dict), it uses regular iteration which could probably hit the GC and then release the GIL.

> I suspect that protecting *all* accesses to _data with a lock will hit the performance and invalidate the reason of using _data for caching.

I'm not convinced os.environ is performance-critical.  Perhaps getenv() is really expensive on some systems?
msg294456 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-25 10:30
> There's no special code for list(dict), it uses regular iteration which could probably hit the GC and then release the GIL.

While there are no special code for list(dict), regular iteration over exact dict don't hit the GC.
msg294457 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-05-25 10:33
> While there are no special code for list(dict), regular iteration over exact dict don't hit the GC.

Doesn't it? Creating a dict iterator creates a GC-tracked object.
msg294459 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-25 10:42
But it is destroyed only after iterating.
msg294460 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-05-25 10:45
Every time you create a GC-tracked object, you can invoke the GC. See _PyObject_GC_Alloc.
msg294467 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-25 11:15
Invoking GC before iterating or after iterating doesn't break the atomicity of iterating.
msg294468 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-05-25 11:16
Perhaps.  But this discussion shows the matter is complicated and we shouldn't rely on fragile assumptions about implementation details.  So we need a lock.
msg294498 - (view) Author: Osvaldo Santana Neto (osantana) * Date: 2017-05-25 17:08
New version of patch:

1. Use RLock instead of Lock (Mark & Antoine recomendation)
2. Add lock acquire at `__setitem__` and `__delitem__` as following (Mark & Antoine recomendation)
3. Add lock protection in `__repr__` implementation.

I've some questions pending:

1. Is it a good idea to wrap self._data access with lock in __getitem__? (I suspect it's not)
2. Is it a good idea to lock protect self.copy() and self.setdefault() (I suspect it's not because both uses __iter__ that is already protected)
msg296122 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-06-15 18:23
See also issue24484. The solution depends on the assumption that list(dict) is atomic.
msg296847 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-06-26 05:11
Osvaldo, could you please create a pull request on GitHub based on your first patch? But use list() instead of dict().
msg296904 - (view) Author: Osvaldo Santana Neto (osantana) * Date: 2017-06-26 14:32
Pull Request #2409 (https://github.com/python/cpython/pull/2409) opened.
msg297486 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-01 17:34
New changeset 8a8d28501fc8ce25926d168f1c657656c809fd4c by Serhiy Storchaka (Osvaldo Santana Neto) in branch 'master':
bpo-30441: Fix bug when modifying os.environ while iterating over it (#2409)
https://github.com/python/cpython/commit/8a8d28501fc8ce25926d168f1c657656c809fd4c
msg297625 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-04 04:55
New changeset bebd2cfa5f21811dd0ee4f3b1a1b85d379b83436 by Serhiy Storchaka in branch '3.6':
[3.6] bpo-30441: Fix bug when modifying os.environ while iterating over it (GH-2409). (#2556)
https://github.com/python/cpython/commit/bebd2cfa5f21811dd0ee4f3b1a1b85d379b83436
msg297626 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-04 04:55
New changeset 1a3bc5546aa27f01426ad76618a9b2c3b698ae68 by Serhiy Storchaka in branch '3.5':
[3.5] bpo-30441: Fix bug when modifying os.environ while iterating over it (GH-2409). (#2557)
https://github.com/python/cpython/commit/1a3bc5546aa27f01426ad76618a9b2c3b698ae68
msg297627 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-04 05:19
Thank you for your contribution Osvaldo.
msg297695 - (view) Author: Osvaldo Santana Neto (osantana) * Date: 2017-07-05 03:46
Hi Serhiy Storchaka,

I need to thank you for your valuable guidance.
History
Date User Action Args
2017-07-05 03:46:48osantanasetmessages: + msg297695
2017-07-04 05:19:33serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg297627

stage: backport needed -> resolved
2017-07-04 04:55:47serhiy.storchakasetmessages: + msg297626
2017-07-04 04:55:34serhiy.storchakasetmessages: + msg297625
2017-07-04 04:24:29serhiy.storchakasetpull_requests: + pull_request2626
2017-07-04 04:22:15serhiy.storchakasetpull_requests: + pull_request2625
2017-07-01 17:35:39serhiy.storchakasetstage: backport needed
2017-07-01 17:34:47serhiy.storchakasetmessages: + msg297486
2017-06-26 14:32:16osantanasetmessages: + msg296904
pull_requests: + pull_request2457
2017-06-26 05:11:49serhiy.storchakasetmessages: + msg296847
2017-06-15 18:23:24serhiy.storchakasetmessages: + msg296122
2017-05-25 17:08:24osantanasetfiles: + fix_os_environ_iter_issue_low_level_thread_lock_2.diff

messages: + msg294498
2017-05-25 11:16:17pitrousetmessages: + msg294468
2017-05-25 11:15:10serhiy.storchakasetmessages: + msg294467
2017-05-25 10:45:20pitrousetmessages: + msg294460
2017-05-25 10:42:09serhiy.storchakasetmessages: + msg294459
2017-05-25 10:33:17pitrousetmessages: + msg294457
2017-05-25 10:30:51serhiy.storchakasetmessages: + msg294456
2017-05-25 10:13:54pitrousetmessages: + msg294453
2017-05-25 10:10:53serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg294452
2017-05-25 07:46:06pitrousetmessages: + msg294445
2017-05-25 07:32:25mark.dickinsonsetnosy: + mark.dickinson
messages: + msg294444
2017-05-24 22:46:12osantanasetfiles: + fix_os_environ_iter_issue_threading_lock.diff

messages: + msg294407
2017-05-24 22:37:28osantanasetfiles: + fix_os_environ_iter_issue_low_level_thread_lock.diff

messages: + msg294406
2017-05-24 21:30:23pitrousetnosy: + pitrou

messages: + msg294398
versions: + Python 3.7
2017-05-24 13:25:57osantanasetmessages: + msg294354
2017-05-24 12:48:08martin.pantersetnosy: + martin.panter
messages: + msg294352
2017-05-23 16:57:19Danilo Shigasetnosy: + Danilo Shiga
2017-05-23 16:29:06osantanasetmessages: + msg294261
2017-05-23 15:47:53Jelle Zijlstrasetnosy: + Jelle Zijlstra
messages: + msg294257
2017-05-23 14:35:50osantanacreate