classification
Title: Release GIL periodically in _pickle module
Type: enhancement Stage:
Components: Interpreter Core, Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Martin Bammer, pierreglaser, pitrou, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2018-07-16 18:55 by Martin Bammer, last changed 2019-05-10 17:46 by pitrou.

Files
File name Uploaded Description Edit
pickle_gil.patch pitrou, 2018-07-17 13:48
pickle_gil.py pitrou, 2018-07-17 13:49
Messages (12)
msg321755 - (view) Author: Martin Bammer (Martin Bammer) Date: 2018-07-16 18:55
Hi,

the old and slow python implementation of pickle didn't block background
thread.
But the newer C-implementation blocks other threads while dump/load is
running.
Wouldn't it be possible to allow other threads during this time?
Especially could load/loads release the GIL, because Python objects are not available to the Python code until these functions have finished?

Regards,
Martin
msg321764 - (view) Author: (ppperry) Date: 2018-07-16 19:58
um, something doesn't make sense about this. the python implementation of pickle never released the GIL (it can't, by definition -- it's written in python). The C implementation releasing the GIL wouldn't make sense, as the pickle api involves calls into python everywhere (for example, `__reduce__`)
msg321805 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-07-17 07:48
This is about releasing the GIL periodically to allow other threads to run, as Python already does in its main interpreter loop.
msg321806 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-07-17 08:00
A workaround is writing Python wrappers for IO:

def Writer:
    def __init__(self, file):
        self.file = file
    def write(self, data):
        return self.file.write(data)

def Reader:
    def __init__(self, file):
        self.file = file
    def read(self, size=-1):
        return self.file.read(size)
    def readline(self, size=-1):
        return self.file.readline(size)
    def peek(self, size=-1):
        return self.file.peek(size)

def mydump(obj, file, *args, **kwargs):
    return pickle.dump(obj, Writer(file), *args, **kwargs)

def myload(file, *args, **kwargs):
    return pickle.load(Reader(file), *args, **kwargs)
msg321821 - (view) Author: Martin Bammer (Martin Bammer) Date: 2018-07-17 12:56
Maybe an optional parameter with the desired interval would be good idea. So that the coder can decide if he wants/needs that feature and which interval he needs for his application.
Otherwise it is hard to define a specific interval which fits for everyone.
msg321823 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-07-17 13:21
The right way to do this is not to pass a timeout parameter but to check for GIL interrupts as done in the main bytecode evaluation loop.
msg321826 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-07-17 13:48
Attaching proof-of-concept patch.
msg321827 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-07-17 13:49
Attaching demonstration script.
msg321830 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-07-17 13:54
(as the demo script shows, there is no detectable slowdown)
msg321835 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-07-17 15:02
The demo script shows around 8% slowdown to me for

    data = list(map(float, range(N)))
msg321846 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-07-17 17:49
Interesting, which kind of computer / system / compiler are you on?
msg321847 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-07-17 18:14
CPU = Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Ubuntu 18.04
Linux 4.15.0 x86_64
gcc 7.3.0

Performing the check in save() can have not insignificant overhead (especially after implementing the issue34141 optimization). It can be reduced if perform it when flush a frame (in protocol 4) or buffer to the file, or after writing significant amount of bytes into buffer.
History
Date User Action Args
2019-05-10 17:46:26pitrousetnosy: + pierreglaser
2018-07-17 18:14:10serhiy.storchakasetmessages: + msg321847
2018-07-17 17:58:55ppperrysetnosy: - ppperry
2018-07-17 17:49:47pitrousetmessages: + msg321846
2018-07-17 15:02:49serhiy.storchakasetmessages: + msg321835
2018-07-17 13:54:50pitrousetmessages: + msg321830
2018-07-17 13:49:18pitrousetfiles: + pickle_gil.py

messages: + msg321827
2018-07-17 13:48:46pitrousetfiles: + pickle_gil.patch
keywords: + patch
messages: + msg321826
2018-07-17 13:21:53pitrousetmessages: + msg321823
2018-07-17 12:56:29Martin Bammersetmessages: + msg321821
2018-07-17 12:42:51ppperrysetcomponents: + Library (Lib)
title: Do not block threads when pickle/unpickle -> Release GIL periodically in _pickle module
2018-07-17 08:00:53serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg321806
2018-07-17 07:48:53pitrousetnosy: + pitrou

messages: + msg321805
versions: + Python 3.8, - Python 3.6
2018-07-16 19:58:11ppperrysetnosy: + ppperry
messages: + msg321764
2018-07-16 18:55:37Martin Bammercreate