classification
Title: Add a new os.read_into() function to avoid memory copies
Type: Stage:
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: postponed
Dependencies: Superseder:
Assigned To: Nosy List: martin.panter, piotr.dobrogost, pitrou, vstinner
Priority: normal Keywords:

Created on 2015-03-23 21:16 by vstinner, last changed 2015-05-25 22:51 by vstinner. This issue is now closed.

Messages (5)
msg239069 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-23 21:16
Sockets have a recv_into() method, io.IOBase has a readinto() method, but there is no os.read_into() function. It would avoid memory copies. It would benefit to the Python implementation FileIO (readall() and readinto() methods), see the issue #21859.
msg239072 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-23 21:26
os.read_into() may be used by the following functions.

subprocess.Popen._execute_child():

    # Wait for exec to fail or succeed; possibly raising an
    # exception (limited in size)
    errpipe_data = bytearray()
    while True:
        part = os.read(errpipe_read, 50000)
        errpipe_data += part
        if not part or len(errpipe_data) > 50000:
            break

subprocess.Popen.communicate():

    self._fileobj2output = {}
    if self.stdout:
        self._fileobj2output[self.stdout] = []
    ...
    data = os.read(key.fd, 32768)
    if not data:
        ...
    self._fileobj2output[key.fileobj].append(data)
    ...
    stdout = b''.join(...)

multiprocessing.Connection._recv():

    def _recv(self, size, read=_read):
        buf = io.BytesIO()
        handle = self._handle
        remaining = size
        while remaining > 0:
            chunk = read(handle, remaining)
            n = len(chunk)
            if n == 0:
                if remaining == size:
                    raise EOFError
                else:
                    raise OSError("got end of file during message")
            buf.write(chunk)
            remaining -= n
        return buf

multiprocessing.read_unsigned():

    def read_unsigned(fd):
        data = b''
        length = UNSIGNED_STRUCT.size
        while len(data) < length:
            s = os.read(fd, length - len(data))
            if not s:
                raise EOFError('unexpected EOF')
            data += s
        return UNSIGNED_STRUCT.unpack(data)[0]

The problem is that some functions still require to return a bytes, not a bytearray or something else. Converting a bytearray to a bytes still require a memory copy...
msg239079 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-03-23 22:40
Why do you want to optimize the pure Python FileIO?
msg239550 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-30 01:25
> Why do you want to optimize the pure Python FileIO?

I gave more examples than FileIO in this issue.
msg244059 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-05-25 22:51
Without more interested, I chose to defer this issue. Feel free to reopen it if you need it for more use cases, or if you are interested to implement it.
History
Date User Action Args
2015-05-25 22:51:43vstinnersetstatus: open -> closed
resolution: postponed
messages: + msg244059
2015-03-30 01:25:07vstinnersetmessages: + msg239550
2015-03-23 23:47:06martin.pantersetnosy: + martin.panter
2015-03-23 22:55:05piotr.dobrogostsetnosy: + piotr.dobrogost
2015-03-23 22:40:14pitrousetnosy: + pitrou
messages: + msg239079
2015-03-23 21:26:50vstinnersetmessages: + msg239072
2015-03-23 21:16:11vstinnercreate