Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new os.read_into() function to avoid memory copies #67942

Closed
vstinner opened this issue Mar 23, 2015 · 5 comments
Closed

Add a new os.read_into() function to avoid memory copies #67942

vstinner opened this issue Mar 23, 2015 · 5 comments
Labels
stdlib Python modules in the Lib dir

Comments

@vstinner
Copy link
Member

BPO 23754
Nosy @pitrou, @vstinner, @vadmium

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2015-05-25.22:51:43.573>
created_at = <Date 2015-03-23.21:16:11.883>
labels = ['library']
title = 'Add a new os.read_into() function to avoid memory copies'
updated_at = <Date 2015-05-25.22:51:43.572>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2015-05-25.22:51:43.572>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2015-05-25.22:51:43.573>
closer = 'vstinner'
components = ['Library (Lib)']
creation = <Date 2015-03-23.21:16:11.883>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 23754
keywords = []
message_count = 5.0
messages = ['239069', '239072', '239079', '239550', '244059']
nosy_count = 4.0
nosy_names = ['pitrou', 'vstinner', 'martin.panter', 'piotr.dobrogost']
pr_nums = []
priority = 'normal'
resolution = 'postponed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue23754'
versions = ['Python 3.5']

@vstinner
Copy link
Member Author

Sockets have a recv_into() method, io.IOBase has a readinto() method, but there is no os.read_into() function. It would avoid memory copies. It would benefit to the Python implementation FileIO (readall() and readinto() methods), see the issue bpo-21859.

@vstinner vstinner added the stdlib Python modules in the Lib dir label Mar 23, 2015
@vstinner
Copy link
Member Author

os.read_into() may be used by the following functions.

subprocess.Popen._execute_child():

    # Wait for exec to fail or succeed; possibly raising an
    # exception (limited in size)
    errpipe_data = bytearray()
    while True:
        part = os.read(errpipe_read, 50000)
        errpipe_data += part
        if not part or len(errpipe_data) > 50000:
            break

subprocess.Popen.communicate():

    self._fileobj2output = {}
    if self.stdout:
        self._fileobj2output[self.stdout] = []
    ...
    data = os.read(key.fd, 32768)
    if not data:
        ...
    self._fileobj2output[key.fileobj].append(data)
    ...
    stdout = b''.join(...)

multiprocessing.Connection._recv():

    def _recv(self, size, read=_read):
        buf = io.BytesIO()
        handle = self._handle
        remaining = size
        while remaining > 0:
            chunk = read(handle, remaining)
            n = len(chunk)
            if n == 0:
                if remaining == size:
                    raise EOFError
                else:
                    raise OSError("got end of file during message")
            buf.write(chunk)
            remaining -= n
        return buf

multiprocessing.read_unsigned():

    def read_unsigned(fd):
        data = b''
        length = UNSIGNED_STRUCT.size
        while len(data) < length:
            s = os.read(fd, length - len(data))
            if not s:
                raise EOFError('unexpected EOF')
            data += s
        return UNSIGNED_STRUCT.unpack(data)[0]

The problem is that some functions still require to return a bytes, not a bytearray or something else. Converting a bytearray to a bytes still require a memory copy...

@pitrou
Copy link
Member

pitrou commented Mar 23, 2015

Why do you want to optimize the pure Python FileIO?

@vstinner
Copy link
Member Author

Why do you want to optimize the pure Python FileIO?

I gave more examples than FileIO in this issue.

@vstinner
Copy link
Member Author

Without more interested, I chose to defer this issue. Feel free to reopen it if you need it for more use cases, or if you are interested to implement it.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

2 participants