Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sendfile where possible in httplib #57768

Open
benjaminp opened this issue Dec 8, 2011 · 13 comments
Open

Use sendfile where possible in httplib #57768

benjaminp opened this issue Dec 8, 2011 · 13 comments
Labels
3.12 bugs and security fixes performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@benjaminp
Copy link
Contributor

BPO 13559
Nosy @orsenthil, @giampaolo, @tiran, @benjaminp, @merwok, @vadmium, @moreati
Dependencies
  • bpo-17552: Add a new socket.sendfile() method
  • bpo-23740: http.client request and send method have some datatype issues
  • Files
  • httplib-sendfile.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2011-12-08.20:47:25.389>
    labels = ['library', '3.10', 'performance']
    title = 'Use sendfile where possible in httplib'
    updated_at = <Date 2021-03-09.10:40:33.692>
    user = 'https://github.com/benjaminp'

    bugs.python.org fields:

    activity = <Date 2021-03-09.10:40:33.692>
    actor = 'christian.heimes'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2011-12-08.20:47:25.389>
    creator = 'benjamin.peterson'
    dependencies = ['17552', '23740']
    files = ['35569']
    hgrepos = []
    issue_num = 13559
    keywords = ['patch']
    message_count = 13.0
    messages = ['149052', '149073', '149075', '149077', '149078', '220262', '250435', '348634', '387699', '387729', '387751', '387754', '388346']
    nosy_count = 9.0
    nosy_names = ['orsenthil', 'giampaolo.rodola', 'christian.heimes', 'benjamin.peterson', 'eric.araujo', 'rosslagerwall', 'kasun', 'martin.panter', 'Alex.Willmer']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue13559'
    versions = ['Python 3.10']

    @benjaminp
    Copy link
    Contributor Author

    HTTPConnection.send() should use os.sendfile when possible to avoid copying data into userspace and back.

    @benjaminp benjaminp added type-feature A feature request or enhancement easy stdlib Python modules in the Lib dir labels Dec 8, 2011
    @giampaolo
    Copy link
    Contributor

    This is not possible for two reasons:

    • on most POSIX systems, sendfile() works with mmap-like ("regular") files only, while HTTPConnection.send() accepts any file-like object as long as it provides a read() method

    • after read()ing a chunk of data from the file and before send()ing it over the socket, the data can be subject to an intermediate conversion (datablock.encode("iso-8859-1")):
      http://hg.python.org/cpython/file/87c6be1e393a/Lib/http/client.py#l839
      ...whereas sendfile() can only be used to send a binary file "as-is"

    I think we can use sendfile() in ftplib.py though .
    I'll open a ticket for that.

    @benjaminp
    Copy link
    Contributor Author

    2011/12/8 Giampaolo Rodola' <report@bugs.python.org>:

    Giampaolo Rodola' <g.rodola@gmail.com> added the comment:

    This is not possible for two reasons:

    • on most POSIX systems, sendfile() works with mmap-like ("regular") files only, while HTTPConnection.send() accepts any file-like object as long as it provides a read() method

    • after read()ing a chunk of data from the file and before send()ing it over the socket, the data can be subject to an intermediate conversion (datablock.encode("iso-8859-1")):
      http://hg.python.org/cpython/file/87c6be1e393a/Lib/http/client.py#l839
      ...whereas sendfile() can only be used to send a binary file "as-is"

    I presume you could check for a binary mode, though? Also, you can
    catch EINVAl on invalid fds.

    @giampaolo
    Copy link
    Contributor

    ftplib's sendfile support is not tracked as bpo-13559.
    Considerations I made there should apply here as well.

    @giampaolo
    Copy link
    Contributor

    Ops! I meant bpo-13564.

    @giampaolo
    Copy link
    Contributor

    Patch in attachment uses the newly added socket.sendfile() method (bpo-17552).

    @giampaolo giampaolo added performance Performance or resource usage and removed type-feature A feature request or enhancement labels Jun 11, 2014
    @vadmium
    Copy link
    Member

    vadmium commented Sep 11, 2015

    The multiple personalities of HTTPConnection.send() and friends is a bit of a can of worms. I suggest working on bpo-23740 to get an idea of what kinds of file objects are meant to be supported, and what things may work by accident and be used in the real world.

    For instance, is it possible to manually set Content-Length, and say supply a GzipFile reader, or file object positioned halfway through the file? How does this interact with the socket.sendfile() call?

    @vstinner
    Copy link
    Member

    This issue is no newcomer friendly, I remove the "easy" keyword.

    @vstinner vstinner removed the easy label Jul 29, 2019
    @moreati
    Copy link
    Mannequin

    moreati mannequin commented Feb 26, 2021

    I would like to take a stab at this. Giampaolo, would it be okay if I made a pull request updated from your patch? With the appropriate "Co-authored-by: Author Name <email_address>" line.

    @orsenthil
    Copy link
    Member

    Alex, https://bugs.python.org/issue23740 is identified as a dependency on this issue. We will have to resolve that first, and come back to this. And yes, if you contribute on other's patch, both the contributions will be included and appropriately credited.

    @moreati
    Copy link
    Mannequin

    moreati mannequin commented Feb 26, 2021

    To check my understanding

    Is the motivation for the closer to

    1. using sendfile() will break $X, and we know X
    2. there's high probability sendfile() will break something
    3. there's unknown probability sendfile() will break something
    4. there's low probability sendfile() will break something, but it is still too high
    5. any non-trivial change here is too risky, regardless of sendfile()
    6. something else?

    My guess is 5.

    @orsenthil
    Copy link
    Member

    Yes, the point number 5. We will have to evaluate if sendfile side-steps and avoids the issues noted in bpo-23740

    @tiran
    Copy link
    Member

    tiran commented Mar 9, 2021

    sendfile() only works for plain HTTP. For technical reasons it does not work for HTTPS (*). These days majority of services use HTTPS. Therefore the usefulness of sendfile() patch is minimal.

    (*) It is possible to use sendfile() for TLS connections, but the feature requires a Kernel module that provides kTLS offloading feature, https://www.kernel.org/doc/html/latest/networking/tls-offload.html . In user space it requires OpenSSL 3.0.0 with kTLS support. 3.0.0 is currently under development.

    @tiran tiran added the 3.10 only security fixes label Mar 9, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel iritkatriel added 3.12 bugs and security fixes and removed 3.10 only security fixes labels Sep 11, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.12 bugs and security fixes performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    Status: Todo
    Development

    No branches or pull requests

    7 participants