Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make http.client._tunnel send one byte string over the network #87498

Closed
zveinn mannequin opened this issue Feb 26, 2021 · 12 comments
Closed

Make http.client._tunnel send one byte string over the network #87498

zveinn mannequin opened this issue Feb 26, 2021 · 12 comments
Assignees
Labels
3.9 only security fixes 3.10 only security fixes performance Performance or resource usage topic-C-API

Comments

@zveinn
Copy link
Mannequin

zveinn mannequin commented Feb 26, 2021

BPO 43332
Nosy @terryjreedy, @gpshead, @orsenthil, @berkerpeksag, @miss-islington, @zveinn
PRs
  • bpo-43332: Buffer proxy connection setup packets before sending. #24780
  • [3.9] bpo-43332: Buffer proxy connection setup packets before sending. (GH-24780) #24783
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2021-03-08.21:48:46.248>
    created_at = <Date 2021-02-26.20:18:00.086>
    labels = ['expert-C-API', '3.9', '3.10', 'performance']
    title = 'Make http.client._tunnel send one byte string over the network'
    updated_at = <Date 2021-03-30.20:49:50.211>
    user = 'https://github.com/zveinn'

    bugs.python.org fields:

    activity = <Date 2021-03-30.20:49:50.211>
    actor = 'zveinn'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2021-03-08.21:48:46.248>
    closer = 'gregory.p.smith'
    components = ['C API']
    creation = <Date 2021-02-26.20:18:00.086>
    creator = 'zveinn'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 43332
    keywords = ['patch']
    message_count = 12.0
    messages = ['387742', '387743', '387745', '387747', '388163', '388164', '388168', '388204', '388258', '388259', '388309', '389851']
    nosy_count = 6.0
    nosy_names = ['terry.reedy', 'gregory.p.smith', 'orsenthil', 'berker.peksag', 'miss-islington', 'zveinn']
    pr_nums = ['24780', '24783']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue43332'
    versions = ['Python 3.9', 'Python 3.10']

    @zveinn
    Copy link
    Mannequin Author

    zveinn mannequin commented Feb 26, 2021

    Hey, some time ago I ran into some code in the cpython code I thought might be possible to improve it a little bit.

    https://github.com/python/cpython/blob/master/Lib/http/client.py#L903

    This code specifically.

    Notice how the self.send() method is used multiple time to construct the CONNECT request. When the network load is high, these different parts actually get split into separate network frames.

    This causes additional meta data to be sent, effectively costing more bandwidth and causing the request to be split into multiple network frames.

    This has some interesting behavior on the receiving end as well.

    If you send everything as a single network frame, then the receiver can read the entire thing in a single read call. If you send multiple frames, the main reader pipe now needs a temporary buffer to encapsulate the multiple calls.

    Because of this, sending requests as many network frames actually causes a rise in processing complexity on the receiving end.

    Here is a github issue I made about this problem some time ago: psf/requests#5384
    In this issue you will find detailed information and screenshots.

    My recommendation would be to construct the query as a whole before using a single self.send() to send the whole payload in one network frame. Even if we ignore the added complexity on the receivers end, the gain in network performance is worth it.

    @zveinn zveinn mannequin added 3.10 only security fixes topic-C-API performance Performance or resource usage labels Feb 26, 2021
    @zveinn zveinn mannequin changed the title def _tunnel(self): - uses multiple network writes, possibly causing unnecessary implementation complexity on the receiving end def _tunnel(self): - uses multiple network writes, possibly causing excessive network usage and increased implementation complexity on the other end Feb 26, 2021
    @zveinn zveinn mannequin changed the title def _tunnel(self): - uses multiple network writes, possibly causing unnecessary implementation complexity on the receiving end def _tunnel(self): - uses multiple network writes, possibly causing excessive network usage and increased implementation complexity on the other end Feb 26, 2021
    @zveinn zveinn mannequin changed the title def _tunnel(self): - uses multiple network writes, possibly causing excessive network usage and increased implementation complexity on the other end http/client.py: - uses multiple network writes, possibly causing excessive network usage and increased implementation complexity on the other end Feb 26, 2021
    @zveinn zveinn mannequin changed the title def _tunnel(self): - uses multiple network writes, possibly causing excessive network usage and increased implementation complexity on the other end http/client.py: - uses multiple network writes, possibly causing excessive network usage and increased implementation complexity on the other end Feb 26, 2021
    @zveinn
    Copy link
    Mannequin Author

    zveinn mannequin commented Feb 26, 2021

    P.s. Sorry for the formatting of the previous message, I´m new :S

    @zveinn
    Copy link
    Mannequin Author

    zveinn mannequin commented Feb 26, 2021

    def _tunnel(self):
            connect_str = "CONNECT %s:%d HTTP/1.0\r\n" % (self._tunnel_host,
                self._tunnel_port)
            connect_bytes = connect_str.encode("ascii") <<!!-- ASCII
            self.send(connect_bytes)
            for header, value in self._tunnel_headers.items():
                header_str = "%s: %s\r\n" % (header, value)
                header_bytes = header_str.encode("latin-1") <<!!-- LATIN-1
                self.send(header_bytes)
            self.send(b'\r\n')

    Is it possible that the method was designed this way so that the header could be encoded with latin-1 while the connect part is encoded in ascii ? Is that needed ?

    @zveinn
    Copy link
    Mannequin Author

    zveinn mannequin commented Feb 26, 2021

    also found this: https://dynatrace.github.io/OneAgent-SDK-for-Python/docs/encoding.html

    It might be relevant ?

    @terryjreedy
    Copy link
    Member

    I changed the title to what a PR/commit title should look like. Your justification is that "Multiple writes possibly cause excessive network usage and increased implementation complexity on the other end."

    I see no problem with the formatting of your first post.

    I presume the proposal is to make a list of bytes and then b''.join(the_list). This is now a standard idiom. Have you tested a patch locally? Can you make a PR?

    I don't know if there was a particular reason to not join before sending. Perhaps because successive sends effectively do the same thing, though with the possible downsides you note. I am not an expert on network usage.

    The _connect method was added by Senthil Kumaran in 2009 in bpo-1424152 and revised since. There is no current http maintainer, so I added as nosy Senthil and others who have worked on the module.

    @terryjreedy terryjreedy changed the title http/client.py: - uses multiple network writes, possibly causing excessive network usage and increased implementation complexity on the other end Make http.client._tunnel send one byte string over the network Mar 5, 2021
    @terryjreedy terryjreedy changed the title http/client.py: - uses multiple network writes, possibly causing excessive network usage and increased implementation complexity on the other end Make http.client._tunnel send one byte string over the network Mar 5, 2021
    @gpshead
    Copy link
    Member

    gpshead commented Mar 5, 2021

    yep, that'd be a worthwhile improvement. note that the send method in that code also accepts BytesIO objects so rather than doing our own sequence of bytes and join we could just buffer in one of those.

    @gpshead
    Copy link
    Member

    gpshead commented Mar 5, 2021

    FYI another common socket idiom for this, specifically added for use in old HTTP 1.x servers building up responses, is setting and clearing the TCP_CORK (Linux) or TCP_NOPUSH (FreeBSD, and maybe macos? [buggy?]) socket option before and after the set of send()s you want to be optimally buffered into packets.

    A more modern approach (often used with TCP_CORK) is not to build up a single buffer at all. Instead use writev(). os.writev() exists in Python these days. It limits userspace data shuffling in the normal case of everything getting written.

    Unfortunately... this old http client code and API isn't really setup for that. Everything is required to go through the HTTPConnection.send method. And the send method is not defined to take a list of bytes objects as writev() would require.

    So for this code, the patch we want is just the joined bytes or BytesIO. It is the simple better change that works everywhere without getting into platform specific idioms.

    For modern best practices I'd look to the async frameworks instead of this older library.

    @zveinn
    Copy link
    Mannequin Author

    zveinn mannequin commented Mar 6, 2021

    Hey!

    First of all, thank you for not shitting all over me <3

    I have never really used python and the only reason I found this is because I was developing a tool that accepted a LOT of CONNECT requests for python and I just happened to stumble upon this little nugget.

    Regarding a PR or a local path, it would have been too much work for me since I'm already swamped and I don't know anything about python or it's ecosystem :S

    Anyways, I leave this in your hands now.

    Regards, Zveinn

    @orsenthil orsenthil added 3.8 only security fixes 3.9 only security fixes labels Mar 8, 2021
    @gpshead
    Copy link
    Member

    gpshead commented Mar 8, 2021

    New changeset c25910a by Gregory P. Smith in branch 'master':
    bpo-43332: Buffer proxy connection setup packets before sending. (GH-24780)
    c25910a

    @miss-islington
    Copy link
    Contributor

    New changeset c6e7cf1 by Miss Islington (bot) in branch '3.9':
    bpo-43332: Buffer proxy connection setup packets before sending. (GH-24780)
    c6e7cf1

    @gpshead
    Copy link
    Member

    gpshead commented Mar 8, 2021

    I only ported this back to 3.9 as it is a bit late in 3.8's release cycle for a pure performance fix of an issue that has been around for ages.

    Thanks for raising the issue. The main http code already did this, the tunnel proxy code path clearly hadn't gotten much love.

    @gpshead gpshead removed the 3.8 only security fixes label Mar 8, 2021
    @gpshead gpshead closed this as completed Mar 8, 2021
    @gpshead gpshead removed the 3.8 only security fixes label Mar 8, 2021
    @gpshead gpshead closed this as completed Mar 8, 2021
    @zveinn
    Copy link
    Mannequin Author

    zveinn mannequin commented Mar 30, 2021

    No problem,

    Hopefully this will improve the performance on some network devices and proxy services.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes performance Performance or resource usage topic-C-API
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants