classification
Title: Make http.client._tunnel send one byte string over the network
Type: performance Stage: resolved
Components: C API Versions: Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: berker.peksag, gregory.p.smith, miss-islington, orsenthil, terry.reedy, zveinn
Priority: normal Keywords: patch

Created on 2021-02-26 20:18 by zveinn, last changed 2021-03-30 20:49 by zveinn. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 24780 merged gregory.p.smith, 2021-03-07 19:27
PR 24783 merged miss-islington, 2021-03-08 07:37
Messages (12)
msg387742 - (view) Author: Zveinn (zveinn) Date: 2021-02-26 20:17
Hey, some time ago I ran into some code in the cpython code I thought might be possible to improve it a little bit.

https://github.com/python/cpython/blob/master/Lib/http/client.py#L903

This code specifically. 

Notice how the self.send() method is used multiple time to construct the CONNECT request. When the network load is high, these different parts actually get split into separate network frames.

This causes additional meta data to be sent, effectively costing more bandwidth and causing the request to be split into multiple network frames.

This has some interesting behavior on the receiving end as well. 

If you send everything as a single network frame, then the receiver can read the entire thing in a single read call. If you send multiple frames, the main reader pipe now needs a temporary buffer to encapsulate the multiple calls. 

Because of this, sending requests as many network frames actually causes a rise in processing complexity on the receiving end. 

Here is a github issue I made about this problem some time ago: https://github.com/psf/requests/issues/5384
In this issue you will find detailed information and screenshots.

My recommendation would be to construct the query as a whole before using a single self.send() to send the whole payload in one network frame. Even if we ignore the added complexity on the receivers end, the gain in network performance is worth it.
msg387743 - (view) Author: Zveinn (zveinn) Date: 2021-02-26 20:25
P.s. Sorry for the formatting of the previous message, I´m new :S
msg387745 - (view) Author: Zveinn (zveinn) Date: 2021-02-26 20:38
def _tunnel(self):
        connect_str = "CONNECT %s:%d HTTP/1.0\r\n" % (self._tunnel_host,
            self._tunnel_port)
        connect_bytes = connect_str.encode("ascii") <<!!-- ASCII
        self.send(connect_bytes)
        for header, value in self._tunnel_headers.items():
            header_str = "%s: %s\r\n" % (header, value)
            header_bytes = header_str.encode("latin-1") <<!!-- LATIN-1
            self.send(header_bytes)
        self.send(b'\r\n')


Is it possible that the method was designed this way so that the header could be encoded with latin-1 while the connect part is encoded in ascii ? Is that needed ?
msg387747 - (view) Author: Zveinn (zveinn) Date: 2021-02-26 20:45
also found this: https://dynatrace.github.io/OneAgent-SDK-for-Python/docs/encoding.html

It might be relevant ?
msg388163 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-03-05 20:57
I changed the title to what a PR/commit title should look like.  Your justification is that "Multiple writes possibly cause excessive network usage and increased implementation complexity on the other end."

I see no problem with the formatting of your first post.

I presume the proposal is to make a list of bytes and then b''.join(the_list).  This is now a standard idiom.  Have you tested a patch locally?  Can you make a PR?

I don't know if there was a particular reason to not join before sending.  Perhaps because successive sends effectively do the same thing, though with the possible  downsides you note.  I am not an expert on network usage.

The _connect method was added by Senthil Kumaran in 2009 in #1424152 and revised since.  There is no current http maintainer, so I added as nosy  Senthil and others who have worked on the module.
msg388164 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-03-05 21:04
yep, that'd be a worthwhile improvement.  note that the send method in that code also accepts BytesIO objects so rather than doing our own sequence of bytes and join we could just buffer in one of those.
msg388168 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-03-05 22:13
FYI another common socket idiom for this, specifically added for use in old HTTP 1.x servers building up responses, is setting and clearing the TCP_CORK (Linux) or TCP_NOPUSH (FreeBSD, and maybe macos? [buggy?]) socket option before and after the set of send()s you want to be optimally buffered into packets.

A more modern approach (often used with TCP_CORK) is not to build up a single buffer at all.  Instead use writev().  os.writev() exists in Python these days.  It limits userspace data shuffling in the normal case of everything getting written.

Unfortunately... this old http client code and API isn't really setup for that.  Everything is required to go through the HTTPConnection.send method.  And the send method is not defined to take a list of bytes objects as writev() would require.  

So for this code, the patch we want is just the joined bytes or BytesIO. It is the simple better change that works everywhere without getting into platform specific idioms.

For modern best practices I'd look to the async frameworks instead of this older library.
msg388204 - (view) Author: Zveinn (zveinn) Date: 2021-03-06 15:15
Hey! 

First of all, thank you for not shitting all over me <3 

I have never really used python and the only reason I found this is because I was developing a tool that accepted a LOT of CONNECT requests for python and I just happened to stumble upon this little nugget.

Regarding a PR or a local path, it would have been too much work for me since I'm already swamped and I don't know anything about python or it's ecosystem :S

Anyways, I leave this in your hands now. 

Regards, Zveinn
msg388258 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-03-08 07:35
New changeset c25910a135c2245accadb324b40dd6453015e056 by Gregory P. Smith in branch 'master':
bpo-43332: Buffer proxy connection setup packets before sending. (GH-24780)
https://github.com/python/cpython/commit/c25910a135c2245accadb324b40dd6453015e056
msg388259 - (view) Author: miss-islington (miss-islington) Date: 2021-03-08 07:59
New changeset c6e7cf1ee09c88d35e6703c33a61eca7b9db54f3 by Miss Islington (bot) in branch '3.9':
bpo-43332: Buffer proxy connection setup packets before sending. (GH-24780)
https://github.com/python/cpython/commit/c6e7cf1ee09c88d35e6703c33a61eca7b9db54f3
msg388309 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-03-08 21:48
I only ported this back to 3.9 as it is a bit late in 3.8's release cycle for a pure performance fix of an issue that has been around for ages.

Thanks for raising the issue.  The main http code already did this, the tunnel proxy code path clearly hadn't gotten much love.
msg389851 - (view) Author: Zveinn (zveinn) Date: 2021-03-30 20:49
No problem, 

Hopefully this will improve the performance on some network devices and proxy services.
History
Date User Action Args
2021-03-30 20:49:50zveinnsetmessages: + msg389851
2021-03-08 22:27:37orsenthilsetstage: commit review -> resolved
2021-03-08 21:48:46gregory.p.smithsetstatus: open -> closed
versions: - Python 3.8
messages: + msg388309

resolution: fixed
stage: patch review -> commit review
2021-03-08 07:59:56miss-islingtonsetmessages: + msg388259
2021-03-08 07:37:14miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request23549
2021-03-08 07:35:27gregory.p.smithsetmessages: + msg388258
2021-03-08 03:38:04orsenthilsetversions: + Python 3.8, Python 3.9
2021-03-07 19:30:31gregory.p.smithsetassignee: gregory.p.smith
2021-03-07 19:27:44gregory.p.smithsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request23545
2021-03-06 15:15:20zveinnsetmessages: + msg388204
2021-03-05 22:13:24gregory.p.smithsetmessages: + msg388168
2021-03-05 21:04:41gregory.p.smithsetstage: needs patch
2021-03-05 21:04:21gregory.p.smithsetmessages: + msg388164
2021-03-05 20:57:28terry.reedysetnosy: + gregory.p.smith, berker.peksag, terry.reedy, orsenthil

messages: + msg388163
title: http/client.py: - uses multiple network writes, possibly causing excessive network usage and increased implementation complexity on the other end -> Make http.client._tunnel send one byte string over the network
2021-02-26 20:45:35zveinnsetmessages: + msg387747
2021-02-26 20:38:30zveinnsetmessages: + msg387745
2021-02-26 20:25:41zveinnsetmessages: + msg387743
2021-02-26 20:23:19zveinnsettitle: def _tunnel(self): - uses multiple network writes, possibly causing excessive network usage and increased implementation complexity on the other end -> http/client.py: - uses multiple network writes, possibly causing excessive network usage and increased implementation complexity on the other end
2021-02-26 20:18:50zveinnsettitle: def _tunnel(self): - uses multiple network writes, possibly causing unnecessary implementation complexity on the receiving end -> def _tunnel(self): - uses multiple network writes, possibly causing excessive network usage and increased implementation complexity on the other end
2021-02-26 20:18:00zveinncreate