msg387742 - (view) |
Author: Zveinn (zveinn) |
Date: 2021-02-26 20:17 |
Hey, some time ago I ran into some code in the cpython code I thought might be possible to improve it a little bit.
https://github.com/python/cpython/blob/master/Lib/http/client.py#L903
This code specifically.
Notice how the self.send() method is used multiple time to construct the CONNECT request. When the network load is high, these different parts actually get split into separate network frames.
This causes additional meta data to be sent, effectively costing more bandwidth and causing the request to be split into multiple network frames.
This has some interesting behavior on the receiving end as well.
If you send everything as a single network frame, then the receiver can read the entire thing in a single read call. If you send multiple frames, the main reader pipe now needs a temporary buffer to encapsulate the multiple calls.
Because of this, sending requests as many network frames actually causes a rise in processing complexity on the receiving end.
Here is a github issue I made about this problem some time ago: https://github.com/psf/requests/issues/5384
In this issue you will find detailed information and screenshots.
My recommendation would be to construct the query as a whole before using a single self.send() to send the whole payload in one network frame. Even if we ignore the added complexity on the receivers end, the gain in network performance is worth it.
|
msg387743 - (view) |
Author: Zveinn (zveinn) |
Date: 2021-02-26 20:25 |
P.s. Sorry for the formatting of the previous message, I´m new :S
|
msg387745 - (view) |
Author: Zveinn (zveinn) |
Date: 2021-02-26 20:38 |
def _tunnel(self):
connect_str = "CONNECT %s:%d HTTP/1.0\r\n" % (self._tunnel_host,
self._tunnel_port)
connect_bytes = connect_str.encode("ascii") <<!!-- ASCII
self.send(connect_bytes)
for header, value in self._tunnel_headers.items():
header_str = "%s: %s\r\n" % (header, value)
header_bytes = header_str.encode("latin-1") <<!!-- LATIN-1
self.send(header_bytes)
self.send(b'\r\n')
Is it possible that the method was designed this way so that the header could be encoded with latin-1 while the connect part is encoded in ascii ? Is that needed ?
|
msg387747 - (view) |
Author: Zveinn (zveinn) |
Date: 2021-02-26 20:45 |
also found this: https://dynatrace.github.io/OneAgent-SDK-for-Python/docs/encoding.html
It might be relevant ?
|
msg388163 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2021-03-05 20:57 |
I changed the title to what a PR/commit title should look like. Your justification is that "Multiple writes possibly cause excessive network usage and increased implementation complexity on the other end."
I see no problem with the formatting of your first post.
I presume the proposal is to make a list of bytes and then b''.join(the_list). This is now a standard idiom. Have you tested a patch locally? Can you make a PR?
I don't know if there was a particular reason to not join before sending. Perhaps because successive sends effectively do the same thing, though with the possible downsides you note. I am not an expert on network usage.
The _connect method was added by Senthil Kumaran in 2009 in #1424152 and revised since. There is no current http maintainer, so I added as nosy Senthil and others who have worked on the module.
|
msg388164 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2021-03-05 21:04 |
yep, that'd be a worthwhile improvement. note that the send method in that code also accepts BytesIO objects so rather than doing our own sequence of bytes and join we could just buffer in one of those.
|
msg388168 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2021-03-05 22:13 |
FYI another common socket idiom for this, specifically added for use in old HTTP 1.x servers building up responses, is setting and clearing the TCP_CORK (Linux) or TCP_NOPUSH (FreeBSD, and maybe macos? [buggy?]) socket option before and after the set of send()s you want to be optimally buffered into packets.
A more modern approach (often used with TCP_CORK) is not to build up a single buffer at all. Instead use writev(). os.writev() exists in Python these days. It limits userspace data shuffling in the normal case of everything getting written.
Unfortunately... this old http client code and API isn't really setup for that. Everything is required to go through the HTTPConnection.send method. And the send method is not defined to take a list of bytes objects as writev() would require.
So for this code, the patch we want is just the joined bytes or BytesIO. It is the simple better change that works everywhere without getting into platform specific idioms.
For modern best practices I'd look to the async frameworks instead of this older library.
|
msg388204 - (view) |
Author: Zveinn (zveinn) |
Date: 2021-03-06 15:15 |
Hey!
First of all, thank you for not shitting all over me <3
I have never really used python and the only reason I found this is because I was developing a tool that accepted a LOT of CONNECT requests for python and I just happened to stumble upon this little nugget.
Regarding a PR or a local path, it would have been too much work for me since I'm already swamped and I don't know anything about python or it's ecosystem :S
Anyways, I leave this in your hands now.
Regards, Zveinn
|
msg388258 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2021-03-08 07:35 |
New changeset c25910a135c2245accadb324b40dd6453015e056 by Gregory P. Smith in branch 'master':
bpo-43332: Buffer proxy connection setup packets before sending. (GH-24780)
https://github.com/python/cpython/commit/c25910a135c2245accadb324b40dd6453015e056
|
msg388259 - (view) |
Author: miss-islington (miss-islington) |
Date: 2021-03-08 07:59 |
New changeset c6e7cf1ee09c88d35e6703c33a61eca7b9db54f3 by Miss Islington (bot) in branch '3.9':
bpo-43332: Buffer proxy connection setup packets before sending. (GH-24780)
https://github.com/python/cpython/commit/c6e7cf1ee09c88d35e6703c33a61eca7b9db54f3
|
msg388309 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2021-03-08 21:48 |
I only ported this back to 3.9 as it is a bit late in 3.8's release cycle for a pure performance fix of an issue that has been around for ages.
Thanks for raising the issue. The main http code already did this, the tunnel proxy code path clearly hadn't gotten much love.
|
msg389851 - (view) |
Author: Zveinn (zveinn) |
Date: 2021-03-30 20:49 |
No problem,
Hopefully this will improve the performance on some network devices and proxy services.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:42 | admin | set | github: 87498 |
2021-03-30 20:49:50 | zveinn | set | messages:
+ msg389851 |
2021-03-08 22:27:37 | orsenthil | set | stage: commit review -> resolved |
2021-03-08 21:48:46 | gregory.p.smith | set | status: open -> closed versions:
- Python 3.8 messages:
+ msg388309
resolution: fixed stage: patch review -> commit review |
2021-03-08 07:59:56 | miss-islington | set | messages:
+ msg388259 |
2021-03-08 07:37:14 | miss-islington | set | nosy:
+ miss-islington pull_requests:
+ pull_request23549
|
2021-03-08 07:35:27 | gregory.p.smith | set | messages:
+ msg388258 |
2021-03-08 03:38:04 | orsenthil | set | versions:
+ Python 3.8, Python 3.9 |
2021-03-07 19:30:31 | gregory.p.smith | set | assignee: gregory.p.smith |
2021-03-07 19:27:44 | gregory.p.smith | set | keywords:
+ patch stage: needs patch -> patch review pull_requests:
+ pull_request23545 |
2021-03-06 15:15:20 | zveinn | set | messages:
+ msg388204 |
2021-03-05 22:13:24 | gregory.p.smith | set | messages:
+ msg388168 |
2021-03-05 21:04:41 | gregory.p.smith | set | stage: needs patch |
2021-03-05 21:04:21 | gregory.p.smith | set | messages:
+ msg388164 |
2021-03-05 20:57:28 | terry.reedy | set | nosy:
+ gregory.p.smith, berker.peksag, terry.reedy, orsenthil
messages:
+ msg388163 title: http/client.py: - uses multiple network writes, possibly causing excessive network usage and increased implementation complexity on the other end -> Make http.client._tunnel send one byte string over the network |
2021-02-26 20:45:35 | zveinn | set | messages:
+ msg387747 |
2021-02-26 20:38:30 | zveinn | set | messages:
+ msg387745 |
2021-02-26 20:25:41 | zveinn | set | messages:
+ msg387743 |
2021-02-26 20:23:19 | zveinn | set | title: def _tunnel(self): - uses multiple network writes, possibly causing excessive network usage and increased implementation complexity on the other end -> http/client.py: - uses multiple network writes, possibly causing excessive network usage and increased implementation complexity on the other end |
2021-02-26 20:18:50 | zveinn | set | title: def _tunnel(self): - uses multiple network writes, possibly causing unnecessary implementation complexity on the receiving end -> def _tunnel(self): - uses multiple network writes, possibly causing excessive network usage and increased implementation complexity on the other end |
2021-02-26 20:18:00 | zveinn | create | |