Issue 41002: HTTPResponse.read with amt is slow

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/85174

classification

Title:	HTTPResponse.read with amt is slow
Type:	performance	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.10

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	bmerry, methane, miss-islington, op368
Priority:	normal	Keywords:	patch

Created on 2020-06-17 12:08 by bmerry, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
httpbench-simple.py	bmerry, 2020-06-17 12:08	Micro-benchmark to demonstrate the issue

Pull Requests
URL	Status	Linked	Edit
PR 20943	merged	bmerry, 2020-06-17 15:41

Messages (5)
msg371732 - (view)	Author: Bruce Merry (bmerry) *	Date: 2020-06-17 12:08
I've run into this on 3.8, but the code on Git master doesn't look significantly different so I assume it still applies. I'm happy to work on a PR for this. When http.client.HTTPResponse.read is called with a specific amount to read, it goes down this code path: ``` if amt is not None: # Amount is given, implement using readinto b = bytearray(amt) n = self.readinto(b) return memoryview(b)[:n].tobytes() ``` That's pretty inefficient, because - `bytearray(amt)` will first zero-fill some memory - `tobytes()` will make an extra copy of this memory - if amt is big enough, it'll cause the temporary memory to be allocated from the kernel, which will also zero-fill the pages for security. A better approach would be to use the read method of the underlying fp. I have a micro-benchmark (that I'll attach) showing that for a 1GB body and reading the whole body with or without the amount being explicit, performance is reduced from 3GB/s to 1GB/s. For some unknown reason the requests library likes to read the body in 10KB chunks even if the user has requested the entire body, so this will help here (although the gains probably won't be as big because 10KB is really too small to amortise all the accounting overhead). Output from my benchmark, run against a 1GB file on localhost: httpclient-read: 3019.0 ± 63.8 MB/s httpclient-read-length: 1050.3 ± 4.8 MB/s httpclient-read-raw: 3150.3 ± 5.3 MB/s socket-read: 3134.4 ± 7.9 MB/s
msg371821 - (view)	Author: Open Close (op368) *	Date: 2020-06-18 16:32
@bmerry check the test results again. (perhaps 'MB/s's are wrong). httpclient-read: 3019.0 ± 63.8 MB/s httpclient-read-length: 1050.3 ± 4.8 MB/s --> httpclient-read-raw: 3150.3 ± 5.3 MB/s --> socket-read: 3134.4 ± 7.9 MB/s
msg371822 - (view)	Author: Bruce Merry (bmerry) *	Date: 2020-06-18 16:39
> (perhaps 'MB/s's are wrong). Why, are you getting significantly different results? Just in case it's confusing, the results are reported as A ± B MB/s, where A is the mean and B is the standard deviation of the mean. So it's about 3GB/s when no length if passed, or 1GB/s when a length is passed.
msg371824 - (view)	Author: Open Close (op368) *	Date: 2020-06-18 16:46
@bmerry yah, sorry, don't bother. I have mistaken. (I thought somehow 'MB/s' was simple speed, not std). I confirmed your tests. httpclient-read: 2504.5 ± 10.6 MB/s httpclient-read-length: 871.5 ± 4.9 MB/s httpclient-read-raw: 2528.3 ± 3.6 MB/s socket-read: 2520.9 ± 3.6 MB/s
msg372304 - (view)	Author: miss-islington (miss-islington)	Date: 2020-06-25 06:30
New changeset 152f0b8beea12e6282d284100b600771b968927a by Bruce Merry in branch 'master': bpo-41002: Optimize HTTPResponse.read with a given amount (GH-20943) https://github.com/python/cpython/commit/152f0b8beea12e6282d284100b600771b968927a

History
Date	User	Action	Args
2022-04-11 14:59:32	admin	set	github: 85174
2020-06-25 06:31:52	methane	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2020-06-25 06:30:29	miss-islington	set	nosy: + miss-islington messages: + msg372304
2020-06-18 16:46:20	op368	set	messages: + msg371824
2020-06-18 16:39:48	bmerry	set	messages: + msg371822
2020-06-18 16:32:50	op368	set	nosy: + op368 messages: + msg371821
2020-06-18 02:02:26	methane	set	nosy: + methane versions: + Python 3.10, - Python 3.8
2020-06-17 15:41:04	bmerry	set	keywords: + patch stage: patch review pull_requests: + pull_request20124
2020-06-17 12:09:16	bmerry	set	type: performance
2020-06-17 12:08:16	bmerry	create