This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author bmerry
Recipients bmerry
Date 2019-02-20.12:55:43
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1550667343.87.0.589252473583.issue36050@roundup.psfhosted.org>
In-reply-to
Content
While investigating poor HTTP read performance I discovered that reading all the data from a response with a content-length goes via _safe_read, which in turn reads in chunks of at most MAXAMOUNT (1MB) before stitching them together with b"".join. This can really hurt performance for responses larger than MAXAMOUNT, because
(a) the data has to be copied an additional time; and
(b) the join operation doesn't drop the GIL, so this limits multi-threaded scaling.

I'm struggling to see any advantage in doing this chunking - it's not saving memory either (in fact it is wasting it).

To give an idea of the performance impact, changing MAXAMOUNT to a very large value made a multithreaded test of mine go from 800MB/s to 2.5GB/s (which is limited by the network speed).
History
Date User Action Args
2019-02-20 12:55:43bmerrysetrecipients: + bmerry
2019-02-20 12:55:43bmerrysetmessageid: <1550667343.87.0.589252473583.issue36050@roundup.psfhosted.org>
2019-02-20 12:55:43bmerrylinkissue36050 messages
2019-02-20 12:55:43bmerrycreate