Message 217401 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	skip.montanaro
Recipients	ezio.melotti, nadeem.vawda, neologix, pitrou, serhiy.storchaka, skip.montanaro, tiwilliam
Date	2014-04-28.19:24:25
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CANc-5UynFsAaFWz94xUMvz6vvFvWTGR6gDsG8MaP58U9pwtPMA@mail.gmail.com>
In-reply-to	<1398711580.2393.7.camel@fsol>

Content
On Mon, Apr 28, 2014 at 1:59 PM, Antoine Pitrou <report@bugs.python.org> wrote: > Well, I think that compressed files in general would benefit from a > larger buffer size than plain binary I/O, but that's just a hunch. I agree. When writing my patch, my (perhaps specious) thinking went like this. * We have a big-ass file, so we compress it. * On average, when seeking to another point in that file, we probably want to go a long way. * It's possible that operating system read-ahead semantics will make read performance relatively high. * That would put more burden on the Python code to be efficient. * Larger buffer sizes will reduce the amount of Python bytecode which must be executed. So, if I have a filesystem block size of 8192 bytes, while that would represent some sort of "optimal" chunk size, in practice, I think operating system read-ahead and post-read processing of the bytes read will tend to suggest larger chunk sizes. Hence my naive choice of 16k bytes for _CHUNK_SIZE in my patch. Skip

On Mon, Apr 28, 2014 at 1:59 PM, Antoine Pitrou <report@bugs.python.org> wrote:
> Well, I think that compressed files in general would benefit from a
> larger buffer size than plain binary I/O, but that's just a hunch.

I agree. When writing my patch, my (perhaps specious) thinking went like this.

* We have a big-ass file, so we compress it.
* On average, when seeking to another point in that file, we probably
want to go a long way.
* It's possible that operating system read-ahead semantics will make
read performance relatively high.
* That would put more burden on the Python code to be efficient.
* Larger buffer sizes will reduce the amount of Python bytecode which
must be executed.

So, if I have a filesystem block size of 8192 bytes, while that would
represent some sort of "optimal" chunk size, in practice, I think
operating system read-ahead and post-read processing of the bytes read
will tend to suggest larger chunk sizes. Hence my naive choice of 16k
bytes for _CHUNK_SIZE in my patch.

Skip

History
Date	User	Action	Args
2014-04-28 19:24:26	skip.montanaro	set	recipients: + skip.montanaro, pitrou, nadeem.vawda, ezio.melotti, neologix, serhiy.storchaka, tiwilliam
2014-04-28 19:24:26	skip.montanaro	link	issue20962 messages
2014-04-28 19:24:25	skip.montanaro	create