This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dw
Recipients dw
Date 2014-07-17.22:25:35
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1405635937.15.0.406925083031.issue22003@psf.upfronthosting.co.za>
In-reply-to
Content
This is a followup to the thread at https://mail.python.org/pipermail/python-dev/2014-July/135543.html , discussing the existing behaviour of BytesIO copying its source object, and how this regresses compared to cStringIO.StringI.

The goal of posting the patch on list was to try and stimulate discussion around the approach. The patch itself obviously isn't ready for review, and I'm not in a position to dedicate time to it just now (although in a few weeks I'd love to give it full attention!).

Ignoring this quick implementation, are there any general comments around the approach?

My only concern is that it might keep large objects alive in a non-intuitive way in certain circumstances, though I can't think of any obvious ones immediately.

Also interested in comments on the second half of that thread: "a natural extension of this is to do something very similar on the write side: instead of generating a temporary private heap allocation, generate (and freely resize) a private PyBytes object until it is exposed to the user, at which point, _getvalue() returns it, and converts its into an IO_SHARED buffer."

There are quite a few interactions with making that work correctly, in particular:

* How BytesIO would implement the buffers interface without causing the under-construction Bytes to become readonly

* Avoiding redundant copies and resizes -- we can't simply tack 25% slack on the end of the Bytes and then truncate it during getvalue() without likely triggering a copy and move, however with careful measurement of allocator behavior there are various tradeoffs that could be made - e.g. obmalloc won't move a <500 byte allocation if it shrinks by <25%. glibc malloc's rules are a bit more complex though.

Could also add a private _PyBytes_SetSize() API to allow truncation to the final size during getvalue() without informing the allocator. Then we'd simply overallocate by up to 10% or 1-2kb, and write off the loss of the slack space.

Notably, this approach completely differs from the one documented in http://bugs.python.org/issue15381 .. it's not clear to me which is better.
History
Date User Action Args
2014-07-17 22:25:39dwsetrecipients: + dw
2014-07-17 22:25:37dwsetmessageid: <1405635937.15.0.406925083031.issue22003@psf.upfronthosting.co.za>
2014-07-17 22:25:37dwlinkissue22003 messages
2014-07-17 22:25:36dwcreate