BytesIO copy-on-write #66202

dw · 2014-07-17T22:25:37Z

BPO	22003
Nosy	@pitrou, @scoder, @benjaminp, @skrah, @hynek, @dw, @serhiy-storchaka
Files	cow.patch: BytesIO patch against hg 4c2f3240ad65 cow2.patch: cow.patch version 2 against hg 4c2f3240ad65 cow3.patch: cow.patch version 3 against hg 4c2f3240ad65 cow4.patch: cow.patch version 4 against hg 2fc379ce5762 cow5.patch: cow.patch version 5 against hg 96ea15ee8525 cow6.patch: cow.patch version 6 against hg 8c1438c15ed0 whatsnew.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2014-07-29.23:46:30.037>
created_at = <Date 2014-07-17.22:25:37.100>
labels = ['library', 'performance']
title = 'BytesIO copy-on-write'
updated_at = <Date 2015-03-04.21:04:29.471>
user = 'https://github.com/dw'

bugs.python.org fields:

activity = <Date 2015-03-04.21:04:29.471>
actor = 'dw'
assignee = 'none'
closed = True
closed_date = <Date 2014-07-29.23:46:30.037>
closer = 'pitrou'
components = ['Library (Lib)']
creation = <Date 2014-07-17.22:25:37.100>
creator = 'dw'
dependencies = []
files = ['35988', '36004', '36005', '36016', '36078', '36137', '38058']
hgrepos = []
issue_num = 22003
keywords = ['patch']
message_count = 37.0
messages = ['223383', '223385', '223386', '223401', '223402', '223522', '223526', '223542', '223581', '223582', '223583', '223586', '223588', '223599', '223600', '223611', '223633', '223681', '223682', '223683', '223692', '223707', '223907', '223962', '224121', '224139', '224164', '224168', '224169', '224238', '224273', '224274', '235596', '235619', '236002', '237207', '237212']
nosy_count = 11.0
nosy_names = ['pitrou', 'scoder', 'benjamin.peterson', 'stutzbach', 'skrah', 'python-dev', 'hynek', 'piotr.dobrogost', 'dw', 'serhiy.storchaka', 'kmike']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue22003'
versions = ['Python 3.5']

dw · 2014-07-17T22:25:35Z

This is a followup to the thread at https://mail.python.org/pipermail/python-dev/2014-July/135543.html , discussing the existing behaviour of BytesIO copying its source object, and how this regresses compared to cStringIO.StringI.

The goal of posting the patch on list was to try and stimulate discussion around the approach. The patch itself obviously isn't ready for review, and I'm not in a position to dedicate time to it just now (although in a few weeks I'd love to give it full attention!).

Ignoring this quick implementation, are there any general comments around the approach?

My only concern is that it might keep large objects alive in a non-intuitive way in certain circumstances, though I can't think of any obvious ones immediately.

Also interested in comments on the second half of that thread: "a natural extension of this is to do something very similar on the write side: instead of generating a temporary private heap allocation, generate (and freely resize) a private PyBytes object until it is exposed to the user, at which point, _getvalue() returns it, and converts its into an IO_SHARED buffer."

There are quite a few interactions with making that work correctly, in particular:

How BytesIO would implement the buffers interface without causing the under-construction Bytes to become readonly
Avoiding redundant copies and resizes -- we can't simply tack 25% slack on the end of the Bytes and then truncate it during getvalue() without likely triggering a copy and move, however with careful measurement of allocator behavior there are various tradeoffs that could be made - e.g. obmalloc won't move a <500 byte allocation if it shrinks by <25%. glibc malloc's rules are a bit more complex though.

Could also add a private _PyBytes_SetSize() API to allow truncation to the final size during getvalue() without informing the allocator. Then we'd simply overallocate by up to 10% or 1-2kb, and write off the loss of the slack space.

Notably, this approach completely differs from the one documented in http://bugs.python.org/issue15381 .. it's not clear to me which is better.

dw · 2014-07-17T22:30:56Z

Submitted contributor agreement. Please consider the demo patch licensed under the Apache 2 licence.

pitrou · 2014-07-17T22:46:38Z

Be careful what happens when the original object is mutable:

>>> b = bytearray(b"abc")
>>> bio = io.BytesIO(b)
>>> b[:] = b"defghi"
>>> bio.getvalue()
b'abc'

I don't know what your patch does in this case.

dw · 2014-07-18T07:12:03Z

Good catch :( There doesn't seem to be way a to ask for an immutable buffer, so perhaps it could just be a little more selective. I think the majority of use cases would still be covered if the sharing behaviour was restricted only to BytesType.

In that case "Py_buffer initialdata" could become a PyObject*, saving a small amount of memory, and allowing reuse of the struct member if BytesIO was also modified to directly write into a private BytesObject

scoder · 2014-07-18T08:42:32Z

Even if there is no way to explicitly request a RO buffer, the Py_buffer struct that you get back actually tells you if it's read-only or not. Shouldn't that be enough to enable this optimisation?

Whether or not implementors of the buffer protocol set this flag correctly is another question, but if not then they need fixing on their side anyway. (And in the vast majority of cases, the implementor will be either CPython or NumPy.)

Also, generally speaking, I think such an optimisation would be nice, even if it only catches some common cases (and doesn't break the others :). It could still copy data if necessary, but try to avoid it if possible.

dw · 2014-07-20T16:50:34Z

This version is tidied up enough that I think it could be reviewed.

Changes are:

Defer `buf' allocation until __init__, rather than __new__ as was previously done. Now upon completion, BytesIO.__new__ returns a valid, closed BytesIO, whereas previously a valid, empty, open BytesIO was returned. Is this interface change permissible?
Move __init__ guts into a "reinit()", for sharing with __setstate__, which also previously caused an unnecessary copy. Additionally gather up various methods for deallocating buffers into a single "reset()" function, called by reinit(), _dealloc(), and _close()
Per Stefan's suggested approach, reinit() now explicitly checks for a read-only buffer, falling back to silently performing a copy if the returned buffer is read-write. That seems vastly preferable to throwing an exception, which would probably be another interface change.
Call unshare() any place the buffer is about to be modified. If the buffer needs to be made private, it also accepts a size hint indicating how much less/more space the subsequent operation needs, to avoid a redundant reallocation after the unsharing.

Outstanding issues:

I don't understand why buf_size is a size_t, and I'm certain the casting in unshare() is incorrect somehow. Is it simply to avoid signed overflow?

dw · 2014-07-20T17:55:54Z

New patch also calls unshare() during getbuffer()

dw · 2014-07-20T22:07:21Z

I'm not sure the "read only buffer" test is strong enough: having a readonly view is not a guarantee that the data in the view cannot be changed through some other means, i.e. it is read-only, not immutable.

Pretty sure this approach is broken. What about the alternative approach of specializing for Bytes?

pitrou · 2014-07-21T15:31:02Z

Pretty sure this approach is broken. What about the alternative approach of specializing for Bytes?

That would certainly sound good enough, to optimize the common case.

Also, it would be nice if you could add some tests to the patch (e.g. to stress the bytearray case). Thank you!

pitrou · 2014-07-21T15:31:47Z

As for whether the "checking for a readonly view" approach is broken, I don't know: that part of the buffer API is still mysterious to me. Stefan, would you have some insight?

skrah · 2014-07-21T16:04:48Z

I think checking for a readonly view is fine. The protocol is this:

Use the PyBUF_WRITABLE flag in the request. Then the provider must
either have a writable buffer or else deny the request entirely.
Omit the PyBUF_WRITABLE flag in the request. Then the provider can
return a writable or a readonly buffer, but must set the readonly flag
correctly AND export the same type of buffer to ALL consumers.

It is not possible to ask for a readonly buffer explicitly, but the
readonly flag in the Py_Buffer struct should always be set correctly.

It is hard to guess the original intention of the PEP-3118 authors, but
in practice "readonly" means "immutable" here. IMO a buffer provider would
be seriously broken if a readonly buffer is mutated in any way.

skrah · 2014-07-21T16:23:01Z

The original wording in the PEP is this:

readonly
--------
an integer variable to hold whether or not the memory is readonly. 1
means the memory is readonly, zero means the memory is writable.

To me this means that a hypothetical compiler that could figure
out at compile time that the readonly flag is set would be allowed
to put the buffer contents into the read-only data section.

dw · 2014-07-21T17:02:09Z

Stefan,

Thanks for digging here. As much as I'd love to follow this interpretation, it simply doesn't match existing buffer implementations, including within the standard library.

For example, mmap.mmap(..., flags=mmap.MAP_SHARED, prot=mmap.PROT_READ) will produce a read-only buffer, yet mutability is entirely at the whim of the operating system. In this case, "immutability" may be apparent for years, until some machine has memory pressure, causing the shared mapping to be be flushed, and refreshed from (say, incoherent NFS storage) on next access.

I thought it would be worth auditing some of the most popular types of buffer just to check your interpretation, and this was the first, most obvious candidate.

Any thoughts? I'm leaning heavily toward the Bytes specialization approach

dw · 2014-07-21T18:24:41Z

I'm not sure how much work it would be, or even if it could be made sufficient to solve our problem, but what about extending the buffers interface to include a "int stable" flag, defaulting to 0?

It seems though, that it would just be making the already arcane buffers interface even more arcane simply for the benefit of our specific use case

skrah · 2014-07-21T19:09:33Z

I'm sure many exporters aren't setting the right flags; on the other hand
we already hash memoryviews based on readonly buffers, assuming they are
immutable.

dw · 2014-07-21T22:03:25Z

Hi Stefan,

How does this approach in reinit() look? We first ask for a writable buffer, and if the object obliges, immediately copy it. Otherwise if it refused, ask for a read-only buffer, and this time expect that it will never change.

This still does not catch the case of mmap.mmap. I am not sure how do deal with mmap.mmap. There is no way for it to export PROT_READ as a read-only buffer without permitted mutation, so the only options seem to either be a) remove buffer support from mmap, or b) blacklist it in bytesio(!).

Antoine, I have padded out the unit tests a little. test_memoryio.py seems the best place for them. Also modified test_sizeof(), although to the way this test is designed seems inherently brittle to begin with. Now it is also sensitive to changes in Py_buffer struct.

Various other changes:

__new__ once again returns a valid, open, empty BytesIO, since the alternative breaks pickling.
reinit() preserves existing BytesIO state until it knows it can succeed, which fixes another of the pickle tests.
setstate() had CHECK_CLOSED() re-added, again for the pickle tests.

Probably the patch guts could be rearranged again, since the definition of the functions is no longer as clear as it was in cow3.patch.

serhiy-storchaka · 2014-07-22T07:15:43Z

See also bpo-15381.

pitrou · 2014-07-22T18:50:02Z

There's also the following code in numpy's getbuffer method:

    /*
     * If a read-only buffer is requested on a read-write array, we return a
     * read-write buffer, which is dubious behavior. But that's why this call
     * is guarded by PyArray_ISWRITEABLE rather than (flags &
     * PyBUF_WRITEABLE).
     */
    if (PyArray_ISWRITEABLE(self)) {
        if (array_might_be_written(self) < 0) {
            goto fail;
        }
    }

... which seems to imply that mmap is not the only one with "dubious behaviour" (?).

skrah · 2014-07-22T19:12:25Z

Actually we have an extra safety net in memory_hash() apart from
the readonly check: We also check if the underlying object is
hashable.

This might be applicable here, too. Unfortunately mmap objects
*are* hashable, leading to some funny results:

>>> import mmap
>>> with open("hello.txt", "wb") as f:
...     f.write(b"xxxxx\n")
...
6
>>> f = open("hello.txt", "r+b")
>>> mm = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
>>> x = memoryview(mm)
>>> hash(mm)
-9223363309538046107
>>> hash(x)
-3925142568057840789
>>> x.tolist()
[120, 120, 120, 120, 120, 10]
>>>
>>> with open("hello.txt", "wb") as g:
...     g.write(b"yyy\n")
...
4
>>> hash(mm)
-9223363309538046107
>>> hash(x)
-3925142568057840789
>>> x.tolist()
[121, 121, 121, 10, 0, 0]

memoryview (rightfully) assumes that hashable objects are immutable
and caches the first hash.

I'm not sure why mmap objects are hashable, it looks like a bug
to me.

skrah · 2014-07-22T19:37:26Z

I think the mmap behavior is probably worse than the NumPy example.

I assume that in the example the exporter sets view.readonly=0.
mmap objects set view.readonly=1 and can still be mutated.

dw · 2014-07-22T20:31:39Z

Stefan, I like your new idea. If there isn't some backwards compatibility argument about mmap.mmap being hashable, then it could be considered a bug, and fixed in the same hypothetical future release that includes this BytesIO change. The only cost now is that to test for hashability, we must hash the object, which causes every byte in it to be touched (aka. almost 50% the cost of a copy)

If however we can't fix mmap.mmap due to the interface change (I think that's a silly idea -- Python has never been about letting the user shoot themselves in the foot), then the specialized-for-Bytes approach is almost as good (and perhaps even better, since the resulting concept and structure layout is more aligned with Serhiy's patch in bpo-15381).

tl;dr:

a) mmap.mmap can be fixed - use hashability as strong test for immutability (instead of ugly heuristic involving buffer blags)

undecided: is calling hash(obj) to check for immutability too costly?

b) mmap.mmap can't be fixed - use the Bytes specialization approach.

pitrou · 2014-07-22T23:19:56Z

I don't like the idea of trying to hash the object. It may be a time-consuming operation, while the result will be thrown away.

I think restricting the optimization to bytes objects is fine. We can whitelist other types, such as memoryview.

dw · 2014-07-24T22:56:14Z

This new patch abandons the buffer interface and specializes for Bytes per the comments on this issue.

Anyone care to glance at least at the general structure?

Tests could probably use a little more work.

Microbenchmark seems fine, at least for construction. It doesn't seem likely this patch would introduce severe performance troubles elsewhere, but I'd like to trying it out with some example heavy BytesIO consumers (any suggestions? Some popular template engine?)

cpython] ./python.exe -m timeit -s 'import i' 'i.readlines()'
lines: 54471
100 loops, best of 3: 13.3 msec per loop

[23:52:55 eldil!58 cpython] ./python-nocow -m timeit -s 'import i' 'i.readlines()'
lines: 54471
10 loops, best of 3: 19.6 msec per loop

[23:52:59 eldil!59 cpython] cat i.py
import io
word = b'word'
line = (word * int(79/len(word))) + b'\n'
ar = line * int((4 * 1048576) / len(line))
def readlines():
return len(list(io.BytesIO(ar)))
print('lines: %s' % (readlines(),))

pitrou · 2014-07-25T16:26:58Z

It doesn't seem likely this patch would introduce severe performance troubles elsewhere, but I'd like to trying it out with some example heavy BytesIO consumers (any suggestions? Some popular template engine?)

I don't have any specific suggestions, but you could try the benchmark suite here:
http://hg.python.org/benchmarks

dw · 2014-07-27T12:05:27Z

Hey Antoine,

Thanks for the link. I'm having trouble getting reproducible results at present, and running out of ideas as to what might be causing it. Even after totally isolating a CPU for e.g. django_v2 and with frequency scaling disabled, numbers still jump around for the same binary by as much as 3%.

I could not detect any significant change between runs of the old and new binary that could not be described as noise, given the isolation issues above.

pitrou · 2014-07-27T15:48:36Z

Even after totally isolating a CPU for e.g. django_v2 and with frequency scaling disabled, numbers still jump around for the same binary by as much as 3%.

That's expected. If the difference doesn't go above 5-10%, then you IMO can pretty much consider your patch didn't have any impact on those benchmarks.

dw · 2014-07-28T12:30:27Z

Newest patch incorporates Antoine's review comments. The final benchmark results are below. Just curious, what causes e.g. telco to differ up to 7% between runs? That's really huge

Report on Linux k2 3.14-1-amd64 #1 SMP Debian 3.14.9-1 (2014-06-30) x86_64
Total CPU cores: 4

### call_method_slots ###
Min: 0.329869 -> 0.340487: 1.03x slower
Avg: 0.330512 -> 0.341786: 1.03x slower
Significant (t=-216.69)
Stddev: 0.00067 -> 0.00060: 1.1111x smaller

### call_method_unknown ###
Min: 0.351167 -> 0.343961: 1.02x faster
Avg: 0.351731 -> 0.344580: 1.02x faster
Significant (t=238.89)
Stddev: 0.00033 -> 0.00040: 1.2271x larger

### call_simple ###
Min: 0.257487 -> 0.277366: 1.08x slower
Avg: 0.257942 -> 0.277809: 1.08x slower
Significant (t=-845.64)
Stddev: 0.00029 -> 0.00029: 1.0126x smaller

### etree_generate ###
Min: 0.377985 -> 0.365952: 1.03x faster
Avg: 0.381797 -> 0.369452: 1.03x faster
Significant (t=31.15)
Stddev: 0.00314 -> 0.00241: 1.3017x smaller

### etree_iterparse ###
Min: 0.545668 -> 0.565437: 1.04x slower
Avg: 0.554188 -> 0.576807: 1.04x slower
Significant (t=-17.00)
Stddev: 0.00925 -> 0.00956: 1.0340x larger

### etree_process ###
Min: 0.294158 -> 0.286617: 1.03x faster
Avg: 0.296354 -> 0.288877: 1.03x faster
Significant (t=36.22)
Stddev: 0.00149 -> 0.00143: 1.0435x smaller

### fastpickle ###
Min: 0.458961 -> 0.475828: 1.04x slower
Avg: 0.460226 -> 0.481228: 1.05x slower
Significant (t=-109.38)
Stddev: 0.00082 -> 0.00173: 2.1051x larger

### nqueens ###
Min: 0.305883 -> 0.295858: 1.03x faster
Avg: 0.308085 -> 0.297755: 1.03x faster
Significant (t=90.22)
Stddev: 0.00077 -> 0.00085: 1.0942x larger

### silent_logging ###
Min: 0.074152 -> 0.075818: 1.02x slower
Avg: 0.074345 -> 0.076005: 1.02x slower
Significant (t=-96.29)
Stddev: 0.00013 -> 0.00012: 1.0975x smaller

### spectral_norm ###
Min: 0.355738 -> 0.364419: 1.02x slower
Avg: 0.356691 -> 0.365764: 1.03x slower
Significant (t=-126.23)
Stddev: 0.00054 -> 0.00047: 1.1533x smaller

### telco ###
Min: 0.012152 -> 0.013038: 1.07x slower
Avg: 0.012264 -> 0.013157: 1.07x slower
Significant (t=-83.98)
Stddev: 0.00008 -> 0.00007: 1.0653x smaller

The following not significant results are hidden, use -v to show them:
2to3, call_method, chaos, django_v2, etree_parse, fannkuch, fastunpickle, float, formatted_logging, go, hexiom2, iterative_count, json_dump, json_dump_v2, json_load, mako, mako_v2, meteor_contest, nbody, normal_startup, pathlib, pickle_dict, pickle_list, pidigits, raytrace, regex_compile, regex_effbot, regex_v8, richards, simple_logging, startup_nosite, threaded_count, tornado_http, unpack_sequence, unpickle_list.

skrah · 2014-07-28T14:23:08Z

Just curious, what causes e.g. telco to differ up to 7% between runs? That's really huge.

telco.py always varies a lot between runs (up to 10%), even in the
big version "telco.py full":

http://bytereef.org/mpdecimal/quickstart.html#telco-benchmark

Using the average of 10 runs, I can't really see a slowdown.

skrah · 2014-07-28T14:26:39Z

So I wonder why the benchmark suite says that the telco slowdown is significant. :)

dw · 2014-07-29T18:24:10Z

I suspect it's all covered now, but is there anything else I can help with to get this patch pushed along its merry way?

python-dev · 2014-07-29T23:45:47Z

New changeset 79a5fbe2c78f by Antoine Pitrou in branch 'default':
Issue bpo-22003: When initialized from a bytes object, io.BytesIO() now
http://hg.python.org/cpython/rev/79a5fbe2c78f

pitrou · 2014-07-29T23:46:30Z

The latest patch is good indeed. Thank you very much!

kmike · 2015-02-09T08:13:42Z

Shouldn't this fix be mentioned in https://docs.python.org/3.5/whatsnew/3.5.html#optimizations ?

dw · 2015-02-09T15:57:14Z

Attached trivial patch for whatsnew.rst.

python-dev · 2015-02-14T22:45:31Z

New changeset 7ae156f07a90 by Berker Peksag in branch 'default':
Add a whatsnew entry for issue bpo-22003.
https://hg.python.org/cpython/rev/7ae156f07a90

piotrdobrogost · 2015-03-04T20:12:27Z

This new patch abandons the buffer interface and specializes for Bytes per the comments on this issue.

Why does it abandon buffer interface? Because of the following?

Thanks for digging here. As much as I'd love to follow this interpretation, it simply doesn't match existing buffer implementations, including within the standard library.

Shouldn't existing buffer implementations be fixed then and this feature made to use buffer interface instead of specialize for Bytes? If so is there at least any information on this in the comments so that one wouldn't wonder why there is specialization instead of relaying on buffer interface?

dw · 2015-03-04T21:04:29Z

Hi Piotr,

There wasn't an obvious fix that didn't involve changing the buffer interface itself. There is presently ambiguity in the interface regarding the difference between a "read only" buffer and an "immutable" buffer, which is crucial for its use in this case.

Fixing the interface, followed by every buffer interface user, is a significantly more complicated task than simply optimizing for the most common case, as done here. FWIW I still think this work is worth doing, though I personally don't have time to approach it just now.

We could have (and possibly should) approach fixing e.g. mmap.mmap() hashability, possibly causing user code regressions, but even if such cases were fixed it still wouldn't be a enough to rely on for the optimization implemented here.

dw mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Jul 17, 2014

pitrou closed this as completed Jul 29, 2014

ezio-melotti transferred this issue from another repository Apr 10, 2022

YouJiacheng mentioned this issue Aug 13, 2023

memoryview support for torch._C.import_ir_module_from_buffer pytorch/pytorch#107099

Open

BytesIO copy-on-write #66202

BytesIO copy-on-write #66202

Comments

dw mannequin commented Jul 17, 2014

dw mannequin commented Jul 17, 2014

dw mannequin commented Jul 17, 2014

pitrou commented Jul 17, 2014

dw mannequin commented Jul 18, 2014

scoder commented Jul 18, 2014

dw mannequin commented Jul 20, 2014

dw mannequin commented Jul 20, 2014

dw mannequin commented Jul 20, 2014

pitrou commented Jul 21, 2014

pitrou commented Jul 21, 2014

skrah mannequin commented Jul 21, 2014

skrah mannequin commented Jul 21, 2014

dw mannequin commented Jul 21, 2014

dw mannequin commented Jul 21, 2014

skrah mannequin commented Jul 21, 2014

dw mannequin commented Jul 21, 2014

serhiy-storchaka commented Jul 22, 2014

pitrou commented Jul 22, 2014

skrah mannequin commented Jul 22, 2014

skrah mannequin commented Jul 22, 2014

dw mannequin commented Jul 22, 2014

pitrou commented Jul 22, 2014

dw mannequin commented Jul 24, 2014

pitrou commented Jul 25, 2014

dw mannequin commented Jul 27, 2014

pitrou commented Jul 27, 2014

dw mannequin commented Jul 28, 2014

skrah mannequin commented Jul 28, 2014

skrah mannequin commented Jul 28, 2014

dw mannequin commented Jul 29, 2014

python-dev mannequin commented Jul 29, 2014

pitrou commented Jul 29, 2014

kmike mannequin commented Feb 9, 2015

dw mannequin commented Feb 9, 2015

python-dev mannequin commented Feb 14, 2015

piotrdobrogost mannequin commented Mar 4, 2015

dw mannequin commented Mar 4, 2015