Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BytesIO copy-on-write #66202

Closed
dw mannequin opened this issue Jul 17, 2014 · 37 comments
Closed

BytesIO copy-on-write #66202

dw mannequin opened this issue Jul 17, 2014 · 37 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@dw
Copy link
Mannequin

dw mannequin commented Jul 17, 2014

BPO 22003
Nosy @pitrou, @scoder, @benjaminp, @skrah, @hynek, @dw, @serhiy-storchaka
Files
  • cow.patch: BytesIO patch against hg 4c2f3240ad65
  • cow2.patch: cow.patch version 2 against hg 4c2f3240ad65
  • cow3.patch: cow.patch version 3 against hg 4c2f3240ad65
  • cow4.patch: cow.patch version 4 against hg 2fc379ce5762
  • cow5.patch: cow.patch version 5 against hg 96ea15ee8525
  • cow6.patch: cow.patch version 6 against hg 8c1438c15ed0
  • whatsnew.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-07-29.23:46:30.037>
    created_at = <Date 2014-07-17.22:25:37.100>
    labels = ['library', 'performance']
    title = 'BytesIO copy-on-write'
    updated_at = <Date 2015-03-04.21:04:29.471>
    user = 'https://github.com/dw'

    bugs.python.org fields:

    activity = <Date 2015-03-04.21:04:29.471>
    actor = 'dw'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-07-29.23:46:30.037>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2014-07-17.22:25:37.100>
    creator = 'dw'
    dependencies = []
    files = ['35988', '36004', '36005', '36016', '36078', '36137', '38058']
    hgrepos = []
    issue_num = 22003
    keywords = ['patch']
    message_count = 37.0
    messages = ['223383', '223385', '223386', '223401', '223402', '223522', '223526', '223542', '223581', '223582', '223583', '223586', '223588', '223599', '223600', '223611', '223633', '223681', '223682', '223683', '223692', '223707', '223907', '223962', '224121', '224139', '224164', '224168', '224169', '224238', '224273', '224274', '235596', '235619', '236002', '237207', '237212']
    nosy_count = 11.0
    nosy_names = ['pitrou', 'scoder', 'benjamin.peterson', 'stutzbach', 'skrah', 'python-dev', 'hynek', 'piotr.dobrogost', 'dw', 'serhiy.storchaka', 'kmike']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue22003'
    versions = ['Python 3.5']

    @dw
    Copy link
    Mannequin Author

    dw mannequin commented Jul 17, 2014

    This is a followup to the thread at https://mail.python.org/pipermail/python-dev/2014-July/135543.html , discussing the existing behaviour of BytesIO copying its source object, and how this regresses compared to cStringIO.StringI.

    The goal of posting the patch on list was to try and stimulate discussion around the approach. The patch itself obviously isn't ready for review, and I'm not in a position to dedicate time to it just now (although in a few weeks I'd love to give it full attention!).

    Ignoring this quick implementation, are there any general comments around the approach?

    My only concern is that it might keep large objects alive in a non-intuitive way in certain circumstances, though I can't think of any obvious ones immediately.

    Also interested in comments on the second half of that thread: "a natural extension of this is to do something very similar on the write side: instead of generating a temporary private heap allocation, generate (and freely resize) a private PyBytes object until it is exposed to the user, at which point, _getvalue() returns it, and converts its into an IO_SHARED buffer."

    There are quite a few interactions with making that work correctly, in particular:

    • How BytesIO would implement the buffers interface without causing the under-construction Bytes to become readonly

    • Avoiding redundant copies and resizes -- we can't simply tack 25% slack on the end of the Bytes and then truncate it during getvalue() without likely triggering a copy and move, however with careful measurement of allocator behavior there are various tradeoffs that could be made - e.g. obmalloc won't move a <500 byte allocation if it shrinks by <25%. glibc malloc's rules are a bit more complex though.

    Could also add a private _PyBytes_SetSize() API to allow truncation to the final size during getvalue() without informing the allocator. Then we'd simply overallocate by up to 10% or 1-2kb, and write off the loss of the slack space.

    Notably, this approach completely differs from the one documented in http://bugs.python.org/issue15381 .. it's not clear to me which is better.

    @dw dw mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Jul 17, 2014
    @dw
    Copy link
    Mannequin Author

    dw mannequin commented Jul 17, 2014

    Submitted contributor agreement. Please consider the demo patch licensed under the Apache 2 licence.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 17, 2014

    Be careful what happens when the original object is mutable:

    >>> b = bytearray(b"abc")
    >>> bio = io.BytesIO(b)
    >>> b[:] = b"defghi"
    >>> bio.getvalue()
    b'abc'

    I don't know what your patch does in this case.

    @dw
    Copy link
    Mannequin Author

    dw mannequin commented Jul 18, 2014

    Good catch :( There doesn't seem to be way a to ask for an immutable buffer, so perhaps it could just be a little more selective. I think the majority of use cases would still be covered if the sharing behaviour was restricted only to BytesType.

    In that case "Py_buffer initialdata" could become a PyObject*, saving a small amount of memory, and allowing reuse of the struct member if BytesIO was also modified to directly write into a private BytesObject

    @scoder
    Copy link
    Contributor

    scoder commented Jul 18, 2014

    Even if there is no way to explicitly request a RO buffer, the Py_buffer struct that you get back actually tells you if it's read-only or not. Shouldn't that be enough to enable this optimisation?

    Whether or not implementors of the buffer protocol set this flag correctly is another question, but if not then they need fixing on their side anyway. (And in the vast majority of cases, the implementor will be either CPython or NumPy.)

    Also, generally speaking, I think such an optimisation would be nice, even if it only catches some common cases (and doesn't break the others :). It could still copy data if necessary, but try to avoid it if possible.

    @dw
    Copy link
    Mannequin Author

    dw mannequin commented Jul 20, 2014

    This version is tidied up enough that I think it could be reviewed.

    Changes are:

    • Defer `buf' allocation until __init__, rather than __new__ as was previously done. Now upon completion, BytesIO.__new__ returns a valid, closed BytesIO, whereas previously a valid, empty, open BytesIO was returned. Is this interface change permissible?

    • Move __init__ guts into a "reinit()", for sharing with __setstate__, which also previously caused an unnecessary copy. Additionally gather up various methods for deallocating buffers into a single "reset()" function, called by reinit(), _dealloc(), and _close()

    • Per Stefan's suggested approach, reinit() now explicitly checks for a read-only buffer, falling back to silently performing a copy if the returned buffer is read-write. That seems vastly preferable to throwing an exception, which would probably be another interface change.

    • Call unshare() any place the buffer is about to be modified. If the buffer needs to be made private, it also accepts a size hint indicating how much less/more space the subsequent operation needs, to avoid a redundant reallocation after the unsharing.

    Outstanding issues:

    • I don't understand why buf_size is a size_t, and I'm certain the casting in unshare() is incorrect somehow. Is it simply to avoid signed overflow?

    @dw
    Copy link
    Mannequin Author

    dw mannequin commented Jul 20, 2014

    New patch also calls unshare() during getbuffer()

    @dw
    Copy link
    Mannequin Author

    dw mannequin commented Jul 20, 2014

    I'm not sure the "read only buffer" test is strong enough: having a readonly view is not a guarantee that the data in the view cannot be changed through some other means, i.e. it is read-only, not immutable.

    Pretty sure this approach is broken. What about the alternative approach of specializing for Bytes?

    @pitrou
    Copy link
    Member

    pitrou commented Jul 21, 2014

    Pretty sure this approach is broken. What about the alternative approach of specializing for Bytes?

    That would certainly sound good enough, to optimize the common case.

    Also, it would be nice if you could add some tests to the patch (e.g. to stress the bytearray case). Thank you!

    @pitrou
    Copy link
    Member

    pitrou commented Jul 21, 2014

    As for whether the "checking for a readonly view" approach is broken, I don't know: that part of the buffer API is still mysterious to me. Stefan, would you have some insight?

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 21, 2014

    I think checking for a readonly view is fine. The protocol is this:

    1. Use the PyBUF_WRITABLE flag in the request. Then the provider must
      either have a writable buffer or else deny the request entirely.

    2. Omit the PyBUF_WRITABLE flag in the request. Then the provider can
      return a writable or a readonly buffer, but must set the readonly flag
      correctly AND export the same type of buffer to ALL consumers.

    It is not possible to ask for a readonly buffer explicitly, but the
    readonly flag in the Py_Buffer struct should always be set correctly.

    It is hard to guess the original intention of the PEP-3118 authors, but
    in practice "readonly" means "immutable" here. IMO a buffer provider would
    be seriously broken if a readonly buffer is mutated in any way.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 21, 2014

    The original wording in the PEP is this:

    readonly
    --------
    an integer variable to hold whether or not the memory is readonly. 1
    means the memory is readonly, zero means the memory is writable.

    To me this means that a hypothetical compiler that could figure
    out at compile time that the readonly flag is set would be allowed
    to put the buffer contents into the read-only data section.

    @dw
    Copy link
    Mannequin Author

    dw mannequin commented Jul 21, 2014

    Stefan,

    Thanks for digging here. As much as I'd love to follow this interpretation, it simply doesn't match existing buffer implementations, including within the standard library.

    For example, mmap.mmap(..., flags=mmap.MAP_SHARED, prot=mmap.PROT_READ) will produce a read-only buffer, yet mutability is entirely at the whim of the operating system. In this case, "immutability" may be apparent for years, until some machine has memory pressure, causing the shared mapping to be be flushed, and refreshed from (say, incoherent NFS storage) on next access.

    I thought it would be worth auditing some of the most popular types of buffer just to check your interpretation, and this was the first, most obvious candidate.

    Any thoughts? I'm leaning heavily toward the Bytes specialization approach

    @dw
    Copy link
    Mannequin Author

    dw mannequin commented Jul 21, 2014

    I'm not sure how much work it would be, or even if it could be made sufficient to solve our problem, but what about extending the buffers interface to include a "int stable" flag, defaulting to 0?

    It seems though, that it would just be making the already arcane buffers interface even more arcane simply for the benefit of our specific use case

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 21, 2014

    I'm sure many exporters aren't setting the right flags; on the other hand
    we already hash memoryviews based on readonly buffers, assuming they are
    immutable.

    @dw
    Copy link
    Mannequin Author

    dw mannequin commented Jul 21, 2014

    Hi Stefan,

    How does this approach in reinit() look? We first ask for a writable buffer, and if the object obliges, immediately copy it. Otherwise if it refused, ask for a read-only buffer, and this time expect that it will never change.

    This still does not catch the case of mmap.mmap. I am not sure how do deal with mmap.mmap. There is no way for it to export PROT_READ as a read-only buffer without permitted mutation, so the only options seem to either be a) remove buffer support from mmap, or b) blacklist it in bytesio(!).

    Antoine, I have padded out the unit tests a little. test_memoryio.py seems the best place for them. Also modified test_sizeof(), although to the way this test is designed seems inherently brittle to begin with. Now it is also sensitive to changes in Py_buffer struct.

    Various other changes:

    • __new__ once again returns a valid, open, empty BytesIO, since the alternative breaks pickling.

    • reinit() preserves existing BytesIO state until it knows it can succeed, which fixes another of the pickle tests.

    • setstate() had CHECK_CLOSED() re-added, again for the pickle tests.

    Probably the patch guts could be rearranged again, since the definition of the functions is no longer as clear as it was in cow3.patch.

    @serhiy-storchaka
    Copy link
    Member

    See also bpo-15381.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 22, 2014

    There's also the following code in numpy's getbuffer method:

        /*
         * If a read-only buffer is requested on a read-write array, we return a
         * read-write buffer, which is dubious behavior. But that's why this call
         * is guarded by PyArray_ISWRITEABLE rather than (flags &
         * PyBUF_WRITEABLE).
         */
        if (PyArray_ISWRITEABLE(self)) {
            if (array_might_be_written(self) < 0) {
                goto fail;
            }
        }

    ... which seems to imply that mmap is not the only one with "dubious behaviour" (?).

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 22, 2014

    Actually we have an extra safety net in memory_hash() apart from
    the readonly check: We also check if the underlying object is
    hashable.

    This might be applicable here, too. Unfortunately mmap objects
    *are* hashable, leading to some funny results:

    >>> import mmap
    >>> with open("hello.txt", "wb") as f:
    ...     f.write(b"xxxxx\n")
    ...
    6
    >>> f = open("hello.txt", "r+b")
    >>> mm = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    >>> x = memoryview(mm)
    >>> hash(mm)
    -9223363309538046107
    >>> hash(x)
    -3925142568057840789
    >>> x.tolist()
    [120, 120, 120, 120, 120, 10]
    >>>
    >>> with open("hello.txt", "wb") as g:
    ...     g.write(b"yyy\n")
    ...
    4
    >>> hash(mm)
    -9223363309538046107
    >>> hash(x)
    -3925142568057840789
    >>> x.tolist()
    [121, 121, 121, 10, 0, 0]

    memoryview (rightfully) assumes that hashable objects are immutable
    and caches the first hash.

    I'm not sure why mmap objects are hashable, it looks like a bug
    to me.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 22, 2014

    I think the mmap behavior is probably worse than the NumPy example.

    I assume that in the example the exporter sets view.readonly=0.
    mmap objects set view.readonly=1 and can still be mutated.

    @dw
    Copy link
    Mannequin Author

    dw mannequin commented Jul 22, 2014

    Stefan, I like your new idea. If there isn't some backwards compatibility argument about mmap.mmap being hashable, then it could be considered a bug, and fixed in the same hypothetical future release that includes this BytesIO change. The only cost now is that to test for hashability, we must hash the object, which causes every byte in it to be touched (aka. almost 50% the cost of a copy)

    If however we can't fix mmap.mmap due to the interface change (I think that's a silly idea -- Python has never been about letting the user shoot themselves in the foot), then the specialized-for-Bytes approach is almost as good (and perhaps even better, since the resulting concept and structure layout is more aligned with Serhiy's patch in bpo-15381).

    tl;dr:

    a) mmap.mmap can be fixed - use hashability as strong test for immutability (instead of ugly heuristic involving buffer blags)

    • undecided: is calling hash(obj) to check for immutability too costly?

    b) mmap.mmap can't be fixed - use the Bytes specialization approach.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 22, 2014

    I don't like the idea of trying to hash the object. It may be a time-consuming operation, while the result will be thrown away.

    I think restricting the optimization to bytes objects is fine. We can whitelist other types, such as memoryview.

    @dw
    Copy link
    Mannequin Author

    dw mannequin commented Jul 24, 2014

    This new patch abandons the buffer interface and specializes for Bytes per the comments on this issue.

    Anyone care to glance at least at the general structure?

    Tests could probably use a little more work.

    Microbenchmark seems fine, at least for construction. It doesn't seem likely this patch would introduce severe performance troubles elsewhere, but I'd like to trying it out with some example heavy BytesIO consumers (any suggestions? Some popular template engine?)

    cpython] ./python.exe -m timeit -s 'import i' 'i.readlines()'
    lines: 54471
    100 loops, best of 3: 13.3 msec per loop

    [23:52:55 eldil!58 cpython] ./python-nocow -m timeit -s 'import i' 'i.readlines()'
    lines: 54471
    10 loops, best of 3: 19.6 msec per loop

    [23:52:59 eldil!59 cpython] cat i.py
    import io
    word = b'word'
    line = (word * int(79/len(word))) + b'\n'
    ar = line * int((4 * 1048576) / len(line))
    def readlines():
    return len(list(io.BytesIO(ar)))
    print('lines: %s' % (readlines(),))

    @pitrou
    Copy link
    Member

    pitrou commented Jul 25, 2014

    It doesn't seem likely this patch would introduce severe performance troubles elsewhere, but I'd like to trying it out with some example heavy BytesIO consumers (any suggestions? Some popular template engine?)

    I don't have any specific suggestions, but you could try the benchmark suite here:
    http://hg.python.org/benchmarks

    @dw
    Copy link
    Mannequin Author

    dw mannequin commented Jul 27, 2014

    Hey Antoine,

    Thanks for the link. I'm having trouble getting reproducible results at present, and running out of ideas as to what might be causing it. Even after totally isolating a CPU for e.g. django_v2 and with frequency scaling disabled, numbers still jump around for the same binary by as much as 3%.

    I could not detect any significant change between runs of the old and new binary that could not be described as noise, given the isolation issues above.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 27, 2014

    Even after totally isolating a CPU for e.g. django_v2 and with frequency scaling disabled, numbers still jump around for the same binary by as much as 3%.

    That's expected. If the difference doesn't go above 5-10%, then you IMO can pretty much consider your patch didn't have any impact on those benchmarks.

    @dw
    Copy link
    Mannequin Author

    dw mannequin commented Jul 28, 2014

    Newest patch incorporates Antoine's review comments. The final benchmark results are below. Just curious, what causes e.g. telco to differ up to 7% between runs? That's really huge

    Report on Linux k2 3.14-1-amd64 #1 SMP Debian 3.14.9-1 (2014-06-30) x86_64
    Total CPU cores: 4

    ### call_method_slots ###
    Min: 0.329869 -> 0.340487: 1.03x slower
    Avg: 0.330512 -> 0.341786: 1.03x slower
    Significant (t=-216.69)
    Stddev: 0.00067 -> 0.00060: 1.1111x smaller

    ### call_method_unknown ###
    Min: 0.351167 -> 0.343961: 1.02x faster
    Avg: 0.351731 -> 0.344580: 1.02x faster
    Significant (t=238.89)
    Stddev: 0.00033 -> 0.00040: 1.2271x larger

    ### call_simple ###
    Min: 0.257487 -> 0.277366: 1.08x slower
    Avg: 0.257942 -> 0.277809: 1.08x slower
    Significant (t=-845.64)
    Stddev: 0.00029 -> 0.00029: 1.0126x smaller

    ### etree_generate ###
    Min: 0.377985 -> 0.365952: 1.03x faster
    Avg: 0.381797 -> 0.369452: 1.03x faster
    Significant (t=31.15)
    Stddev: 0.00314 -> 0.00241: 1.3017x smaller

    ### etree_iterparse ###
    Min: 0.545668 -> 0.565437: 1.04x slower
    Avg: 0.554188 -> 0.576807: 1.04x slower
    Significant (t=-17.00)
    Stddev: 0.00925 -> 0.00956: 1.0340x larger

    ### etree_process ###
    Min: 0.294158 -> 0.286617: 1.03x faster
    Avg: 0.296354 -> 0.288877: 1.03x faster
    Significant (t=36.22)
    Stddev: 0.00149 -> 0.00143: 1.0435x smaller

    ### fastpickle ###
    Min: 0.458961 -> 0.475828: 1.04x slower
    Avg: 0.460226 -> 0.481228: 1.05x slower
    Significant (t=-109.38)
    Stddev: 0.00082 -> 0.00173: 2.1051x larger

    ### nqueens ###
    Min: 0.305883 -> 0.295858: 1.03x faster
    Avg: 0.308085 -> 0.297755: 1.03x faster
    Significant (t=90.22)
    Stddev: 0.00077 -> 0.00085: 1.0942x larger

    ### silent_logging ###
    Min: 0.074152 -> 0.075818: 1.02x slower
    Avg: 0.074345 -> 0.076005: 1.02x slower
    Significant (t=-96.29)
    Stddev: 0.00013 -> 0.00012: 1.0975x smaller

    ### spectral_norm ###
    Min: 0.355738 -> 0.364419: 1.02x slower
    Avg: 0.356691 -> 0.365764: 1.03x slower
    Significant (t=-126.23)
    Stddev: 0.00054 -> 0.00047: 1.1533x smaller

    ### telco ###
    Min: 0.012152 -> 0.013038: 1.07x slower
    Avg: 0.012264 -> 0.013157: 1.07x slower
    Significant (t=-83.98)
    Stddev: 0.00008 -> 0.00007: 1.0653x smaller

    The following not significant results are hidden, use -v to show them:
    2to3, call_method, chaos, django_v2, etree_parse, fannkuch, fastunpickle, float, formatted_logging, go, hexiom2, iterative_count, json_dump, json_dump_v2, json_load, mako, mako_v2, meteor_contest, nbody, normal_startup, pathlib, pickle_dict, pickle_list, pidigits, raytrace, regex_compile, regex_effbot, regex_v8, richards, simple_logging, startup_nosite, threaded_count, tornado_http, unpack_sequence, unpickle_list.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 28, 2014

    Just curious, what causes e.g. telco to differ up to 7% between runs? That's really huge.

    telco.py always varies a lot between runs (up to 10%), even in the
    big version "telco.py full":

    http://bytereef.org/mpdecimal/quickstart.html#telco-benchmark

    Using the average of 10 runs, I can't really see a slowdown.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 28, 2014

    So I wonder why the benchmark suite says that the telco slowdown is significant. :)

    @dw
    Copy link
    Mannequin Author

    dw mannequin commented Jul 29, 2014

    I suspect it's all covered now, but is there anything else I can help with to get this patch pushed along its merry way?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 29, 2014

    New changeset 79a5fbe2c78f by Antoine Pitrou in branch 'default':
    Issue bpo-22003: When initialized from a bytes object, io.BytesIO() now
    http://hg.python.org/cpython/rev/79a5fbe2c78f

    @pitrou
    Copy link
    Member

    pitrou commented Jul 29, 2014

    The latest patch is good indeed. Thank you very much!

    @pitrou pitrou closed this as completed Jul 29, 2014
    @kmike
    Copy link
    Mannequin

    kmike mannequin commented Feb 9, 2015

    Shouldn't this fix be mentioned in https://docs.python.org/3.5/whatsnew/3.5.html#optimizations ?

    @dw
    Copy link
    Mannequin Author

    dw mannequin commented Feb 9, 2015

    Attached trivial patch for whatsnew.rst.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 14, 2015

    New changeset 7ae156f07a90 by Berker Peksag in branch 'default':
    Add a whatsnew entry for issue bpo-22003.
    https://hg.python.org/cpython/rev/7ae156f07a90

    @piotrdobrogost
    Copy link
    Mannequin

    piotrdobrogost mannequin commented Mar 4, 2015

    This new patch abandons the buffer interface and specializes for Bytes per the comments on this issue.

    Why does it abandon buffer interface? Because of the following?

    Thanks for digging here. As much as I'd love to follow this interpretation, it simply doesn't match existing buffer implementations, including within the standard library.

    Shouldn't existing buffer implementations be fixed then and this feature made to use buffer interface instead of specialize for Bytes? If so is there at least any information on this in the comments so that one wouldn't wonder why there is specialization instead of relaying on buffer interface?

    @dw
    Copy link
    Mannequin Author

    dw mannequin commented Mar 4, 2015

    Hi Piotr,

    There wasn't an obvious fix that didn't involve changing the buffer interface itself. There is presently ambiguity in the interface regarding the difference between a "read only" buffer and an "immutable" buffer, which is crucial for its use in this case.

    Fixing the interface, followed by every buffer interface user, is a significantly more complicated task than simply optimizing for the most common case, as done here. FWIW I still think this work is worth doing, though I personally don't have time to approach it just now.

    We could have (and possibly should) approach fixing e.g. mmap.mmap() hashability, possibly causing user code regressions, but even if such cases were fixed it still wouldn't be a enough to rely on for the optimization implemented here.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants