Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytearray front-slicing not optimized #63287

Closed
pitrou opened this issue Sep 25, 2013 · 44 comments
Closed

bytearray front-slicing not optimized #63287

pitrou opened this issue Sep 25, 2013 · 44 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@pitrou
Copy link
Member

pitrou commented Sep 25, 2013

BPO 19087
Nosy @pitrou, @vstinner, @vadmium, @serhiy-storchaka
Files
  • bytea_slice.patch
  • bench_bytearray.py
  • bytea_slice2.patch
  • bench_bytearray2.py
  • bytea_slice3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-10-05.20:54:09.063>
    created_at = <Date 2013-09-25.14:03:50.061>
    labels = ['interpreter-core', 'performance']
    title = 'bytearray front-slicing not optimized'
    updated_at = <Date 2015-04-18.06:14:10.875>
    user = 'https://github.com/pitrou'

    bugs.python.org fields:

    activity = <Date 2015-04-18.06:14:10.875>
    actor = 'martin.panter'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-10-05.20:54:09.063>
    closer = 'pitrou'
    components = ['Interpreter Core']
    creation = <Date 2013-09-25.14:03:50.061>
    creator = 'pitrou'
    dependencies = []
    files = ['31873', '31916', '31926', '31929', '31954']
    hgrepos = []
    issue_num = 19087
    keywords = ['patch']
    message_count = 44.0
    messages = ['198385', '198390', '198391', '198415', '198423', '198424', '198429', '198430', '198440', '198442', '198652', '198654', '198655', '198656', '198657', '198658', '198659', '198660', '198661', '198662', '198663', '198664', '198665', '198669', '198670', '198671', '198718', '198722', '198724', '198728', '198730', '198731', '198736', '198743', '198744', '198911', '198915', '198935', '198937', '198998', '199000', '199001', '199002', '241401']
    nosy_count = 5.0
    nosy_names = ['pitrou', 'vstinner', 'python-dev', 'martin.panter', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue19087'
    versions = ['Python 3.4']

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 25, 2013

    If you delete a slice at the end of a bytearray, it is naturally optimized (thanks to the resizing strategy). However, if you delete a slice at the front of a bytearray, it is not: a memmove() gets done every time.

    $ ./python -m timeit "b=bytearray(10000)" "while b: b[-1:] = b''"
    100 loops, best of 3: 5.67 msec per loop
    $ ./python -m timeit "b=bytearray(10000)" "while b: b[:1] = b''"
    100 loops, best of 3: 6.67 msec per loop
    
    $ ./python -m timeit "b=bytearray(50000)" "while b: b[-1:] = b''"
    10 loops, best of 3: 28.3 msec per loop
    $ ./python -m timeit "b=bytearray(50000)" "while b: b[:1] = b''"
    10 loops, best of 3: 61.1 msec per loop
    
    $ ./python -m timeit "b=bytearray(100000)" "while b: b[-1:] = b''"
    10 loops, best of 3: 59.4 msec per loop
    $ ./python -m timeit "b=bytearray(100000)" "while b: b[:1] = b''"
    10 loops, best of 3: 198 msec per loop

    This makes implementing a fifo using bytearray a bit suboptimal. It shouldn't be very hard to improve.

    @pitrou pitrou added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Sep 25, 2013
    @serhiy-storchaka
    Copy link
    Member

    And the same is for a list. List and bytearray are wrong types for front deleting. I don't think we should increase the size of bytearray, and complicate and slowdown it for such special purpose.

    If you want to implement a fifo using bytearray more optimal, defer the deleting until used size less than a half of allocated size. See for example XMLPullParser.read_events() in Lib/xml/etree/ElementTree.py.

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 25, 2013

    And the same is for a list. List and bytearray are wrong types for
    front deleting.

    There is no bytedeque().

    I don't think we should increase the size of
    bytearray, and complicate and slowdown it for such special purpose.

    I don't think it would really slow it down. It should be a simple
    optimization. And FIFO buffers are quite common when writing parsers
    for network applications.

    If you want to implement a fifo using bytearray more optimal, defer
    the deleting until used size less than a half of allocated size. See
    for example XMLPullParser.read_events() in
    Lib/xml/etree/ElementTree.py.

    Of course, I wrote that code. Still, doing it manually is suboptimal
    and cumbersome when it could be done transparently.

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 25, 2013

    Here is a patch. Benchmarks (under Linux where realloc is fast; the gap may be wider under Windows):

    $ ./python -m timeit "b=bytearray(100000)" "while b: b[:1] = b''"
    -> before: 225 msec per loop
    -> after: 60.4 msec per loop
    
    $ ./python -m timeit "b=bytearray(100000)" "while b: b[:200] = b''"
    -> before: 1.17 msec per loop
    -> after: 350 usec per loop

    @serhiy-storchaka
    Copy link
    Member

    Could you please provide an example which uses this feature?

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 25, 2013

    Could you please provide an example which uses this feature?

    A generic example is to parse messages out of a TCP stream. Basically
    any protocol transported on TCP needs such a facility, or has to find
    workarounds (which are either suboptimal or complicated).

    Mercurial has another implementation strategy for a similar thing:
    http://selenic.com/repo/hg/file/50d721553198/mercurial/util.py#l935

    @vstinner
    Copy link
    Member

    Mercurial has another implementation strategy for a similar thing:
    http://selenic.com/repo/hg/file/50d721553198/mercurial/util.py#l935

    I found an interesting comment in the following issue:

    "I think the trouble we get into is chunkbuffer() creates new large strings by concatenation and causes memory fragmentation. Keeping a list of chunks might be more efficient."

    http://bz.selenic.com/show_bug.cgi?id=1842#c17

    @antoine: Do you know if your patch may reduce the memory fragmentation on "bytearray front-slicing"?

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 26, 2013

    @antoine: Do you know if your patch may reduce the memory
    fragmentation on "bytearray front-slicing"?

    It reduces the number of allocations so, yes, it can reduce memory
    fragmentation.
    We cannot really use a list of chunks for bytearray since it is
    supposed to be usable as a contiguous buffer (using the buffer API).

    @vstinner
    Copy link
    Member

    Could you please add unit tests for check that ob_start is used instead of memmove()?

    I didn't find a function for that in _testcapi. I tried to test it using sys.getsizeof(), but the size is not reliable (the bytearray buffer is not always shrinked, it depends on the new size).

    The best is probably to add a new function in _testcapi to get private attributes: ob_exports, ob_alloc, ob_start, ob_bytes. Using these attributes, it becomes easy to check that fast-path are correctly optimized (eg. increases ob_start instead of getting a new ob_bytes buffer).

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 26, 2013

    Could you please add unit tests for check that ob_start is used
    instead of memmove()?

    How would I do that? Most of the time we don't unit-test performance
    improvements (i.e. there are no tests that list.append() is O(1), for
    example).

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 29, 2013

    Results under Windows:

    • before:

    PCbuild\amd64\python.exe -m timeit "b=bytearray(100000)" "while b: b[-1:] = b''"
    10 loops, best of 3: 74.8 msec per loop

    PCbuild\amd64\python.exe -m timeit "b=bytearray(100000)" "while b: b[:1] = b''"
    10 loops, best of 3: 330 msec per loop

    • after:

    PCbuild\amd64\python.exe -m timeit "b=bytearray(100000)" "while b: b[-1:] = b''"
    10 loops, best of 3: 73.9 msec per loop

    PCbuild\amd64\python.exe -m timeit "b=bytearray(100000)" "while b: b[:1] = b''"
    10 loops, best of 3: 73.8 msec per loop

    @serhiy-storchaka
    Copy link
    Member

    A generic example is to parse messages out of a TCP stream. Basically
    any protocol transported on TCP needs such a facility, or has to find
    workarounds (which are either suboptimal or complicated).

    Could you please show concrete code in which you are going to use this optimization?

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 29, 2013

    Could you please show concrete code in which you are going to use this
    optimization?

    There's no need to "use" this optimization. Any networking code that has to find message boundaries and split on them will benefit. If you don't understand that, I'm not willing to explain it for you.

    @serhiy-storchaka
    Copy link
    Member

    Deleting a slice at the front of a bytearray have linear complexity from the size of a bytearray (in any case del b[:1] is a little faster than b[:1] = b''). I doubt than any performance critical code do it instead of increasing an index in constant time.

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 29, 2013

    Deleting a slice at the front of a bytearray have linear complexity
    from the size of a bytearray (in any case del b[:1] is a little faster
    than b[:1] = b''). I doubt than any performance critical code do it
    instead of increasing an index in constant time.

    Increasing an index requires that you compact the bytearray from time to
    time, lest it fills the whole memory.

    @serhiy-storchaka
    Copy link
    Member

    The same is true with your patch.

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 29, 2013

    The same is true with your patch.

    I don't understand. What is "true with my patch"?

    @serhiy-storchaka
    Copy link
    Member

    You increase internal index in a bytearray. A bytearray with small visible length can consume much hidden memory.

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 29, 2013

    You increase internal index in a bytearray. A bytearray with small
    visible length can consume much hidden memory.

    No, because PyByteArray_Resize() is always called afterwards to ensure
    that the buffer is resized when it gets below 50% usage.

    @serhiy-storchaka
    Copy link
    Member

    No, because PyByteArray_Resize() is always called afterwards to ensure
    that the buffer is resized when it gets below 50% usage.

    I.e. the bytearray is compacted from time to time.

    @vstinner
    Copy link
    Member

    @serhiy: "I doubt than any performance critical code do it instead of increasing an index in constant time."

    Sorry, I don't get your point. It's not become Python is inefficient that developers must develop workarounds. Antoine's patch is simple, elegant, and offer better performances for "free".

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 29, 2013

    > No, because PyByteArray_Resize() is always called afterwards to ensure
    > that the buffer is resized when it gets below 50% usage.

    I.e. the bytearray is compacted from time to time.

    Is there a problem with that?

    @serhiy-storchaka
    Copy link
    Member

    Is there a problem with that?

    No more than with msg198657.

    Sorry, I don't get your point. It's not become Python is inefficient that developers must develop workarounds.

    I'm not sure that "workarounds" are much worst than using this optimization. At least we still not seen real code which will benefit from this optimization.

    Antoine's patch is simple, elegant, and offer better performances for "free".

    It offer better performances for "free" only for suboptimal code which currently have O(N) instead of O(1).

    One of most used cases for bytearrays is accumulating. And the patch slow down this case.

    $ ./python -m timeit  "b = bytearray(); a = b'x'"  "for i in range(10000): b += a"  "bytes(b)"

    Without patch: 4.3 msec per loop
    With patch: 4.62 msec per loops

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 29, 2013

    One of most used cases for bytearrays is accumulating.
    And the patch slow down this case.

    I see no difference here. You are seeing a 10% slowdown, which is possibly a measurement glitch. The bottom line is that the performance remains approximately the same.

    It offer better performances for "free" only for suboptimal code
    which currently have O(N) instead of O(1).

    The problem is the "suboptimal code" is also the natural way to write such code. If you know a simple and idiomatic way to write an optimal bytes FIFO, then please share it with us. Otherwise, I will happily ignore your line of argument here.

    @vstinner
    Copy link
    Member

    One of most used cases for bytearrays is accumulating. And the patch slow down this case.

    Please don't use the raw timeit command for micro-benchmarks, it is not reliable. For example, I'm unable to reproduce your "slow down" (7% on a microbenchmark is not that large).

    My micro-benchmark using my benchmark.py script:

    Common platform:
    Python version: 2.7.3 (default, Aug 9 2012, 17:23:57) [GCC 4.7.1 20120720 (Red Hat 4.7.1-5)]
    Python unicode implementation: UCS-4
    CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
    Timer: time.time
    Platform: Linux-3.9.4-200.fc18.x86_64-x86_64-with-fedora-18-Spherical_Cow
    Timer precision: 954 ns
    CFLAGS: -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv
    Bits: int=32, long=64, long long=64, size_t=64, void*=64

    Platform of campaign original:
    SCM: hg revision=687dd81cee3b tag=tip branch=default date="2013-09-29 22:18 +0200"
    Date: 2013-09-30 00:59:42

    Platform of campaign patched:
    SCM: hg revision=687dd81cee3b+ tag=tip branch=default date="2013-09-29 22:18 +0200"
    Date: 2013-09-30 00:59:07

    ------------+-------------+------------
    Tests | original | patched
    ------------+-------------+------------
    10**1 bytes | 859 ns () | 864 ns
    10**3 bytes | 55.8 us (
    ) | 56.4 us
    10**5 bytes | 5.42 ms | 5.41 ms ()
    10**7 bytes | 578 ms | 563 ms (
    )
    ------------+-------------+------------
    Total | 583 ms | 569 ms (*)
    ------------+-------------+------------

    So performances are the same with the patch.

    @vstinner
    Copy link
    Member

    Oh, by the way:

    $ ./python -m timeit "b = bytearray(); a = b'x'" "for i in range(10000): b += a" "bytes(b)"

    I'm not sure that it is what you expected: bytearray() is only initialized once ("setup" of timeit). You probably want to reinitialize at each loop.

    @serhiy-storchaka
    Copy link
    Member

    I'm not sure that it is what you expected: bytearray() is only initialized once ("setup" of timeit). You probably want to reinitialize at each loop.

    There is no "setup" of timeit here. And you forgot bytes(b) after accumulating loop. bench_bytearray.py shows me 10% slowdown for 10**3 and 10**5 bytes tests.

    Of course it can be a measurement glitch. On other hand, there are no measurements which show positive effect of the patch for real code. Currently we consider only hypothetic code and can't compare it with alternatives.

    The problem is the "suboptimal code" is also the natural way to write such code. If you know a simple and idiomatic way to write an optimal bytes FIFO, then please share it with us.

    Please share this written in the "natural way" real code with us. I can't compare with alternatives a code which I don't see.

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 30, 2013

    > The problem is the "suboptimal code" is also the natural way to
    write such code. If you know a simple and idiomatic way to write an
    optimal bytes FIFO, then please share it with us.

    Please share this written in the "natural way" real code with us. I
    can't compare with alternatives a code which I don't see.

    I'm sorry, I don't want to spend more time on such a minor issue. The
    patch is simple and yields good benefits, and Victor seems to have
    approved it, so I'm inclined to ignore your skepticism and commit.

    I would have liked more constructive criticism (perhaps the patch is
    inefficient or suboptimal, etc.), but it seems I'll have to do without
    it.

    @serhiy-storchaka
    Copy link
    Member

    I don't understand why you avoid to show any examples which benefit. Shouldn't optimizing patches prove their efficient?

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 30, 2013

    I don't understand why you avoid to show any examples which benefit.
    Shouldn't optimizing patches prove their efficient?

    Many micro-optimizations get committed without proving themselves on a
    high-level benchmark suite, as long as they produce a big enough
    difference on micro-benchmarks. I think you have noticed that!

    A 4x improvement on a micro-benchmark is very likely to make a
    difference in at least some real-world code (while a 10% improvement
    wouldn't).

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 30, 2013

    However, the patch had a bug in the resizing logic. Here is a new patch fixing that (+ an additional test).

    @pitrou
    Copy link
    Member Author

    pitrou commented Sep 30, 2013

    Other benchmarks for the new patch (exercising FIFO-like behaviour: some data is appended at one end, and popped at the other):

    timeit -s "b=bytearray(100000);s=b'x'*100" "b[:100] = b''; b.extend(s)"
    -> before: 4.07 usec per loop
    -> after: 0.812 usec per loop

    For comparison, popping from the end (LIFO-like):

    timeit -s "b=bytearray(100000);s=b'x'*100" "b[-100:] = b''; b.extend(s)"
    -> before: 0.894 usec per loop
    -> after: 0.819 usec per loop

    @serhiy-storchaka
    Copy link
    Member

    A 4x improvement on a micro-benchmark is very likely to make a
    difference in at least some real-world code (while a 10% improvement
    wouldn't).

    If there is a code that uses the deleting from the beginning of a bytearray. I just pray to demonstrate this code. Perhaps you only intend to write a code that will use it. Wonderful. I want to look at it and make sure that the same problem can not be solved just as effective in another way.

    @vstinner
    Copy link
    Member

    I took me some time, but Antoine explained me the use case on IRC :-) The patch is useful is the bytearray is used as a FIFO: remove front, append tail. It can be seen as an implementation for BufferedReader. Consumer/producer is a common pattern, especially consuming one end (front) and produce at the other end (tail).

    @vstinner
    Copy link
    Member

    I adapted my micro-benchmark to measure the speedup: bench_bytearray2.py. Result on bytea_slice2.patch:

    Common platform:
    CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
    CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
    Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
    Platform: Linux-3.9.4-200.fc18.x86_64-x86_64-with-fedora-18-Spherical_Cow
    Python unicode implementation: PEP-393
    Timer: time.perf_counter
    Bits: int=32, long=64, long long=64, size_t=64, void*=64
    Timer precision: 40 ns

    Platform of campaign original:
    Date: 2013-09-30 23:39:31
    Python version: 3.4.0a2+ (default:687dd81cee3b, Sep 30 2013, 23:39:27) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)]
    SCM: hg revision=687dd81cee3b tag=tip branch=default date="2013-09-29 22:18 +0200"

    Platform of campaign patched:
    Date: 2013-09-30 23:38:55
    Python version: 3.4.0a2+ (default:687dd81cee3b+, Sep 30 2013, 23:30:35) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)]
    SCM: hg revision=687dd81cee3b+ tag=tip branch=default date="2013-09-29 22:18 +0200"

    ------------------------+-------------+------------
    non regression | original | patched
    ------------------------+-------------+------------
    concatenate 10**1 bytes | 1.1 us () | 1.14 us
    concatenate 10**3 bytes | 46.9 us | 46.8 us (
    )
    concatenate 10**5 bytes | 4.66 ms () | 4.71 ms
    concatenate 10**7 bytes | 478 ms (
    ) | 483 ms
    ------------------------+-------------+------------
    Total | 482 ms (*) | 488 ms
    ------------------------+-------------+------------

    ----------------------------+-------------------+-------------
    deleting front, append tail | original | patched
    ----------------------------+-------------------+-------------
    buffer 10**1 bytes | 639 ns () | 689 ns (+8%)
    buffer 10**3 bytes | 682 ns (
    ) | 723 ns (+6%)
    buffer 10**5 bytes | 3.54 us (+428%) | 671 ns ()
    buffer 10**7 bytes | 900 us (+107128%) | 840 ns (
    )
    ----------------------------+-------------------+-------------
    Total | 905 us (+30877%) | 2.92 us (*)
    ----------------------------+-------------------+-------------

    ----------------------------+------------------+------------
    Summary | original | patched
    ----------------------------+------------------+------------
    non regression | 482 ms () | 488 ms
    deleting front, append tail | 905 us (+30877%) | 2.92 us (
    )
    ----------------------------+------------------+------------
    Total | 483 ms (*) | 488 ms
    ----------------------------+------------------+------------

    @serhiy: I see "zero" difference in the append loop micro-benchmark. I added the final cast to bytes()

    @antoine: Your patch rocks, 30x faster! (I don't care of the 8% slowdown in the nanosecond timing).

    @pitrou
    Copy link
    Member Author

    pitrou commented Oct 3, 2013

    Here is a slightly modified patch implementing Serhiy's suggestion.

    @vstinner
    Copy link
    Member

    vstinner commented Oct 3, 2013

    bytea_slice3.patch looks simpler than bytea_slice2.patch, I prefer it.

    @serhiy-storchaka
    Copy link
    Member

    I don't see much sense in differences between bytea_slice2.patch and bytea_slice3.patch, because bytea_slice3.patch is not smaller and simpler than bytea_slice2.patch.

    I meant that you can continue use self->ob_bytes instead of PyByteArray_AS_STRING(self) if self->ob_bytes points not to the start of physical buffer, but to the start of logical byte array. *This* will simplify the patch a lot.

    @pitrou
    Copy link
    Member Author

    pitrou commented Oct 4, 2013

    I meant that you can continue use self->ob_bytes instead of
    PyByteArray_AS_STRING(self) if self->ob_bytes points not to the
    start of physical buffer, but to the start of logical byte array.
    *This* will simplify the patch a lot.

    It will make the diff smaller but it will not "simplify" the patch.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 5, 2013

    New changeset 499a96611baa by Antoine Pitrou in branch 'default':
    Issue bpo-19087: Improve bytearray allocation in order to allow cheap popping of data at the front (slice deletion).
    http://hg.python.org/cpython/rev/499a96611baa

    @pitrou
    Copy link
    Member Author

    pitrou commented Oct 5, 2013

    The commit produced compiled errors on Windows, but I've since fixed them.

    @pitrou pitrou closed this as completed Oct 5, 2013
    @serhiy-storchaka
    Copy link
    Member

    Side effect of this change is that bytearray's data now can be non-aligned. We should examine all places which relies on this.

    @pitrou
    Copy link
    Member Author

    pitrou commented Oct 5, 2013

    Side effect of this change is that bytearray's data now can be
    non-aligned. We should examine all places which relies on this.

    The C API makes no guarantees as to alignment of private data areas, so
    any external code relying on it would be incorrect.

    The remaining question is whether the bytearray implementation relies on
    it, but I don't think that's the case.

    @vadmium
    Copy link
    Member

    vadmium commented Apr 18, 2015

    I think the changes for this issue are causing the crash and unexpected buffer expansion described in bpo-23985. Appending to a bytearray() can overstep the memory buffer because it doesn’t account for ob_start when checking for resizing. And “del” can expand the allocated memory due to an off-by-one error. Please have a look at my patches. Perhaps there are other operations that also need patching to account for ob_start.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants