classification
Title: An in-place version of many bytearray methods is needed
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, ezio.melotti, gregory.p.smith, haypo, inada.naoki, irdb, martin.panter, n, serhiy.storchaka, terry.reedy
Priority: normal Keywords: easy, patch

Created on 2013-02-26 20:28 by gregory.p.smith, last changed 2017-01-04 12:27 by serhiy.storchaka.

Files
File name Uploaded Description Edit
issue17301.diff n, 2013-04-13 20:36 review
issue17301-2.diff serhiy.storchaka, 2017-01-04 10:40 review
Messages (13)
msg183082 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2013-02-26 20:28
bytearray has many methods that return a *new* bytearray object rather than applying the transformation to modify the bytearray in place.  Given that one use of bytearray's is to avoid making extra copies... There should be in-place variants of the following methods:

ljust
lower
lstrip
rjust
rstrip
strip
swapcase
title
translate
upper

especially so for the methods that don't change the length or move data around such as translate, lower, upper, swapcase and title.
msg183292 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-03-01 20:57
I think you need to make more of a case for 'should'. Bytearrays are, well, byte arrays, not text arrays. I could just as well claim that the ascii text operations should not even be there. They are certainly not needed for typical mixed binary-ascii protocol strings, which is what bytearrars were intended for. In any case, this would fatten the api considerably for not too much benefit. I think that this, like many or most enhancement proposals, should best be discussed on python-ideas list before any tracker action.

I consider translate an exception to the above comments. It is a byte operation, not a text operations. bytearry.translate could plausibly have been defined as 'in-place' when added. Most of the other operations are special cases of translate, and can therefore be done with translate, without the limitation to only ascii chars. If one wants to directly uppercase latin-1 encoded bytes without decoding to text, one would need .translate anyway.
msg183302 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2013-03-01 23:29
Translate isn't a text operation. (That's the api I wanted).  The others I
only lists for completeness because someone else would complain if I
hadn't. ;)
On Mar 1, 2013 12:57 PM, "Terry J. Reedy" <report@bugs.python.org> wrote:

>
> Terry J. Reedy added the comment:
>
> I think you need to make more of a case for 'should'. Bytearrays are,
> well, byte arrays, not text arrays. I could just as well claim that the
> ascii text operations should not even be there. They are certainly not
> needed for typical mixed binary-ascii protocol strings, which is what
> bytearrars were intended for. In any case, this would fatten the api
> considerably for not too much benefit. I think that this, like many or most
> enhancement proposals, should best be discussed on python-ideas list before
> any tracker action.
>
> I consider translate an exception to the above comments. It is a byte
> operation, not a text operations. bytearry.translate could plausibly have
> been defined as 'in-place' when added. Most of the other operations are
> special cases of translate, and can therefore be done with translate,
> without the limitation to only ascii chars. If one wants to directly
> uppercase latin-1 encoded bytes without decoding to text, one would need
> .translate anyway.
>
> ----------
> components: +Library (Lib) -Interpreter Core
> nosy: +terry.reedy
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue17301>
> _______________________________________
>
msg183478 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-03-04 17:28
How *just and *strip can modify an argument in-place? All other methods except title (lower, swapcase, upper) are partial cases of translate. I.e. you need only in-place translate. However I doubt that difference will be more than several percents. Actually translate uses more tacts per byte than memcpy.
msg183502 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-03-04 22:30
Gregory, if an in-place version of .translate is all you want, only ask for that. Change the title and let us focus on the best case. Forget what others *might* want.

Serhiy, what is 'tacts per byte'?
msg183530 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-03-05 14:20
Sorry, Terry, I was meaning a CPU time.
msg183531 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-03-05 14:23
+1
msg183534 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2013-03-05 16:07
The important reasons for this are memory use and cache thrashing, not just
CPU time.
msg186833 - (view) Author: Ned Jackson Lovely (n) * Date: 2013-04-13 20:36
An mtranslate, or "mutating translate" method for bytearrays. My first C code in a long time, so be gentle. The name is bad, but I don't see a better suggestion below.
msg284618 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-01-04 07:40
I prefer ".inplace_translate" to ".mtranslate", but I'm not sure about it's worth enough to add it to bytearray.

In last year, I proposed adding `bytes.frombuffer()` constructor to avoid
unnecessary memory copy.
https://mail.python.org/pipermail/python-dev/2016-October/146668.html

Nick Coghlan againsted for adding methods to builtin type only for low level
programming, and suggested to implement it in 3rd party library first.
https://mail.python.org/pipermail/python-dev/2016-October/146746.html
https://mail.python.org/pipermail/python-dev/2016-October/146763.html
msg284625 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-04 10:40
> The important reasons for this are memory use and cache thrashing, not just
> CPU time.

Memory use is not an issue unless you translate hundreds of megabytes at a time. Cache trashing at the end is performance issue.

The original patch is no longer applied cleanly. Here is a patch synchronized with current sources and with fixed one error. I didn't look at it closely and don't know whether it has other bugs.

Here are results of microbenchmarking.

$ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))*1000" -- "data = data.translate(table)"
Median +- std dev: 1.48 ms +- 0.02 ms
$ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))*1000" -- "data[:] = data.translate(table)"
Median +- std dev: 1.60 ms +- 0.09 ms
$ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))*1000" -- "data.mtranslate(table)"
Median +- std dev: 1.79 ms +- 0.07 ms

In-place translate don't have benefits. It is slower that translate with creating a new copy, and even is slower that translate with copying a new copy back.
msg284626 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-01-04 10:49
Not every machine has hundreds of MB of memory to waste. Some devices (IoT, MicroPython, Raspberry Pi, OLPC) have limited resources.
msg284634 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-04 12:27
Python interpreter itself takes 20-30 MB of memory. Using it on a machine that has no few tens of free memory doesn't make much sense. MicroPython can be an exception, but it has special modules for low-level programming.

If you need to proceed so large bytearray that creating yet one copy of it is not possible, it can be translated by separate elements or smaller chunks.

$ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))*1000" -- "for i in range(len(data)): data[i] = table[data[i]]"
Median +- std dev: 205 ms +- 10 ms
$ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))*1000" -- "for i in range(0, len(data), 100): data[i:i+100] = data[i:i+100].translate(table)"
Median +- std dev: 12.9 ms +- 0.6 ms
$ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))*1000" -- "for i in range(0, len(data), 1000): data[i:i+1000] = data[i:i+1000].translate(table)"
Median +- std dev: 3.12 ms +- 0.22 ms
$ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))*1000" -- "for i in range(0, len(data), 10000): data[i:i+10000] = data[i:i+10000].translate(table)"
Median +- std dev: 1.79 ms +- 0.14 ms

Translating by chunks also works with data that can't be fit in memory at all. You can mmap large file or read and write it by blocks. In-place bytearray method wouldn't help in this case.
History
Date User Action Args
2017-01-04 12:27:45serhiy.storchakasetmessages: + msg284634
2017-01-04 10:55:42hayposetnosy: + haypo
2017-01-04 10:49:01christian.heimessetmessages: + msg284626
2017-01-04 10:40:40serhiy.storchakasetfiles: + issue17301-2.diff

messages: + msg284625
2017-01-04 07:40:39inada.naokisetnosy: + inada.naoki
messages: + msg284618
2017-01-04 05:19:48irdbsetnosy: + irdb
2015-03-24 09:03:52martin.pantersetnosy: + martin.panter
2013-04-13 20:36:20nsetfiles: + issue17301.diff

nosy: + n
messages: + msg186833

keywords: + patch
2013-03-05 16:07:38gregory.p.smithsetmessages: + msg183534
2013-03-05 14:23:35christian.heimessetnosy: + christian.heimes
messages: + msg183531
2013-03-05 14:20:15serhiy.storchakasetmessages: + msg183530
2013-03-04 22:36:05ezio.melottisetnosy: + ezio.melotti
2013-03-04 22:30:26terry.reedysetmessages: + msg183502
2013-03-04 17:28:49serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg183478
2013-03-01 23:29:00gregory.p.smithsetmessages: + msg183302
2013-03-01 20:57:35terry.reedysetnosy: + terry.reedy
messages: + msg183292
components: + Library (Lib), - Interpreter Core
2013-02-26 20:28:43gregory.p.smithcreate