Issue 17301: An in-place version of many bytearray methods is needed

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/61503

classification

Title:	An in-place version of many bytearray methods is needed
Type:	enhancement	Stage:	needs patch
Components:	Library (Lib)	Versions:	Python 3.4

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Philip Dye, THRlWiTi, christian.heimes, ezio.melotti, gregory.p.smith, martin.panter, methane, n, serhiy.storchaka, terry.reedy, vstinner
Priority:	normal	Keywords:	easy, patch

Created on 2013-02-26 20:28 by gregory.p.smith, last changed 2022-04-11 14:57 by admin.

Files
File name	Uploaded	Description	Edit
issue17301.diff	n, 2013-04-13 20:36		review
issue17301-2.diff	serhiy.storchaka, 2017-01-04 10:40		review

Messages (13)
msg183082 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2013-02-26 20:28
bytearray has many methods that return a new bytearray object rather than applying the transformation to modify the bytearray in place. Given that one use of bytearray's is to avoid making extra copies... There should be in-place variants of the following methods: ljust lower lstrip rjust rstrip strip swapcase title translate upper especially so for the methods that don't change the length or move data around such as translate, lower, upper, swapcase and title.
msg183292 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2013-03-01 20:57
I think you need to make more of a case for 'should'. Bytearrays are, well, byte arrays, not text arrays. I could just as well claim that the ascii text operations should not even be there. They are certainly not needed for typical mixed binary-ascii protocol strings, which is what bytearrars were intended for. In any case, this would fatten the api considerably for not too much benefit. I think that this, like many or most enhancement proposals, should best be discussed on python-ideas list before any tracker action. I consider translate an exception to the above comments. It is a byte operation, not a text operations. bytearry.translate could plausibly have been defined as 'in-place' when added. Most of the other operations are special cases of translate, and can therefore be done with translate, without the limitation to only ascii chars. If one wants to directly uppercase latin-1 encoded bytes without decoding to text, one would need .translate anyway.
msg183302 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2013-03-01 23:29
Translate isn't a text operation. (That's the api I wanted). The others I only lists for completeness because someone else would complain if I hadn't. ;) On Mar 1, 2013 12:57 PM, "Terry J. Reedy" <report@bugs.python.org> wrote: > > Terry J. Reedy added the comment: > > I think you need to make more of a case for 'should'. Bytearrays are, > well, byte arrays, not text arrays. I could just as well claim that the > ascii text operations should not even be there. They are certainly not > needed for typical mixed binary-ascii protocol strings, which is what > bytearrars were intended for. In any case, this would fatten the api > considerably for not too much benefit. I think that this, like many or most > enhancement proposals, should best be discussed on python-ideas list before > any tracker action. > > I consider translate an exception to the above comments. It is a byte > operation, not a text operations. bytearry.translate could plausibly have > been defined as 'in-place' when added. Most of the other operations are > special cases of translate, and can therefore be done with translate, > without the limitation to only ascii chars. If one wants to directly > uppercase latin-1 encoded bytes without decoding to text, one would need > .translate anyway. > > ---------- > components: +Library (Lib) -Interpreter Core > nosy: +terry.reedy > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue17301> > _______________________________________ >
msg183478 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-03-04 17:28
How just and strip can modify an argument in-place? All other methods except title (lower, swapcase, upper) are partial cases of translate. I.e. you need only in-place translate. However I doubt that difference will be more than several percents. Actually translate uses more tacts per byte than memcpy.
msg183502 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2013-03-04 22:30
Gregory, if an in-place version of .translate is all you want, only ask for that. Change the title and let us focus on the best case. Forget what others might want. Serhiy, what is 'tacts per byte'?
msg183530 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-03-05 14:20
Sorry, Terry, I was meaning a CPU time.
msg183531 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2013-03-05 14:23
+1
msg183534 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2013-03-05 16:07
The important reasons for this are memory use and cache thrashing, not just CPU time.
msg186833 - (view)	Author: Ned Jackson Lovely (n) *	Date: 2013-04-13 20:36
An mtranslate, or "mutating translate" method for bytearrays. My first C code in a long time, so be gentle. The name is bad, but I don't see a better suggestion below.
msg284618 - (view)	Author: Inada Naoki (methane) *	Date: 2017-01-04 07:40
I prefer ".inplace_translate" to ".mtranslate", but I'm not sure about it's worth enough to add it to bytearray. In last year, I proposed adding `bytes.frombuffer()` constructor to avoid unnecessary memory copy. https://mail.python.org/pipermail/python-dev/2016-October/146668.html Nick Coghlan againsted for adding methods to builtin type only for low level programming, and suggested to implement it in 3rd party library first. https://mail.python.org/pipermail/python-dev/2016-October/146746.html https://mail.python.org/pipermail/python-dev/2016-October/146763.html
msg284625 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-01-04 10:40
> The important reasons for this are memory use and cache thrashing, not just > CPU time. Memory use is not an issue unless you translate hundreds of megabytes at a time. Cache trashing at the end is performance issue. The original patch is no longer applied cleanly. Here is a patch synchronized with current sources and with fixed one error. I didn't look at it closely and don't know whether it has other bugs. Here are results of microbenchmarking. $ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))1000" -- "data = data.translate(table)" Median +- std dev: 1.48 ms +- 0.02 ms $ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))1000" -- "data[:] = data.translate(table)" Median +- std dev: 1.60 ms +- 0.09 ms $ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))*1000" -- "data.mtranslate(table)" Median +- std dev: 1.79 ms +- 0.07 ms In-place translate don't have benefits. It is slower that translate with creating a new copy, and even is slower that translate with copying a new copy back.
msg284626 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2017-01-04 10:49
Not every machine has hundreds of MB of memory to waste. Some devices (IoT, MicroPython, Raspberry Pi, OLPC) have limited resources.
msg284634 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-01-04 12:27
Python interpreter itself takes 20-30 MB of memory. Using it on a machine that has no few tens of free memory doesn't make much sense. MicroPython can be an exception, but it has special modules for low-level programming. If you need to proceed so large bytearray that creating yet one copy of it is not possible, it can be translated by separate elements or smaller chunks. $ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))1000" -- "for i in range(len(data)): data[i] = table[data[i]]" Median +- std dev: 205 ms +- 10 ms $ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))1000" -- "for i in range(0, len(data), 100): data[i:i+100] = data[i:i+100].translate(table)" Median +- std dev: 12.9 ms +- 0.6 ms $ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))1000" -- "for i in range(0, len(data), 1000): data[i:i+1000] = data[i:i+1000].translate(table)" Median +- std dev: 3.12 ms +- 0.22 ms $ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))1000" -- "for i in range(0, len(data), 10000): data[i:i+10000] = data[i:i+10000].translate(table)" Median +- std dev: 1.79 ms +- 0.14 ms Translating by chunks also works with data that can't be fit in memory at all. You can mmap large file or read and write it by blocks. In-place bytearray method wouldn't help in this case.

History
Date	User	Action	Args
2022-04-11 14:57:42	admin	set	github: 61503
2019-08-09 02:38:57	Philip Dye	set	nosy: + Philip Dye
2017-01-04 12:27:45	serhiy.storchaka	set	messages: + msg284634
2017-01-04 10:55:42	vstinner	set	nosy: + vstinner
2017-01-04 10:49:01	christian.heimes	set	messages: + msg284626
2017-01-04 10:40:40	serhiy.storchaka	set	files: + issue17301-2.diff messages: + msg284625
2017-01-04 07:40:39	methane	set	nosy: + methane messages: + msg284618
2017-01-04 05:19:48	THRlWiTi	set	nosy: + THRlWiTi
2015-03-24 09:03:52	martin.panter	set	nosy: + martin.panter
2013-04-13 20:36:20	n	set	files: + issue17301.diff nosy: + n messages: + msg186833 keywords: + patch
2013-03-05 16:07:38	gregory.p.smith	set	messages: + msg183534
2013-03-05 14:23:35	christian.heimes	set	nosy: + christian.heimes messages: + msg183531
2013-03-05 14:20:15	serhiy.storchaka	set	messages: + msg183530
2013-03-04 22:36:05	ezio.melotti	set	nosy: + ezio.melotti
2013-03-04 22:30:26	terry.reedy	set	messages: + msg183502
2013-03-04 17:28:49	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg183478
2013-03-01 23:29:00	gregory.p.smith	set	messages: + msg183302
2013-03-01 20:57:35	terry.reedy	set	nosy: + terry.reedy messages: + msg183292 components: + Library (Lib), - Interpreter Core
2013-02-26 20:28:43	gregory.p.smith	create