msg183082 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2013-02-26 20:28 |
bytearray has many methods that return a *new* bytearray object rather than applying the transformation to modify the bytearray in place. Given that one use of bytearray's is to avoid making extra copies... There should be in-place variants of the following methods:
ljust
lower
lstrip
rjust
rstrip
strip
swapcase
title
translate
upper
especially so for the methods that don't change the length or move data around such as translate, lower, upper, swapcase and title.
|
msg183292 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2013-03-01 20:57 |
I think you need to make more of a case for 'should'. Bytearrays are, well, byte arrays, not text arrays. I could just as well claim that the ascii text operations should not even be there. They are certainly not needed for typical mixed binary-ascii protocol strings, which is what bytearrars were intended for. In any case, this would fatten the api considerably for not too much benefit. I think that this, like many or most enhancement proposals, should best be discussed on python-ideas list before any tracker action.
I consider translate an exception to the above comments. It is a byte operation, not a text operations. bytearry.translate could plausibly have been defined as 'in-place' when added. Most of the other operations are special cases of translate, and can therefore be done with translate, without the limitation to only ascii chars. If one wants to directly uppercase latin-1 encoded bytes without decoding to text, one would need .translate anyway.
|
msg183302 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2013-03-01 23:29 |
Translate isn't a text operation. (That's the api I wanted). The others I
only lists for completeness because someone else would complain if I
hadn't. ;)
On Mar 1, 2013 12:57 PM, "Terry J. Reedy" <report@bugs.python.org> wrote:
>
> Terry J. Reedy added the comment:
>
> I think you need to make more of a case for 'should'. Bytearrays are,
> well, byte arrays, not text arrays. I could just as well claim that the
> ascii text operations should not even be there. They are certainly not
> needed for typical mixed binary-ascii protocol strings, which is what
> bytearrars were intended for. In any case, this would fatten the api
> considerably for not too much benefit. I think that this, like many or most
> enhancement proposals, should best be discussed on python-ideas list before
> any tracker action.
>
> I consider translate an exception to the above comments. It is a byte
> operation, not a text operations. bytearry.translate could plausibly have
> been defined as 'in-place' when added. Most of the other operations are
> special cases of translate, and can therefore be done with translate,
> without the limitation to only ascii chars. If one wants to directly
> uppercase latin-1 encoded bytes without decoding to text, one would need
> .translate anyway.
>
> ----------
> components: +Library (Lib) -Interpreter Core
> nosy: +terry.reedy
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue17301>
> _______________________________________
>
|
msg183478 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-03-04 17:28 |
How *just and *strip can modify an argument in-place? All other methods except title (lower, swapcase, upper) are partial cases of translate. I.e. you need only in-place translate. However I doubt that difference will be more than several percents. Actually translate uses more tacts per byte than memcpy.
|
msg183502 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2013-03-04 22:30 |
Gregory, if an in-place version of .translate is all you want, only ask for that. Change the title and let us focus on the best case. Forget what others *might* want.
Serhiy, what is 'tacts per byte'?
|
msg183530 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-03-05 14:20 |
Sorry, Terry, I was meaning a CPU time.
|
msg183531 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-03-05 14:23 |
+1
|
msg183534 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2013-03-05 16:07 |
The important reasons for this are memory use and cache thrashing, not just
CPU time.
|
msg186833 - (view) |
Author: Ned Jackson Lovely (n) * |
Date: 2013-04-13 20:36 |
An mtranslate, or "mutating translate" method for bytearrays. My first C code in a long time, so be gentle. The name is bad, but I don't see a better suggestion below.
|
msg284618 - (view) |
Author: Inada Naoki (methane) * |
Date: 2017-01-04 07:40 |
I prefer ".inplace_translate" to ".mtranslate", but I'm not sure about it's worth enough to add it to bytearray.
In last year, I proposed adding `bytes.frombuffer()` constructor to avoid
unnecessary memory copy.
https://mail.python.org/pipermail/python-dev/2016-October/146668.html
Nick Coghlan againsted for adding methods to builtin type only for low level
programming, and suggested to implement it in 3rd party library first.
https://mail.python.org/pipermail/python-dev/2016-October/146746.html
https://mail.python.org/pipermail/python-dev/2016-October/146763.html
|
msg284625 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-01-04 10:40 |
> The important reasons for this are memory use and cache thrashing, not just
> CPU time.
Memory use is not an issue unless you translate hundreds of megabytes at a time. Cache trashing at the end is performance issue.
The original patch is no longer applied cleanly. Here is a patch synchronized with current sources and with fixed one error. I didn't look at it closely and don't know whether it has other bugs.
Here are results of microbenchmarking.
$ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))*1000" -- "data = data.translate(table)"
Median +- std dev: 1.48 ms +- 0.02 ms
$ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))*1000" -- "data[:] = data.translate(table)"
Median +- std dev: 1.60 ms +- 0.09 ms
$ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))*1000" -- "data.mtranslate(table)"
Median +- std dev: 1.79 ms +- 0.07 ms
In-place translate don't have benefits. It is slower that translate with creating a new copy, and even is slower that translate with copying a new copy back.
|
msg284626 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2017-01-04 10:49 |
Not every machine has hundreds of MB of memory to waste. Some devices (IoT, MicroPython, Raspberry Pi, OLPC) have limited resources.
|
msg284634 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-01-04 12:27 |
Python interpreter itself takes 20-30 MB of memory. Using it on a machine that has no few tens of free memory doesn't make much sense. MicroPython can be an exception, but it has special modules for low-level programming.
If you need to proceed so large bytearray that creating yet one copy of it is not possible, it can be translated by separate elements or smaller chunks.
$ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))*1000" -- "for i in range(len(data)): data[i] = table[data[i]]"
Median +- std dev: 205 ms +- 10 ms
$ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))*1000" -- "for i in range(0, len(data), 100): data[i:i+100] = data[i:i+100].translate(table)"
Median +- std dev: 12.9 ms +- 0.6 ms
$ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))*1000" -- "for i in range(0, len(data), 1000): data[i:i+1000] = data[i:i+1000].translate(table)"
Median +- std dev: 3.12 ms +- 0.22 ms
$ ./python -m perf timeit -s "table = bytes(range(256)).swapcase(); data = bytearray(range(256))*1000" -- "for i in range(0, len(data), 10000): data[i:i+10000] = data[i:i+10000].translate(table)"
Median +- std dev: 1.79 ms +- 0.14 ms
Translating by chunks also works with data that can't be fit in memory at all. You can mmap large file or read and write it by blocks. In-place bytearray method wouldn't help in this case.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:42 | admin | set | github: 61503 |
2019-08-09 02:38:57 | Philip Dye | set | nosy:
+ Philip Dye
|
2017-01-04 12:27:45 | serhiy.storchaka | set | messages:
+ msg284634 |
2017-01-04 10:55:42 | vstinner | set | nosy:
+ vstinner
|
2017-01-04 10:49:01 | christian.heimes | set | messages:
+ msg284626 |
2017-01-04 10:40:40 | serhiy.storchaka | set | files:
+ issue17301-2.diff
messages:
+ msg284625 |
2017-01-04 07:40:39 | methane | set | nosy:
+ methane messages:
+ msg284618
|
2017-01-04 05:19:48 | THRlWiTi | set | nosy:
+ THRlWiTi
|
2015-03-24 09:03:52 | martin.panter | set | nosy:
+ martin.panter
|
2013-04-13 20:36:20 | n | set | files:
+ issue17301.diff
nosy:
+ n messages:
+ msg186833
keywords:
+ patch |
2013-03-05 16:07:38 | gregory.p.smith | set | messages:
+ msg183534 |
2013-03-05 14:23:35 | christian.heimes | set | nosy:
+ christian.heimes messages:
+ msg183531
|
2013-03-05 14:20:15 | serhiy.storchaka | set | messages:
+ msg183530 |
2013-03-04 22:36:05 | ezio.melotti | set | nosy:
+ ezio.melotti
|
2013-03-04 22:30:26 | terry.reedy | set | messages:
+ msg183502 |
2013-03-04 17:28:49 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg183478
|
2013-03-01 23:29:00 | gregory.p.smith | set | messages:
+ msg183302 |
2013-03-01 20:57:35 | terry.reedy | set | nosy:
+ terry.reedy messages:
+ msg183292 components:
+ Library (Lib), - Interpreter Core
|
2013-02-26 20:28:43 | gregory.p.smith | create | |