This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: bitwise ops for bytes of equal length
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: MGilch, Ramchandra Apte, abarnert, cameron, christian.heimes, cowlicks, georg.brandl, gregory.p.smith, josh.r, martin.panter, ncoghlan, pitrou, rhettinger, scoder, serhiy.storchaka, socketpair, terry.reedy, vstinner
Priority: normal Keywords: patch

Created on 2013-10-13 20:07 by christian.heimes, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
bitwise_bytes.diff cowlicks, 2016-01-10 14:48 Patch review
Messages (52)
msg199785 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-10-13 20:07
I like to propose a new feature for bytes and perhaps bytearray. None of the bytes types support bitwise operations like &, | and ^. I like to add the bitwise protocol between byte objects of equal length:

>>> a, b = b"123", b"abc"
>>> bytes(x ^ y for x, y in zip(a, b)) 
b'PPP'
>>> a ^ b
b'PPP'
msg199786 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-13 20:09
bytearray should certainly get the same functionality as bytes.
msg199787 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013-10-13 20:12
I assume you have a use case?
msg199792 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-13 20:21
Use int's.

(int.from_bytes(a, 'big') ^ int.from_bytes(b, 'big')).to_bytes(len(a), 'big')

Adding & and | operations to bytes will be confused because bytes is a sequence and this will conflict with set operations.
msg199794 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-13 20:30
The int-based spelling isn't very pretty though.
And sequences are not sets :-)
I suppose the use case has something to do with cryptography?
msg199801 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-13 20:57
When you work with int's instead of bytes it is spelled pretty enough.

Sets are sequences. So having for example iteration and | operation in one function we can't be sure what it means.

Please don't turn Python to APL or Perl.

Perhaps separated mutable bitset (or bitlist?) type will be more useful.
msg199802 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-13 21:00
> Sets are sequences.

>>> from collections import abc
>>> issubclass(abc.Set, abc.Sequence)
False
>>> issubclass(abc.Sequence, abc.Set)
False

> Perhaps separated mutable bitset (or bitlist?) type will be more useful.

Then I would prefer a view, using e.g. the buffer API, but without any copies.
msg199805 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-10-13 21:10
You got me, Antoine! I'm working on a Python-only implementation of PBKDF2_HMAC. It involves XOR of two bytes in one place.

Serhiy, I'm not going to turn Python into Perl. I merely like to provide a simple and well defined feature with existing syntax. I propose bitwise ops only between bytes and bytearrays of equal size. Everything else like unequal size, bitwise ops between integers and bytes etc. are out of the question.
msg199835 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2013-10-14 04:00
+1
msg200330 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-10-18 23:26
'XOR of two bytes in one place' strikes me as a thin excuse for a new feature that abbreviates a simple, short, one-liner. To me, bytes(x ^ y for x, y in zip(a, b)) looks fine. a and b can be any iterables of ints.
msg200359 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2013-10-19 03:44
On 19 October 2013 04:56, Terry J. Reedy <report@bugs.python.org> wrote:

>
> Terry J. Reedy added the comment:
>
> 'XOR of two bytes in one place' strikes me as a thin excuse for a new
> feature that abbreviates a simple, short, one-liner. To me, bytes(x ^ y for
> x, y in zip(a, b)) looks fine. a and b can be any iterables of ints.
>
> ----------
> nosy: +terry.reedy
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue19251>
> _______________________________________
>
Hm... I think you are right.
msg200401 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-10-19 10:56
bytes(x ^ y for x, y in zip(a, b)) is super-slow if you have to do XOR inside a hot loop for a couple of ten thousand times. int.from_bytes + int.to_bytes is about ten times faster. I expect bitwise ops of bytes to be even faster and more readable.

$ python3.3 -m timeit -n 100000 -s "a = b'a'*64; b = b'b'*64" "bytes(x ^ y for x, y in zip(a, b))"
100000 loops, best of 3: 7.5 usec per loop

$ python3.3 -m timeit -n 100000 -s "a = b'a'*64; b = b'b'*64" "i = int.from_bytes(a, 'little') ^ int.from_bytes(b, 'little'); i.to_bytes(64, 'little')"
100000 loops, best of 3: 0.866 usec per loop
msg200696 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-10-21 04:56
Christian, we need multiple motivating use cases to warrant API expansion for fundamental types.

I'm concerned about starting to move numpy vector-op functionality into the core of Python because it optimizes your one use crypto use case.
msg200735 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-10-21 09:15
I see that the feature idea is more controversial than I initially expected. It's probably best to have a long bike-shedding discussion on Python-ideas... :) Right now from_bytes/to_bytes trick is fast enough for my needs anyway. Therefore I'm deferring the proposal for 3.5.

By the way I'd also be happy with a set of vector ops in the operator module, e.g. vector_xor(a, b).
msg200739 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2013-10-21 09:53
On 21 October 2013 14:45, Christian Heimes <report@bugs.python.org> wrote:

>
> Christian Heimes added the comment:
>
> I see that the feature idea is more controversial than I initially
> expected. It's probably best to have a long bike-shedding discussion on
> Python-ideas... :) Right now from_bytes/to_bytes trick is fast enough for
> my needs anyway. Therefore I'm deferring the proposal for 3.5.
>
> By the way I'd also be happy with a set of vector ops in the operator
> module, e.g. vector_xor(a, b).
>
> +1, a generic vector_xor function looks like a better idea.
msg200740 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-21 10:01
> By the way I'd also be happy with a set of vector ops in the operator module, e.g. vector_xor(a, b).

This is one direction. Other direction is adding the bitarray or bitset container and the bitview adapter.
msg202120 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-04 11:07
"You got me, Antoine! I'm working on a Python-only implementation of PBKDF2_HMAC. It involves XOR of two bytes in one place."

If you want super-fast code, you should probably reimplement it in C. Python is not designed for performances...
msg257727 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-01-08 00:08
I'm very skeptical of this. I expect it would cause quite a few surprises for people who aren't used to bitmask operations on integers, let alone on (byte) strings.
msg257728 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2016-01-08 00:19
Two years ago I suggested this:

> Then I would prefer a view, using e.g. the buffer API, but without any copies.

... and of course Numpy already provides such an API:

>>> a, b = b"123", bytearray(b"abc")
>>> p = np.frombuffer(a, dtype=np.int8)
>>> q = np.frombuffer(b, dtype=np.int8)
>>> p ^ q
array([80, 80, 80], dtype=int8)
>>> b[0] = 64
>>> p ^ q
array([113,  80,  80], dtype=int8)
msg257729 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2016-01-08 00:19
Another possibility if you don't mind a copy is to use int.from_bytes():

>>> p = int.from_bytes(a, 'little')
>>> q = int.from_bytes(b, 'little')
>>> (p ^ q).to_bytes(len(a), 'little')
b'qPP'
msg257730 - (view) Author: Andrew Barnert (abarnert) * Date: 2016-01-08 00:42
For what it's worth, I looked at some old code (a clean re-implementation of a crypto algorithm in Python, used as a unit test for production code in C++) and found this:

class BytesBits(bytes):
    # from a quick test, 1.12us in NumPy, 1.39us in C, 2.55us this way, 46.1us with bytes(genexpr), so we don't need numpy
    def _bitwise(self, other, op):
        iself = int.from_bytes(self, 'little')
        iother = int.from_bytes(other, 'little')
        b = op(iself, iother)
        return b.to_bytes(len(self), 'little')
    def __or__(self, other):
        return self._bitwise(other, int.__or__)
    __ror__ = __or__
    def __and__(self, other):
        return self._bitwise(other, int.__and__)
    __rand__ = __and__
    def __xor__(self, other):
        return self._bitwise(other, int.__xor__)
    __rxor__ = __xor__
    
It doesn't do as much error checking as you want, but it was good enough for my purposes.

At any rate, the fact that it's trivial to wrap this up yourself (and even more so if you just write functions called band/bor/bxor instead of wrapping them up in a subclass) implies to me that, if it's not used all that often, it doesn't need to be on the builtin types.
msg257748 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-08 10:24
> I'm very skeptical of this. I expect it would cause quite a few surprises for people who aren't used to bitmask operations on integers, let alone on (byte) strings.

Maybe it makes more sense to implement such operation on the array.array type? But only for integer types (ex: not for 'd', float)?
msg257754 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-08 12:15
FWIW a long time ago I wanted fast XORing of 512-byte “sectors” of Rar files. Initially I think I used array.array with the largest word size available, or numpy if available. Later when I learnt more Python I discovered the int.from_bytes() trick, and used int(hexlify(...), 16) in Python 2. So I guess the array module was a relatively obvious place to look, and the long integer trick was unexpected (I’d never programmed with unlimited size integers before).

Serhiy’s bit array idea also seems interesting. I bet somebody has already written a package. Maybe it could also be useful for things like Huffman encoding, where you string various bit strings together into a sequence of bytes. But I do wonder if all these things are too specialized for Python’s standard library.
msg257755 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-01-08 12:57
Serhiy’s bitarray idea has another benefit. Bit masking is only haft of the story. Since Python's int type has no length information, it not possible to handle shifting with overflow and rotation. A bit array can provide bit shifting ops like rshift, lshift, rotr and rotl.
msg257761 - (view) Author: Andrew Barnert (abarnert) * Date: 2016-01-08 14:35
There are a number of existing libraries on PyPI for bit arrays and bit sets. There are a lot more API choices than you'd think of in advance, and between them they explore most of the useful space. And I don't think any of them need to be in the stdlib.
msg257765 - (view) Author: Марк Коренберг (socketpair) * Date: 2016-01-08 16:39
Just mention it here :)

https://github.com/KeepSafe/aiohttp/issues/686
msg257911 - (view) Author: cowlicks (cowlicks) Date: 2016-01-10 14:48
I've attached a diff that adds ^, |, and & to bytes and bytearray object on the master branch (err the analogous hg thing).

It also includes a test file which definitely is in the wrong place, but demonstrates what is working.

Personally this came up while I was playing with toy crypto problems. I expected to already be part of the language, but it wasn't. I think this is a natural expectation. I don't think it is obvious to newer python users that they need to cast bytes to ints to do bitwise operations on them.

And I do not understand how bitwise operations work on arbitrary precision integers. So perhaps it is not as simple of a concept as "bytes xor bytes".

Some folks have suggested using NumPy, but that is a very heavy dependency, and not useful outside of cpython.

I searched for code this would clean up in the wild. It was easy to find.

This would also catch bugs when bytes objects are different lengths, but there is no check. Like this code I found 

# XOR each byte of the roundKey with the state table
def addRoundKey(state, roundKey):
    for i in range(len(state)):
        state[i] = state[i] ^ roundKey[i]

p.s. this is my first cpython issue/patch please let me know if I'm doing something wrong.
msg257914 - (view) Author: Andrew Barnert (abarnert) * Date: 2016-01-10 15:20
On Jan 10, 2016, at 06:48, cowlicks <report@bugs.python.org> wrote:
> 
> 
> Personally this came up while I was playing with toy crypto problems. I expected to already be part of the language, but it wasn't. I think this is a natural expectation.

Maybe if you're coming to Python from APL or R or something, but C, C#, Scala, Ruby, Haskell--almost any other language you pick, there's no bitwise operations on (byte) strings. (And if you _are_ coming to Python from something like R, you definitely should be using NumPy.)
> 
> And I do not understand how bitwise operations work on arbitrary precision integers.

It's obvious once you think of them as infinite-sized fixed-size ints: 0x27 is the same number as 0x0027, so 0x27 & 0x0134 is 0x0024. (Of course not and negation aren't quite as obvious, but once you think about it, there's only one sensible thing 2's complement could do, and only two sensible things 1's complement could do, and Python is sensible, so it's not surprising once you try it out.)

> Some folks have suggested using NumPy, but that is a very heavy dependency, and not useful outside of cpython.

It's useful outside of CPython. While NumPyPy isn't 100% yet, it's usable enough for many projects.

More to the point, if you're looking for arrays that have really fast and highly readable elementwise operations, that's exactly what NumPy is all about. Sure, you can get bits and pieces of similar functionality without it, but if you're thinking about your code in NumPy's terms (elementwise operations), you really do want to think about NumPy.

Meanwhile, have you looked around PyPI at the various bitarray, bitstring, etc. libraries? Are they too slow, too heavyweight, or too inconveniently-API'd? I know whenever I want to play with things like Huffman coding or the underlying bit representation of IEEE floats or anything else bitwise besides basic C integer stuff, I reach for one of those libraries, rather than trying to hack things up around bytes strings. (Except for that example I posted above--clearly for some reason I _did_ hack things up around bytes strings that time--but it only took me that simple class to make things convenient and efficient.)
msg257964 - (view) Author: cowlicks (cowlicks) Date: 2016-01-11 14:58
@Andrew Barnert
> Maybe if you're coming to Python from...
I'm not sure if your trying argue that my expectations are unusual? Python is my first programming language. To reiterate: I expected cpython to support bitwise operations on binary data. I don't think that is so strange.

No I have not looked at PyPi. What I did was have an idea to do this, and there happened to be an open bug on it that needed a patch. So I wrote one.

And yes, I realize NumPy can do this, but it is still a very large dependency.

Anyway, here are some random projects which would look a lot nicer with this:

An implementation of the blake2 hash function in pure python. Consider this line:
https://github.com/buggywhip/blake2_py/blob/master/blake2.py#L234

self.h = [self.h[i] ^ v[i] ^ v[i+8] for i in range(8)]

Which would become something like:

self.h ^= v[:8] ^ v[8:]

Which is much easier to read and much faster.

Or consider this function from this aes implementation:
https://github.com/bozhu/AES-Python/blob/master/aes.py#L194-L201

    def __mix_single_column(self, a):
        # please see Sec 4.1.2 in The Design of Rijndael
        t = a[0] ^ a[1] ^ a[2] ^ a[3]
        u = a[0]
        a[0] ^= t ^ xtime(a[0] ^ a[1])
        a[1] ^= t ^ xtime(a[1] ^ a[2])
        a[2] ^= t ^ xtime(a[2] ^ a[3])
        a[3] ^= t ^ xtime(a[3] ^ u)

This would become something like:

def __mix_single_column(self, a):
    a ^= a ^ xtime(a ^ (a[1:] + a[0:1]))

Clearer and faster. 

Another piece of code this would improve:
https://github.com/mgoffin/keccak-python/blob/master/Keccak.py#L196-L209

These were easy to find so I'm sure there are more. I think these demonstrate that despite what people *should* be doing, they are doing things in a way that could be substantially improved with this patch.

This does resemble NumPy's vectorized functions, but it is much more limited in scope since there is no broadcasting involved.

Here is a quick benchmark:

$ ./python -m timeit -n 100000 -s "a=b'a'*64; b=b'b'*64" "(int.from_bytes(a, 'little') ^ int.from_bytes(b, 'little')).to_bytes(64, 'little')"
100000 loops, best of 3: 0.942 usec per loop

$ ./python -m timeit -n 100000 -s "a=b'a'*64; b=b'b'*64" "a ^ b"
100000 loops, best of 3: 0.041 usec per loop

NumPy is the slowest but I'm probably doing something wrong, and its in ipython so I'm not timing the import:

In [13]: %timeit bytes(np.frombuffer(b'b'*64, dtype=np.int8) ^ np.frombuffer(b'a'*64, dtype=np.int8))
100000 loops, best of 3: 3.69 µs per loop

About 20 times faster,
msg260470 - (view) Author: Марк Коренберг (socketpair) * Date: 2016-02-18 16:52
in order to increase perofrmance even more, use block operation on bytes. I.e. Xor by 8 bytes first (on 64-bit system) while size remainig is bigger or equal to 8, then by 4 bytes using same loop, and then xor remaining bytes by one byte. This will increase performance roughly to 8 times on 64bit systems and by 4 times on 32bit systems.

See my PR https://github.com/KeepSafe/aiohttp/pull/687/files for details
msg264184 - (view) Author: cowlicks (cowlicks) Date: 2016-04-25 16:17
To reiterate, this issue would make more readable, secure, and speed up a lot of code.

The concerns about this being a numpy-like vector operation are confusing to me. The current implementation is already vector-like, but lacks size checking. Isn't "int ^ int" really just the xor of two arbitrarily long arrays of binary data? At least with "bytes ^ bytes" we can enforce the arrays be the same size.
msg264187 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-04-25 16:24
> I like to add the bitwise protocol between byte objects of equal length

Would it make sense to add such operators in a new module like the existing (but deprecated) audioop module?

If yes, maybe we should start with a module on PyPI. Is there a volunteer to try this option?
msg264188 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-04-25 16:27
Can you point to some examples of existing code that would become more
readable and faster when this feature exists? Separately, how is it more
secure?

On Mon, Apr 25, 2016 at 9:17 AM, cowlicks <report@bugs.python.org> wrote:

>
> cowlicks added the comment:
>
> To reiterate, this issue would make more readable, secure, and speed up a
> lot of code.
>
> The concerns about this being a numpy-like vector operation are confusing
> to me. The current implementation is already vector-like, but lacks size
> checking. Isn't "int ^ int" really just the xor of two arbitrarily long
> arrays of binary data? At least with "bytes ^ bytes" we can enforce the
> arrays be the same size.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue19251>
> _______________________________________
>
msg264190 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2016-04-25 17:29
I have wanted bytes/bytearray bit operations (be sure to include the in place operators for bytearray) when using micropython where it is normal to be operating on binary data.

that said, i'd need someone from micropython to chime in as to if they can magically detect
 # Equivalent of:  c = b ^ a
 c = bytes(x ^ y for x, y in zip(a, b))
and make it run fast.

what is a similar expression for an in place bytearray modification?
 # Equivalent of:  a ^= b
 assert len(a) == len(b)
 for i, b_i in enumerate(b): a[i] ^= b_i  ?

Why both of the above are slow is obvious: tons of work looping within python itself, creating and destroying small integers and/or tuples the entire time instead of deferring to the optimal loop in C.

Neither of the above "look as nice" as a simple operator would.
But they are at least both understandable and frankly about the same as what you would naively write in C for the task.

Security claims?  Nonsense. This has nothing to do with security.  It is *fundamentally impossible* to write constant time side channel attack resistant algorithms in a high level garbage collected language. Do not attempt it.  Leave that stuff to assembler or _very_ carefully crafted C/C++ that the compiler cannot undo constant time enforcing tricks in.  Where it belongs.  Python will never make such guarantees.

NumPy?  No.  That is a huge bloated dependency.  It is not relevant to this as we are not doing scientific computing.  It is not available everywhere.

The int.from_bytes(...) variant optimizations?  Those are hacks that might be useful to people in CPython today, but they are much less readable.  Do not write that in polite code, hide it behind a function with a comment explaining why it's so ugly to anyone who dares look inside please.

So as much as I'd love this feature to exist on bytes & bytearray, it is not a big deal that it does not.

Starting with a PyPI module for fast bit operations on bytes & bytearray objects makes more sense (include both pure python and a C extension implementation).  Use of that will give an idea of how often anyone actually wants to do this.
msg264192 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-25 18:00
> If yes, maybe we should start with a module on PyPI. Is there a volunteer to try this option?

bitsets – ordered subsets over a predefined domain
bitarray – efficient boolean array implemented as C extension
bitstring – pure-Python bit string based on bytearray
BitVector – pure-Python bit array based on unsigned short array
Bitsets – Cython interface to fast bitsets in Sage
bitfield – Cython positive integer sets
intbitset – integer bit sets as C extension

Is it enough? Ah, and NumPy.
msg264323 - (view) Author: cowlicks (cowlicks) Date: 2016-04-26 18:35
@gvanrossum in this previous comment https://bugs.python.org/issue19251?@ok_message=msg%20264184%20created%0Aissue%2019251%20message_count%2C%20messages%20edited%20ok&@template=item#msg257964

I pointed out code from the wild which would be more readable, and posted preliminary benchmarks. But there is a typo, I should have written:

def __mix_single_column(self, a):
    t = len(a) * bytes([reduce(xor, a)])
    a ^= t ^ xtime(a ^ (a[1:] + a[0:1]))


As @gregory.p.smith points out, my claim about security isn't very clear. This would be "more secure" for two reasons. Code would be easier to read and therefore verify, but this is the same as readability. The other reason, doing some binary bitwise op on two bytes objects enforces that the objects be the same length, so unexpected bugs in these code samples would be avoided.

bytes(x ^ y for x, y in zip(a, b))

(int.from_bytes(a, 'big') ^ int.from_bytes(b, 'big')).to_bytes(len(a), 'big')

# XOR each byte of the roundKey with the state table
def addRoundKey(state, roundKey):
    for i in range(len(state)):
        state[i] = state[i] ^ roundKey[i]
msg264324 - (view) Author: cowlicks (cowlicks) Date: 2016-04-26 18:39
I'll look through the list @serhiy.storchaka posted and make sure this still seems sane to me.
msg264325 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-04-26 19:25
>> If yes, maybe we should start with a module on PyPI. Is there a volunteer to try this option?
>
> bitsets – ordered subsets over a predefined domain

This is an array of *bits*, not an array of bytes (or integers).

> bitarray – efficient boolean array implemented as C extension

This one is also an array of *bits*, but I see .frombytes() and
.tobytes() methods.

> bitstring – pure-Python bit string based on bytearray

Array of *bits*. I don't see how to use an array of integers (or a
byte string) with it.

> BitVector – pure-Python bit array based on unsigned short array
> Bitsets – Cython interface to fast bitsets in Sage
> bitfield – Cython positive integer sets
> intbitset – integer bit sets as C extension

I'm too lazy to check these ones.

I didn't check these modules support operations like x^y.

> Is it enough? Ah, and NumPy.

I'm quite sure that NumPy supports operations like x^y ;-) And NumPy
supports a wide choices of arrays.

Victor
msg265560 - (view) Author: Cameron Simpson (cameron) * Date: 2016-05-14 23:49
I'd like speak my support for bitwise ops on bytes and bytearray (agree, on equal lengths only).

I've got 2 arguments here:

- readability: a ^ b, a | b and so forth are clear and direct

- all the various incantation presented must be _understood_, not to mention invented anew by anyone wanting to do they same, with the same burden of getting it correct; of course one can say the same of any feature not already present in a language but that is trite; there are several ways to say this and all have varying degrees of speed, obtuseness and verbosity. And they're all SLOW.

Regarding some of the counter arguments in the discussion:

- gregory.p.smith in reply to cowlicks: "Security claims?  Nonsense. This has nothing to do with security.  It is *fundamentally impossible* to write constant time side channel attack resistant algorithms [...]"

Maybe cowlicks should have said "reliable", though to my naive eye a normal implementation would be constant time for a given size. I would argue that the clarity and directness of just writing "a^b" immediately makes for trivially correct code, which itself is a necessary prerequisite for secure code.

- gregory.p.smith again: "Neither of the above "look as nice" as a simple operator would. But they are at least both understandable and frankly about the same as what you would naively write in C for the task."

This is not an argument against the feature. That one had to perform similar activitie in Python as in C merely reflects the present lack of these operators, not a preexisting gleaming sufficiency of operator richness.

- Terry J. Reddy: "'XOR of two bytes in one place' strikes me as a thin excuse for a new feature that abbreviates a simple, short, one-liner". Christian Heimes's code has this single example, but anyone wanting to work on chunks of bytes may find themselves here. Just because a lot of things can be written/constructed as one liners doesn't mean they should be operators when (a) the operator is available (==unused) for this type, (b) the meaning of the operator is straight forward and intuitive and (c) any pure Python construction is both wordier and much slower.

Anyway, I an for this feature, for the record.
msg265561 - (view) Author: Cameron Simpson (cameron) * Date: 2016-05-14 23:51
Amendment: I wrote above: "Just because a lot of things can be written/constructed as one liners doesn't mean they should be operators". Of course I meant to write "doesn't mean they should _not_ be operators".
msg316938 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2018-05-17 14:31
I'm back in the embedded software world now, and hence working with the combination of:

- low level serial formats (including fixed length CAN packets)
- C firmware developers that are quite capable of writing supporting C-in-Python code using the standard library, but aren't the least bit interested in graduating from writing standalone stdlib-only Python scripts that live in repositories otherwise full of C code to writing full Python applications with PyPI backed dependency management (etc)

It's the kind of environment where having the struct module in the standard library is incredibly valuable, and the main things that better support for direct manipulation of binary data could potentially offer us is avoiding some "memory -> struct.unpack -> process -> struct.pack -> memory" round trips, as well as potentially reducing the overall amount of code we have to maintain.

So I'll keep an eye out for potential opportunities for code simplification - while crypto algorithms, file formats, network protocols, and hardware interfaces can all call for this kind of thing, I'm less sure how often we're encountering it in situations where having it available would have let us avoid invoking struct entirely.
msg316939 - (view) Author: Марк Коренберг (socketpair) * Date: 2018-05-17 14:34
@ncoghlan

Could you please create Pull-request on Github ?
msg316940 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2018-05-17 14:37
This issue isn't at the stage where a PR would help - the core question is still "Should we add better native support for multi-byte bitwise operations?", not the specifics of what they API might look like or how we would implement it.
msg316941 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-17 14:50
Nick, for your tasks you may be interested in PEP 3118 which still is not completely implemented (issue3132).
msg316956 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2018-05-17 16:20
Thanks for the link Serhiy (I'd forgotten about the struct changes proposed in PEP 3118), but the existing struct formatting codes are fine for my purposes.

The question is whether we might be able to avoid some bytes->Python-objects->bytes cycles if there were a few more contiguous-binary-data-centric operations on bytes and/or memoryview (similar to the way the ASCII-centric operations on bytes and bytearray help to avoid bytes->text->bytes cycles).
msg316957 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-05-17 16:22
Le 17/05/2018 à 18:20, Nick Coghlan a écrit :
> 
> The question is whether we might be able to avoid some bytes->Python-objects->bytes cycles if there were a few more contiguous-binary-data-centric operations on bytes and/or memoryview (similar to the way the ASCII-centric operations on bytes and bytearray help to avoid bytes->text->bytes cycles).

Can you elaborate on your question?
msg317038 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2018-05-18 17:22
I'd second the proposal of considering the "array.array" type for this, instead of the bytes/bytearray types. Why? Because it is somewhat of a niche case after all, and the bytes type seems too exposed for it. Also, array.array supports more diverse base item types than just plain bytes, which could potentially cover more use cases or at least simplify certain data processing needs.

Alternatively, implement a lightweight SIMD-like memoryview type that operates on arbitrary buffers (which covers bytes, bytearray and array.array). But that's either NumPy then, or something new that would best spend its early days on PyPI.
msg317040 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-05-18 17:38
Please, let's have this API discussion outside of the bug tracker.  This deserves a PEP.  Also because I have an alternative API to suggest :-)
msg317095 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2018-05-19 04:47
I think Antoine's right that another venue (such as python-ideas) might be a better venue for this discussion, but I'll still try to explain the potential analogy I see to bytes.upper()/.lower()/etc: those operations let you treat ASCII segments in otherwise binary data as ASCII text, *without* needing to convert them to str first. While doing the str conversion is more formally correct, being able to stay in the raw binary domain frequently offers significant practical benefits by reducing both the runtime performance overhead and the amount of code needed.

Offering bitwise operations for bytes segments of equal length (perhaps via memoryview, or a memoryview subclass that only supports C-contiguous views) *might* turn out to offer a similar benefit when it comes to manipulating sections of a data buffer that represent integers (or anything else with a well-defined binary representation). With the right buffer exporter, you could even use it for direct bit-bashing of memory-mapped registers (which then gets quite interesting in the context of MicroPython applications).
msg317107 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-05-19 07:41
The last three posts have convinced me that 'efficient bit operations', not tied to the int type, are worth exploring, without immediate restriction to a particular API.  I can see that micropython is a significant new use case.
msg317109 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-19 08:45
I don't understand what relation this issue has with Nick's use case. If you need to access separate fields in a binary packed structure, adding support of extended PEP 3118 like syntax in memoryview can help you. If you want to to alter separate bits, you may need something like a bit array view. If you need to change specific physical memory, you need a different view object (maybe something in ctypes). But what relation all this have with b'123' ^ b'abc'?
msg317116 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2018-05-19 12:47
While it does match Christian's original suggestion, I'm not taking the "bytes" in the issue title as literally requiring that the feature be implemented as operator support on the bytes type (implementing it on memoryview or a new view type would likely meet the expressed need just as well).

I'm also not sure why you're continuing to bring up PEP 3118 - C contiguous data is already supported in memoryview, so this shouldn't require any new data shapes support, it's mainly a question of which manipulations we decide we want to offer on viewed data.
History
Date User Action Args
2022-04-11 14:57:51adminsetgithub: 63450
2018-05-23 08:37:30MGilchsetnosy: + MGilch
2018-05-19 15:47:06gvanrossumsetnosy: - gvanrossum
2018-05-19 12:47:04ncoghlansetmessages: + msg317116
2018-05-19 08:45:06serhiy.storchakasetmessages: + msg317109
2018-05-19 07:41:23terry.reedysetmessages: + msg317107
2018-05-19 04:47:10ncoghlansetmessages: + msg317095
2018-05-18 17:38:32pitrousetmessages: + msg317040
2018-05-18 17:22:04scodersetnosy: + scoder
messages: + msg317038
2018-05-17 16:22:23pitrousetmessages: + msg316957
2018-05-17 16:20:57ncoghlansetmessages: + msg316956
2018-05-17 14:50:31serhiy.storchakasetmessages: + msg316941
2018-05-17 14:37:58ncoghlansetmessages: + msg316940
stage: needs patch ->
2018-05-17 14:34:39socketpairsetmessages: + msg316939
2018-05-17 14:31:19ncoghlansetnosy: + ncoghlan

messages: + msg316938
versions: + Python 3.8, - Python 3.5
2018-05-17 14:04:38ncoghlanlinkissue31656 superseder
2016-05-14 23:51:45cameronsetmessages: + msg265561
2016-05-14 23:49:39cameronsetmessages: + msg265560
2016-05-14 23:28:06cameronsetnosy: + cameron
2016-04-26 19:25:58vstinnersetmessages: + msg264325
2016-04-26 18:39:13cowlickssetmessages: + msg264324
2016-04-26 18:35:04cowlickssetmessages: + msg264323
2016-04-25 18:00:46serhiy.storchakasetmessages: + msg264192
2016-04-25 17:29:41gregory.p.smithsetmessages: + msg264190
2016-04-25 16:27:38gvanrossumsetmessages: + msg264188
2016-04-25 16:24:12vstinnersetmessages: + msg264187
2016-04-25 16:17:34cowlickssetmessages: + msg264184
2016-02-18 16:52:52socketpairsetmessages: + msg260470
2016-01-11 22:25:53gregory.p.smithsetnosy: + gregory.p.smith
2016-01-11 14:58:49cowlickssetmessages: + msg257964
2016-01-10 15:20:41abarnertsetmessages: + msg257914
2016-01-10 14:48:47cowlickssetfiles: + bitwise_bytes.diff

nosy: + cowlicks
messages: + msg257911

keywords: + patch
2016-01-08 16:39:06socketpairsetnosy: + socketpair
messages: + msg257765
2016-01-08 14:35:17abarnertsetmessages: + msg257761
2016-01-08 12:57:09christian.heimessetmessages: + msg257755
2016-01-08 12:15:14martin.pantersetmessages: + msg257754
2016-01-08 10:24:39vstinnersetmessages: + msg257748
2016-01-08 00:42:11abarnertsetnosy: + abarnert
messages: + msg257730
2016-01-08 00:19:53pitrousetmessages: + msg257729
2016-01-08 00:19:13pitrousetmessages: + msg257728
2016-01-08 00:08:29gvanrossumsetnosy: + gvanrossum
messages: + msg257727
2014-03-06 22:29:04josh.rsetnosy: + josh.r
2013-11-08 04:27:30martin.pantersetnosy: + martin.panter
2013-11-04 11:07:34vstinnersetmessages: + msg202120
2013-10-21 10:01:18serhiy.storchakasetmessages: + msg200740
2013-10-21 09:53:05Ramchandra Aptesetmessages: + msg200739
2013-10-21 09:26:41vstinnersetnosy: + vstinner
2013-10-21 09:15:26christian.heimessetmessages: + msg200735
versions: + Python 3.5, - Python 3.4
2013-10-21 04:56:19rhettingersetnosy: + rhettinger
messages: + msg200696
2013-10-19 10:56:52christian.heimessetmessages: + msg200401
2013-10-19 03:44:34Ramchandra Aptesetmessages: + msg200359
2013-10-18 23:26:58terry.reedysetnosy: + terry.reedy
messages: + msg200330
2013-10-14 04:00:57Ramchandra Aptesetnosy: + Ramchandra Apte
messages: + msg199835
2013-10-13 21:10:41christian.heimessetmessages: + msg199805
2013-10-13 21:00:36pitrousetmessages: + msg199802
2013-10-13 20:57:16serhiy.storchakasetmessages: + msg199801
2013-10-13 20:30:15pitrousetmessages: + msg199794
2013-10-13 20:21:16serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg199792
2013-10-13 20:12:07georg.brandlsetnosy: + georg.brandl
messages: + msg199787
2013-10-13 20:09:58pitrousetnosy: + pitrou
messages: + msg199786
2013-10-13 20:07:27christian.heimescreate