msg271329 - (view) |
Author: Марк Коренберг (socketpair) * |
Date: 2016-07-26 06:26 |
As you can see, these conversions are not consistent. What is use case to allow that?
===================
In [22]: (-1).to_bytes(0, 'big', signed=True)
Out[22]: b''
In [23]: (0).to_bytes(0, 'big', signed=True)
Out[23]: b''
As you can see, two different values serialized to same empty bytes sequence.
===================
In [28]: int.from_bytes(b'', 'big', signed=True)
Out[28]: 0
In [29]: int.from_bytes(b'', 'big', signed=False)
Out[29]: 0
Anyway, -1 can not be deserialized.
===================
|
msg271330 - (view) |
Author: Марк Коренберг (socketpair) * |
Date: 2016-07-26 06:28 |
So, as I think, it must ValueError when bytes count is zero. This is like division by zero.
|
msg271476 - (view) |
Author: Lucas Morales (lucasem) |
Date: 2016-07-28 00:47 |
This is actually a problem in Objects/longobject.c, in the _PyLong_AsByteArray function. It should have given an overflow error, because -1 cannot be encoded in 0 bytes.
|
msg271479 - (view) |
Author: Anders Lorentsen (Phaqui) * |
Date: 2016-07-28 01:36 |
Isn't it possible to just add a small line of code that checks if length is less than or equal to 0, and if it is, call the necessary c functions to have python raise a valueerror...? Sorry if this is giving a solution without actually submitting the patch - but this is all very new to me. I have never contributed to anything yet (fourth year CS-student), and I am as fresh to the process as can be. I registered here just now. This seems like an issue I could handle.. I just need to take the time to learn about the process of how things are done around here.
|
msg271484 - (view) |
Author: Anilyka Barry (abarry) * |
Date: 2016-07-28 02:28 |
Here's a patch that fixes this.
|
msg271485 - (view) |
Author: Марк Коренберг (socketpair) * |
Date: 2016-07-28 03:03 |
I think, we should deny:
* Passing `0` to `to_bytes` (even if integer is equal to zero)
* Passing empty string to `from_bytes`
I do not see any use cases for this to work. It was never guaranteed to work earlier. Everyone pass constant to `to_bytes` that is > 0. And passing empty buffer to `from_bytes` just indicate error in logic (i.e. something was not read correctly from file/socket e.t.c).
`struct.pack`/`unpack` does not support zero-byte types.
|
msg271487 - (view) |
Author: Anilyka Barry (abarry) * |
Date: 2016-07-28 03:13 |
I don't use this feature enough to have a clear opinion, however it's specifically accounted for in the code and has a test for it. It might be a good idea to bring this up on Python-ideas. It's very likely to break some code, but I wonder if said code wasn't already broken to begin with :)
|
msg271499 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-07-28 05:11 |
I agree that the signed conversion cases should be an error.
However the unsigned case would break working code that I have written for bijective numeration. See _bytes_to_int() and _int_to_bytes() in Issue 20132, inc-codecs.diff, for example. Since non-zero unsigned conversions work by converting
N bytes <-> 0 <= value < 2^N
For N = 0, there is only one possible value, 0.
|
msg271504 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-07-28 06:17 |
I agree with Martin. The ambiguous signed conversion cases should be an error, the unambiguous unsigned conversion case should be supported (especially if there are tests for this).
|
msg271506 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2016-07-28 07:27 |
> The ambiguous signed conversion cases should be an error, the unambiguous unsigned conversion case should be supported
+1. A signed representation *requires* 1 bit for the sign (regardless of whether the number being represented is negative or nonnegative), so it should be an error to encode into zero bytes. But there's nothing wrong with trying to encode 0 in zero bytes when using an unsigned representation.
|
msg271564 - (view) |
Author: Anders Lorentsen (Phaqui) * |
Date: 2016-07-28 16:17 |
So, am I to understand that the only corner case we should fix is that
>>> (-1).to_bytes(0, 'big', signed=True)
should raise an overflow error (currently it returns b'') ?
|
msg271565 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2016-07-28 16:29 |
I guess that would be the minimal change necessary to remove ambiguity. But I tend to think that
>>> (0).to_bytes(0, 'big', signed=True)
should also be an error. (And of course both these should still be errors with 'big' replaced with 'little'.)
|
msg271657 - (view) |
Author: Anders Lorentsen (Phaqui) * |
Date: 2016-07-30 00:01 |
I updated my patch to account for that second corner case. But ideally, shouldn't it rather be accounted for in the function that does the actual conversion, that is, in _PyLong_AsByteArray?
|
msg271660 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-07-30 01:21 |
I’m not familiar with the implementation, but it does seem like this should be fixed within _PyLong_AsByteArray().
Also, what about int.from_bytes(b"", ..., signed=True)? There are existing tests for this case, but it seems like it should be an error, rather than returning zero.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:34 | admin | set | github: 71810 |
2016-07-30 01:21:55 | martin.panter | set | messages:
+ msg271660 |
2016-07-30 00:01:18 | Phaqui | set | files:
+ int_to_bytes_overflow_cornercase2.patch
messages:
+ msg271657 |
2016-07-28 16:29:11 | mark.dickinson | set | messages:
+ msg271565 |
2016-07-28 16:17:27 | Phaqui | set | files:
+ int_to_bytes_overflow_cornercase.patch
messages:
+ msg271564 |
2016-07-28 07:27:09 | mark.dickinson | set | nosy:
+ mark.dickinson messages:
+ msg271506
|
2016-07-28 06:17:34 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg271504
|
2016-07-28 05:11:25 | martin.panter | set | nosy:
+ martin.panter messages:
+ msg271499
|
2016-07-28 03:13:12 | abarry | set | messages:
+ msg271487 |
2016-07-28 03:03:24 | socketpair | set | messages:
+ msg271485 |
2016-07-28 02:28:38 | abarry | set | files:
+ int_to_bytes_overflow_1.patch
components:
+ Interpreter Core, - Library (Lib)
keywords:
+ patch nosy:
+ abarry messages:
+ msg271484 stage: patch review |
2016-07-28 01:36:35 | Phaqui | set | messages:
+ msg271479 |
2016-07-28 01:20:04 | Phaqui | set | nosy:
+ Phaqui
|
2016-07-28 00:47:55 | lucasem | set | nosy:
+ lucasem messages:
+ msg271476
|
2016-07-26 06:28:39 | socketpair | set | messages:
+ msg271330 |
2016-07-26 06:26:56 | socketpair | set | type: behavior components:
+ Library (Lib) versions:
+ Python 3.6 |
2016-07-26 06:26:20 | socketpair | create | |