classification
Title: int.to_bytes() and int.from_bytes(): raise ValueError when bytes count is zero
Type: behavior Stage: patch review
Components: Interpreter Core Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Phaqui, abarry, lucasem, mark.dickinson, martin.panter, serhiy.storchaka, socketpair
Priority: normal Keywords: patch

Created on 2016-07-26 06:26 by socketpair, last changed 2016-07-30 01:21 by martin.panter.

Files
File name Uploaded Description Edit
int_to_bytes_overflow_1.patch abarry, 2016-07-28 02:28 review
int_to_bytes_overflow_cornercase.patch Phaqui, 2016-07-28 16:17 Throws OverflowError on (-1).to_bytes(0, 'big', signed=True) review
int_to_bytes_overflow_cornercase2.patch Phaqui, 2016-07-30 00:01 (-x).to_bytes(0, 'big', signed=True), x=0 or -1 now throws OverflowError; also updated the tests. They run successfully. review
Messages (14)
msg271329 - (view) Author: Марк Коренберг (socketpair) * Date: 2016-07-26 06:26
As you can see, these conversions are not consistent. What is use case to allow that?

===================

In [22]: (-1).to_bytes(0, 'big', signed=True)
Out[22]: b''

In [23]: (0).to_bytes(0, 'big', signed=True)
Out[23]: b''

As you can see, two different values serialized to same  empty bytes sequence.

===================

In [28]: int.from_bytes(b'', 'big', signed=True)
Out[28]: 0

In [29]: int.from_bytes(b'', 'big', signed=False)
Out[29]: 0

Anyway, -1 can not be deserialized.

===================
msg271330 - (view) Author: Марк Коренберг (socketpair) * Date: 2016-07-26 06:28
So, as I think, it must ValueError when bytes count is zero. This is like division by zero.
msg271476 - (view) Author: Lucas Morales (lucasem) Date: 2016-07-28 00:47
This is actually a problem in Objects/longobject.c, in the _PyLong_AsByteArray function. It should have given an overflow error, because -1 cannot be encoded in 0 bytes.
msg271479 - (view) Author: Anders Lorentsen (Phaqui) * Date: 2016-07-28 01:36
Isn't it possible to just add a small line of code that checks if length is less than or equal to 0, and if it is, call the necessary c functions to have python raise a valueerror...? Sorry if this is giving a solution without actually submitting the patch - but this is all very new to me. I have never contributed to anything yet (fourth year CS-student), and I am as fresh to the process as can be. I registered here just now. This seems like an issue I could handle.. I just need to take the time to learn about the process of how things are done around here.
msg271484 - (view) Author: Anilyka Barry (abarry) * (Python triager) Date: 2016-07-28 02:28
Here's a patch that fixes this.
msg271485 - (view) Author: Марк Коренберг (socketpair) * Date: 2016-07-28 03:03
I think, we should deny:

* Passing `0` to `to_bytes` (even if integer is equal to zero)
* Passing empty string to `from_bytes`

I do not see any use cases for this to work. It was never guaranteed to work earlier. Everyone pass constant to `to_bytes` that is > 0. And passing empty buffer to `from_bytes` just indicate error in logic (i.e. something was not read correctly from file/socket e.t.c).

`struct.pack`/`unpack` does not support zero-byte types.
msg271487 - (view) Author: Anilyka Barry (abarry) * (Python triager) Date: 2016-07-28 03:13
I don't use this feature enough to have a clear opinion, however it's specifically accounted for in the code and has a test for it. It might be a good idea to bring this up on Python-ideas. It's very likely to break some code, but I wonder if said code wasn't already broken to begin with :)
msg271499 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-07-28 05:11
I agree that the signed conversion cases should be an error.

However the unsigned case would break working code that I have written for bijective numeration. See _bytes_to_int() and _int_to_bytes() in Issue 20132, inc-codecs.diff, for example. Since non-zero unsigned conversions work by converting

N bytes  <->  0 <= value < 2^N

For N = 0, there is only one possible value, 0.
msg271504 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-07-28 06:17
I agree with Martin. The ambiguous signed conversion cases should be an error, the unambiguous unsigned conversion case should be supported (especially if there are tests for this).
msg271506 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-07-28 07:27
> The ambiguous signed conversion cases should be an error, the unambiguous unsigned conversion case should be supported

+1. A signed representation *requires* 1 bit for the sign (regardless of whether the number being represented is negative or nonnegative), so it should be an error to encode into zero bytes. But there's nothing wrong with trying to encode 0 in zero bytes when using an unsigned representation.
msg271564 - (view) Author: Anders Lorentsen (Phaqui) * Date: 2016-07-28 16:17
So, am I to understand that the only corner case we should fix is that
>>> (-1).to_bytes(0, 'big', signed=True)
should raise an overflow error (currently it returns  b'') ?
msg271565 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-07-28 16:29
I guess that would be the minimal change necessary to remove ambiguity. But I tend to think that

>>> (0).to_bytes(0, 'big', signed=True)

should also be an error. (And of course both these should still be errors with 'big' replaced with 'little'.)
msg271657 - (view) Author: Anders Lorentsen (Phaqui) * Date: 2016-07-30 00:01
I updated my patch to account for that second corner case. But ideally, shouldn't it rather be accounted for in the function that does the actual conversion, that is, in _PyLong_AsByteArray?
msg271660 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-07-30 01:21
I’m not familiar with the implementation, but it does seem like this should be fixed within _PyLong_AsByteArray().

Also, what about int.from_bytes(b"", ..., signed=True)? There are existing tests for this case, but it seems like it should be an error, rather than returning zero.
History
Date User Action Args
2016-07-30 01:21:55martin.pantersetmessages: + msg271660
2016-07-30 00:01:18Phaquisetfiles: + int_to_bytes_overflow_cornercase2.patch

messages: + msg271657
2016-07-28 16:29:11mark.dickinsonsetmessages: + msg271565
2016-07-28 16:17:27Phaquisetfiles: + int_to_bytes_overflow_cornercase.patch

messages: + msg271564
2016-07-28 07:27:09mark.dickinsonsetnosy: + mark.dickinson
messages: + msg271506
2016-07-28 06:17:34serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg271504
2016-07-28 05:11:25martin.pantersetnosy: + martin.panter
messages: + msg271499
2016-07-28 03:13:12abarrysetmessages: + msg271487
2016-07-28 03:03:24socketpairsetmessages: + msg271485
2016-07-28 02:28:38abarrysetfiles: + int_to_bytes_overflow_1.patch

components: + Interpreter Core, - Library (Lib)

keywords: + patch
nosy: + abarry
messages: + msg271484
stage: patch review
2016-07-28 01:36:35Phaquisetmessages: + msg271479
2016-07-28 01:20:04Phaquisetnosy: + Phaqui
2016-07-28 00:47:55lucasemsetnosy: + lucasem
messages: + msg271476
2016-07-26 06:28:39socketpairsetmessages: + msg271330
2016-07-26 06:26:56socketpairsettype: behavior
components: + Library (Lib)
versions: + Python 3.6
2016-07-26 06:26:20socketpaircreate