Issue 27623: int.to_bytes() and int.from_bytes(): raise ValueError when bytes count is zero

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/71810

classification

Title:	int.to_bytes() and int.from_bytes(): raise ValueError when bytes count is zero
Type:	behavior	Stage:	patch review
Components:	Interpreter Core	Versions:	Python 3.6

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Phaqui, abarry, lucasem, mark.dickinson, martin.panter, serhiy.storchaka, socketpair
Priority:	normal	Keywords:	patch

Created on 2016-07-26 06:26 by socketpair, last changed 2022-04-11 14:58 by admin.

Files
File name	Uploaded	Description	Edit
int_to_bytes_overflow_1.patch	abarry, 2016-07-28 02:28		review
int_to_bytes_overflow_cornercase.patch	Phaqui, 2016-07-28 16:17	Throws OverflowError on (-1).to_bytes(0, 'big', signed=True)	review
int_to_bytes_overflow_cornercase2.patch	Phaqui, 2016-07-30 00:01	(-x).to_bytes(0, 'big', signed=True), x=0 or -1 now throws OverflowError; also updated the tests. They run successfully.	review

Messages (14)
msg271329 - (view)	Author: Марк Коренберг (socketpair) *	Date: 2016-07-26 06:26
As you can see, these conversions are not consistent. What is use case to allow that? =================== In [22]: (-1).to_bytes(0, 'big', signed=True) Out[22]: b'' In [23]: (0).to_bytes(0, 'big', signed=True) Out[23]: b'' As you can see, two different values serialized to same empty bytes sequence. =================== In [28]: int.from_bytes(b'', 'big', signed=True) Out[28]: 0 In [29]: int.from_bytes(b'', 'big', signed=False) Out[29]: 0 Anyway, -1 can not be deserialized. ===================
msg271330 - (view)	Author: Марк Коренберг (socketpair) *	Date: 2016-07-26 06:28
So, as I think, it must ValueError when bytes count is zero. This is like division by zero.
msg271476 - (view)	Author: Lucas Morales (lucasem)	Date: 2016-07-28 00:47
This is actually a problem in Objects/longobject.c, in the _PyLong_AsByteArray function. It should have given an overflow error, because -1 cannot be encoded in 0 bytes.
msg271479 - (view)	Author: Anders Lorentsen (Phaqui) *	Date: 2016-07-28 01:36
Isn't it possible to just add a small line of code that checks if length is less than or equal to 0, and if it is, call the necessary c functions to have python raise a valueerror...? Sorry if this is giving a solution without actually submitting the patch - but this is all very new to me. I have never contributed to anything yet (fourth year CS-student), and I am as fresh to the process as can be. I registered here just now. This seems like an issue I could handle.. I just need to take the time to learn about the process of how things are done around here.
msg271484 - (view)	Author: Anilyka Barry (abarry) *	Date: 2016-07-28 02:28
Here's a patch that fixes this.
msg271485 - (view)	Author: Марк Коренберг (socketpair) *	Date: 2016-07-28 03:03
I think, we should deny: * Passing `0` to `to_bytes` (even if integer is equal to zero) * Passing empty string to `from_bytes` I do not see any use cases for this to work. It was never guaranteed to work earlier. Everyone pass constant to `to_bytes` that is > 0. And passing empty buffer to `from_bytes` just indicate error in logic (i.e. something was not read correctly from file/socket e.t.c). `struct.pack`/`unpack` does not support zero-byte types.
msg271487 - (view)	Author: Anilyka Barry (abarry) *	Date: 2016-07-28 03:13
I don't use this feature enough to have a clear opinion, however it's specifically accounted for in the code and has a test for it. It might be a good idea to bring this up on Python-ideas. It's very likely to break some code, but I wonder if said code wasn't already broken to begin with :)
msg271499 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-07-28 05:11
I agree that the signed conversion cases should be an error. However the unsigned case would break working code that I have written for bijective numeration. See _bytes_to_int() and _int_to_bytes() in Issue 20132, inc-codecs.diff, for example. Since non-zero unsigned conversions work by converting N bytes <-> 0 <= value < 2^N For N = 0, there is only one possible value, 0.
msg271504 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-07-28 06:17
I agree with Martin. The ambiguous signed conversion cases should be an error, the unambiguous unsigned conversion case should be supported (especially if there are tests for this).
msg271506 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-07-28 07:27
> The ambiguous signed conversion cases should be an error, the unambiguous unsigned conversion case should be supported +1. A signed representation requires 1 bit for the sign (regardless of whether the number being represented is negative or nonnegative), so it should be an error to encode into zero bytes. But there's nothing wrong with trying to encode 0 in zero bytes when using an unsigned representation.
msg271564 - (view)	Author: Anders Lorentsen (Phaqui) *	Date: 2016-07-28 16:17
So, am I to understand that the only corner case we should fix is that >>> (-1).to_bytes(0, 'big', signed=True) should raise an overflow error (currently it returns b'') ?
msg271565 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-07-28 16:29
I guess that would be the minimal change necessary to remove ambiguity. But I tend to think that >>> (0).to_bytes(0, 'big', signed=True) should also be an error. (And of course both these should still be errors with 'big' replaced with 'little'.)
msg271657 - (view)	Author: Anders Lorentsen (Phaqui) *	Date: 2016-07-30 00:01
I updated my patch to account for that second corner case. But ideally, shouldn't it rather be accounted for in the function that does the actual conversion, that is, in _PyLong_AsByteArray?
msg271660 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-07-30 01:21
I’m not familiar with the implementation, but it does seem like this should be fixed within _PyLong_AsByteArray(). Also, what about int.from_bytes(b"", ..., signed=True)? There are existing tests for this case, but it seems like it should be an error, rather than returning zero.

History
Date	User	Action	Args
2022-04-11 14:58:34	admin	set	github: 71810
2016-07-30 01:21:55	martin.panter	set	messages: + msg271660
2016-07-30 00:01:18	Phaqui	set	files: + int_to_bytes_overflow_cornercase2.patch messages: + msg271657
2016-07-28 16:29:11	mark.dickinson	set	messages: + msg271565
2016-07-28 16:17:27	Phaqui	set	files: + int_to_bytes_overflow_cornercase.patch messages: + msg271564
2016-07-28 07:27:09	mark.dickinson	set	nosy: + mark.dickinson messages: + msg271506
2016-07-28 06:17:34	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg271504
2016-07-28 05:11:25	martin.panter	set	nosy: + martin.panter messages: + msg271499
2016-07-28 03:13:12	abarry	set	messages: + msg271487
2016-07-28 03:03:24	socketpair	set	messages: + msg271485
2016-07-28 02:28:38	abarry	set	files: + int_to_bytes_overflow_1.patch components: + Interpreter Core, - Library (Lib) keywords: + patch nosy: + abarry messages: + msg271484 stage: patch review
2016-07-28 01:36:35	Phaqui	set	messages: + msg271479
2016-07-28 01:20:04	Phaqui	set	nosy: + Phaqui
2016-07-28 00:47:55	lucasem	set	nosy: + lucasem messages: + msg271476
2016-07-26 06:28:39	socketpair	set	messages: + msg271330
2016-07-26 06:26:56	socketpair	set	type: behavior components: + Library (Lib) versions: + Python 3.6
2016-07-26 06:26:20	socketpair	create