classification
Title: int.to_bytes(-1, ...) should automatically choose required count of bytes
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: mark.dickinson, martin.panter, serhiy.storchaka, socketpair
Priority: normal Keywords:

Created on 2016-07-28 03:15 by socketpair, last changed 2016-07-28 12:47 by mark.dickinson.

Messages (6)
msg271488 - (view) Author: Марк Коренберг (socketpair) * Date: 2016-07-28 03:15
It will be nice if `int.to_bytes` be able to automatically choose number of bytes to serialize. If so, I could write serialisation code:

def serialize(value: int, signed=True) -> bytes:
    x = value.to_bytes(-1, 'big', signed=signed)
    l = value.to_bytes(4, 'big', signed=False)
    return l + x

assert len(serialize(0)) == 4 + 0 # see Issue27623
assert len(serialize(120)) == 4 + 1
assert len(serialize(130)) == 4 + 2
assert len(serialize(130), False) == 4 + 1
msg271489 - (view) Author: Марк Коренберг (socketpair) * Date: 2016-07-28 03:16
Oops.

def serialize(value: int, signed=True) -> bytes:
    x = value.to_bytes(-1, 'big', signed=signed)
    l = len(x).to_bytes(4, 'big', signed=False)
    return l + x

assert len(serialize(0)) == 4 + 0 # see Issue27623
assert len(serialize(120)) == 4 + 1
assert len(serialize(130)) == 4 + 2
assert len(serialize(130), False) == 4 + 1
msg271498 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-07-28 04:59
I don’t like special values. A length of minus one makes no sense, so should trigger an exception, not some unexpected behaviour. A different data type like None would be a bit better.

But I’m not sure this would be widely used. If you really need it you could calculate the number of bytes needed via value.bit_length().
msg271502 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-07-28 06:14
This is rarely needed, mainly in general serializers like pickle. The code for determining the minimal number of bytes is not trivial, but it depends on the serializer. If you always serialize unsigned values and saves the sign separately, or use one's complement represenatation, or if the serializer supports only fixed set of integer sizes, the code is absolutely different.

I don't think that we need this feature in the stdlib.
msg271507 - (view) Author: Марк Коренберг (socketpair) * Date: 2016-07-28 07:54
https://github.com/pyca/cryptography/issues/3064
msg271542 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-07-28 12:47
[Martin]

> I don’t like special values.

Agreed. If we wanted to add this, the obvious API would be to simply make the size optional (which would force passing the endianness by name or explicitly passing a default value of `None`, but that doesn't seem like a big deal to me).

I'm -0 on the feature itself. On the plus side, the fact that it's not completely trivial to compute the size with errors is an argument for including that calculation within the Python code. I'd suggest formulas of:

   (x.bit_length() + 7) // 8

for the unsigned case, and

   (~x if x < 0 else x).bit_length() // 8 + 1

for the signed case, these giving the minimal number of bytes necessary for encoding x in each case.
History
Date User Action Args
2016-07-28 12:47:31mark.dickinsonsetnosy: + mark.dickinson
messages: + msg271542
2016-07-28 07:54:38socketpairsetmessages: + msg271507
2016-07-28 06:14:51serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg271502
2016-07-28 04:59:08martin.pantersetnosy: + martin.panter
messages: + msg271498
2016-07-28 03:16:34socketpairsetmessages: + msg271489
2016-07-28 03:15:12socketpaircreate