This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: int.to_bytes(-1, ...) should automatically choose required count of bytes
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: lorenz_, mark.dickinson, martin.panter, serhiy.storchaka, socketpair
Priority: normal Keywords:

Created on 2016-07-28 03:15 by socketpair, last changed 2022-04-11 14:58 by admin.

Messages (7)
msg271488 - (view) Author: Марк Коренберг (socketpair) * Date: 2016-07-28 03:15
It will be nice if `int.to_bytes` be able to automatically choose number of bytes to serialize. If so, I could write serialisation code:

def serialize(value: int, signed=True) -> bytes:
    x = value.to_bytes(-1, 'big', signed=signed)
    l = value.to_bytes(4, 'big', signed=False)
    return l + x

assert len(serialize(0)) == 4 + 0 # see Issue27623
assert len(serialize(120)) == 4 + 1
assert len(serialize(130)) == 4 + 2
assert len(serialize(130), False) == 4 + 1
msg271489 - (view) Author: Марк Коренберг (socketpair) * Date: 2016-07-28 03:16
Oops.

def serialize(value: int, signed=True) -> bytes:
    x = value.to_bytes(-1, 'big', signed=signed)
    l = len(x).to_bytes(4, 'big', signed=False)
    return l + x

assert len(serialize(0)) == 4 + 0 # see Issue27623
assert len(serialize(120)) == 4 + 1
assert len(serialize(130)) == 4 + 2
assert len(serialize(130), False) == 4 + 1
msg271498 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-07-28 04:59
I don’t like special values. A length of minus one makes no sense, so should trigger an exception, not some unexpected behaviour. A different data type like None would be a bit better.

But I’m not sure this would be widely used. If you really need it you could calculate the number of bytes needed via value.bit_length().
msg271502 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-07-28 06:14
This is rarely needed, mainly in general serializers like pickle. The code for determining the minimal number of bytes is not trivial, but it depends on the serializer. If you always serialize unsigned values and saves the sign separately, or use one's complement represenatation, or if the serializer supports only fixed set of integer sizes, the code is absolutely different.

I don't think that we need this feature in the stdlib.
msg271507 - (view) Author: Марк Коренберг (socketpair) * Date: 2016-07-28 07:54
https://github.com/pyca/cryptography/issues/3064
msg271542 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-07-28 12:47
[Martin]

> I don’t like special values.

Agreed. If we wanted to add this, the obvious API would be to simply make the size optional (which would force passing the endianness by name or explicitly passing a default value of `None`, but that doesn't seem like a big deal to me).

I'm -0 on the feature itself. On the plus side, the fact that it's not completely trivial to compute the size with errors is an argument for including that calculation within the Python code. I'd suggest formulas of:

   (x.bit_length() + 7) // 8

for the unsigned case, and

   (~x if x < 0 else x).bit_length() // 8 + 1

for the signed case, these giving the minimal number of bytes necessary for encoding x in each case.
msg403194 - (view) Author: Lorenz Panny (lorenz_) Date: 2021-10-05 05:05
I would like to express my support for making length=None to automatically use the minimal possible length. It's true that this will rarely be needed in production-grade serialization code, but this functionality is worth its weight in gold for quickly written proof-of-concept code or when using Python as a "pocket calculator" in an interactive shell.

I'm sure I've personally typed the expression (n.bit_length()+7)//8 approximately a million times while quickly trying something. It'd be nice if Python could just do this simple computation for me instead. The code changes required are minimal and there shouldn't be any performance impact.

In fact, in my opinion this should even be the default behaviour, but 3.11 just made length=1 the default (see #45155) and changing this now would cause an (albeit very mild) API incompatibility.
History
Date User Action Args
2022-04-11 14:58:34adminsetgithub: 71824
2021-10-05 05:05:44lorenz_setnosy: + lorenz_
messages: + msg403194
2016-07-28 12:47:31mark.dickinsonsetnosy: + mark.dickinson
messages: + msg271542
2016-07-28 07:54:38socketpairsetmessages: + msg271507
2016-07-28 06:14:51serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg271502
2016-07-28 04:59:08martin.pantersetnosy: + martin.panter
messages: + msg271498
2016-07-28 03:16:34socketpairsetmessages: + msg271489
2016-07-28 03:15:12socketpaircreate