Issue45155
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2021-09-09 22:00 by barry, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 28265 | merged | barry, 2021-09-09 22:28 | |
PR 28465 | merged | rhettinger, 2021-09-20 17:47 |
Messages (41) | |||
---|---|---|---|
msg401524 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2021-09-09 22:00 | |
In the PEP 467 discussion, I proposed being able to use >>> (65).to_bytes() b'A' IOW, adding default arguments for the `length` and `byteorder` arguments to `int.to_bytes()` https://mail.python.org/archives/list/python-dev@python.org/message/PUR7UCOITMMH6TZVVJA5LKRCBYS4RBMR/ It occurs to me that this is (1) useful on its own merits; (2) easy to do. So I've done it. Creating this bug so I can link a PR against it. |
|||
msg401538 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-09-10 00:29 | |
Rather than defaulting to sys.byteorder, could the byte order default to None and only be optional when not needed? (input value fits in a single byte, output is a single byte) Otherwise the difference in defaults between this method and the struct module (network byte order rather than host byte order) could be very confusing. |
|||
msg401540 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2021-09-10 00:35 | |
Never mind, I've forced network byte order in struct strings for so long I had forgotten that native byte order was also the default there. Hence I withdraw that objection. |
|||
msg401541 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2021-09-10 00:57 | |
So (128).to_bytes() will raise an error, right? I afraid also that it will lead to some programs working correctly only on platforms with the most common byte order, just because authors are not aware of byte ordering. Currently the interface forces programmers to read something about byte ordering. |
|||
msg401542 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2021-09-10 01:01 | |
Ah, signed=False by default, so (128).to_bytes() will work. But I still worry that it can provoke writing more errorprone code. |
|||
msg401543 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2021-09-10 01:06 | |
> Ah, signed=False by default, so (128).to_bytes() will work. But I still worry that it can provoke writing more errorprone code. Can you elaborate on that? Obviously no existing code will change behavior. I really don’t expect people to write `(128).to_bytes(signed=True)` by accident. |
|||
msg401544 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2021-09-10 01:12 | |
Perhaps instead of the system byte ordering, choose one for the default so that default encoding/decoding will work cross platform. I think "little" is the most common (intel and arm). |
|||
msg401545 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2021-09-10 01:15 | |
For the common case where you’re using all defaults, it won’t matter. byteorder doesn’t matter when length=1. > On Sep 9, 2021, at 18:12, Raymond Hettinger <report@bugs.python.org> wrote: > > > Raymond Hettinger <raymond.hettinger@gmail.com> added the comment: > > Perhaps instead of the system byte ordering, choose one for the default so that default encoding/decoding will work cross platform. I think "little" is the most common (intel and arm). |
|||
msg401552 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2021-09-10 04:04 | |
Serhiy is likely thinking of other the other cases. Prior to this discussion, the principal use for to_bytes and from_bytes was for integers larger than a single byte. If we're going to add a *byteorder* default, it needs to make sense for those cases as well. |
|||
msg401555 - (view) | Author: Vedran Čačić (veky) * | Date: 2021-09-10 05:35 | |
> choose one for the default so that default encoding/decoding will work cross platform. I think "little" is the most common (intel and arm). Raymond, please don't do this. We already have a "sensible default" in a network context, and it is big endian. Having another "sensible default" opposite to the previous one is really no way to ensure interoperability. (https://xkcd.com/927/ only becomes more ridiculous when the number in question is 2.:) I don't want to think about whether the way machines A and B exchange data can be called "a network" or not. Of course, having the byteorder optional when there's only one (unsigned) byte is good. |
|||
msg401556 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2021-09-10 07:03 | |
I'd also really like to avoid a system-dependent default. The danger is that code of the form some_externally_supplied_integer.to_bytes(length=4) can be written and thoroughly tested, only to fail unexpectedly some time later when that code happens to meet a big-endian machine. In most real-world cases with input length >= 1, it's unlikely that the system byteorder is the right thing, especially for from_bytes: what you need to know is what endianness the integer was encoded with, and that's not likely to be well correlated with the endianness that the machine you're running on right now happens to be using. (The choice of default obviously doesn't matter in cases where you're encoding and decoding on the same system, but there are going to be plenty of cases where that's not true.) This is essentially the same issue that PEP 597 starts to address with `open(filename)` with no encoding specified. That system-specific default encoding has caused us real issues in production code. |
|||
msg401560 - (view) | Author: Petr Viktorin (petr.viktorin) * | Date: 2021-09-10 07:48 | |
Exactly, a platform-dependent default is a bad idea. A default allows using the function without the code author & reviewer even being *aware* that there is a choice, and that is dangerous. |
|||
msg401570 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-09-10 11:06 | |
I dislike the idea of adding a default length to int.to_bytes(). The length changes the meaning of the output: >>> (1).to_bytes(2, 'big') b'\x00\x01' >>> (1).to_bytes(1, 'big') b'\x01' If the intent is to "magically cast an integer to a byte strings", having a fixed length of 1 doesn't help: >>> (1000).to_bytes(1, "big") OverflowError: int too big to convert If the intent is to create a bytes string of length 1, I'm not sure that "re-using" this existing API for that is a good idea. |
|||
msg401597 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2021-09-10 16:31 | |
Just to point out, struct module also uses “native” (i.e. system) byte order by default. Any choice other than that for to_bytes() seems both arbitrary and inconsistent. > On Sep 10, 2021, at 00:48, Petr Viktorin <report@bugs.python.org> wrote: > > Exactly, a platform-dependent default is a bad idea. A default allows using the function without the code author & reviewer even being *aware* that there is a choice, and that is dangerous. |
|||
msg401598 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2021-09-10 16:33 | |
On Sep 10, 2021, at 04:06, STINNER Victor <report@bugs.python.org> wrote: > > If the intent is to create a bytes string of length 1, I'm not sure that "re-using" this existing API for that is a good idea. Why not? It seems an obvious and simple convenience. |
|||
msg401599 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2021-09-10 16:37 | |
Petr Viktorin <encukou@gmail.com> added the comment: > > Exactly, a platform-dependent default is a bad idea. A default allows using the function without the code author & reviewer even being *aware* that there is a choice, and that is dangerous. I’m not convinced. I’m more concerned with the obscurity of the API. If I saw its use in some code I was reviewing, I’d look it up, and then I’d know exactly what it was doing. |
|||
msg401622 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2021-09-11 00:49 | |
[Mark Dickinson] > I'd also really like to avoid a system-dependent default. [Petr Viktorin] > Exactly, a platform-dependent default is a bad idea. I concur with Petr and Mark. The principal use case for int.to_bytes is to convert an integer of arbitrary size to a binary format for storage or transmission. The corresponding file load or received data needs to symmetrically restore the int. The default (whether little or big) needs to be the same on both sides to prevent bugs. Otherwise, for portable code, we would have to recommend that people not use the default because the output isn't deterministic across systems. By way of comparison, we've had long standing issues like this in other parts of the language. For example, this gives inconsistent encodings across systems: with open(filename) as f: f.write(text) Not long ago, Inada had to sweep through and add encoding="utf-8" to fix all the bugs caused by the default platform dependent encoding. Arguably, most code that has ever been written without an explicit encoding is wrong if the files were intended to be shared outside the local file system. So if Veky wants the default to be "big", that's fine by me. The important thing is that a *consistent* default be used (not platform dependent). I won't argue for a "sensible default" because apparently Veky has different sensibilities. |
|||
msg401652 - (view) | Author: Vedran Čačić (veky) * | Date: 2021-09-12 07:53 | |
My sensibilities are irrelevant here. I'm just saying we already have a standard byte order for data in transit, and it was introduced long before this thing called internet (it was with capital I back then:) started to interest me. |
|||
msg401653 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2021-09-12 08:21 | |
The struct module has 4 different modes. By default it uses not only native byte order, but native sizes and alignments which depend on OS and compiler. You need to know all these details just to understand the format codes. I think that the struct module user is more prepared to work with different word sizes and therefore with different byte ordering. |
|||
msg401657 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2021-09-12 08:38 | |
In the stdlib, there is only one use of to_bytes() with sys.byteorder (2 in tests), 16 uses of to_bytes()/from_bytes() with 'little' (22 in tests) and 22 uses with 'big' (33 in tests). So making sys.byteorder the default will help almost nobody, and the advantage of 'big' over 'litte' is so small, that making any of them the default may only help in a half of cases and confuse in the other half. |
|||
msg401661 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2021-09-12 11:20 | |
Interestingly, "little" is faster than "big". $ python3.10 -m timeit -r11 -s 'x=3452452454524' 'x.to_bytes(10, "little")' 5000000 loops, best of 11: 82.7 nsec per loop $ python3.10 -m timeit -r11 -s 'x=3452452454524' 'x.to_bytes(10, "big")' 5000000 loops, best of 11: 90.6 nsec per loop |
|||
msg401664 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2021-09-12 11:44 | |
Perhaps it is because "little" is checked first. One call of _PyUnicode_EqualToASCIIId() for "little" and two for "big". |
|||
msg401693 - (view) | Author: Petr Viktorin (petr.viktorin) * | Date: 2021-09-13 09:14 | |
> I’m not convinced. I’m more concerned with the obscurity of the API. If I saw its use in some code I was reviewing, I’d look it up, and then I’d know exactly what it was doing. I know you would. But there are many others who just try things until they work. Also, if this does become *the* way to create bytes, it won't be obscure any more -- but you'd still need to remember to always specify byteorder for length > 1. That is, unless you *want* platform-specific behavior, which I don't think is all that often. Even in this case, you want to think about the issue, and omitting the argument is a bad way to encode that you thought about it. --- Hm, what happened to the idea of only requiring byteorder for `length > 1`? I recall it being discussed |
|||
msg401709 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2021-09-13 16:38 | |
That’s okay, Brandt’s improved sys.byteorder is fastest <wink>. % ./python.exe -m timeit -r11 -s 'x=3452452454524' 'x.to_bytes(10, "little")' 2000000 loops, best of 11: 94.6 nsec per loop % ./python.exe -m timeit -r11 -s 'x=3452452454524' 'x.to_bytes(10, "big")' 2000000 loops, best of 11: 97.8 nsec per loop % ./python.exe -m timeit -r11 -s 'x=3452452454524' 'x.to_bytes(10)' 5000000 loops, best of 11: 79.1 nsec per loop On Sep 12, 2021, at 04:20, Raymond Hettinger <report@bugs.python.org> wrote: > > Raymond Hettinger <raymond.hettinger@gmail.com> added the comment: > > Interestingly, "little" is faster than "big". > > $ python3.10 -m timeit -r11 -s 'x=3452452454524' 'x.to_bytes(10, "little")' > 5000000 loops, best of 11: 82.7 nsec per loop > $ python3.10 -m timeit -r11 -s 'x=3452452454524' 'x.to_bytes(10, "big")' > 5000000 loops, best of 11: 90.6 nsec per loop > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue45155> > _______________________________________ |
|||
msg401713 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2021-09-13 16:52 | |
I created a Discourse poll: https://discuss.python.org/t/what-should-be-the-default-value-for-int-to-bytes-byteorder/10616 |
|||
msg401720 - (view) | Author: Matthew Barnett (mrabarnett) * | Date: 2021-09-13 17:30 | |
I'd probably say "In the face of ambiguity, refuse the temptation to guess". As there's disagreement about the 'correct' default, make it None and require either "big" or "little" if length > 1 (the default). |
|||
msg401728 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-09-13 20:38 | |
>>> (65).to_bytes() b'A' It seems like your proposal is mostly guided by: convert an int to a byte (bytes string of length 1). IMO this case is special enough to justify the usage of a different function. What if people expect int.to_bytes() always return a single byte, but then get two bytes by mistake? ch = 256 byte = ch.to_bytes() assert len(byte) == 2 # oops A function dedicated to create a single byte is expected to raise a ValueError for values outside the range [0; 255]. Like: >>> struct.pack('B', 255) b'\xff' >>> struct.pack('B', 256) struct.error: ubyte format requires 0 <= number <= 255 |
|||
msg401729 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-09-13 20:39 | |
> In the PEP 467 discussion, I proposed (...) Since this PEP is controversial, and this issue seems to be controversial as well, maybe this idea should be part of the PEP. |
|||
msg401730 - (view) | Author: Vedran Čačić (veky) * | Date: 2021-09-13 20:39 | |
The poll is invalid, since the option that most people want is deliberately not offered. |
|||
msg401737 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2021-09-13 21:50 | |
Just reread the thread. AFAICT not a single use case was presented for having system byte ordering as the default. However, multiple respondents have pointed out that a default to system byte ordering is a bug waiting to happen, almost ensuring that some users will encounter unexpected behaviors when crossing platforms. We've seen issues like that before and should avoid them. We don't really need a poll. What is needed is for the system byte ordering proponents to present valid reasons why it would useful and to address the concerns that it is actually harmful. If the proposal goes through despite the concerns, we should ask folks writing lint tools to flag every use of the default as a potential bug and advise people to never use the default unless they know for sure that it is encoding only a single byte. Personally, I would never let system byte ordering pass a code review. |
|||
msg401746 - (view) | Author: Matthew Barnett (mrabarnett) * | Date: 2021-09-14 00:31 | |
I wonder whether there should be a couple of other endianness values, namely, "native" and "network", for those cases where you want to be explicit about it. If you use "big" it's not clear whether that's because you want network endianness or because the platform is big-endian. |
|||
msg401749 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2021-09-14 05:07 | |
> I'd probably say "In the face of ambiguity, refuse the temptation to guess". > > As there's disagreement about the 'correct' default, make it None and require either "big" or "little" if length > 1 (the default). Jelle suggested that over in Discourse, and I’m not opposed, but it does mean that there’s no natural default for byteorder in int.from_bytes(). |
|||
msg401750 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2021-09-14 05:08 | |
On Sep 13, 2021, at 13:38, STINNER Victor <report@bugs.python.org> wrote: > It seems like your proposal is mostly guided by: convert an int to a byte (bytes string of length 1). IMO this case is special enough to justify the usage of a different function. Like bchr() ? <wink> > What if people expect int.to_bytes() always return a single byte, but then get two bytes by mistake? > > ch = 256 > byte = ch.to_bytes() The OverflowError you’ll get seems reasonable. |
|||
msg401751 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2021-09-14 05:09 | |
On Sep 13, 2021, at 13:39, Vedran Čačić <report@bugs.python.org> wrote: > > The poll is invalid, since the option that most people want is deliberately not offered. *Is* there an option that most people want? |
|||
msg401752 - (view) | Author: Vedran Čačić (veky) * | Date: 2021-09-14 05:12 | |
I'd say yes. Of course, one way to ascertain that would be to conduct a valid pool. ;-) |
|||
msg401753 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2021-09-14 05:15 | |
On Sep 13, 2021, at 22:12, Vedran Čačić <report@bugs.python.org> wrote: > > > Vedran Čačić <vedgar@gmail.com> added the comment: > > I'd say yes. Of course, one way to ascertain that would be to conduct a valid pool. ;-) People can always comment otherwise in the Discourse thread. |
|||
msg401809 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2021-09-15 04:50 | |
"big" by default |
|||
msg401914 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2021-09-16 02:55 | |
New changeset 07e737d002cdbf0bfee53248a652a86c9f93f02b by Barry Warsaw in branch 'main': bpo-45155 : Default arguments for int.to_bytes(length=1, byteorder=sys.byteorder) (#28265) https://github.com/python/cpython/commit/07e737d002cdbf0bfee53248a652a86c9f93f02b |
|||
msg401924 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-09-16 07:36 | |
> bpo-45155 : Default arguments for int.to_bytes(length=1, byteorder=sys.byteorder) (#28265) The commit title is wrong, the default "big" not sys.byteorder: int.to_bytes(length=1, byteorder='big', *, signed=False) int.from_bytes(bytes, byteorder='big', *, signed=False) |
|||
msg401947 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2021-09-16 14:44 | |
On Sep 16, 2021, at 00:36, STINNER Victor <report@bugs.python.org> wrote: > > The commit title is wrong, the default "big" not sys.byteorder: > > int.to_bytes(length=1, byteorder='big', *, signed=False) > int.from_bytes(bytes, byteorder='big', *, signed=False) Oops |
|||
msg402265 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2021-09-20 18:23 | |
New changeset 9510e6f3c797b4398aaf58abc1072b9db0a644f9 by Raymond Hettinger in branch 'main': bpo-45155: Apply new byteorder default values for int.to/from_bytes (GH-28465) https://github.com/python/cpython/commit/9510e6f3c797b4398aaf58abc1072b9db0a644f9 |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:49 | admin | set | github: 89318 |
2021-09-20 18:23:05 | rhettinger | set | messages: + msg402265 |
2021-09-20 17:47:39 | rhettinger | set | pull_requests: + pull_request26879 |
2021-09-20 11:57:42 | 4-launchpad-kalvdans-no-ip-org | set | nosy:
+ 4-launchpad-kalvdans-no-ip-org |
2021-09-16 14:44:30 | barry | set | messages: + msg401947 |
2021-09-16 07:36:52 | vstinner | set | messages: + msg401924 |
2021-09-16 02:56:07 | barry | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2021-09-16 02:55:34 | barry | set | messages: + msg401914 |
2021-09-15 04:50:12 | barry | set | messages: + msg401809 |
2021-09-14 05:15:40 | barry | set | messages: + msg401753 |
2021-09-14 05:12:39 | veky | set | messages: + msg401752 |
2021-09-14 05:09:31 | barry | set | messages: + msg401751 |
2021-09-14 05:08:53 | barry | set | messages: + msg401750 |
2021-09-14 05:07:09 | barry | set | messages: + msg401749 |
2021-09-14 00:31:01 | mrabarnett | set | messages: + msg401746 |
2021-09-13 21:50:34 | rhettinger | set | messages: + msg401737 |
2021-09-13 20:39:37 | veky | set | messages: + msg401730 |
2021-09-13 20:39:14 | vstinner | set | messages: + msg401729 |
2021-09-13 20:38:06 | vstinner | set | messages: + msg401728 |
2021-09-13 17:30:19 | mrabarnett | set | nosy:
+ mrabarnett messages: + msg401720 |
2021-09-13 16:52:43 | barry | set | messages: + msg401713 |
2021-09-13 16:38:44 | barry | set | messages: + msg401709 |
2021-09-13 09:14:41 | petr.viktorin | set | messages: + msg401693 |
2021-09-12 11:44:16 | serhiy.storchaka | set | messages: + msg401664 |
2021-09-12 11:20:58 | rhettinger | set | messages: + msg401661 |
2021-09-12 08:38:59 | serhiy.storchaka | set | messages: + msg401657 |
2021-09-12 08:21:45 | serhiy.storchaka | set | messages: + msg401653 |
2021-09-12 07:53:14 | veky | set | messages: + msg401652 |
2021-09-11 00:49:58 | rhettinger | set | messages: + msg401622 |
2021-09-10 20:59:10 | brandtbucher | set | nosy:
+ brandtbucher |
2021-09-10 16:37:09 | barry | set | messages: + msg401599 |
2021-09-10 16:33:48 | barry | set | messages: + msg401598 |
2021-09-10 16:31:22 | barry | set | messages: + msg401597 |
2021-09-10 11:06:54 | vstinner | set | nosy:
+ vstinner messages: + msg401570 |
2021-09-10 07:48:20 | petr.viktorin | set | nosy:
+ petr.viktorin messages: + msg401560 |
2021-09-10 07:03:59 | mark.dickinson | set | nosy:
+ mark.dickinson messages: + msg401556 |
2021-09-10 05:35:02 | veky | set | nosy:
+ veky messages: + msg401555 |
2021-09-10 04:04:15 | rhettinger | set | messages: + msg401552 |
2021-09-10 01:15:17 | barry | set | messages: + msg401545 |
2021-09-10 01:12:01 | rhettinger | set | nosy:
+ rhettinger messages: + msg401544 |
2021-09-10 01:06:07 | barry | set | messages: + msg401543 |
2021-09-10 01:01:53 | serhiy.storchaka | set | messages: + msg401542 |
2021-09-10 00:57:42 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg401541 |
2021-09-10 00:35:40 | ncoghlan | set | messages: + msg401540 |
2021-09-10 00:29:47 | ncoghlan | set | nosy:
+ ncoghlan messages: + msg401538 |
2021-09-09 23:19:21 | ethan.furman | set | nosy:
+ ethan.furman |
2021-09-09 22:28:13 | barry | set | keywords:
+ patch pull_requests: + pull_request26685 |
2021-09-09 22:00:31 | barry | create |