classification
Title: Add bytes.empty_buffer and deprecate bytes(17) for the same purpose
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, ethan.furman, josh.r, ncoghlan, r.david.murray, serhiy.storchaka, terry.reedy, vadmium
Priority: normal Keywords:

Created on 2014-03-12 10:29 by ethan.furman, last changed 2014-03-30 05:35 by ncoghlan.

Messages (14)
msg213242 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-03-12 10:29
`bytes` is a list of integers.  Passing a single integer to `bytes()`, as in:

   --> bytes(7)
   b'\x00\x00\x00\x00\x00\x00\x00'

results in a bytes object containing that many zeroes.

I propose that this behavior be deprecated for eventual removal, and a class method be created to take its place.
msg213246 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-03-12 10:57
Class method is not needed. This is just b'\0' * 7.
msg213262 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-12 14:39
I don't have a strong opinion on this, but I think you are going to have to articulate a good use/usability case for the deprecation.  I'm sure this is used in the wild, and we don't just gratuitously break things :)
msg213592 - (view) Author: Josh Rosenberg (josh.r) * Date: 2014-03-14 21:40
I would think the argument for deprecation is that usually, people type bytes(7) or bytes(somesmallintvalue) expecting to create a length one bytes object using that value (happens by accident if you iterate a bytes object and forget it's an iterable of ints, not an iterable of len 1 bytes). It's really easy to forget to make it bytes([7]) or bytes((7,)) or what have you. If you make the same mistake with str, list, tuple, etc., you get an error, because they only accept iterables. But bytes silently behaves in a way that is inconsistent with the other sequence types.

Given that b'\0' * 7 is usually faster in any event (by avoiding lookup costs to find the bytes constructor) and more intuitive to people familiar with the Python sequence idiom, I could definitely see this as a redundancy that does nothing but confuse.
msg213596 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-03-14 22:13
I agree with Serhiy that the method is not needed in any case.

I was about to post the same missing rationale: people misunderstand 'bytes(7)' and write it expecting to get bytes([7]) == b(\x07'), so it would be better to make bytes(7) raise instead of silently accepting a buggy usage.  I was thinking that one rationale for bytes(n) might be that it is faster than b'\0' * n. Since Josh claimed the contrary, I tried to test with timeit.repeat (both console and Idle) and got this error message
  TypeError: source code string cannot contain null bytes
Both eval and compile emit this message. So it seems that one justification for bytes(n) is to avoid putting null bytes in source strings.

I think this issue should be closed. Deprecation ideas should really be posted of python-ideas and ultimately pydev for discussion and approval.

If Ethan wants to pursue the idea, he should research the design discussions for bytes() (probably on the py3k list) and whether Guido directly approved of bytes(n) or if someone else 'snuck' it in after the initial approval.
msg213597 - (view) Author: Josh Rosenberg (josh.r) * Date: 2014-03-14 22:23
Terry: You forgot to use a raw string for your timeit.repeat check, which is why it blew up. It was evaluating the \0 when you defined the statement string itself, not the contents. If you use r'b"\0" * 7' it works just fine by deferring backslash escape processing until the string is actually eval-ed, rather than when you create the string.

For example, on my (admittedly underpowered) laptop (Win7 x64, Py 3.3.0 64-bit):

>>> min(timeit.repeat(r'b"\0" * 7'))
0.07514287752866267
>>> min(timeit.repeat(r'bytes(7)'))
0.7210309422814021
>>> min(timeit.repeat(r'b"\0" * 7000'))
0.8994351749659302
>>> min(timeit.repeat(r'bytes(7000)'))
2.06750710129117

For a short bytes, the difference is enormous (as I suspected, the lookup of bytes dominates the runtime). For much longer bytes, it's still winning by a lot, because the cost of having the short literal first, then multiplying it, is still trivial next to the lookup cost.

P.S. I made a mistake: str does accept an int argument (obviously), but it has completely different meaning.
msg213598 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-03-14 22:26
I'm inclined to leave it open while I do the suggested research.

Thanks for the tips, Terry, and the numbers, Josh.
msg213641 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-03-15 06:01
AFAIK, bytes(int) is a remnant from times when bytes was mutable. Then bytes was split to non-mutable bytes and mutable bytearray and this constructor was forgotten. I'm +0 for deprecation.
msg213656 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-03-15 15:56
Python 2.7.3 (default, Sep 26 2012, 21:51:14) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

--> bytes(5)
'5'

--> bytearray(5)
bytearray(b'\x00\x00\x00\x00\x00')
----------------------------------------------------------------------

Creating a buffer of null bytes makes sense for bytearray, which is mutable; it does not make sense, and IMHO only causes confusion, to have bytes return an /immutable/ sequence of zero bytes.
msg215095 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-03-28 23:39
Bringing over Barry's suggestion from the current python-ideas thread [1]:

    @classmethod
    def fill(cls, length, value=0):
        # Creates a bytes of given length with given fill value

[1] https://mail.python.org/pipermail/python-ideas/2014-March/027305.html
msg215103 - (view) Author: Josh Rosenberg (josh.r) * Date: 2014-03-29 00:35
Why would we need bytes.fill(length, value)? Is b'\xVV' * length (or if value is a variable containing int, bytes((value,)) * length) unreasonable? Similarly, bytearray(b'\xVV) * length or bytearray((value,)) * length is both Pythonic and performant. Most sequences support multiplication so simple stuff like this can be done easily and consistently; why invent a new approach unique to bytes/bytearrays?
msg215106 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-29 02:02
Also, to me 'fill' implies something is being filled, not that something is being created.
msg215110 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-03-29 02:55
The fill() name makes more sense for the bytearray variant, it is just provided on bytes as well for consistency. As Serhiy notes above, the current behaviour is almost certainly just a holdover from the original "mutable bytes" design that didn't survive into the initial 3.0 release.
msg215165 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-03-30 05:35
Under the name "from_len", this is now part of a larger proposal to improve the consistency of the binary APIs: http://www.python.org/dev/peps/pep-0467/
History
Date User Action Args
2014-03-30 05:35:21ncoghlansetmessages: + msg215165
2014-03-29 02:55:30ncoghlansetmessages: + msg215110
2014-03-29 02:02:13r.david.murraysetmessages: + msg215106
2014-03-29 00:35:36josh.rsetmessages: + msg215103
2014-03-28 23:39:02ncoghlansetnosy: + ncoghlan
messages: + msg215095
2014-03-28 15:12:20barrysetnosy: + barry
2014-03-19 01:04:53vadmiumsetnosy: + vadmium
2014-03-15 15:56:05ethan.furmansetmessages: + msg213656
2014-03-15 06:01:44serhiy.storchakasetmessages: + msg213641
2014-03-14 22:26:36ethan.furmansetmessages: + msg213598
2014-03-14 22:23:54josh.rsetmessages: + msg213597
2014-03-14 22:13:16terry.reedysetnosy: + terry.reedy
messages: + msg213596
2014-03-14 21:40:16josh.rsetnosy: + josh.r
messages: + msg213592
2014-03-12 14:39:39r.david.murraysetnosy: + r.david.murray
messages: + msg213262
2014-03-12 10:57:12serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg213246
2014-03-12 10:29:57ethan.furmancreate