Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bytes.empty_buffer and deprecate bytes(17) for the same purpose #65094

Closed
ethanfurman opened this issue Mar 12, 2014 · 16 comments
Closed

Add bytes.empty_buffer and deprecate bytes(17) for the same purpose #65094

ethanfurman opened this issue Mar 12, 2014 · 16 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@ethanfurman
Copy link
Member

BPO 20895
Nosy @warsaw, @terryjreedy, @ncoghlan, @bitdancer, @ethanfurman, @vadmium, @serhiy-storchaka, @MojoVampire

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2014-12-08.00:07:09.213>
created_at = <Date 2014-03-12.10:29:57.095>
labels = ['interpreter-core', 'type-bug']
title = 'Add bytes.empty_buffer and deprecate bytes(17) for the same purpose'
updated_at = <Date 2015-05-17.22:41:48.341>
user = 'https://github.com/ethanfurman'

bugs.python.org fields:

activity = <Date 2015-05-17.22:41:48.341>
actor = 'terry.reedy'
assignee = 'none'
closed = True
closed_date = <Date 2014-12-08.00:07:09.213>
closer = 'ethan.furman'
components = ['Interpreter Core']
creation = <Date 2014-03-12.10:29:57.095>
creator = 'ethan.furman'
dependencies = []
files = []
hgrepos = []
issue_num = 20895
keywords = []
message_count = 16.0
messages = ['213242', '213246', '213262', '213592', '213596', '213597', '213598', '213641', '213656', '215095', '215103', '215106', '215110', '215165', '232292', '232294']
nosy_count = 8.0
nosy_names = ['barry', 'terry.reedy', 'ncoghlan', 'r.david.murray', 'ethan.furman', 'martin.panter', 'serhiy.storchaka', 'josh.r']
pr_nums = []
priority = 'normal'
resolution = 'out of date'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue20895'
versions = ['Python 3.5']

@ethanfurman
Copy link
Member Author

bytes is a list of integers. Passing a single integer to bytes(), as in:

--> bytes(7)
b'\x00\x00\x00\x00\x00\x00\x00'

results in a bytes object containing that many zeroes.

I propose that this behavior be deprecated for eventual removal, and a class method be created to take its place.

@ethanfurman ethanfurman added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Mar 12, 2014
@serhiy-storchaka
Copy link
Member

Class method is not needed. This is just b'\0' * 7.

@bitdancer
Copy link
Member

I don't have a strong opinion on this, but I think you are going to have to articulate a good use/usability case for the deprecation. I'm sure this is used in the wild, and we don't just gratuitously break things :)

@MojoVampire
Copy link
Mannequin

MojoVampire mannequin commented Mar 14, 2014

I would think the argument for deprecation is that usually, people type bytes(7) or bytes(somesmallintvalue) expecting to create a length one bytes object using that value (happens by accident if you iterate a bytes object and forget it's an iterable of ints, not an iterable of len 1 bytes). It's really easy to forget to make it bytes([7]) or bytes((7,)) or what have you. If you make the same mistake with str, list, tuple, etc., you get an error, because they only accept iterables. But bytes silently behaves in a way that is inconsistent with the other sequence types.

Given that b'\0' * 7 is usually faster in any event (by avoiding lookup costs to find the bytes constructor) and more intuitive to people familiar with the Python sequence idiom, I could definitely see this as a redundancy that does nothing but confuse.

@terryjreedy
Copy link
Member

I agree with Serhiy that the method is not needed in any case.

I was about to post the same missing rationale: people misunderstand 'bytes(7)' and write it expecting to get bytes([7]) == b(\x07'), so it would be better to make bytes(7) raise instead of silently accepting a buggy usage. I was thinking that one rationale for bytes(n) might be that it is faster than b'\0' * n. Since Josh claimed the contrary, I tried to test with timeit.repeat (both console and Idle) and got this error message
TypeError: source code string cannot contain null bytes
Both eval and compile emit this message. So it seems that one justification for bytes(n) is to avoid putting null bytes in source strings.

I think this issue should be closed. Deprecation ideas should really be posted of python-ideas and ultimately pydev for discussion and approval.

If Ethan wants to pursue the idea, he should research the design discussions for bytes() (probably on the py3k list) and whether Guido directly approved of bytes(n) or if someone else 'snuck' it in after the initial approval.

@MojoVampire
Copy link
Mannequin

MojoVampire mannequin commented Mar 14, 2014

Terry: You forgot to use a raw string for your timeit.repeat check, which is why it blew up. It was evaluating the \0 when you defined the statement string itself, not the contents. If you use r'b"\0" * 7' it works just fine by deferring backslash escape processing until the string is actually eval-ed, rather than when you create the string.

For example, on my (admittedly underpowered) laptop (Win7 x64, Py 3.3.0 64-bit):

>>> min(timeit.repeat(r'b"\0" * 7'))
0.07514287752866267
>>> min(timeit.repeat(r'bytes(7)'))
0.7210309422814021
>>> min(timeit.repeat(r'b"\0" * 7000'))
0.8994351749659302
>>> min(timeit.repeat(r'bytes(7000)'))
2.06750710129117

For a short bytes, the difference is enormous (as I suspected, the lookup of bytes dominates the runtime). For much longer bytes, it's still winning by a lot, because the cost of having the short literal first, then multiplying it, is still trivial next to the lookup cost.

P.S. I made a mistake: str does accept an int argument (obviously), but it has completely different meaning.

@ethanfurman
Copy link
Member Author

I'm inclined to leave it open while I do the suggested research.

Thanks for the tips, Terry, and the numbers, Josh.

@serhiy-storchaka
Copy link
Member

AFAIK, bytes(int) is a remnant from times when bytes was mutable. Then bytes was split to non-mutable bytes and mutable bytearray and this constructor was forgotten. I'm +0 for deprecation.

@ethanfurman
Copy link
Member Author

Python 2.7.3 (default, Sep 26 2012, 21:51:14)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

--> bytes(5)
'5'

--> bytearray(5)
bytearray(b'\x00\x00\x00\x00\x00')
----------------------------------------------------------------------

Creating a buffer of null bytes makes sense for bytearray, which is mutable; it does not make sense, and IMHO only causes confusion, to have bytes return an /immutable/ sequence of zero bytes.

@ncoghlan
Copy link
Contributor

Bringing over Barry's suggestion from the current python-ideas thread [1]:

    @classmethod
    def fill(cls, length, value=0):
        # Creates a bytes of given length with given fill value

[1] https://mail.python.org/pipermail/python-ideas/2014-March/027305.html

@MojoVampire
Copy link
Mannequin

MojoVampire mannequin commented Mar 29, 2014

Why would we need bytes.fill(length, value)? Is b'\xVV' * length (or if value is a variable containing int, bytes((value,)) * length) unreasonable? Similarly, bytearray(b'\xVV) * length or bytearray((value,)) * length is both Pythonic and performant. Most sequences support multiplication so simple stuff like this can be done easily and consistently; why invent a new approach unique to bytes/bytearrays?

@bitdancer
Copy link
Member

Also, to me 'fill' implies something is being filled, not that something is being created.

@ncoghlan
Copy link
Contributor

The fill() name makes more sense for the bytearray variant, it is just provided on bytes as well for consistency. As Serhiy notes above, the current behaviour is almost certainly just a holdover from the original "mutable bytes" design that didn't survive into the initial 3.0 release.

@ncoghlan
Copy link
Contributor

Under the name "from_len", this is now part of a larger proposal to improve the consistency of the binary APIs: http://www.python.org/dev/peps/pep-0467/

@terryjreedy
Copy link
Member

May we close this as superceded by PEP-467?

@ethanfurman
Copy link
Member Author

Superseded by PEP-467.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants