Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

struct.Struct.format is bytes, but should be str #65270

Closed
zbysz mannequin opened this issue Mar 26, 2014 · 24 comments
Closed

struct.Struct.format is bytes, but should be str #65270

zbysz mannequin opened this issue Mar 26, 2014 · 24 comments
Labels
3.7 (EOL) end of life docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@zbysz
Copy link
Mannequin

zbysz mannequin commented Mar 26, 2014

BPO 21071
Nosy @rhettinger, @vstinner, @benjaminp, @bitdancer, @vadmium, @serhiy-storchaka, @zhangyangyu
PRs
  • bpo-21071: struct.Struct.format type is now str #845
  • Files
  • format-bytes.patch
  • format-str.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-06-23.13:14:01.966>
    created_at = <Date 2014-03-26.17:48:53.473>
    labels = ['3.7', 'type-bug', 'library', 'docs']
    title = 'struct.Struct.format is bytes, but should be str'
    updated_at = <Date 2017-06-23.13:14:01.965>
    user = 'https://bugs.python.org/zbysz'

    bugs.python.org fields:

    activity = <Date 2017-06-23.13:14:01.965>
    actor = 'vstinner'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2017-06-23.13:14:01.966>
    closer = 'vstinner'
    components = ['Documentation', 'Library (Lib)']
    creation = <Date 2014-03-26.17:48:53.473>
    creator = 'zbysz'
    dependencies = []
    files = ['37474', '37488']
    hgrepos = []
    issue_num = 21071
    keywords = ['patch']
    message_count = 24.0
    messages = ['214903', '214905', '214906', '214907', '216655', '232768', '232840', '232843', '232857', '232863', '232868', '232880', '232955', '290500', '290584', '290595', '290596', '290601', '292398', '292409', '292411', '292558', '296710', '296711']
    nosy_count = 10.0
    nosy_names = ['rhettinger', 'vstinner', 'benjamin.peterson', 'Arfrever', 'r.david.murray', 'zbysz', 'docs@python', 'martin.panter', 'serhiy.storchaka', 'xiang.zhang']
    pr_nums = ['845']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue21071'
    versions = ['Python 3.7']

    @zbysz
    Copy link
    Mannequin Author

    zbysz mannequin commented Mar 26, 2014

    In Python 2, Struct.format used to be a str. In Python 3 it is bytes, which is unexpected.

    Why do I expect .format to be a string:

    • This format is pretty much the same as a "{}-format" - plain text
    • according to documentation it is composed of things like characters from a closed set '<.=@hi...', a subset of ASCII,
    • it is always called "format string" in the documentation

    Why is this a problem:

    • If I use a str format in constructor, I expect to get a str format,
    • Comparisons are broken:
    >>> struct.Struct('x').format == 'x'
    False
    >>> struct.Struct('x').format[0] == 'x'
    False
    
    - doctests are broken
    >>> struct.Struct('x').format
    'x' # in Python 2
    b'x' # in Python 3

    @zbysz zbysz mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Mar 26, 2014
    @benjaminp
    Copy link
    Contributor

    I agree that's rather unfortunate. It would be backwards incompatible to change, though.

    @zbysz
    Copy link
    Mannequin Author

    zbysz mannequin commented Mar 26, 2014

    Maybe a flag param for the constructor?

    @bitdancer
    Copy link
    Member

    I agree that the implementation does not match the documentation in this case. Especially the part about "the format string used to create this Struct object". I don't see what having a flag would buy you: it doesn't help you in writing 2/3 shared code. I think the best we can do here is a doc change.

    @bitdancer bitdancer added the docs Documentation in the Doc dir label Mar 26, 2014
    @vadmium
    Copy link
    Member

    vadmium commented Apr 17, 2014

    This is closely related to bpo-16349. If format strings were explicitly allowed to be byte strings there would be less conflict, but documenting the data type of the “format” attribute is better than nothing.

    @vadmium
    Copy link
    Member

    vadmium commented Dec 16, 2014

    It seems to me that the simplest fix is to document:

    1. Struct.format attribute is a byte string
    2. The input format strings for struct.pack(), Struct class, etc, are also allowed to be byte strings, for consistency (bpo-16349)

    Here is a patch that does that, and adds some simple test cases.

    @vstinner
    Copy link
    Member

    It would be backwards incompatible to change, though.

    I'm in favor of breaking the compatibility with Python 3.4 and return the format as an Unicode string.

    @vadmium
    Copy link
    Member

    vadmium commented Dec 18, 2014

    I originally assumed it would be a text string from the existing documentation, so changing the behaviour to match also seems reasonable

    @vadmium
    Copy link
    Member

    vadmium commented Dec 18, 2014

    Here is a patch that changes over to a str() type.

    Is it safe to assume PyUnicode_AsUTF8() is null-terminated (like PyBytes_AS_STRING() is)? My documentation doesn’t say.

    @vstinner
    Copy link
    Member

    Is it safe to assume PyUnicode_AsUTF8() is null-terminated?

    Yes, Python ensures that the string is null terminated.

    (like PyBytes_AS_STRING() is)

    Yes, PyBytes_AS_STRING() also ends with a null byte.

    By the way, Unicode strings internally ends with a null character.

    @serhiy-storchaka
    Copy link
    Member

    I think breaking the compatibility should be discussed on Python-Dev. Similar issue (and even worse) is bpo-8934.

    @bitdancer
    Copy link
    Member

    A backward compatibility break would certainly need to be discussed, IMO.

    @rhettinger
    Copy link
    Contributor

    I would like to see this and bpo-8934 discussed as a usability bug. As far as I can tell, the current state of affairs an unintended by-product of a rushed effort to split the standard library to bytes apis and unicode apis. I don't see any reason that we should have to live with this forever.

    @vadmium
    Copy link
    Member

    vadmium commented Mar 25, 2017

    A backwards-compatible way forward would be to preserve (and document) the “format” attribute as a byte string, and add a new attribute which is definitely a text string. Not sure of a good name; perhaps “Struct.text_format” or “format_str” is a start.

    @vstinner
    Copy link
    Member

    I created #845 to change struct.Struct.format type to str (Unicode).

    struct.Struct() accepts bytes and str format strings, so it's not really a backward incompatible change.

    It's just a minor enhancement to help development:

    $ ./python
    Python 3.7.0a0 (heads/master-dirty:b8a7daf, Mar 27 2017, 13:02:20) 
    >>> print(struct.Struct('hi').format)
    hi

    Without the patch:

    haypo@selma$ python3
    Python 3.5.2 (default, Sep 14 2016, 11:28:32) 
    [GCC 6.2.1 20160901 (Red Hat 6.2.1-1)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import struct
    >>> print(struct.Struct('hi').format)
    b'hi'
    
    haypo@selma$ python3 -bb
    Python 3.5.2 (default, Sep 14 2016, 11:28:32) 
    >>> import struct
    >>> print(struct.Struct('hi').format)
    Traceback (most recent call last):
      ...
    BytesWarning: str() on a bytes instance

    @serhiy-storchaka
    Copy link
    Member

    This should be discussed on Python-Dev first. I already raised this issue on Python-Dev, but don't remember what is the result.

    @vadmium
    Copy link
    Member

    vadmium commented Mar 27, 2017

    Hi Victor, I’m not sure about changing the data type. As Python 3 grows older, there is potentially more code being written that you break by fixing a bug like this. It is incompatible if you used to write

    >>> print(struct.Struct('hi').format.decode())
    hi

    I have used this decode() trick in the past to build composite format strings; e.g.: <https://bugs.python.org/issue16349#msg174083\>. If you change the data type this code will raise AttributeError. At a minimum you should acknowledge it in the “porting” section of What’s New.

    Also, if you make this change, maybe update the module doc string. See the end of format-str.patch.

    @vstinner
    Copy link
    Member

    Ok, I opened a thread on python-dev:
    https://mail.python.org/pipermail/python-dev/2017-March/147688.html

    Martin: "At a minimum you should acknowledge it in the “porting”
    section of What’s New."

    I wasn't sure if the change was worth it to be mentionned in What's
    New in Python 3.7. Ok, will do for the next round (I'm now waiting for
    more feedback on my python-dev thread and this issue.)

    @zhangyangyu
    Copy link
    Member

    +1 for change bytes to str. But struct.Struct() accepts both bytes and str, maybe in future buffer objects. When it gets a bytes object, converting it to a str looks unnecessary to me, and as OP said, comparison (a theoretical use case) could still fail. Could we just leave what the user passes in? bytes(bytes-like) -> bytes, str -> str. This looks more natural to me.

    @serhiy-storchaka
    Copy link
    Member

    After changing the type of Struct.format to str we perhaps should deprecate accepting bytes as format. Currently this can lead to emitting a BytesWarning.

    $ ./python -Wa -b
    >>> import struct
    >>> struct.pack('I', 12345)
    b'90\x00\x00'
    >>> struct.pack(b'I', 12345)
    __main__:1: BytesWarning: Comparison between bytes and string
    __main__:1: BytesWarning: Comparison between bytes and string
    b'90\x00\x00'

    @zhangyangyu
    Copy link
    Member

    The warnings are possible to remove I think... but deprecate bytes arguments sounds good.

    @vadmium
    Copy link
    Member

    vadmium commented Apr 29, 2017

    I don’t think the API should be expanded to accept arbitrary bytes-like objects as format strings. Struct formats are strings of ASCII-compatible characters, but not arbitrary chunks of memory.

    I think the main question is whether it is okay to break compatibility (Victor’s pull request, or my format-str.patch), or whether there has to be a backwards-compatible deprecation of the existing bytes attribute. FWIW I am okay with breaking compatibility, since the main documentation already implies it should be a text string.

    @vstinner
    Copy link
    Member

    New changeset f87b85f by Victor Stinner in branch 'master':
    bpo-21071: struct.Struct.format type is now str (#845)
    f87b85f

    @vstinner
    Copy link
    Member

    Ok, I changed struct.Struct.format type to str (Unicode string).

    If someone wants to modify the C code to use a PyUnicodeObject rather than a char*, feel free to propose a further change.

    Since the initial issue is fixed, I now close the issue.

    Thank you all for your feedback and reviews ;-)

    @vstinner vstinner added the 3.7 (EOL) end of life label Jun 23, 2017
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants