Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pprint could use line continuation for long bytes literals #61732

Closed
pitrou opened this issue Mar 23, 2013 · 10 comments
Closed

pprint could use line continuation for long bytes literals #61732

pitrou opened this issue Mar 23, 2013 · 10 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@pitrou
Copy link
Member

pitrou commented Mar 23, 2013

BPO 17530
Nosy @freddrake, @pitrou, @serhiy-storchaka
Dependencies
  • bpo-19104: pprint produces invalid output for long strings
  • bpo-19105: pprint doesn't use all width
  • Files
  • bytes_pprint.patch
  • bytes_pprint2.patch
  • pprint_bytes.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-03-24.17:23:48.673>
    created_at = <Date 2013-03-23.19:39:37.416>
    labels = ['type-feature', 'library']
    title = 'pprint could use line continuation for long bytes literals'
    updated_at = <Date 2015-03-24.17:23:48.672>
    user = 'https://github.com/pitrou'

    bugs.python.org fields:

    activity = <Date 2015-03-24.17:23:48.672>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-03-24.17:23:48.673>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2013-03-23.19:39:37.416>
    creator = 'pitrou'
    dependencies = ['19104', '19105']
    files = ['29865', '29915', '38138']
    hgrepos = []
    issue_num = 17530
    keywords = ['patch']
    message_count = 10.0
    messages = ['185078', '186988', '186989', '186993', '186995', '186998', '187218', '187456', '235968', '239159']
    nosy_count = 5.0
    nosy_names = ['fdrake', 'pitrou', 'python-dev', 'Pam.McANulty', 'serhiy.storchaka']
    pr_nums = []
    priority = 'low'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue17530'
    versions = ['Python 3.5']

    @pitrou
    Copy link
    Member Author

    pitrou commented Mar 23, 2013

    Same as issue bpo-17150:

    >>> pprint.pprint({"a": b"\x00\xff" * 20})
    {'a': b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff'}

    ... could be better formatted as:

    >>> pprint.pprint({"a": b"\x00\xff" * 20})
    {'a': b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00'
          b'\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff'
          b'\x00\xff\x00\xff\x00\xff'}

    @pitrou pitrou added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Mar 23, 2013
    @PamMcANulty
    Copy link
    Mannequin

    PamMcANulty mannequin commented Apr 15, 2013

    Here's a patch. I needed to handle the fact that the repr of a single byte can be 1, 2 or 4 characters long and did not want to wrap in the middle of a byte representation. Note also that bytes literals require a continuation character. In the pathological case where the wrap size is smaller than the representation of a single byte, I chose to always print at least one byte per line.

    As an aside, I also replaced the str wrapping code's calls to len with the cached _len used in the rest of pprint.py

    @PamMcANulty
    Copy link
    Mannequin

    PamMcANulty mannequin commented Apr 15, 2013

    oops, forgot to add some samples:
    >>> pprint.pprint(b"\n\n\n\n\n\n", width=5)
    b'\n'\
    b'\n'\
    b'\n'\
    b'\n'\
    b'\n'\
    b'\n'
    
    >>> pprint.pprint({"a": b"\x00\xff" * 20})
    {'a': b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00'\
          b'\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff'\
          b'\x00\xff\x00\xff\x00\xff'}
    
    >>> pprint.pprint({"a": b"\x00\xff" * 20}, width=20)
    {'a': b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'\
          b'\x00\xff'}
    
    >>> pprint.pprint(b'a\x00\n\\x00', width=20)
    b'a\x00\n\\x00'

    @pitrou
    Copy link
    Member Author

    pitrou commented Apr 15, 2013

    I don't understand why you say that "bytes literals require a continuation character":

    >>> (b"x"
    ...  b"y")
    b'xy'
    >>> [b"x"
    ...  b"y"]
    [b'xy']

    I think the "len caching" is a misoptimization, it's useless here (most CPU time will be sent creating and wrapping the representation).
    Also perhaps it would be nice to refactor things a bit, since we have both _str_parts and _bytes_parts used in exactly the same way (but that can also be done later).

    As for the doc, the example would probably deserve to be a bit more "meaningful" :-)

    @PamMcANulty
    Copy link
    Mannequin

    PamMcANulty mannequin commented Apr 15, 2013

    • eval expects bytes to have a continuation character and test_str_wrap did an eval check so I figured test_bytes_wrap should as well:
    # repr some bytes:
    >>> b = b"\x00\xff" * 5
    >>> b
    b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff'
    >>> r = repr(b)
    >>> r
    "b'\\x00\\xff\\x00\\xff\\x00\\xff\\x00\\xff\\x00\\xff'"
    >>> eval(r)
    b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff'
    
    # hand-wrap it without the continuation character and it fails to eval (stuck the 
    >>> s = "b'\\x00\\xff\\x00\\xff\\x00'\nb'\\xff\\x00\\xff\\x00\\xff'"
    >>> print(s)
    b'\x00\xff\x00\xff\x00'
    b'\xff\x00\xff\x00\xff'
    >>> eval(s)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<string>", line 2
        b'\xff\x00\xff\x00\xff'
                              ^
    SyntaxError: invalid syntax
    
    # stick the continuation character in, and it evals properly
    >>> s = s.replace("\n", "\\\n")
    >>> print(s)
    b'\x00\xff\x00\xff\x00'\
    b'\xff\x00\xff\x00\xff'
    >>> eval(s)
    b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff'
    • I agree about the len v _len, but figured this wasn't all that foolish a consistency issue (i.e. the rest of pprint.py used _len)

    • I also wanted to refactor _str_parts and _bytes_parts, but couldn't decide on the best course. I was favoring a helper function to run the common loop since the two "if issubclass..." calls were so different and parameterizing the differences felt like it would obfuscate things too much.

    • I also agree on the doc. I figured I'd see if there weren't any hidden surprises with the patch before I worked on better doc.

    @pitrou
    Copy link
    Member Author

    pitrou commented Apr 15, 2013

    Well, but eval works if you put parentheses as required by the grammar:

    >>> s = "(b'xy'\nb'za')"
    >>> eval(s)
    b'xyza'

    Yes, _str_parts and _bytes_parts should probably remain separate. It's the higher-level routine that would deserve sharing.
    Also, perhaps the other wrapping routines (for dict, list...) could get the same treatment.

    @PamMcANulty
    Copy link
    Mannequin

    PamMcANulty mannequin commented Apr 18, 2013

    Here's a new version. It looks more like the str_parts patch and uses parentheses instead of continuation as suggested.

    Sample output:
    >>> pprint.pprint(b"\n\na\x00", width=1)
       (b'\n'
        b'\n'
        b'a'
        b'\x00')

    @pitrou
    Copy link
    Member Author

    pitrou commented Apr 20, 2013

    Ok, I hadn't noticed that pretty-printing a single string didn't add the parentheses as desired:

    >>> pprint.pprint("abcd " * 6, width=15)
    'abcd abcd '
    'abcd abcd '
    'abcd abcd '

    On the other hand, the added parentheses aren't needed when inside a container (line continuations will work without them).

    (and of course the same rules should apply to bytes objects :-))

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch based on current str formatting code (i.e. parenthesis are added only if needed, the space at the right is used more efficiently). It adds pprint support for bytes and bytearrays. Bytes are broken only at positions divisible by 4, so packed 32-bit ints are never broken.

    Examples:

    >>> pprint.pprint(bytes(range(128)))
    (b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13'
     b'\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./01234567'
     b'89:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{'
     b'|}~\x7f')
    >>> pprint.pprint({'abcdefgh': bytes(range(128))})
    {'abcdefgh': b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'
                 b'\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f'
                 b' !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ['
                 b'\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f'}
    >>> pprint.pprint(bytearray(range(128)))
    bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'
              b'\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f'
              b' !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_'
              b'`abcdefghijklmnopqrstuvwxyz{|}~\x7f')
    >>> pprint.pprint({'abcdefgh': bytearray(range(128))})
    {'abcdefgh': bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b'
                           b'\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17'
                           b'\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123'
                           b'456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefg'
                           b'hijklmnopqrstuvwxyz{|}~\x7f')}

    @serhiy-storchaka serhiy-storchaka self-assigned this Mar 24, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 24, 2015

    New changeset 976de10bf731 by Serhiy Storchaka in branch 'default':
    Issue bpo-17530: pprint now wraps long bytes objects and bytearrays.
    https://hg.python.org/cpython/rev/976de10bf731

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants