Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch to implement PEP 461 (%-interpolation for bytes) #64483

Closed
nascheme opened this issue Jan 16, 2014 · 24 comments
Closed

patch to implement PEP 461 (%-interpolation for bytes) #64483

nascheme opened this issue Jan 16, 2014 · 24 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@nascheme
Copy link
Member

BPO 20284
Nosy @nascheme, @ncoghlan, @vstinner, @ericvsmith, @ethanfurman, @vadmium
Files
  • pep-draft.txt: draft PEP (alternative to 460 & 461)
  • 01-pep461.patch
  • 02-code-a.patch
  • 03-py2-flag.patch
  • 04-py2-eq.patch
  • issue20284.stoneleaf.01.patch
  • issue20284.stoneleaf.tests_only.01.patch
  • issue20284.stoneleaf.02.patch
  • issue20284.stoneleaf.03.patch
  • issue20284.stoneleaf.04.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ethanfurman'
    closed_at = <Date 2015-01-29.23:26:31.652>
    created_at = <Date 2014-01-16.21:33:09.311>
    labels = ['interpreter-core', 'type-feature']
    title = 'patch to implement PEP 461 (%-interpolation for bytes)'
    updated_at = <Date 2015-01-29.23:26:31.651>
    user = 'https://github.com/nascheme'

    bugs.python.org fields:

    activity = <Date 2015-01-29.23:26:31.651>
    actor = 'ethan.furman'
    assignee = 'ethan.furman'
    closed = True
    closed_date = <Date 2015-01-29.23:26:31.652>
    closer = 'ethan.furman'
    components = ['Interpreter Core']
    creation = <Date 2014-01-16.21:33:09.311>
    creator = 'nascheme'
    dependencies = []
    files = ['33507', '33574', '33575', '33576', '33577', '37616', '37617', '37703', '37745', '37766']
    hgrepos = []
    issue_num = 20284
    keywords = ['patch', 'needs review']
    message_count = 24.0
    messages = ['208313', '208328', '208329', '208331', '208356', '208581', '215019', '224024', '233456', '233458', '233524', '233545', '233546', '233548', '233875', '234012', '234189', '234246', '234265', '234266', '234591', '234726', '234746', '234750']
    nosy_count = 8.0
    nosy_names = ['nascheme', 'ncoghlan', 'vstinner', 'eric.smith', 'Arfrever', 'ethan.furman', 'python-dev', 'martin.panter']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue20284'
    versions = ['Python 3.5']

    @nascheme
    Copy link
    Member Author

    This is a very rough, proof of concept patch that implements %-style formatting for bytes objects. Currently it calls __format__ with a bytes argument and expects a bytes result. I've only implemented
    support for bytes formatting for the 'long' object.

    Expected behavior:

    >>> b'%s' % b'hello'
    b'hello'
    >>> b'%s' % 'hello'
    TypeError is raised
    >>> b'%s' % 123
    b'123'
    >>> b'%d' % 123
    b'123'

    Some issues:

    • %s support is incomplete, needs to handle width and padding. I think it should be done in formatbytes().

    • PyBytes_Format function very likely has bugs, I copied it mostly from Python 2 and quickly made it run, I did very little testing.

    • long__format__ with a bytes argument is inefficient. It creates temporary Python objects that could be avoided. %-style formatting on bytes will be much less efficient than str in Python 2. We could inline the handling of certain types in PyBytes_Format, maybe longs only would be sufficent.

    • I'm not sure overloading __format__ is the best approach. Maybe we should introduce a new method, say __ascii__ instead and have PyBytes_Format call that.

    @nascheme nascheme added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jan 16, 2014
    @nascheme
    Copy link
    Member Author

    I'm attaching v2 of my proposed patch. This one is quite a bit better, IMHO.

    • Introduce __ascii__ as a special method, like __str__ but required to exist only if an ASCII-only format exists.

    • Extract PyString_Format from Python 2.7 and update it for PyBytes.

    • %c only accepts integers, not single character strs, maybe should
      accept length one byte objects.

    • add %a, should be useful for debugging

    • %s calls __bytes__ or __ascii__, otherwise gives a TypeError, should
      eventually support buffer API

    • number formats work as they do in Python 2.

    @vstinner
    Copy link
    Member

    I reviewed your second patch on Rietveld.

    @nascheme
    Copy link
    Member Author

    Uploading new patch with the following changes:

    • Allow length 1 bytes object as argument to %c.
    • Make %r an alias for %a.

    I will upload a draft PEP (proposed as a replacement for 461).

    Victor, thanks for the review. My reply is:

    • regarding duplicated code: almost all of the code I added came directly from Python 2.7. If it can but shared with unicode object, great. However, note that the Python 2.7 code is well tested and well optimized.

    • regarding the introduction of __ascii__, see my draft PEP. PEP-461 cannot be implemented, adding __bytes__ to number objects breaks backwards compatibility.

    • regarding the change to _datetime, I don't mind if it gets discarded but I think its useful behavior. Mostly I made it as a proof-of-concept.

    @nascheme
    Copy link
    Member Author

    Another revision of the patch, now quite close to PEP-461 as proposed. Changes from PEP-461:

    • include %a

    • add -2 command-line flag. When enabled have %s fallback to calling PyObject_Str() and encoding to ASCII and also enable %r as alias for %a.

    Changes from previous patch:

    • remove __ascii__ special method, %s will only accept objects that
      implement __bytes__ or the buffer API, unless -2 command line is used

    • use buffer API if available

    • add -2 command-line option

    • Add prototypes for PyBytes_Format and _PyUnicode_FormatLong

    • improve some exception messages

    Reference counting in PyBytes_Format is quite hairy, could use some review. The code is nearly the same as Python 2.x stringobject.c.

    @nascheme nascheme changed the title proof for concept patch for bytes formatting methods patch to implement %-interpolation for bytes (roughly PEP 461) Jan 17, 2014
    @nascheme
    Copy link
    Member Author

    I've updated my patch into a sequence, the first of which implements PEP-461.

    02-code-a.patch adds support for %a (ascii() on arg)

    03-py2-flag.patch makes %s and %r behave similar to Python 2 if a command
    line flag is provided to the interpreter

    04-py-eq.patch makes the command line flag also enable comparision between bytes() and str() (warning is generated).

    @nascheme nascheme changed the title patch to implement %-interpolation for bytes (roughly PEP 461) patch to implement PEP 461 (%-interpolation for bytes) Jan 20, 2014
    @ethanfurman
    Copy link
    Member

    PEP-461 has been accepted. I'll look over the code soon.

    @ethanfurman ethanfurman self-assigned this Mar 28, 2014
    @ncoghlan
    Copy link
    Contributor

    Just noting I'm working on some significant updates to the bytes and bytearray docs in bpo-21777. I'll try to get that ready for review and merged relatively soon, so the docs for this can build on top of those changes.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 5, 2015

    Hi. I proposed twice to Ethan to implement the PEP-461, but he replied that he wants to implement it. So, what's the status of the implementation?

    @vstinner
    Copy link
    Member

    vstinner commented Jan 5, 2015

    I would be nice to share as much code as possible with the Unicode implementation. My idea was to add a "_PyBytesWriter" API, very close to the "_PyUnicodeWriter", to share code. Old patch implementing the _PyBytesWriter API: issue bpo-17742 (rejected because it was less efficient, the compiler produces less efficient machine code).

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jan 6, 2015

    With the first alpha next month, unless we hear otherwise from Ethan in the next day or two, I'd suggest going ahead with the implementation. We can always tweak it during the alpha cycle if there are specific details he'd like to see changed.

    @ethanfurman
    Copy link
    Member

    Here is what I have so far:

    • complete tests for bytes and bytearry (bytearray currently commented out at line 71)
    • PEP-461 implemented for bytes

    This is basically an adaptation of the 2.7 code for str, adjusted appropriately.

    I was planning on having bytearray convert to bytes, then call the bytes code, then integrate the results back into the existing bytearray (for %=) or create and return a new bytearray (for %).

    I can easily believe this is not the most efficient way to do it. ;)

    I should have the bytearray portion done, if not this weekend, then by the following weekend.

    I have no objections if Victor wants to combine and optimize with the unicode implementation (and no need to wait for me to finish the bytearray portion).

    @vstinner
    Copy link
    Member

    vstinner commented Jan 6, 2015

    Ethan, do you have a public repository? If no, you can for example
    fork CPython: click on "Server-side clone" at
    https://hg.python.org/cpython/

    @ethanfurman
    Copy link
    Member

    Sorry, no. And time is scarce at the moment so figuring out server-side clones will have to wait as well.

    I uploaded the patch of what I have so far -- hopefully that will be helpful.

    Also attaching patch with just the tests.

    @ethanfurman
    Copy link
    Member

    I've been digging into this over the last week and come to the realization that I won't be able to finish this patch. My apologies.

    Victor, can you take over? I would appreciate it.

    The tests I have written are only for the Python side. The patch I was working on (inherited from Niel and the Python 2 code base) also added a couple C ABI functions -- do we want/need these? How do we write tests for them?

    @ethanfurman
    Copy link
    Member

    Removed the new ABI functions, all new functions are static.

    Duplicated bytes code in bytearray.

    in-place interpolation returns new bytearray at this point.

    I'll work on getting in-place working, but otherwise I'll commit this in a week so we have something in for the first alpha.

    @ethanfurman
    Copy link
    Member

    Better patch, along the lines of my original thought:

    • byarrayformat converts bytearray to bytes
    • calls bytesformat (now _PyBytes_Format) to do the heavy lifting
    • uses PyByteArray_FromObject to tranform back to bytearray

    Now working on in-place format.

    @vstinner
    Copy link
    Member

    I will not have to work on optimization before the alpha 1 (February 8, 2015).

    Ethan: just commit your patch when you consider that it's ready to be
    merged, we will have time to refactor and enhance the code later.

    IMO it's more important to have the feature in alpha 1 than having
    perfect code, because some developer are waiting for this feature and
    will have more time to provide feedback.

    @ethanfurman
    Copy link
    Member

    Here's the patch -- the code for % and %= is in place for bytes and bytearray; I still need to get the doc patch done. I'll commit Wednesday-ish barring problems.

    Big question
    ============

    Background
    ----------
    There is a Python C ABI function called PyBytes_FromFormat which is used to create bytes objects using the %-interpolation format (it takes a c string and none-or-many c args).

    Actual Question
    ---------------
    Should PyBytes_FromFormat also support the new codes of %a and %b ?

    My Thoughts
    -----------
    Writing things down is good! %a and %b are both for Python level arguments, not C-level arguments, so %a and %b would make no sense.

    @ethanfurman
    Copy link
    Member

    Thanks, Victor, for the feedback.

    I was able to figure out some more of the C side thanks to Georg, and I think the code is looking pretty good.

    There may be room for optimization by having the bytes code call the unicode implementation for more of the conversions (currently it's only using the unicode fromlong function), but the docs should happen before that. ;)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 24, 2015

    New changeset 8d802fb6ae32 by Ethan Furman in branch 'default':
    bpo-20284: Implement PEP-461
    https://hg.python.org/cpython/rev/8d802fb6ae32

    @vstinner
    Copy link
    Member

    It's strange that %s format raises an error about the %b format:

    >>> b's? %s' % 1
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: %b requires bytes, or an object that implements __bytes__, not 'int'

    @ethanfurman
    Copy link
    Member

    it does seem a bit odd -- on the other hand, %s is an alias for %b, is deprecated for new 3-only code, and this might help serve as a reminder of that.

    Or we could fix it. ;)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 26, 2015

    New changeset db7ec64aac39 by Victor Stinner in branch 'default':
    Issue bpo-20284: Fix a compilation warning on Windows
    https://hg.python.org/cpython/rev/db7ec64aac39

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants