patch to implement PEP 461 (%-interpolation for bytes) #64483

nascheme · 2014-01-16T21:33:09Z

BPO	20284
Nosy	@nascheme, @ncoghlan, @vstinner, @ericvsmith, @ethanfurman, @vadmium
Files	pep-draft.txt: draft PEP (alternative to 460 & 461) 01-pep461.patch 02-code-a.patch 03-py2-flag.patch 04-py2-eq.patch issue20284.stoneleaf.01.patch issue20284.stoneleaf.tests_only.01.patch issue20284.stoneleaf.02.patch issue20284.stoneleaf.03.patch issue20284.stoneleaf.04.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/ethanfurman'
closed_at = <Date 2015-01-29.23:26:31.652>
created_at = <Date 2014-01-16.21:33:09.311>
labels = ['interpreter-core', 'type-feature']
title = 'patch to implement PEP 461 (%-interpolation for bytes)'
updated_at = <Date 2015-01-29.23:26:31.651>
user = 'https://github.com/nascheme'

bugs.python.org fields:

activity = <Date 2015-01-29.23:26:31.651>
actor = 'ethan.furman'
assignee = 'ethan.furman'
closed = True
closed_date = <Date 2015-01-29.23:26:31.652>
closer = 'ethan.furman'
components = ['Interpreter Core']
creation = <Date 2014-01-16.21:33:09.311>
creator = 'nascheme'
dependencies = []
files = ['33507', '33574', '33575', '33576', '33577', '37616', '37617', '37703', '37745', '37766']
hgrepos = []
issue_num = 20284
keywords = ['patch', 'needs review']
message_count = 24.0
messages = ['208313', '208328', '208329', '208331', '208356', '208581', '215019', '224024', '233456', '233458', '233524', '233545', '233546', '233548', '233875', '234012', '234189', '234246', '234265', '234266', '234591', '234726', '234746', '234750']
nosy_count = 8.0
nosy_names = ['nascheme', 'ncoghlan', 'vstinner', 'eric.smith', 'Arfrever', 'ethan.furman', 'python-dev', 'martin.panter']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue20284'
versions = ['Python 3.5']

nascheme · 2014-01-16T21:33:08Z

This is a very rough, proof of concept patch that implements %-style formatting for bytes objects. Currently it calls __format__ with a bytes argument and expects a bytes result. I've only implemented
support for bytes formatting for the 'long' object.

Expected behavior:

>>> b'%s' % b'hello'
b'hello'
>>> b'%s' % 'hello'
TypeError is raised
>>> b'%s' % 123
b'123'
>>> b'%d' % 123
b'123'

Some issues:

%s support is incomplete, needs to handle width and padding. I think it should be done in formatbytes().
PyBytes_Format function very likely has bugs, I copied it mostly from Python 2 and quickly made it run, I did very little testing.
long__format__ with a bytes argument is inefficient. It creates temporary Python objects that could be avoided. %-style formatting on bytes will be much less efficient than str in Python 2. We could inline the handling of certain types in PyBytes_Format, maybe longs only would be sufficent.
I'm not sure overloading __format__ is the best approach. Maybe we should introduce a new method, say __ascii__ instead and have PyBytes_Format call that.

nascheme · 2014-01-17T08:40:25Z

I'm attaching v2 of my proposed patch. This one is quite a bit better, IMHO.

Introduce __ascii__ as a special method, like __str__ but required to exist only if an ASCII-only format exists.
Extract PyString_Format from Python 2.7 and update it for PyBytes.
%c only accepts integers, not single character strs, maybe should
accept length one byte objects.
add %a, should be useful for debugging
%s calls __bytes__ or __ascii__, otherwise gives a TypeError, should
eventually support buffer API
number formats work as they do in Python 2.

vstinner · 2014-01-17T09:06:36Z

I reviewed your second patch on Rietveld.

nascheme · 2014-01-17T09:40:05Z

Uploading new patch with the following changes:

Allow length 1 bytes object as argument to %c.
Make %r an alias for %a.

I will upload a draft PEP (proposed as a replacement for 461).

Victor, thanks for the review. My reply is:

regarding duplicated code: almost all of the code I added came directly from Python 2.7. If it can but shared with unicode object, great. However, note that the Python 2.7 code is well tested and well optimized.
regarding the introduction of __ascii__, see my draft PEP. PEP-461 cannot be implemented, adding __bytes__ to number objects breaks backwards compatibility.
regarding the change to _datetime, I don't mind if it gets discarded but I think its useful behavior. Mostly I made it as a proof-of-concept.

nascheme · 2014-01-17T19:33:06Z

Another revision of the patch, now quite close to PEP-461 as proposed. Changes from PEP-461:

include %a
add -2 command-line flag. When enabled have %s fallback to calling PyObject_Str() and encoding to ASCII and also enable %r as alias for %a.

Changes from previous patch:

remove __ascii__ special method, %s will only accept objects that
implement __bytes__ or the buffer API, unless -2 command line is used
use buffer API if available
add -2 command-line option
Add prototypes for PyBytes_Format and _PyUnicode_FormatLong
improve some exception messages

Reference counting in PyBytes_Format is quite hairy, could use some review. The code is nearly the same as Python 2.x stringobject.c.

nascheme · 2014-01-20T22:03:21Z

I've updated my patch into a sequence, the first of which implements PEP-461.

02-code-a.patch adds support for %a (ascii() on arg)

03-py2-flag.patch makes %s and %r behave similar to Python 2 if a command
line flag is provided to the interpreter

04-py-eq.patch makes the command line flag also enable comparision between bytes() and str() (warning is generated).

ethanfurman · 2014-03-28T05:26:11Z

PEP-461 has been accepted. I'll look over the code soon.

ncoghlan · 2014-07-26T07:03:11Z

Just noting I'm working on some significant updates to the bytes and bytearray docs in bpo-21777. I'll try to get that ready for review and merged relatively soon, so the docs for this can build on top of those changes.

vstinner · 2015-01-05T10:14:00Z

Hi. I proposed twice to Ethan to implement the PEP-461, but he replied that he wants to implement it. So, what's the status of the implementation?

vstinner · 2015-01-05T10:19:03Z

I would be nice to share as much code as possible with the Unicode implementation. My idea was to add a "_PyBytesWriter" API, very close to the "_PyUnicodeWriter", to share code. Old patch implementing the _PyBytesWriter API: issue bpo-17742 (rejected because it was less efficient, the compiler produces less efficient machine code).

ncoghlan · 2015-01-06T11:25:52Z

With the first alpha next month, unless we hear otherwise from Ethan in the next day or two, I'd suggest going ahead with the implementation. We can always tweak it during the alpha cycle if there are specific details he'd like to see changed.

ethanfurman · 2015-01-06T16:04:38Z

Here is what I have so far:

complete tests for bytes and bytearry (bytearray currently commented out at line 71)
PEP-461 implemented for bytes

This is basically an adaptation of the 2.7 code for str, adjusted appropriately.

I was planning on having bytearray convert to bytes, then call the bytes code, then integrate the results back into the existing bytearray (for %=) or create and return a new bytearray (for %).

I can easily believe this is not the most efficient way to do it. ;)

I should have the bytearray portion done, if not this weekend, then by the following weekend.

I have no objections if Victor wants to combine and optimize with the unicode implementation (and no need to wait for me to finish the bytearray portion).

vstinner · 2015-01-06T16:19:37Z

Ethan, do you have a public repository? If no, you can for example
fork CPython: click on "Server-side clone" at
https://hg.python.org/cpython/

ethanfurman · 2015-01-06T18:40:32Z

Sorry, no. And time is scarce at the moment so figuring out server-side clones will have to wait as well.

I uploaded the patch of what I have so far -- hopefully that will be helpful.

Also attaching patch with just the tests.

ethanfurman · 2015-01-12T08:39:26Z

I've been digging into this over the last week and come to the realization that I won't be able to finish this patch. My apologies.

Victor, can you take over? I would appreciate it.

The tests I have written are only for the Python side. The patch I was working on (inherited from Niel and the Python 2 code base) also added a couple C ABI functions -- do we want/need these? How do we write tests for them?

ethanfurman · 2015-01-14T07:28:46Z

Removed the new ABI functions, all new functions are static.

Duplicated bytes code in bytearray.

in-place interpolation returns new bytearray at this point.

I'll work on getting in-place working, but otherwise I'll commit this in a week so we have something in for the first alpha.

ethanfurman · 2015-01-17T18:37:08Z

Better patch, along the lines of my original thought:

byarrayformat converts bytearray to bytes
calls bytesformat (now _PyBytes_Format) to do the heavy lifting
uses PyByteArray_FromObject to tranform back to bytearray

Now working on in-place format.

vstinner · 2015-01-18T14:28:01Z

I will not have to work on optimization before the alpha 1 (February 8, 2015).

Ethan: just commit your patch when you consider that it's ready to be
merged, we will have time to refactor and enhance the code later.

IMO it's more important to have the feature in alpha 1 than having
perfect code, because some developer are waiting for this feature and
will have more time to provide feedback.

ethanfurman · 2015-01-18T18:37:22Z

Here's the patch -- the code for % and %= is in place for bytes and bytearray; I still need to get the doc patch done. I'll commit Wednesday-ish barring problems.

Big question
============

Background
----------
There is a Python C ABI function called PyBytes_FromFormat which is used to create bytes objects using the %-interpolation format (it takes a c string and none-or-many c args).

Actual Question
---------------
Should PyBytes_FromFormat also support the new codes of %a and %b ?

My Thoughts
-----------
Writing things down is good! %a and %b are both for Python level arguments, not C-level arguments, so %a and %b would make no sense.

ethanfurman · 2015-01-18T18:43:07Z

Thanks, Victor, for the feedback.

I was able to figure out some more of the C side thanks to Georg, and I think the code is looking pretty good.

There may be room for optimization by having the bytes code call the unicode implementation for more of the conversions (currently it's only using the unicode fromlong function), but the docs should happen before that. ;)

python-dev · 2015-01-24T04:06:04Z

New changeset 8d802fb6ae32 by Ethan Furman in branch 'default':
bpo-20284: Implement PEP-461
https://hg.python.org/cpython/rev/8d802fb6ae32

vstinner · 2015-01-26T08:44:13Z

It's strange that %s format raises an error about the %b format:

>>> b's? %s' % 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: %b requires bytes, or an object that implements __bytes__, not 'int'

ethanfurman · 2015-01-26T14:09:42Z

it does seem a bit odd -- on the other hand, %s is an alias for %b, is deprecated for new 3-only code, and this might help serve as a reminder of that.

Or we could fix it. ;)

python-dev · 2015-01-26T15:45:05Z

New changeset db7ec64aac39 by Victor Stinner in branch 'default':
Issue bpo-20284: Fix a compilation warning on Windows
https://hg.python.org/cpython/rev/db7ec64aac39

nascheme added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jan 16, 2014

nascheme changed the title ~~proof for concept patch for bytes formatting methods~~ patch to implement %-interpolation for bytes (roughly PEP 461) Jan 17, 2014

nascheme changed the title ~~patch to implement %-interpolation for bytes (roughly PEP 461)~~ patch to implement PEP 461 (%-interpolation for bytes) Jan 20, 2014

ethanfurman self-assigned this Mar 28, 2014

ethanfurman closed this as completed Jan 29, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

patch to implement PEP 461 (%-interpolation for bytes) #64483

patch to implement PEP 461 (%-interpolation for bytes) #64483

nascheme commented Jan 16, 2014

nascheme commented Jan 16, 2014

nascheme commented Jan 17, 2014

vstinner commented Jan 17, 2014

nascheme commented Jan 17, 2014

nascheme commented Jan 17, 2014

nascheme commented Jan 20, 2014

ethanfurman commented Mar 28, 2014

ncoghlan commented Jul 26, 2014

vstinner commented Jan 5, 2015

vstinner commented Jan 5, 2015

ncoghlan commented Jan 6, 2015

ethanfurman commented Jan 6, 2015

vstinner commented Jan 6, 2015

ethanfurman commented Jan 6, 2015

ethanfurman commented Jan 12, 2015

ethanfurman commented Jan 14, 2015

ethanfurman commented Jan 17, 2015

vstinner commented Jan 18, 2015

ethanfurman commented Jan 18, 2015

ethanfurman commented Jan 18, 2015

python-dev mannequin commented Jan 24, 2015

vstinner commented Jan 26, 2015

ethanfurman commented Jan 26, 2015

python-dev mannequin commented Jan 26, 2015

patch to implement PEP 461 (%-interpolation for bytes) #64483

patch to implement PEP 461 (%-interpolation for bytes) #64483

Comments

nascheme commented Jan 16, 2014

nascheme commented Jan 16, 2014

nascheme commented Jan 17, 2014

vstinner commented Jan 17, 2014

nascheme commented Jan 17, 2014

nascheme commented Jan 17, 2014

nascheme commented Jan 20, 2014

ethanfurman commented Mar 28, 2014

ncoghlan commented Jul 26, 2014

vstinner commented Jan 5, 2015

vstinner commented Jan 5, 2015

ncoghlan commented Jan 6, 2015

ethanfurman commented Jan 6, 2015

vstinner commented Jan 6, 2015

ethanfurman commented Jan 6, 2015

ethanfurman commented Jan 12, 2015

ethanfurman commented Jan 14, 2015

ethanfurman commented Jan 17, 2015

vstinner commented Jan 18, 2015

ethanfurman commented Jan 18, 2015

ethanfurman commented Jan 18, 2015

python-dev mannequin commented Jan 24, 2015

vstinner commented Jan 26, 2015

ethanfurman commented Jan 26, 2015

python-dev mannequin commented Jan 26, 2015