classification
Title: patch to implement PEP 461 (%-interpolation for bytes)
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ethan.furman Nosy List: Arfrever, eric.smith, ethan.furman, martin.panter, nascheme, ncoghlan, python-dev, vstinner
Priority: normal Keywords: needs review, patch

Created on 2014-01-16 21:33 by nascheme, last changed 2015-01-29 23:26 by ethan.furman. This issue is now closed.

Files
File name Uploaded Description Edit
pep-draft.txt nascheme, 2014-01-17 09:40 draft PEP (alternative to 460 & 461)
01-pep461.patch nascheme, 2014-01-20 22:03 review
02-code-a.patch nascheme, 2014-01-20 22:03
03-py2-flag.patch nascheme, 2014-01-20 22:03
04-py2-eq.patch nascheme, 2014-01-20 22:04
issue20284.stoneleaf.01.patch ethan.furman, 2015-01-06 16:04 review
issue20284.stoneleaf.tests_only.01.patch ethan.furman, 2015-01-06 18:40 review
issue20284.stoneleaf.02.patch ethan.furman, 2015-01-14 07:28 review
issue20284.stoneleaf.03.patch ethan.furman, 2015-01-17 18:37 review
issue20284.stoneleaf.04.patch ethan.furman, 2015-01-18 18:37 review
Messages (24)
msg208313 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2014-01-16 21:33
This is a very rough, proof of concept patch that implements %-style formatting for bytes objects.  Currently it calls __format__ with a bytes argument and expects a bytes result.  I've only implemented 
support for bytes formatting for the 'long' object.

Expected behavior:

>>> b'%s' % b'hello'
b'hello'
>>> b'%s' % 'hello'
TypeError is raised
>>> b'%s' % 123
b'123'
>>> b'%d' % 123
b'123'

Some issues:

- %s support is incomplete, needs to handle width and padding.  I think it should be done in formatbytes().

- PyBytes_Format function very likely has bugs, I copied it mostly from Python 2 and quickly made it run, I did very little testing.

- long__format__ with a bytes argument is inefficient.  It creates temporary Python objects that could be avoided.  %-style formatting on bytes will be much less efficient than str in Python 2.  We could inline the handling of certain types in PyBytes_Format, maybe longs only would be sufficent.

- I'm not sure overloading __format__ is the best approach.  Maybe we should introduce a new method, say __ascii__ instead and have PyBytes_Format call that.
msg208328 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2014-01-17 08:40
I'm attaching v2 of my proposed patch.  This one is quite a bit better, IMHO.

- Introduce __ascii__ as a special method, like __str__ but required to exist only if an ASCII-only format exists.

- Extract PyString_Format from Python 2.7 and update it for PyBytes.

- %c only accepts integers, not single character strs, maybe should
  accept length one byte objects.

- add %a, should be useful for debugging

- %s calls __bytes__ or __ascii__, otherwise gives a TypeError, should
  eventually support buffer API

- number formats work as they do in Python 2.
msg208329 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-01-17 09:06
I reviewed your second patch on Rietveld.
msg208331 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2014-01-17 09:40
Uploading new patch with the following changes:

- Allow length 1 bytes object as argument to %c.
- Make %r an alias for %a.

I will upload a draft PEP (proposed as a replacement for 461).

Victor, thanks for the review.  My reply is:

- regarding duplicated code: almost all of the code I added came directly from Python 2.7.  If it can but shared with unicode object, great.  However, note that the Python 2.7 code is well tested and well optimized.

- regarding the introduction of __ascii__, see my draft PEP.  PEP 461 cannot be implemented, adding __bytes__ to number objects breaks backwards compatibility.

- regarding the change to _datetime, I don't mind if it gets discarded but I think its useful behavior.  Mostly I made it as a proof-of-concept.
msg208356 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2014-01-17 19:33
Another revision of the patch, now quite close to PEP 461 as proposed.  Changes from PEP 461:

- include %a

- add -2 command-line flag.  When enabled have %s fallback to calling PyObject_Str() and encoding to ASCII and also enable %r as alias for %a.

Changes from previous patch:

- remove __ascii__ special method, %s will only accept objects that
  implement __bytes__ or the buffer API, unless -2 command line is used

- use buffer API if available

- add -2 command-line option

- Add prototypes for PyBytes_Format and _PyUnicode_FormatLong

- improve some exception messages

Reference counting in PyBytes_Format is quite hairy, could use some review.  The code is nearly the same as Python 2.x stringobject.c.
msg208581 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2014-01-20 22:03
I've updated my patch into a sequence, the first of which implements PEP 461.

02-code-a.patch adds support for %a (ascii() on arg)

03-py2-flag.patch makes %s and %r behave similar to Python 2 if a command
line flag is provided to the interpreter

04-py-eq.patch makes the command line flag also enable comparision between bytes() and str() (warning is generated).
msg215019 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-03-28 05:26
PEP 461 has been accepted.  I'll look over the code soon.
msg224024 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-26 07:03
Just noting I'm working on some significant updates to the bytes and bytearray docs in issue 21777. I'll try to get that ready for review and merged relatively soon, so the docs for this can build on top of those changes.
msg233456 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-01-05 10:14
Hi. I proposed twice to Ethan to implement the PEP 461, but he replied that he wants to implement it. So, what's the status of the implementation?
msg233458 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-01-05 10:19
I would be nice to share as much code as possible with the Unicode implementation. My idea was to add a "_PyBytesWriter" API, very close to the "_PyUnicodeWriter", to share code. Old patch implementing the _PyBytesWriter API: issue #17742 (rejected because it was less efficient, the compiler produces less efficient machine code).
msg233524 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2015-01-06 11:25
With the first alpha next month, unless we hear otherwise from Ethan in the next day or two, I'd suggest going ahead with the implementation. We can always tweak it during the alpha cycle if there are specific details he'd like to see changed.
msg233545 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-01-06 16:04
Here is what I have so far:

  - complete tests for bytes and bytearry (bytearray currently commented out at line 71)
  - pep461 implemented for bytes

This is basically an adaptation of the 2.7 code for str, adjusted appropriately.

I was planning on having bytearray convert to bytes, then call the bytes code, then integrate the results back into the existing bytearray (for %=) or create and return a new bytearray (for %).

I can easily believe this is not the most efficient way to do it.  ;)

I should have the bytearray portion done, if not this weekend, then by the following weekend.

I have no objections if Victor wants to combine and optimize with the unicode implementation (and no need to wait for me to finish the bytearray portion).
msg233546 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-01-06 16:19
Ethan, do you have a public repository? If no, you can for example
fork CPython: click on "Server-side clone" at
https://hg.python.org/cpython/
msg233548 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-01-06 18:40
Sorry, no.  And time is scarce at the moment so figuring out server-side clones will have to wait as well.

I uploaded the patch of what I have so far -- hopefully that will be helpful.

Also attaching patch with just the tests.
msg233875 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-01-12 08:39
I've been digging into this over the last week and come to the realization that I won't be able to finish this patch.  My apologies.

Victor, can you take over?  I would appreciate it.

The tests I have written are only for the Python side.  The patch I was working on (inherited from Niel and the Python 2 code base) also added a couple C ABI functions -- do we want/need these?  How do we write tests for them?
msg234012 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-01-14 07:28
Removed the new ABI functions, all new functions are static.

Duplicated bytes code in bytearray.

in-place interpolation returns new bytearray at this point.

I'll work on getting in-place working, but otherwise I'll commit this in a week so we have something in for the first alpha.
msg234189 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-01-17 18:37
Better patch, along the lines of my original thought:

  - byarrayformat converts bytearray to bytes
  - calls bytesformat (now _PyBytes_Format) to do the heavy lifting
  - uses PyByteArray_FromObject to tranform back to bytearray

Now working on in-place format.
msg234246 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-01-18 14:28
I will not have to work on optimization before the alpha 1 (February 8, 2015).

Ethan: just commit your patch when you consider that it's ready to be
merged, we will have time to refactor and enhance the code later.

IMO it's more important to have the feature in alpha 1 than having
perfect code, because some developer are waiting for this feature and
will have more time to provide feedback.
msg234265 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-01-18 18:37
Here's the patch -- the code for % and %= is in place for bytes and bytearray;  I still need to get the doc patch done.  I'll commit Wednesday-ish barring problems.

Big question
============

Background
----------
There is a Python C ABI function called PyBytes_FromFormat which is used to create bytes objects using the %-interpolation format (it takes a c string and none-or-many c args).

Actual Question
---------------
Should PyBytes_FromFormat also support the new codes of %a and %b ?

My Thoughts
-----------
Writing things down is good!  %a and %b are both for Python level arguments, not C-level arguments, so %a and %b would make no sense.
msg234266 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-01-18 18:43
Thanks, Victor, for the feedback.

I was able to figure out some more of the C side thanks to Georg, and I think the code is looking pretty good.

There may be room for optimization by having the bytes code call the unicode implementation for more of the conversions (currently it's only using the unicode fromlong function), but the docs should happen before that.  ;)
msg234591 - (view) Author: Roundup Robot (python-dev) Date: 2015-01-24 04:06
New changeset 8d802fb6ae32 by Ethan Furman in branch 'default':
Issue20284: Implement PEP461
https://hg.python.org/cpython/rev/8d802fb6ae32
msg234726 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-01-26 08:44
It's strange that %s format raises an error about the %b format:

>>> b's? %s' % 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: %b requires bytes, or an object that implements __bytes__, not 'int'
msg234746 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-01-26 14:09
it does seem a bit odd -- on the other hand, %s is an alias for %b, is deprecated for new 3-only code, and this might help serve as a reminder of that.

Or we could fix it.  ;)
msg234750 - (view) Author: Roundup Robot (python-dev) Date: 2015-01-26 15:45
New changeset db7ec64aac39 by Victor Stinner in branch 'default':
Issue #20284: Fix a compilation warning on Windows
https://hg.python.org/cpython/rev/db7ec64aac39
History
Date User Action Args
2015-01-29 23:26:31ethan.furmansetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2015-01-26 15:45:05python-devsetmessages: + msg234750
2015-01-26 14:09:42ethan.furmansetmessages: + msg234746
2015-01-26 08:44:12vstinnersetmessages: + msg234726
2015-01-24 04:06:04python-devsetnosy: + python-dev
messages: + msg234591
2015-01-18 18:43:07ethan.furmansetmessages: + msg234266
2015-01-18 18:37:23ethan.furmansetfiles: + issue20284.stoneleaf.04.patch

messages: + msg234265
2015-01-18 14:28:01vstinnersetmessages: + msg234246
2015-01-17 18:37:11ethan.furmansetfiles: + issue20284.stoneleaf.03.patch

messages: + msg234189
2015-01-14 07:28:53ethan.furmansetfiles: + issue20284.stoneleaf.02.patch

messages: + msg234012
stage: needs patch -> patch review
2015-01-12 08:39:26ethan.furmansetmessages: + msg233875
stage: patch review -> needs patch
2015-01-06 18:40:32ethan.furmansetfiles: + issue20284.stoneleaf.tests_only.01.patch

messages: + msg233548
2015-01-06 16:19:36vstinnersetmessages: + msg233546
2015-01-06 16:04:41ethan.furmansetfiles: + issue20284.stoneleaf.01.patch

messages: + msg233545
2015-01-06 11:25:52ncoghlansetmessages: + msg233524
2015-01-05 10:19:03vstinnersetmessages: + msg233458
2015-01-05 10:14:00vstinnersetmessages: + msg233456
2014-10-05 04:09:50ncoghlanlinkissue22555 dependencies
2014-07-26 07:03:11ncoghlansetnosy: + ncoghlan
messages: + msg224024
2014-03-28 05:26:11ethan.furmansetassignee: ethan.furman
messages: + msg215019
2014-02-10 21:10:47martin.pantersetnosy: + martin.panter
2014-01-21 01:03:57Arfreversetnosy: + Arfrever
2014-01-20 22:04:04naschemesetfiles: + 04-py2-eq.patch
2014-01-20 22:03:51naschemesetfiles: + 03-py2-flag.patch
2014-01-20 22:03:37naschemesetfiles: + 02-code-a.patch
2014-01-20 22:03:22naschemesetfiles: + 01-pep461.patch

messages: + msg208581
title: patch to implement %-interpolation for bytes (roughly PEP 461) -> patch to implement PEP 461 (%-interpolation for bytes)
2014-01-20 22:00:04naschemesetfiles: - bytes_mod_v4.patch
2014-01-20 21:59:55naschemesetfiles: - bytes_mod_v3.patch
2014-01-20 21:59:45naschemesetfiles: - bytes_mod_v2.patch
2014-01-20 21:59:36naschemesetfiles: - bytes_mod.patch
2014-01-18 01:16:28ethan.furmansetnosy: + ethan.furman
2014-01-17 19:33:06naschemesetfiles: + bytes_mod_v4.patch

messages: + msg208356
title: proof for concept patch for bytes formatting methods -> patch to implement %-interpolation for bytes (roughly PEP 461)
2014-01-17 09:40:42naschemesetfiles: + pep-draft.txt
2014-01-17 09:40:05naschemesetfiles: + bytes_mod_v3.patch
keywords: + patch
messages: + msg208331
2014-01-17 09:06:35vstinnersetmessages: + msg208329
2014-01-17 09:06:22vstinnersetnosy: + vstinner
2014-01-17 08:40:26naschemesetkeywords: + needs review, - patch
files: + bytes_mod_v2.patch
messages: + msg208328
2014-01-17 00:57:04eric.smithsetnosy: + eric.smith
2014-01-16 21:33:09naschemecreate