This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Separate out documentation of binary sequence methods
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ncoghlan Nosy List: chris.jerdonek, ezio.melotti, gvanrossum, lemburg, martin.panter, ncoghlan, python-dev, terry.reedy, zach.ware
Priority: normal Keywords: patch

Created on 2014-06-16 11:36 by ncoghlan, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
separate_binary_sequence_docs.diff ncoghlan, 2014-06-16 11:36 Work in progress patch to show proposed structure review
separate_binary_sequence_docs_v2.diff ncoghlan, 2014-07-13 22:30 Converted the "ASCII by default" category review
separate_binary_sequence_docs_v3.diff ncoghlan, 2014-07-23 12:11 Just splitlines() and zfill() to go in initial draft review
separate_binary_sequence_docs_v4.diff ncoghlan, 2014-07-26 07:44 End of initial pass through all methods review
separate_binary_sequence_docs_v5.diff ncoghlan, 2014-08-08 12:37 Review comments addressed review
Messages (13)
msg220711 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-06-16 11:36
There are currently no dedicated docs for the bytes and bytearray methods - the relevant section just refers back to the str methods. This isn't sufficient, since the str methods cover of lot of stuff related to Unicode that isn't relevant to the binary sequence types, and it doesn't cleanly cover the differences either (like the fact that several methods accept integers).

I've started work on a patch that documents the binary APIs explicitly, although bytes and bytearray still share docs. The methods are grouped into three categories:

- work with arbitrary binary data
- assume ASCII compatibility by default, but can still be used with arbitrary binary data when given suitable arguments
- can only be used safely with data in an ASCII compatible format

I've worked through and updated the new entries for the first category, but the latter two categories are still just copy-and-paste from the str docs.
msg222978 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-13 22:30
v2 patch converts the second category of functions. This conversion highlighted the lack of good examples in the str.split() docs, as well as some over and underspecification in the behaviour of the centering and justification methods (guarantees about object identity that don't hold for bytearray, failure to note that the default fill character is specifically an ASCII space - Unicode has more than one space type), so I also fixed those.

Added Guido to the nosy list - Guido, if you could cast your eye over this and at least give a +1 to the general approach, that would be great, otherwise I'll just go ahead and merge it some time after I finish converting the final category (which I expect will be no later than the PyCon AU sprints in early August, and potentially sooner)
msg222979 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2014-07-13 22:39
Why are you removing guarantees like these from the str docs:

"The original string is returned if *width* is less than or equal to ``len(s)``."

?

This doesn't seem to have anything to do with documenting bytes and bytearrays.
msg222991 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-14 02:00
On 13 Jul 2014 18:39, "Marc-Andre Lemburg" <report@bugs.python.org> wrote:
>
>
> Marc-Andre Lemburg added the comment:
>
> Why are you removing guarantees like these from the str docs:
>
> "The original string is returned if *width* is less than or equal to
``len(s)``."

Because it's untrue for bytearray, and possible object reuse is a general
characteristic of immutability for str and bytes. If another implementation
makes a copy for some reason, it would still be considered "Python".

Since the sentence thus conveys no useful information, I removed it from
both the text and binary variants rather than coming up with appropriate
wording to indicate that the behaviour of returning a new reference to the
existing object when no content changes are needed doesn't apply to the
mutable bytearray.

>
> ?
>
> This doesn't seem to have anything to do with documenting bytes and
bytearrays.
>
> ----------
> nosy: +lemburg
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue21777>
> _______________________________________
msg222992 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-07-14 02:07
On Sun, Jul 13, 2014 at 7:00 PM, Nick Coghlan <report@bugs.python.org>
wrote:

>
> Nick Coghlan added the comment:
>
> On 13 Jul 2014 18:39, "Marc-Andre Lemburg" <report@bugs.python.org> wrote:
> >
> >
> > Marc-Andre Lemburg added the comment:
> >
> > Why are you removing guarantees like these from the str docs:
> >
> > "The original string is returned if *width* is less than or equal to
> ``len(s)``."
>
> Because it's untrue for bytearray, and possible object reuse is a general
> characteristic of immutability for str and bytes. If another implementation
> makes a copy for some reason, it would still be considered "Python".
>
> Since the sentence thus conveys no useful information, I removed it from
> both the text and binary variants rather than coming up with appropriate
> wording to indicate that the behaviour of returning a new reference to the
> existing object when no content changes are needed doesn't apply to the
> mutable bytearray.
>

That feels like overreacting. It *is* useful to know about this guarantee,
and it would be better if we could somehow require it rather than claim it
doesn't matter. And before you counter with examples of other CPython
behaviors that *shouldn't* be guaranteed across implementations, I am
talking about this specific case, and every case needs to be examined on
its merits separately. It is possible that in the end we'll decide this
particular guarantee is not worth having -- but I think that should not be
decided by a refactoring of the docs.
msg223734 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-23 12:11
3rd in progress draft - converted most of the "inherently assumes ASCII" docs now. I think this set of changes really makes it clear how non-trivial it actually is to infer the binary domain behaviour from the str docs, which have all sorts of Unicode complications. You can't easily infer the behaviour from the Python 2 docs either, since these operations were locale dependent for Python 2 str objects.
msg223735 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-23 12:12
Note I haven't added back the immutability guarantees yet - I'll do that before declaring this ready for final review.
msg224025 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-26 07:44
OK, I've completed the initial pass through all the methods. Remaining items:

* add back the guarantees where str will return the same object, add those guarantees for bytes where applicable
* address the review comments from Zach and Ezio

There are a couple of review comments about removing duplication that I'd like to skip addressing for now. I think they're reasonable ideas, but I also think it's a lot easier to go wrong with DRY in docs than it is in code. Indeed, this whole matter of not documenting the bytes behaviour in the first place was a matter of assuming folks could just infer the binary behaviour from the text behaviour.
msg225067 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-08 12:37
v5 has all the review comments I accepted as being in scope addressed, including the restoration/addition of the notes about returning the object unchanged for center(), ljust(), rjust() and zfill() when the field width is less than or equal to the length of the string.
msg225068 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-08 12:39
I think this is done now - absent any major objections, I'll push it live in a couple of days time.
msg225098 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-08-09 06:19
New changeset e750d2b44c1d by Nick Coghlan in branch '3.4':
Issue #21777: separate docs for binary sequence methods
http://hg.python.org/cpython/rev/e750d2b44c1d

New changeset e205bce4cc0a by Nick Coghlan in branch 'default':
Merge #21777 from 3.4
http://hg.python.org/cpython/rev/e205bce4cc0a
msg225099 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-09 06:24
Merged after reviews from Zach & Ezio.

Zach, Ezio - if there are any other refactorings from the reviews that you'd like to pursue, consider pulling them out to separate issues so we don't forget about them.
msg237037 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2015-03-02 09:29
> Zach, Ezio - if there are any other refactorings from the reviews that
> you'd like to pursue, consider pulling them out to separate issues so
> we don't forget about them.

See #23560.
History
Date User Action Args
2022-04-11 14:58:05adminsetgithub: 65976
2015-03-02 09:29:35ezio.melottisetmessages: + msg237037
2014-08-09 06:24:47ncoghlansetstatus: open -> closed
resolution: fixed
messages: + msg225099

stage: commit review -> resolved
2014-08-09 06:19:59python-devsetnosy: + python-dev
messages: + msg225098
2014-08-08 12:39:23ncoghlansetmessages: + msg225068
stage: patch review -> commit review
2014-08-08 12:37:18ncoghlansetfiles: + separate_binary_sequence_docs_v5.diff

messages: + msg225067
2014-07-26 07:44:42ncoghlansetfiles: + separate_binary_sequence_docs_v4.diff

messages: + msg224025
2014-07-23 12:14:51ezio.melottisetnosy: + ezio.melotti, chris.jerdonek, zach.ware

stage: patch review
2014-07-23 12:12:20ncoghlansetmessages: + msg223735
2014-07-23 12:11:07ncoghlansetfiles: + separate_binary_sequence_docs_v3.diff

messages: + msg223734
2014-07-14 02:07:10gvanrossumsetmessages: + msg222992
2014-07-14 02:00:39ncoghlansetmessages: + msg222991
2014-07-13 22:39:38lemburgsetnosy: + lemburg
messages: + msg222979
2014-07-13 22:30:17ncoghlansetfiles: + separate_binary_sequence_docs_v2.diff
nosy: + gvanrossum
messages: + msg222978

2014-06-20 21:06:43terry.reedysetnosy: + terry.reedy
2014-06-17 21:42:18martin.pantersetnosy: + martin.panter
2014-06-16 11:36:45ncoghlancreate