Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate out documentation of binary sequence methods #65976

Closed
ncoghlan opened this issue Jun 16, 2014 · 13 comments
Closed

Separate out documentation of binary sequence methods #65976

ncoghlan opened this issue Jun 16, 2014 · 13 comments
Assignees
Labels
docs Documentation in the Doc dir type-feature A feature request or enhancement

Comments

@ncoghlan
Copy link
Contributor

BPO 21777
Nosy @malemburg, @gvanrossum, @terryjreedy, @ncoghlan, @ezio-melotti, @cjerdonek, @vadmium, @zware
Files
  • separate_binary_sequence_docs.diff: Work in progress patch to show proposed structure
  • separate_binary_sequence_docs_v2.diff: Converted the "ASCII by default" category
  • separate_binary_sequence_docs_v3.diff: Just splitlines() and zfill() to go in initial draft
  • separate_binary_sequence_docs_v4.diff: End of initial pass through all methods
  • separate_binary_sequence_docs_v5.diff: Review comments addressed
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ncoghlan'
    closed_at = <Date 2014-08-09.06:24:47.376>
    created_at = <Date 2014-06-16.11:36:45.350>
    labels = ['type-feature', 'docs']
    title = 'Separate out documentation of binary sequence methods'
    updated_at = <Date 2015-03-02.09:29:35.041>
    user = 'https://github.com/ncoghlan'

    bugs.python.org fields:

    activity = <Date 2015-03-02.09:29:35.041>
    actor = 'ezio.melotti'
    assignee = 'ncoghlan'
    closed = True
    closed_date = <Date 2014-08-09.06:24:47.376>
    closer = 'ncoghlan'
    components = ['Documentation']
    creation = <Date 2014-06-16.11:36:45.350>
    creator = 'ncoghlan'
    dependencies = []
    files = ['35654', '35946', '36046', '36101', '36309']
    hgrepos = []
    issue_num = 21777
    keywords = ['patch']
    message_count = 13.0
    messages = ['220711', '222978', '222979', '222991', '222992', '223734', '223735', '224025', '225067', '225068', '225098', '225099', '237037']
    nosy_count = 9.0
    nosy_names = ['lemburg', 'gvanrossum', 'terry.reedy', 'ncoghlan', 'ezio.melotti', 'chris.jerdonek', 'python-dev', 'martin.panter', 'zach.ware']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue21777'
    versions = ['Python 3.4', 'Python 3.5']

    @ncoghlan
    Copy link
    Contributor Author

    There are currently no dedicated docs for the bytes and bytearray methods - the relevant section just refers back to the str methods. This isn't sufficient, since the str methods cover of lot of stuff related to Unicode that isn't relevant to the binary sequence types, and it doesn't cleanly cover the differences either (like the fact that several methods accept integers).

    I've started work on a patch that documents the binary APIs explicitly, although bytes and bytearray still share docs. The methods are grouped into three categories:

    • work with arbitrary binary data
    • assume ASCII compatibility by default, but can still be used with arbitrary binary data when given suitable arguments
    • can only be used safely with data in an ASCII compatible format

    I've worked through and updated the new entries for the first category, but the latter two categories are still just copy-and-paste from the str docs.

    @ncoghlan ncoghlan self-assigned this Jun 16, 2014
    @ncoghlan ncoghlan added docs Documentation in the Doc dir type-feature A feature request or enhancement labels Jun 16, 2014
    @ncoghlan
    Copy link
    Contributor Author

    v2 patch converts the second category of functions. This conversion highlighted the lack of good examples in the str.split() docs, as well as some over and underspecification in the behaviour of the centering and justification methods (guarantees about object identity that don't hold for bytearray, failure to note that the default fill character is specifically an ASCII space - Unicode has more than one space type), so I also fixed those.

    Added Guido to the nosy list - Guido, if you could cast your eye over this and at least give a +1 to the general approach, that would be great, otherwise I'll just go ahead and merge it some time after I finish converting the final category (which I expect will be no later than the PyCon AU sprints in early August, and potentially sooner)

    @malemburg
    Copy link
    Member

    Why are you removing guarantees like these from the str docs:

    "The original string is returned if width is less than or equal to len(s)."

    ?

    This doesn't seem to have anything to do with documenting bytes and bytearrays.

    @ncoghlan
    Copy link
    Contributor Author

    On 13 Jul 2014 18:39, "Marc-Andre Lemburg" report@bugs.python.org wrote:

    Marc-Andre Lemburg added the comment:

    Why are you removing guarantees like these from the str docs:

    "The original string is returned if width is less than or equal to
    len(s)."

    Because it's untrue for bytearray, and possible object reuse is a general
    characteristic of immutability for str and bytes. If another implementation
    makes a copy for some reason, it would still be considered "Python".

    Since the sentence thus conveys no useful information, I removed it from
    both the text and binary variants rather than coming up with appropriate
    wording to indicate that the behaviour of returning a new reference to the
    existing object when no content changes are needed doesn't apply to the
    mutable bytearray.

    ?

    This doesn't seem to have anything to do with documenting bytes and
    bytearrays.

    ----------
    nosy: +lemburg


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue21777\>


    @gvanrossum
    Copy link
    Member

    On Sun, Jul 13, 2014 at 7:00 PM, Nick Coghlan <report@bugs.python.org>
    wrote:

    Nick Coghlan added the comment:

    On 13 Jul 2014 18:39, "Marc-Andre Lemburg" report@bugs.python.org wrote:

    Marc-Andre Lemburg added the comment:

    Why are you removing guarantees like these from the str docs:

    "The original string is returned if width is less than or equal to
    len(s)."

    Because it's untrue for bytearray, and possible object reuse is a general
    characteristic of immutability for str and bytes. If another implementation
    makes a copy for some reason, it would still be considered "Python".

    Since the sentence thus conveys no useful information, I removed it from
    both the text and binary variants rather than coming up with appropriate
    wording to indicate that the behaviour of returning a new reference to the
    existing object when no content changes are needed doesn't apply to the
    mutable bytearray.

    That feels like overreacting. It *is* useful to know about this guarantee,
    and it would be better if we could somehow require it rather than claim it
    doesn't matter. And before you counter with examples of other CPython
    behaviors that *shouldn't* be guaranteed across implementations, I am
    talking about this specific case, and every case needs to be examined on
    its merits separately. It is possible that in the end we'll decide this
    particular guarantee is not worth having -- but I think that should not be
    decided by a refactoring of the docs.

    @ncoghlan
    Copy link
    Contributor Author

    3rd in progress draft - converted most of the "inherently assumes ASCII" docs now. I think this set of changes really makes it clear how non-trivial it actually is to infer the binary domain behaviour from the str docs, which have all sorts of Unicode complications. You can't easily infer the behaviour from the Python 2 docs either, since these operations were locale dependent for Python 2 str objects.

    @ncoghlan
    Copy link
    Contributor Author

    Note I haven't added back the immutability guarantees yet - I'll do that before declaring this ready for final review.

    @ncoghlan
    Copy link
    Contributor Author

    OK, I've completed the initial pass through all the methods. Remaining items:

    • add back the guarantees where str will return the same object, add those guarantees for bytes where applicable
    • address the review comments from Zach and Ezio

    There are a couple of review comments about removing duplication that I'd like to skip addressing for now. I think they're reasonable ideas, but I also think it's a lot easier to go wrong with DRY in docs than it is in code. Indeed, this whole matter of not documenting the bytes behaviour in the first place was a matter of assuming folks could just infer the binary behaviour from the text behaviour.

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Aug 8, 2014

    v5 has all the review comments I accepted as being in scope addressed, including the restoration/addition of the notes about returning the object unchanged for center(), ljust(), rjust() and zfill() when the field width is less than or equal to the length of the string.

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Aug 8, 2014

    I think this is done now - absent any major objections, I'll push it live in a couple of days time.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 9, 2014

    New changeset e750d2b44c1d by Nick Coghlan in branch '3.4':
    Issue bpo-21777: separate docs for binary sequence methods
    http://hg.python.org/cpython/rev/e750d2b44c1d

    New changeset e205bce4cc0a by Nick Coghlan in branch 'default':
    Merge bpo-21777 from 3.4
    http://hg.python.org/cpython/rev/e205bce4cc0a

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Aug 9, 2014

    Merged after reviews from Zach & Ezio.

    Zach, Ezio - if there are any other refactorings from the reviews that you'd like to pursue, consider pulling them out to separate issues so we don't forget about them.

    @ncoghlan ncoghlan closed this as completed Aug 9, 2014
    @ezio-melotti
    Copy link
    Member

    Zach, Ezio - if there are any other refactorings from the reviews that
    you'd like to pursue, consider pulling them out to separate issues so
    we don't forget about them.

    See bpo-23560.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants