Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Lib Doc Sequence Types Section #49216

Closed
terryjreedy opened this issue Jan 16, 2009 · 30 comments
Closed

Improving Lib Doc Sequence Types Section #49216

terryjreedy opened this issue Jan 16, 2009 · 30 comments
Assignees
Labels
docs Documentation in the Doc dir type-feature A feature request or enhancement

Comments

@terryjreedy
Copy link
Member

BPO 4966
Nosy @birkenfeld, @rhettinger, @terryjreedy, @ncoghlan, @ezio-melotti, @merwok
Files
  • stdtypes.html: First cut - split into 3 sections, new Sequence Types section updated
  • 0a49f6382467.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ncoghlan'
    closed_at = <Date 2012-08-20.07:14:23.346>
    created_at = <Date 2009-01-16.23:43:37.613>
    labels = ['type-feature', 'docs']
    title = 'Improving Lib Doc Sequence Types Section'
    updated_at = <Date 2012-08-20.07:14:23.344>
    user = 'https://github.com/terryjreedy'

    bugs.python.org fields:

    activity = <Date 2012-08-20.07:14:23.344>
    actor = 'python-dev'
    assignee = 'ncoghlan'
    closed = True
    closed_date = <Date 2012-08-20.07:14:23.346>
    closer = 'python-dev'
    components = ['Documentation']
    creation = <Date 2009-01-16.23:43:37.613>
    creator = 'terry.reedy'
    dependencies = []
    files = ['24314', '24511']
    hgrepos = ['106']
    issue_num = 4966
    keywords = ['patch']
    message_count = 30.0
    messages = ['79988', '114849', '114859', '130531', '130534', '130535', '134953', '136102', '136108', '143293', '151774', '151777', '151802', '151805', '151893', '151905', '151910', '152097', '152098', '152100', '152143', '152153', '152160', '152240', '152277', '152307', '152308', '152312', '153187', '168630']
    nosy_count = 10.0
    nosy_names = ['georg.brandl', 'rhettinger', 'terry.reedy', 'dcbbcd', 'ncoghlan', 'ezio.melotti', 'eric.araujo', 'docs@python', 'python-dev', 'anasofiapaixao']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue4966'
    versions = ['Python 3.3', 'Python 3.4']

    @terryjreedy
    Copy link
    Member Author

    Issues and suggestions for Python Standard Library / Built-in Types /
    "Sequence Types — str, bytes, bytearray, list, tuple, range"

    1. Put subsections in the same order as in the title and main section.
      In particular, move bytes/bytearray subsection up to follow string
      subsection and move range subsection down to bottom of this grouping.

    2. String paragraph (the second) ends with the rather wordy sentence
      "In addition to the functionality described here, there are also
      string-specific methods described in the String Methods section."
      where 'String Methods' is a forward link to that subsection.
      Add similar possibly less wordy sentence-links for other types.

    In particular, end next (byte/bytearray) paragraph with something like
    "For specific methods, see String Methods and Bytes and Byte Array
    Methods. For bytearrays, also see Mutable Sequence Types."

    End the list/tuple paragraph after the Warning with
    "For list methods, see Mutable Sequence Types."

    After the following range paragraph, the following could be added:
    "For more, see Range Type."
    However, there is almost nothing more said (perhaps there was before
    range objects were stripped down), so I suggest deleting that subsection
    and adding anything more that is not duplication to the end of the
    beginning section's range paragraph. If tuples do not need their own
    section, range needs one even less.

    1. Bytes and Byte Array Method subsection correctly says that bytes and
      bytearrays do not have (senseless) .encode but neglects to document the
      corresponding inverse .decode method (while it does mention the
      specialized .fromhex decoding method).

    Also add .isdecimal, .isnumeric, .isprintable, and .maketrans to the
    list of exceptions in the first sentence. (Based on dir(str), dir(bytes)
    in 3.0)

    1. I see three problems with the current documentation of count and
      index methods.
      a) They are documented under both String Methods and Mutable Sequences.
      They do not really belong in the latter, which lists "additional
      operations that allow in-place modification of the object", because they
      do not mutate.
      b) Tuples do not have their own a section, but (unlike range objects) do
      have a couple of methods: count and index. Being neither string-like
      nor mutable, their having methods is undocumented.
      c) Bytearrays, on the other hand, are both string-like and mutable. So
      they are (mis)documented as having two slightly different versions of
      these methods. (They actually use the string-like definition, of course.)

    Consequently, the definitions of count and index in the Mutable Sequence
    subsection are not mutable sequence definitions but are really
    list/tuple definitions. So I suggest one of two variations:
    A) In the main section, add the list/tuple version of .count() and
    .index() to the table of common sequence operations with a footnote
    either explaining the difference for the string group or referring to
    String Methods.
    B) In the main section, add both versions to the table with footnotes
    explaining which is which.

    The count/index/tuple doc issue has come up more than once on c.l.p.

    @terryjreedy terryjreedy added the docs Documentation in the Doc dir label Jan 16, 2009
    @ezio-melotti ezio-melotti added the type-feature A feature request or enhancement label Jun 6, 2009
    @BreamoreBoy BreamoreBoy mannequin assigned docspython and unassigned birkenfeld Jul 10, 2010
    @merwok
    Copy link
    Member

    merwok commented Aug 24, 2010

    I’m interested in making a series of patches corresponding to your suggestions, unless you or someone else want to do them.

    I’m assigning to myself so that I don’t forget (I won’t have time for a couple weeks), if someone wants to do it as an easy first patch (Terry did most of the work :), it’s okay, just remove the assignment from me.

    @merwok merwok assigned merwok and unassigned docspython Aug 24, 2010
    @terryjreedy
    Copy link
    Member Author

    Please go ahead. I will gladly review anything you do.

    @ezio-melotti
    Copy link
    Member

    This is maybe out of the scope of this issue, but I would like to see all the basic data types on single page on their own. The current page0 has some section about data types mixed with sections about operations, comparisons, and other things, followed by less-"used" types. The page also contains lot of informations and it's not easy to browse (42 screens on a 24" monitor).

    Ideally the structure should be something like:

    1. True, False, None
    2. int, float(, long, complex)
    3. str, unicode, list, tuple(, bytearray, buffer, xrange)
    4. dict
    5. set(, frozenset)

    (where the types in () are considered less important -- so maybe described in detail later or in another page). The page can list common operations for each group and their methods, but leaving things like the string formatting operations to another page/section.

    @terryjreedy
    Copy link
    Member Author

    I have started learning .rst, so I hope to work on this in the not too distant future.

    Ezio -- I have also noticed that some chapters are too long to be easily scrolled around in (unittest is another), and either need an index at the top (like with built-in functions) or separate files (or both)

    @ezio-melotti
    Copy link
    Member

    The advantage of having one big page is that you can ctrl+f easily without having to go back and forth from different pages
    On the other hand, the page is not easy to browse (especially on small screens, mobile devices, old/slow pcs).

    In this case I don't think that splitting the page is a problem, because the page contains information about different and fairly unrelated thing.

    With pages like unittest or logging is not so easy to split because while working with them you might need to use several different functions/methods/classes and having their docs on two or more page will be annoying. (FWIW I've been working a lot on the unittest doc to make it more "compact" and easier to browse, but there's still work to do. We have also been considering to make a page for unittest "users" that explains how to write tests and use the assert methods and another for unittest "developers" that explains how to write test runners, suites and more advanced stuff.)

    BTW .rst is really easy, and if you are not sure about something just try to build the doc with "make html" and see if it complains and if the resulting page looks OK. Also see http://docs.python.org/documenting/index.html.

    @ezio-melotti
    Copy link
    Member

    See also bpo-11975 and bpo-11976.

    @anasofiapaixao
    Copy link
    Mannequin

    anasofiapaixao mannequin commented May 16, 2011

    I was taking a look into the possibility of splitting this page into several pages, and wondered: could the contents of the Comparisons and the Boolean operations sections just be merged into Python Reference / Expressions, and then deleted from this page altogether? They are not even data types but operators, after all.

    @ezio-melotti
    Copy link
    Member

    I think it should be OK. The stdtypes page could then mention type-specific behavior in the types' sections (e.g. <, <=, =>, > for sets) and link to the language reference for the general behavior.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 1, 2011

    Bringing a suggestion over from bpo-12874, I think it may be worth splitting the current "Sequence Types" section into 3 pieces that all appear in the top level table of contents for the library reference:

    4.6 Sequence Types - list, tuple, range
    4.7 Text Sequence Type - str
    4.8 Binary Data Sequence Types - bytes, bytearray, memoryview

    @ncoghlan
    Copy link
    Contributor

    Éric, are you still planning to work on this? Otherwise I'll make a first pass at doing the split into 3 sections (as per my earlier comment) and implementing some of Terry's suggestions.

    Linked Hg repo is a 2.7 based feature branch where I'll be publishing my changes as I make them.

    @ezio-melotti
    Copy link
    Member

    Éric is without Internet till the end of the month, so I think it's OK if you go ahead and start working on this.

    @birkenfeld
    Copy link
    Member

    +1 for splitting.

    @rhettinger
    Copy link
    Contributor

    +1 for Nick's suggested breakout:

    4.6 Sequence Types - list, tuple, range
    4.7 Text Sequence Type - str
    4.8 Binary Data Sequence Types - bytes, bytearray, memoryview

    @ncoghlan
    Copy link
    Contributor

    I realised that the lack of a clear binary/text distinction would make it messy to do the split docs in 2.7, so I made a new branch based on 3.2 instead (link to repo updated accordingly).

    @ncoghlan ncoghlan assigned ncoghlan and unassigned merwok Jan 24, 2012
    @ncoghlan
    Copy link
    Contributor

    Pushed an initial cut to my sandbox branch. Built HTML is attached so you can get a general idea of how it looks (links, etc, obviously won't work).

    So far, I have made the split into 3 sections and updated the new (shorter) Sequence Types section.

    That section now has 6 subsections:

    • Common Sequence Operations
    • Immutable Sequence Operations (very short, just mentions hash support)
    • Mutable Sequence Operations
    • Lists
    • Tuples
    • Ranges

    I haven't really touched the Text and Binary sections as yet - the only changes there are things that I copied down before removing them from the updated Sequence Types section.

    @ncoghlan
    Copy link
    Contributor

    Note: without the Python docs CSS to create the sidebar, the internal table of contents appears at the *bottom* of the rendered page.

    Really, reviewing this sensibly is probably going to require building the docs locally after using hg pull to retrieve the changes from my sandbox.

    @ncoghlan
    Copy link
    Contributor

    Branch status update:

    • Text Sequence Types section updated to reflect the new structure
    • changed the prose that describes the relationship between printf-style formatting and the str.format method (deliberately removing the implication that the former is any real danger of disappearing - it's simply not practical for us to seriously contemplate killing it off)
    • in the top level index, I split the old "String Services" section into "Text Processing Services" and "Binary Data Services". The latter contains 'struct' and 'codecs', the former contains everthing else that used to be in String Services. The index pages for the two sections do cross-reference modules in the other section a bit (Text Processing includes a pointer directly to the codecs module, Binary Data includes pointers to both re and difflib). The real driver for this change was that "struct" has no place in a "String Services" section in Py3k. Since "codecs" could really have gone in either section, I mainly moved it to the binary section so that 'struct' wasn't the only module in there.

    Major remaining update is to the Binary Sequence Types section (since I haven't really reviewed that at all after rearranging things.

    @ncoghlan
    Copy link
    Contributor

    One other things the branch doesn't currently sort out is the official signature of count() and index().

    In 3.2, for *all* of str, bytes, bytearray, tuple, list, range, the index() method takes the optional start:stop parameters.

    collections.Sequence.index(), OTOH, does not.

    count() splits the field more evenly: str, bytes, bytearray accept the extra parameters, but list, tuple, range and collections.Sequence only support counting values in the whole sequence.

    @ezio-melotti
    Copy link
    Member

    Have you considered/planned to rework a bit the beginning of the page too?
    (Technically the issue is about the Sequence types section, but the whole page could be improved.)
    IMHO the sections about Truth value testing, Boolean operations, and Comparison are out of place there, and True/False/None should be described instead.
    The idea is that a developer new to Python should be able to come to this page, take a look at the headers and figure out what the main types are (what you did so far is already a good step in the right direction).

    @ncoghlan
    Copy link
    Contributor

    Yeah, the basic layout of this entire section has been in place for a *long* time (http://docs.python.org/release/1.4/lib/node4.html#SECTION00310000000000000000)

    Some aspects haven't really aged all that well, as people have made minimalist changes to document new features without necessarily stepping back to see if the overall structure still makes sense.

    However, rather than dumping one massive patch on python-checkins, I think it makes sense to try to tackle it by section (i.e. sequences + related changes for now, then look at mappings, sets, truth values, comparisons and numbers separately).

    One thing I do plan to do is a quick scan for places that reference into the sequence types section to see if they should be adjusted (e.g. see if there's some duplication in the language reference that could be reduced, or cross-references in the glossary to add or update)

    @ncoghlan
    Copy link
    Contributor

    One other point... the branch is actually now relative to default, not 3.2. While that was due to a merging mistake on my part, it also means I can legitimately ignore the narrow/wide build distinction in the section on strings.

    @ncoghlan
    Copy link
    Contributor

    I finished off the binary data section, so the first draft of the update is now complete in the bitbucket repo.

    @ezio-melotti
    Copy link
    Member

    One other point... the branch is actually now relative to default, not
    3.2. While that was due to a merging mistake on my part, it also means
    I can legitimately ignore the narrow/wide build distinction in the
    section on strings.

    So will this go on 3.3 only or are you planning to push it on 3.2(/2.7) too?

    @ncoghlan
    Copy link
    Contributor

    Trying to make this change in 2.7 would actually be a bit of a nightmare - how do you cleanly split documentation of the binary data and text processing sequence types when "str" is used for both?

    The change would be *mostly* feasible in 3.2 (that's why I started my branch from there), but there are still some sharp edges that go away in 3.3 (mainly the narrow/wide Unicode split).

    So unless anyone is really keen to see the update in 3.2, my current plan is to leave the maintenance versions alone and only update it for 3.3. Going that way also provides better opportunities for post-checkin feedback from folks that aren't set up to build the docs themselves (rebuilding the docs is fairly straightforward on *nix, but Terry tells me that using Windows complicates that process quite a bit).

    @terryjreedy
    Copy link
    Member Author

    I agree with 3.3 only. This might not be ready for 3.2.3 anyway, depending on how soon hash patch is ready, and if not, it becomes a somewhat moot point as new people should then download 3.3.0 instead of 3.2.4 next August.

    @birkenfeld
    Copy link
    Member

    ISTM that not doing this will make maintenance harder. For 2.7 I agree that there is no clear boundary to make, but 3.2 should be split up as well to ease merging of updates.

    @ncoghlan
    Copy link
    Contributor

    Good point, without doing the split in both, any doc merges in this section will be a nightmare. OK, with the caveat that the initial 3.2 version may gloss over some issues that no longer apply in 3.3 (specifically the narrow/wide split), I'll make a new branch in the sandbox so the changes will be once again based on 3.2.

    @ncoghlan
    Copy link
    Contributor

    Just noting that this has slipped a bit down my Python to-do list (there are other things I want to focus on before the first 3.3 alpha).

    I'll get back to it at some point, but if someone want to take my branch and run with it in the meantime, please feel free.

    @ncoghlan ncoghlan removed their assignment Feb 12, 2012
    @ncoghlan ncoghlan self-assigned this Aug 20, 2012
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 20, 2012

    New changeset 463f52d20314 by Nick Coghlan in branch 'default':
    Close bpo-4966: revamp the sequence docs in order to better explain the state of modern Python
    http://hg.python.org/cpython/rev/463f52d20314

    @python-dev python-dev mannequin closed this as completed Aug 20, 2012
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants