Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify flag case in re module docstring #84197

Closed
cool-RR mannequin opened this issue Mar 19, 2020 · 16 comments
Closed

Clarify flag case in re module docstring #84197

cool-RR mannequin opened this issue Mar 19, 2020 · 16 comments
Assignees
Labels
3.9 only security fixes docs Documentation in the Doc dir type-feature A feature request or enhancement

Comments

@cool-RR
Copy link
Mannequin

cool-RR mannequin commented Mar 19, 2020

BPO 40016
Nosy @terryjreedy, @cool-RR, @serhiy-storchaka, @miss-islington
PRs
  • bpo-40016: re docstring: Clarify relationship of inline and argument flags #19078
  • [3.8] bpo-40016: re docstring: Clarify relationship of inline and argument flags (GH-19078) #19161
  • [3.7] bpo-40016: re docstring: Clarify relationship of inline and argument flags (GH-19078) #19162
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/terryjreedy'
    closed_at = <Date 2020-03-25.21:59:45.363>
    created_at = <Date 2020-03-19.17:40:50.780>
    labels = ['type-feature', '3.9', 'docs']
    title = 'Clarify flag case in `re` module docstring'
    updated_at = <Date 2020-03-25.21:59:45.363>
    user = 'https://github.com/cool-RR'

    bugs.python.org fields:

    activity = <Date 2020-03-25.21:59:45.363>
    actor = 'terry.reedy'
    assignee = 'terry.reedy'
    closed = True
    closed_date = <Date 2020-03-25.21:59:45.363>
    closer = 'terry.reedy'
    components = ['Documentation']
    creation = <Date 2020-03-19.17:40:50.780>
    creator = 'cool-RR'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40016
    keywords = ['patch']
    message_count = 16.0
    messages = ['364617', '364618', '364619', '364620', '364621', '364622', '364625', '364631', '364716', '364734', '364773', '364777', '364778', '365010', '365014', '365015']
    nosy_count = 6.0
    nosy_names = ['terry.reedy', 'mrabarnett', 'cool-RR', 'docs@python', 'serhiy.storchaka', 'miss-islington']
    pr_nums = ['19078', '19161', '19162']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue40016'
    versions = ['Python 3.9']

    @cool-RR
    Copy link
    Mannequin Author

    cool-RR mannequin commented Mar 19, 2020

    Today I was tripped up by an inconsistency in the re docstring. I wanted to use DOTALL as a flag inside my regex, rather than as an argument to the compile function. Here are two lines from the docstring:

    (?aiLmsux) Set the A, I, L, M, S, U, or X flag for the RE (see below).
    ...
    S  DOTALL      "." matches any character at all, including the newline.
    

    The DOTALL flag appears as an uppercase S in 2 places, and as a lowercase s in one place. This is confusing, and I initially tried using the uppercase S only to get an error.

    I'm attaching a PR to this ticket.

    @cool-RR cool-RR mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir 3.9 only security fixes labels Mar 19, 2020
    @cool-RR
    Copy link
    Mannequin Author

    cool-RR mannequin commented Mar 19, 2020

    As you can see I left the old uppercase enums defined, to avoid breaking backward compatibility. We could make them trigger a DeprecationWarning.

    @serhiy-storchaka
    Copy link
    Member

    It is very inconvenient to use single-letter lowercase names for constants. It contradicts PEP-8:

    https://www.python.org/dev/peps/pep-0008/#constants

    @cool-RR
    Copy link
    Mannequin Author

    cool-RR mannequin commented Mar 19, 2020

    Well, these aren't the textbook case of a constant, since they're enums, and not defined in the global namespace.

    @serhiy-storchaka
    Copy link
    Member

    They are.

    @cool-RR
    Copy link
    Mannequin Author

    cool-RR mannequin commented Mar 19, 2020

    Oops, my mistake. Any other idea how to solve this discrepancy?

    @serhiy-storchaka
    Copy link
    Member

    I do not see any issue except that you was careless when read the documentation.

    @cool-RR
    Copy link
    Mannequin Author

    cool-RR mannequin commented Mar 19, 2020

    I'm gonna look past the rudeness, and I'll just say that if I was tripped up by this, after 11 years of working with Python and the re module, then people in a beginner or intermediate level could be tripped up by this as well.

    Here's another, simpler suggestion for preventing confusion. Replace this line in the docstring:

    (?aiLmsux) Set the A, I, L, M, S, U, or X flag for the RE (see below).
    

    With this line:

    (?aiLmsux) Apply flags to the entire pattern, allowing 
               small tweaks to the matching logic (details below).
    

    There's no reason to mention the letters there because they're already mentioned. And it's helpful to add a short explanation, like the other entries in that list.

    @cool-RR
    Copy link
    Mannequin Author

    cool-RR mannequin commented Mar 20, 2020

    I updated my PR to match.

    @serhiy-storchaka
    Copy link
    Member

    I apologize if I was rude. It's only because of my bad English. There were many translation options for my words suggested by Google Translator and I obviously picked up the wrong one.

    Improving documentation is always a good thing. But I leave the final decision to someone who is fluent in English.

    @serhiy-storchaka serhiy-storchaka added docs Documentation in the Doc dir and removed stdlib Python modules in the Lib dir labels Mar 21, 2020
    @serhiy-storchaka serhiy-storchaka added type-feature A feature request or enhancement docs Documentation in the Doc dir and removed type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Mar 21, 2020
    @serhiy-storchaka serhiy-storchaka added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Mar 21, 2020
    @terryjreedy
    Copy link
    Member

    The root confusion is that re compilation has several variations with two sets of indicators, each with an unhelpful exception, and each combined and used in different ways.

    1. Module constants with uppercase English words or word pairs, also abbreviated by uppercase letters that are the first letter of the word -- except for S-DOTALL and X-VERBOSE. As arguments for the flags parameter of functions other than escape and purge, they are combined with '|'.

    2. Syntax letters within '(?...)', itself within an regex string, that are the single letter module constants lowercased b -- except that L is not lowercased because some fonts make l and 1 look nearly the same or even identical. Multiple syntax letters are combined by concatenation.

    The additional issue for docstrings is the extreme compression, including the omission (here) of 're.' prefixes. They are intended as reminders for those who have read and understood the full doc, but we try to make them as clear as possible. I am working on an alternate revision.

    @terryjreedy terryjreedy changed the title Clarify flag case in re module Clarify flag case in re module docstring Mar 21, 2020
    @terryjreedy terryjreedy changed the title Clarify flag case in re module Clarify flag case in re module docstring Mar 21, 2020
    @terryjreedy
    Copy link
    Member

    The docstring line in question is
    (?aiLmsux) Set the A, I, L, M, S, U, or X flag for the RE (see below).

    This is exceptional in that other syntaxes in the special characters list use lower case only for syntax variables (m, n, name, id/name, yes, no). Here, each letter is a separate and literal special character. (Also exceptional is that the syntax given is illegal, as 'a', 'L', and 'u' are mutually exclusive.)

    The corresponding doc entry starts
    "(One or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x'.)
    ... the letters set the corresponding flags:" followed by 6 more lines.

    I suggest the following as the replacement here (followed by more 'below').
    (?aiLmsux) The letters set the corresponding flags defined below.

    I think 'letters' pretty clearly refers to 'a', 'i', ..., and 'x' as given, and that each 'corresponds' to and sets a flag that is a separate entity.

    The more complicated inline flags syntax, "(?aiLmsux-imsx:...)", is omitted from the docstring. Perhaps this is intentional.

    The flag constants are currently introduced by
    Some of the functions in this module takes flags as optional parameters:

    My suggested more accurate and expanded replacement:
    "Each function other than purge and escape can take an optional 'flags' argument consisting of one or more of the following module constants, joined by "|". A, L, and U are mutually exclusive."

    @terryjreedy
    Copy link
    Member

    The docstring is currently 103 lines. I intentionally replaced 1 line with 1 line that I believe to be more informative and kept the expansion of the other line to 3 lines.

    @terryjreedy
    Copy link
    Member

    New changeset 89a2209 by Ram Rachum in branch 'master':
    bpo-40016: re docstring: Clarify relationship of inline and argument flags (bpo-19078)
    89a2209

    @miss-islington
    Copy link
    Contributor

    New changeset 686d508 by Miss Islington (bot) in branch '3.8':
    bpo-40016: re docstring: Clarify relationship of inline and argument flags (GH-19078)
    686d508

    @miss-islington
    Copy link
    Contributor

    New changeset 0dad748 by Miss Islington (bot) in branch '3.7':
    bpo-40016: re docstring: Clarify relationship of inline and argument flags (GH-19078)
    0dad748

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes docs Documentation in the Doc dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants