classification
Title: Clarify flag case in `re` module docstring
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: cool-RR, docs@python, miss-islington, mrabarnett, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2020-03-19 17:40 by cool-RR, last changed 2020-03-25 21:59 by terry.reedy. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 19078 merged cool-RR, 2020-03-19 17:42
PR 19161 merged miss-islington, 2020-03-25 18:45
PR 19162 merged miss-islington, 2020-03-25 18:45
Messages (16)
msg364617 - (view) Author: Ram Rachum (cool-RR) * Date: 2020-03-19 17:40
Today I was tripped up by an inconsistency in the `re` docstring. I wanted to use DOTALL as a flag inside my regex, rather than as an argument to the `compile` function. Here are two lines from the docstring:

    (?aiLmsux) Set the A, I, L, M, S, U, or X flag for the RE (see below).
    ...
    S  DOTALL      "." matches any character at all, including the newline.

The DOTALL flag appears as an uppercase S in 2 places, and as a lowercase s in one place. This is confusing, and I initially tried using the uppercase S only to get an error.

I'm attaching a PR to this ticket.
msg364618 - (view) Author: Ram Rachum (cool-RR) * Date: 2020-03-19 17:44
As you can see I left the old uppercase enums defined, to avoid breaking backward compatibility. We could make them trigger a DeprecationWarning.
msg364619 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-19 18:17
It is very inconvenient to use single-letter lowercase names for constants. It contradicts PEP 8:

https://www.python.org/dev/peps/pep-0008/#constants
msg364620 - (view) Author: Ram Rachum (cool-RR) * Date: 2020-03-19 18:22
Well, these aren't the textbook case of a constant, since they're enums, and not defined in the global namespace.
msg364621 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-19 18:23
They are.
msg364622 - (view) Author: Ram Rachum (cool-RR) * Date: 2020-03-19 18:31
Oops, my mistake. Any other idea how to solve this discrepancy?
msg364625 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-19 18:42
I do not see any issue except that you was careless when read the documentation.
msg364631 - (view) Author: Ram Rachum (cool-RR) * Date: 2020-03-19 18:59
I'm gonna look past the rudeness, and I'll just say that if I was tripped up by this, after 11 years of working with Python and the re module, then people in a beginner or intermediate level could be tripped  up by this as well. 

Here's another, simpler suggestion for preventing confusion. Replace this line in the docstring:

    (?aiLmsux) Set the A, I, L, M, S, U, or X flag for the RE (see below).

With this line:

    (?aiLmsux) Apply flags to the entire pattern, allowing 
               small tweaks to the matching logic (details below).

There's no reason to mention the letters there because they're already mentioned. And it's helpful to add a short explanation, like the other entries in that list.
msg364716 - (view) Author: Ram Rachum (cool-RR) * Date: 2020-03-20 20:30
I updated my PR to match.
msg364734 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-21 09:12
I apologize if I was rude. It's only because of my bad English. There were many translation options for my words suggested by Google Translator and I obviously picked up the wrong one.

Improving documentation is always a good thing. But I leave the final decision to someone who is fluent in English.
msg364773 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-03-21 20:26
The root confusion is that re compilation has several variations with two sets of indicators, each with an unhelpful exception, and each combined and used in different ways.

1. Module constants with uppercase English words or word pairs, also abbreviated by uppercase letters that are the first letter of the word -- except for S-DOTALL and X-VERBOSE.  As arguments for the flags parameter of functions other than escape and purge, they are combined with '|'.

2. Syntax letters within '(?...)', itself within an regex string, that are the single letter module constants lowercased b -- except that L is not lowercased because some fonts make l and 1 look nearly the same or even identical.  Multiple syntax letters are combined by concatenation.

The additional issue for docstrings is the extreme compression, including the omission (here) of 're.' prefixes.  They are intended as reminders for those who have read and understood the full doc, but we try to make them as clear as possible.  I am working on an alternate revision.
msg364777 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-03-21 23:49
The docstring line in question is
  (?aiLmsux) Set the A, I, L, M, S, U, or X flag for the RE (see below).

This is exceptional in that other syntaxes in the special characters list use lower case only for syntax variables (m, n, name, id/name, yes, no).  Here, each letter is a separate and literal special character.  (Also exceptional is that the syntax given is illegal, as 'a', 'L', and 'u' are mutually exclusive.)

The corresponding doc entry starts
"(One or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x'.)
... the letters set the corresponding flags:" followed by 6 more lines.

I suggest the following as the replacement here (followed by more 'below').
  (?aiLmsux) The letters set the corresponding flags defined below.

I think 'letters' pretty clearly refers to 'a', 'i', ..., and 'x' as given, and that each 'corresponds' to and sets a flag that is a separate entity.

The more complicated inline flags syntax, "(?aiLmsux-imsx:...)", is omitted from the docstring.  Perhaps this is intentional.

The flag constants are currently introduced by
Some of the functions in this module takes flags as optional parameters:

My suggested more accurate and expanded replacement:
"Each function other than purge and escape can take an optional 'flags'  argument consisting of one or more of the following module constants, joined by "|".  A, L, and U are mutually exclusive."
msg364778 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-03-22 00:18
The docstring is currently 103 lines. I intentionally replaced 1 line with 1 line that I believe to be more informative and kept the expansion of the other line to 3 lines.
msg365010 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-03-25 18:44
New changeset 89a2209ae6fc5f39868621799730e16f931eb497 by Ram Rachum in branch 'master':
bpo-40016: re docstring: Clarify relationship of inline and argument flags (#19078)
https://github.com/python/cpython/commit/89a2209ae6fc5f39868621799730e16f931eb497
msg365014 - (view) Author: miss-islington (miss-islington) Date: 2020-03-25 19:01
New changeset 686d508c26fafb57dfe463c4f55b20013dad1441 by Miss Islington (bot) in branch '3.8':
bpo-40016: re docstring: Clarify relationship of inline and argument flags (GH-19078)
https://github.com/python/cpython/commit/686d508c26fafb57dfe463c4f55b20013dad1441
msg365015 - (view) Author: miss-islington (miss-islington) Date: 2020-03-25 19:03
New changeset 0dad7486e7d7bc2e0f1b0a4f44d9c28064762be5 by Miss Islington (bot) in branch '3.7':
bpo-40016: re docstring: Clarify relationship of inline and argument flags (GH-19078)
https://github.com/python/cpython/commit/0dad7486e7d7bc2e0f1b0a4f44d9c28064762be5
History
Date User Action Args
2020-03-25 21:59:45terry.reedysetstatus: open -> closed
assignee: docs@python -> terry.reedy
resolution: fixed
stage: patch review -> resolved
2020-03-25 19:03:38miss-islingtonsetmessages: + msg365015
2020-03-25 19:01:38miss-islingtonsetmessages: + msg365014
2020-03-25 18:45:09miss-islingtonsetpull_requests: + pull_request18522
2020-03-25 18:45:01miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request18521
2020-03-25 18:44:51terry.reedysetmessages: + msg365010
2020-03-22 00:18:30terry.reedysetmessages: + msg364778
2020-03-21 23:49:18terry.reedysetmessages: + msg364777
2020-03-21 20:26:32terry.reedysettitle: Clarify flag case in `re` module -> Clarify flag case in `re` module docstring
2020-03-21 20:26:13terry.reedysetmessages: + msg364773
2020-03-21 09:12:24serhiy.storchakasetnosy: + terry.reedy, docs@python, mrabarnett
messages: + msg364734

assignee: docs@python
components: + Documentation, - Library (Lib)
type: behavior -> enhancement
2020-03-20 20:30:32cool-RRsetmessages: + msg364716
2020-03-19 18:59:47cool-RRsetmessages: + msg364631
2020-03-19 18:42:59serhiy.storchakasetmessages: + msg364625
2020-03-19 18:31:58cool-RRsetmessages: + msg364622
2020-03-19 18:23:21serhiy.storchakasetmessages: + msg364621
2020-03-19 18:22:05cool-RRsetmessages: + msg364620
2020-03-19 18:17:14serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg364619
2020-03-19 17:44:27cool-RRsetmessages: + msg364618
2020-03-19 17:42:34cool-RRsetkeywords: + patch
stage: patch review
pull_requests: + pull_request18434
2020-03-19 17:40:50cool-RRcreate