Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the __main__ module documentation #83633

Closed
maggyero mannequin opened this issue Jan 25, 2020 · 19 comments
Closed

Improve the __main__ module documentation #83633

maggyero mannequin opened this issue Jan 25, 2020 · 19 comments
Labels
3.10 only security fixes 3.11 only security fixes docs Documentation in the Doc dir type-feature A feature request or enhancement

Comments

@maggyero
Copy link
Mannequin

maggyero mannequin commented Jan 25, 2020

BPO 39452
Nosy @gvanrossum, @terryjreedy, @ncoghlan, @cameron-simpson, @stevendaprano, @ambv, @maggyero, @andresdelfino, @miss-islington, @iritkatriel, @jdevries3133
PRs
  • bpo-39452: Improve the __main__ module documentation #14487
  • bpo-39452: rewrite and expand __main__.rst #26883
  • [3.10] bpo-39452: Rewrite and expand __main__.rst (GH-26883) #27932
  • bpo-39452: [doc] Change "must" to "can", on relative import style in __main__ modules #29379
  • [3.10] bpo-39452: [doc] Change "must" to "can" on relative import style in __main__ (GH-29379) #29449
  • Files
  • less_prescriptive.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-08-24.20:55:28.252>
    created_at = <Date 2020-01-25.14:00:07.649>
    labels = ['3.11', 'type-feature', '3.10', 'docs']
    title = 'Improve the __main__ module documentation'
    updated_at = <Date 2021-11-06.18:50:09.178>
    user = 'https://github.com/maggyero'

    bugs.python.org fields:

    activity = <Date 2021-11-06.18:50:09.178>
    actor = 'lukasz.langa'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2021-08-24.20:55:28.252>
    closer = 'lukasz.langa'
    components = ['Documentation']
    creation = <Date 2020-01-25.14:00:07.649>
    creator = 'maggyero'
    dependencies = []
    files = ['50249']
    hgrepos = []
    issue_num = 39452
    keywords = ['patch']
    message_count = 19.0
    messages = ['360682', '360695', '377026', '377035', '377050', '396157', '396443', '399348', '400219', '400238', '400239', '400671', '400781', '400782', '400791', '400793', '401441', '405875', '405877']
    nosy_count = 12.0
    nosy_names = ['gvanrossum', 'terry.reedy', 'ncoghlan', 'cameron', 'steven.daprano', 'docs@python', 'lukasz.langa', 'maggyero', 'adelfino', 'miss-islington', 'iritkatriel', 'jack__d']
    pr_nums = ['14487', '26883', '27932', '29379', '29449']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue39452'
    versions = ['Python 3.10', 'Python 3.11']

    @maggyero
    Copy link
    Mannequin Author

    maggyero mannequin commented Jan 25, 2020

    This PR will apply the following changes on the __main__ module documentation:

    • correct the phrase "run as script" by "run from the file system" (as used in the runpy documentation) since "run as script" does not mean the intended python foo.py but python -m foo (cf. PEP-338);
    • replace the phrase "run with -m" by "run from the module namespace" (as used in the runpy documentation) since the module can be equivalently run with runpy.run_module('foo') instead of python -m foo;
    • make the block comment PEP-8-compliant (located before the if block, capital initialised, period ended);
    • add a missing case for which a package's __main__.py is executed (when the package is run from the file system: python foo/).

    @maggyero maggyero mannequin added the 3.8 only security fixes label Jan 25, 2020
    @maggyero maggyero mannequin assigned docspython Jan 25, 2020
    @maggyero maggyero mannequin added docs Documentation in the Doc dir type-feature A feature request or enhancement 3.8 only security fixes labels Jan 25, 2020
    @maggyero maggyero mannequin assigned docspython Jan 25, 2020
    @maggyero maggyero mannequin added docs Documentation in the Doc dir type-feature A feature request or enhancement labels Jan 25, 2020
    @stevendaprano
    Copy link
    Member

    There are some serious problems with the PR.

    You state that these two phrases are from the runpy documentation:

    • "run from the module namespace"
    • "run from the file system"

    but neither of those phrases appear in the runpy documentation here:

    https://docs.python.org/3/library/runpy.html

    You also say:

    "run as script" does not mean the intended python foo.py
    but python -m foo

    but this is incorrect, and I think based on a misunderstanding of PEP-338. The title of PEP-338, "Executing modules as scripts", is not exclusive: the PEP is about the -m mechanism for locating the module in order to run it as a script. It doesn't imply that python spam.py should no longer be considered to be running a script.

    In common parlance, "run as a script" certainly does include the case where you specify the module by filename python spam.py as well as the -m case where you specify it as a module name and let the interpreter locate the file. In other words, both

    python pathname/spam.py
    python -m spam
    

    are correctly described as "running spam.py as a script" (and other variations). They differ in how the script is specified, but both mechanisms treat the spam.py file as a script and run it.

    See for example https://duckduckgo.com/?q=how+to+run+a+python+script for examples of common usage.

    Consequently, it is simply wrong to say that the intended usage of "run a script" is the -m mechanism.

    The PR changes the term "scope" to "environment", but I think that is wrong. An environment is potentially greater than a scope. __main__ is a module namespace, hence a scope. The environment includes things outside of that scope, such as the builtins, environment variables, the current working directory, the python path, etc. We don't talk about modules being an environment, but as making up a scope.

    The PR introduces the phrase "when the module is run from the file system" to mean the case where a script is run using python spam.py, but it equally applies to the case of python -m spam. In both cases, spam is located somewhere in the file system.

    (It is conceivable that -m could locate and run a built-in module, but I don't know any cases where that actually works. Even if it does, we surely don't need to complicate the docs for this corner case. It's enough to know that -m will locate the module and run it.)

    The PR describes three cases: running from the file system, running from stdin, and running "from the module namespace" but that last one is a clumsy phrase which, it seems to me, is not correct. How do you run a module from its own namespace? Modules *are* a namespace, and we say code runs *in* a namespace, not "from" it.

    In any case, it doesn't matter whether the script is specified on the command line as a file name, or as a module name with -m, or double-clicked in a GUI, in all three cases the module's code is executed in the module's namespace.

    So it is wrong to distinguish "from the file system" and "from (in) the module namespace" as two distinct cases. They are the same case.

    The PR replaces the comment inside the if block:

    # execute only if run as a script
    
    with a comment above the `if` statement:
    # Execute only if the module is not imported.
    

    but the new comment is factually incorrect on two counts. Firstly, it is not correct that the if statement executes only if the module is not imported. There is no magic to the if statement. It always executes, regardless of whether the module is being run as a script or not. We can write code like this:

        if print("Hello, this always runs!") or __name__ == '__main__':
            # execute only if run as a script
            print('running as a script')
        else:
            # execute only if *not* run as a script
            print('not run as a script')

    Placing the comment above the if, where it will apply to the entire if statement, is incorrect.

    The second problem is that when running a module with -m it *is* imported. PEP-338 is clear about this:

    "if -m is used to execute a module the PEP-302 import mechanisms are used to locate the module and retrieve its compiled code, before executing the module"

    (in other words: import the module). We can test this, for example, if you create a package:

    spam/
    +-- __init__.py
    +-- eggs.py
    

    and then run python -m spam.eggs, not only __main__ (the eggs.py module) but also spam will be found in sys.modules. So the new comment is simply wrong.

    There may be other issues with the PR.

    @maggyero
    Copy link
    Mannequin Author

    maggyero mannequin commented Sep 16, 2020

    Thanks for your extended review Steven.

    You state that these two phrases are from the runpy documentation:

    • "run from the module namespace"
    • "run from the file system"

    but neither of those phrases appear in the runpy documentation here:

    https://docs.python.org/3/library/runpy.html

    I agree. Actually the first paragraph of the page uses the phrases:

    • "located using the module namespace";
    • "located using the file system",

    so instead of saying:

    • "run a module located using the module namespace" to mean "python <file>
    • "run a module located using the file system" to mean "python -m <module>",

    I simplified to:

    • "run from the module namespace"
    • "run from the file system"

    But since the terminology is misleading I have used these phrases instead:

    • python: "module initialized from an interactive prompt";
    • python < <file>: "module initialized from standard input";
    • python <file>: "module initialized from a file argument";
    • python -c <code>: "module initialized from a -c argument";
    • python -m <module>: "module initialized from a -m argument";
    • import <module>: "module initialized from an import statement".

    What the documentation tries to explain is that in all of these cases except the last one, code is executed in the __main__ module.

    I have updated the PR.

    ----

    The PR changes the term "scope" to "environment", but I think that is wrong. An environment is potentially greater than a scope. __main__ is a module namespace, hence a scope. The environment includes things outside of that scope, such as the builtins, environment variables, the current working directory, the python path, etc. We don't talk about modules being an environment, but as making up a scope.

    I disagree. According to Wikipedia (https://en.wikipedia.org/wiki/Scope_(computer_science)), the term "scope" is the part of a program where a name binding is valid, while the term "environment" (synonym of "context") is the set of name bindings that are valid within a part of a program. Therefore "scope" is a property of a name binding (a name binding has a scope), and "environment" is a property of a part of a program (a part of a program has an environment).

    And the term "environment" is actually already used in the original title and synopsis of the document (and it is correct):

    :mod:`__main__` --- Top-level script environment

    .. module:: __main__
    :synopsis: The environment where the top-level script is run.

    So my change to the body fixes the inconsistent and incorrect usage of "scope":

    • '__main__' is the name of the scope in which top-level code executes.
    • '__main__' is the name of the environment where top-level code is run.
    • A module can discover whether or not it is running in the main scope
      + A module can discover whether or not it is running in the main environment

    ----

    Placing the comment above the if, where it will apply to the entire if statement, is incorrect.

    I agree. Sometimes you see comments before if statements but they usually don't start with "execute".

    I have updated the PR.

    ----

    The second problem is that when running a module with -m it *is* imported. PEP-338 is clear about this:

    I agree. I should have said "when the module is not initialized from an import statement".

    But note that even before my change the original document already used the phrase "not imported":

    • executing code in a module when it is run as a script or with ``python
    • -m`` but not when it is imported::
    • executing code in a module when it is not imported::
    • execute only if run as a script

    + # Execute only if the module is not imported.

    I have updated the PR.

    @terryjreedy
    Copy link
    Member

    The main issue I have with the existing doc is its use of 'top-level' to mean the main, initial, startup module that first executes the user code for a python 'program'. We routinely use 'top-level' instead for the global scope of a module. Example: https://docs.python.org/3/glossary.html, 'qualified name' entry, line 2: "For top-level functions and classes, ..." Within '__main__', some code is top-level, but class and function bodies are not.

    But this does not have to be part of this PR.

    @maggyero
    Copy link
    Mannequin Author

    maggyero mannequin commented Sep 17, 2020

    I agree with you Terry. Another thing that bothers me: in the current document, the __main__ module is reduced to its environment (aka context or dictionary), whereas a module object has other important attributes such as its code.

    So how about adding the following changes?

    • :mod:`__main__` --- Top-level code environment

    • ==============================================
      + :mod:`__main__` --- Startup module
      + ==================================

    • :synopsis: The environment where top-level code is run.
      + :synopsis: The first module from which the code is executed at startup.

    • '__main__' is the name of the environment where top-level code is run.

    • '__main__' is the name of the startup module.
    • A module can discover whether or not it is running in the main environment
      + A module can discover whether or not it is initialized as the :mod:`__main__` module

    @iritkatriel
    Copy link
    Member

    See also bpo-24632 and bpo-17359.

    @iritkatriel iritkatriel added 3.11 only security fixes and removed 3.8 only security fixes labels Jun 20, 2021
    @jdevries3133
    Copy link
    Mannequin

    jdevries3133 mannequin commented Jun 23, 2021

    Hi All,

    As I wrote on the PR::

    I am picking up the torch on 39452, continuing where @maggyero left 
    off, and also implementing my discourse proposal, which seemed to be 
    well-liked.
    

    Feel free to leave any feedback for me on the GitHub PR, I'm looking forward to continuing to develop this work based on community feedback.

    @jdevries3133 jdevries3133 mannequin added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes labels Jun 23, 2021
    @zware zware removed 3.7 (EOL) end of life 3.8 only security fixes labels Jun 23, 2021
    @zware zware removed the 3.8 only security fixes label Jun 23, 2021
    @jdevries3133
    Copy link
    Mannequin

    jdevries3133 mannequin commented Aug 10, 2021

    Hi All,

    I'm pinging everyone here on the bpo because my GitHub PR has been through a lot of revision and review. Maybe it's close to being ready to merge (I hope)!

    Feel free to take a look if you are interested: #26883

    @ambv
    Copy link
    Contributor

    ambv commented Aug 24, 2021

    New changeset 7cba231 by Jack DeVries in branch 'main':
    bpo-39452: Rewrite and expand __main__.rst (bpo-26883)
    7cba231

    @miss-islington
    Copy link
    Contributor

    New changeset ec5a031 by Miss Islington (bot) in branch '3.10':
    bpo-39452: Rewrite and expand __main__.rst (GH-26883)
    ec5a031

    @ambv
    Copy link
    Contributor

    ambv commented Aug 24, 2021

    Thanks a lot, Géry and Jack! ✨ 🍰 ✨

    @ambv ambv removed the 3.9 only security fixes label Aug 24, 2021
    @ambv ambv closed this as completed Aug 24, 2021
    @ambv ambv removed the 3.9 only security fixes label Aug 24, 2021
    @ambv ambv closed this as completed Aug 24, 2021
    @gvanrossum
    Copy link
    Member

    Thanks, the rewrite is great!

    I have one nit: did you consider which of these two idioms is better?

    if __name__ == "__main__":
        main()

    vs.

    if __name__ == "__main__":
        sys.exit(main())

    Your docs seem to promote the second, whereas I've usually preferred the former. Was this a considered choice on your part?

    @maggyero
    Copy link
    Mannequin Author

    maggyero mannequin commented Aug 31, 2021

    @jack__d

    Thanks for the rewrite! This is a great expansion. Unfortunately I didn’t have the time to review it before the merge. If I find something to be improved I will let you know.

    @gvanrossum

    Your docs seem to promote the second, whereas I've usually preferred the former.

    Are you sure? Yet in your 2003 blog post Python main() functions you promoted the opposite idiom if __name__ == "__main__": sys.exit(main()) over the idiom if __name__ == "__main__": main():

    Now the sys.exit() calls are annoying: when main() calls sys.exit(), your interactive Python interpreter will exit! The remedy is to let main()'s return value specify the exit status.

    I am interested in the rationale if you changed your mind.

    @gvanrossum
    Copy link
    Member

    You're right, I'm being inconsistent. :-( I withdraw my objection.

    There are cases where sys.exit() is easier than returning an exit code, e.g. when the error is discovered deep inside some other code. But it's probably better to raise a dedicated exception in that case and catch it in main(), rather than just calling sys.exit() deep inside the other code. It's probably too fine a point for a tutorial. Sorry!

    @jdevries3133
    Copy link
    Mannequin

    jdevries3133 mannequin commented Aug 31, 2021

    Your docs seem to promote the second, whereas I've usually preferred the
    former. Was this a considered choice on your part?

    First and foremost, stupid GitHub is not letting the permalink load for some
    reason, but yes; this was discussed in the conversation with @graingert on
    June 29th – it was his suggestion. Later, @pradyunsg from PyPa added some
    suggestions about how the document described console script entrypoints,
    and the documentation around this issue changed a bit again.

    As far as my perspective, I also never personally use the sys.exit idiom
    myself. After all, an exception is going to cause a non-zero exit code, and a
    traceback is always going to have a lot more value than an exit code.

    I was, however, surprised to learn how pip treats console script entry points
    in the course of working on this document. Specifically, it generates an
    executable script that does wrap the function in sys.exit.I definitely think
    that the way the document communicates this fact while teaching the idiom is a
    good thing, so I think that whole "Idiomatic Usage" section is good.

    I do think we can tweak the document slightly to make it less prescriptive,
    though, because in reality a lot of people _don't_ use this idiom, so
    presenting it as a de-facto standard is misleading. Plus, it's not
    Pythonic to dole out prescriptive boilerplate.

    I attached a diff that steers in that direction. What do you all think? It is
    a pretty slight change, but I think it better strikes a balance.

    @maggyero
    Copy link
    Mannequin Author

    maggyero mannequin commented Aug 31, 2021

    No worries, it was almost twenty years ago.

    But it's probably better to raise a dedicated exception in that case and catch it in main(), rather than just calling sys.exit() deep inside the other code.

    Yes I agree, and I think you explained very clearly why it is better in the blog post:

    Another refinement is to define a Usage() exception, which we catch in an except clause at the end of main():
    […]
    This gives the main() function a single exit point, which is preferable over multiple return 2 statements.

    So I think you made two independent points:

    • raising a dedicated exception instead of calling sys.exit inside nested functions and catching it inside main allows a single exit point;
    • calling sys.exit outside of main instead of inside prevents exiting the Python interpreter in an interactive session.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 9, 2021

    These changes are excellent - thanks for the patch!

    Something even the updated version doesn't cover yet is directory and zipfile execution, so I filed bpo-45149 as a follow up ticket for that (the info does exist elsewhere in the documentation, so it's mostly just a matter of adding it to the newly expanded page, and deciding what new cross-references, if any, would be appropriate)

    @ambv
    Copy link
    Contributor

    ambv commented Nov 6, 2021

    New changeset 57457a1 by Andre Delfino in branch 'main':
    bpo-39452: [doc] Change "must" to "can" on relative import style in __main__ (GH-29379)
    57457a1

    @ambv
    Copy link
    Contributor

    ambv commented Nov 6, 2021

    New changeset e53cb98 by Miss Islington (bot) in branch '3.10':
    bpo-39452: [doc] Change "must" to "can" on relative import style in __main__ (GH-29379) (GH-29449)
    e53cb98

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes 3.11 only security fixes docs Documentation in the Doc dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants