This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: argparse documentation contrasting nargs '*' vs. '+' is misleading
Type: Stage: resolved
Components: Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: paul.j3, rhettinger, vreuter
Priority: normal Keywords:

Created on 2021-04-16 20:35 by vreuter, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (7)
msg391255 - (view) Author: Vince Reuter (vreuter) * Date: 2021-04-16 20:35
Standard library docs for argparse, at https://docs.python.org/3/library/argparse.html#nargs, suggest that setting nargs='+' differs from nargs='*' in that the former will raise a parsing error when the argument's omitted while the latter will not. In other words, nargs='+' sounds like it's distinguished from nargs='*' by virtue of implicitly making `required=True`. This implication isn't valid when the option name has a double-hyphen prefix, though: no argument parsing error is thrown for a double-hyphen-prefix-named option, even with `nargs='+'`. The docs neglect to mention this qualification that prevents `nargs='+'` from implicitly making the option required. Originally noticed while using a port of the library to R: https://github.com/trevorld/r-argparse/issues/31
msg391256 - (view) Author: Vince Reuter (vreuter) * Date: 2021-04-16 20:39
Here's the docs excerpt that seems misleading:
"""
'+'. Just like '*', all command-line args present are gathered into a list. Additionally, an error message will be generated if there wasn’t at least one command-line argument present.
"""
msg391319 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-04-18 04:06
For me, it is the opposite.  I would have been completely surprised if setting nargs caused an optional argument to become required.  

The "nargs" parameter is entirely about the number of data arguments, not about the option itself.   When nargs=1, then one datum is expected.  When nargs=2, two are expected.  The "+" means one or more.  The "*" means two or more.

Sorry you got confused, but I don't think this is a documentation problem.  We have multiple examples of using nargs as intended.
msg391320 - (view) Author: Vince Reuter (vreuter) * Date: 2021-04-18 04:25
Got it, I see. I guess I'd prefer to be able to control the expectation about argument number through the keyword, without changing the option name, but I understand if the other way (as implemented) is preferred. Can you clarify, though, or direct me in the docs, the distinction in the number expectation between "one or more" vs. "two or more?" What does it mean for "two or more" to be expected (for nargs='*') if there's no parse error thrown even when the option's entirely omitted?
msg391321 - (view) Author: Vince Reuter (vreuter) * Date: 2021-04-18 04:52
There are two small related issues, but I'm not sure how they relate and/or if they've been addressed previously, so I'm sorry for duplicate messaging if that's the case.

1. If it's the case that absent an explicit `required=<bool>` statement, the option name prefix (hyphen(s) or not) determines whether the option's required, then it seems contradictory to have nargs='*' make a positional arg behave as if it's optional (i.e., no parse error if omitted).

2. Prefixing the option name with hyphen(s) seems to remove any distinction between `nargs='*'` and `nargs='+'` (at least without passing anything explicit about required)
msg391322 - (view) Author: Vince Reuter (vreuter) * Date: 2021-04-18 05:26
Looking a bit more at the examples in the "nargs" section of the argparse docs, and on the rest of that page, I think I see the intended usage pattern emerging. nargs='*' is only ever used on that page with an optional (by prefix) option, or with the last positional defined. Conversely, nargs='+' (or "+") is only used with a positional or with an optional that's paired with action="extend". 

This makes sense given the 0-or-more vs. 1-or-more distinction, but could it be enforced by exception or by warning? Specifically, I can think of a couple dangerous (as far as unintended consequences) cases:

1. Define a sequence of positionals with a nargs='*' sandwiched somewhere in the middle. Then passing fewer arguments at the command-line than defined positionals will cause the nargs='*' one to be skipped, rather than populating truly in order. Example:

def _parse_cmdl(cmdl):
    parser = argparse.ArgumentParser()
    parser.add_argument("outdata", help="Path to output data file")
    parser.add_argument("plotTimes", nargs='*', help="Times to plot")
    parser.add_argument("outplot", help="Path to output plot file")
    return parser.parse_args(cmdl)

would result in a parse of something like:
$ ./tinytest.py a b
outdata: a
plotTimes: []
outplot: b

2. Case initially presented, i.e. a nargs='+' with a hyphen-prefixed option name. If the semantics are no different than for nargs='*', could a warning or exception be thrown for defining something this way? It would feel safer to not have the meaning of a value like this to nargs not be conditional on the name of the option.
msg391841 - (view) Author: paul j3 (paul.j3) * (Python triager) Date: 2021-04-25 06:56
Let's see if I can clarify somethings.  But first, I should say that I've worked with this code for so long, that I may miss things that could confuse a beginner.

A basic distinction is between "optionals" and "positionals".  I put those in quotes because that's not the same as "required or not".  Talk about "options" in commandline arguments goes way back (see getopt, and optparse).

An "optional" is identified by a flag string, such as "-f" or "--foo", short or long, single or double dash.  I prefer to call these "flagged arguments".  "optionals" can occur in any order, and even repeatedly.

A "postional" is identified by position, without any sort of flag string.  In earlier parsers, these where the extras, the strings that weren't handled by "options".  These can only occur in the defined order.  

Conventionally, optionals come first, and positionals at the end.  That's how most "help/usage" messages display them.  But argparse tries to handle them in any order.

Both kinds can take a "nargs" parameter.  This specifies how many arguments (or strings) are required, either following the flag string, or as part of position.  Obviously if you don't specify the flag, you don't have to provide its arguments either.  

There's another point of confusion.  "parse.add_argument" creates an "Action", also called an "argument".  And each "action" has a 'nargs' that specifies how many "arguments" go along with it.  Sorry.


The default "nargs" value is "None", which means 1 string.  "?" means zro or one (optional, but in a different sense than flagged).  "*" means any number.  "+" means one or more.  nargs could also be a number, e.g. "2".  There isn't anything that specifies "2 or more" or "2 or 3" (though that has been requested).  "?+*" are used in basically the same sense as in regex/re patterns.

There's another parameter, "action", which also controls the number of required strings.  With "store_true" no string is allowed after the flag, effectively "nargs=0" (this action makes no sense for positionals).  It's actually of subclass of "store_const", with a default "False" and const "True".

With a flagged argument, you may also specify a "required" parameter.  That's convenient, but does lead to confusing terminology.  "optionals" may be "required", and a "positional" with "?" is optional/not-required.  

Since "+" and "*" allow many strings, something has to define the end of that list.  That end is either the end of the input, or the next flag string. If you are just using flagged arguments this isn't a problem.  But with "positionals", it is hard to handle more than one open-end nargs.  Or to use such a "positional" after an open-ended "optional".  As with regex, these nargs are "greedy".  

In some ways, the documentation is more complicated than the code itself.  The code is well written, with different methods and classes handling different issues.  The code itself does not have a lot of complicated rules and conditions.  The complexity comes from how the different pieces interact.

"flagged vs positional", nargs, and "required" are separate specifications, though they do have significant interactions.


In your example:

    parser.add_argument("outdata", help="Path to output data file")
    parser.add_argument("plotTimes", nargs='*', help="Times to plot")
    parser.add_argument("outplot", help="Path to output plot file")

"outdata" takes one string.  "outplot" takes another.  "plotTimes" then gets anything left over in between.  An empty list of strings satisfies its "nargs".  The strings are actually allocated with a regex expression.

With `arg_argument('--foo', nargs='+')`,

    --foo one
    --foo one two three

are both allowed.  With "*",

     --foo

is also allowed.  For a "positional" omit the "--foo".  That means that a "positional" with "*" is always seen (which can require some special edge case handling).
History
Date User Action Args
2022-04-11 14:59:44adminsetgithub: 88042
2021-04-25 06:56:02paul.j3setnosy: + paul.j3
messages: + msg391841
2021-04-18 05:26:16vreutersetmessages: + msg391322
2021-04-18 04:52:00vreutersetmessages: + msg391321
2021-04-18 04:25:35vreutersetmessages: + msg391320
2021-04-18 04:06:26rhettingersetstatus: open -> closed

nosy: + rhettinger
messages: + msg391319

resolution: not a bug
stage: resolved
2021-04-16 20:39:04vreutersetmessages: + msg391256
2021-04-16 20:35:56vreutercreate