Title: argparse with many choices can generate absurdly long usage message
Type: enhancement Stage: patch review
Components: Documentation, Library (Lib) Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Juraj.Variny, bethard, chris.jerdonek, docs@python, ezio.melotti, paul.j3, r.david.murray, terry.reedy
Priority: normal Keywords: patch

Created on 2012-11-05 23:11 by Juraj.Variny, last changed 2013-07-09 03:13 by paul.j3.

File name Uploaded Description Edit
issue-16418-1-default.patch chris.jerdonek, 2012-11-10 19:10 review
Messages (18)
msg174947 - (view) Author: Juraj Variny (Juraj.Variny) Date: 2012-11-05 23:11

parser.add_argument("-p","--port", help="Listening port", type=int, choices=range(1, 65535),default=8007,required = False)

generates this usage text:

usage: [-h]
                [-p {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30, <...snip...> ,65522,65523,65524,65525,65526,65527,65528,65529,65530,65531,65532,65533,65534}]

optional arguments:
  -h, --help            show this help message and exit
  -p {1,2,3,4,5,6,7,8,9,10,11,12,13,14, <...snip...> ,65525,65526,65527,65528,65529,65530,65531,65532,65533,65534}
                        Listening port
msg174951 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-11-06 02:07
I don't think choices is a good choice there (pardon the pun).  You really want a custom type.  I'm inclined to think that that applies to any 'choices' value that is 'too large' for the help display.  Part of the point of choices is that you *want* the help display to list all the choices.

I'm inclined to close this as invalid.
msg174956 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-11-06 06:38
I agree with David.  Another sign that using "choices" isn't the right approach is that it requires constructing a list of 66,000 elements.  There are better ways of checking if a provided argument is an integer between 1 and 65,535.
msg174957 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-11-06 06:50
The argparse documentation says in one part, "The choices keyword argument may be more convenient for type checkers that simply check against a range of values."

Thus, I wouldn't object to language clarifying that choices is meant for lists of choices that should be displayed in the command-line help (e.g. "smaller" lists and human-readable).
msg175293 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-11-10 19:10
Proposed documentation patch attached.
msg175377 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-11-11 17:53
Juraj: Is the example behavior from Py2 or Py3? The meaning of 'range' changed. In Py2, xrange would be the correct choice for 'choice'.

Does argparse actually convert (x)range objects to a list or set (the help indicates the latter) for internal use? That would be foolish as 'n in <range>' is an O(1) operation. (I don't remember is that works for xrange.) For instance, range(0, 1000, 2) is a nice way to say 'even count less than 1000'.

If it is not so converted, converting for display is also foolish.
'range(0, 1000, 2)' is clearer than an explicit sequence.
msg175386 - (view) Author: Juraj Variny (Juraj.Variny) Date: 2012-11-11 19:32
It was Python 2.7 . But if range shouldn't be used for large number of options, arguing whether it's O(1) is splitting hairs, no?

I'll remove the choices from my code. Adding new type for port is overkill, users should know what legal TCP port numbers are.
msg175388 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-11-11 19:55
I am arguing that (x)range *should* be usable for large numbers of options *because* the containment test is O(1). What happens is you *do* use xrange instead of range in 2.7 or 3.x instead of 2.7?

In 2.x, range(n) *is* a list so that is a bad choice for large n, regardless of the display issue, which could be fixed separately as is being done on other issues.
msg175499 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-11-13 14:03
> Does argparse actually convert (x)range objects to a list or set (the help indicates the latter) for internal use?

No, it leaves the provided choices argument as is.

Here is what the documentation says argparse accepts: "Any object that supports the *in* operator can be passed as the choices value, so dict objects, set objects, custom containers, etc. are all supported."  And here is the code for testing containment:

Terry, are you okay with the proposed documentation patch?

Special-casing the display of range values seems like an enhancement request to me rather than a bug.  I would suggest that be handled as an enhancement request targeted initially for Python 3.4.  I would be happy to create a new issue for that.  Alternatively, it could be considered as a second patch on this issue.
msg175520 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-11-13 19:32
I do not agree with the patch. A summary of my view: Range objects support the 'in' operator and they are an intended option for choices, and, as I said before, are exactly the right option for arithmetic sequences with more than a few items. The problem is that they are now, in effect, special-cased relative to other builtins by having their compact representation replaced by an expanded tuple display. Moreover, the iteration required to do this introduces a discrepancy relative to the doc. This bug might be better fixed by a code change.

(The OP did not pass a (x)range object but a list. That was unnecessary and easily fixed in itself. But such a fix leaves the issue above. Condensing long sequences is a somewhat separate issue.)

As to intent:

"The choices keyword argument may be more convenient for type checkers that simply check against a range of values:

>>> parser = argparse.ArgumentParser(prog='PROG')
>>> parser.add_argument('foo', type=int, choices=range(5, 10))
>>> parser.parse_args('7'.split())
>>> parser.parse_args('11'.split())
usage: PROG [-h] {5,6,7,8,9}
PROG: error: argument foo: invalid choice: 11 (choose from 5, 6, 7, 8, 9)"

Note the expansion instead of the normal representation. It is not a big deal here, but obviously can be.

">>> parser.add_argument(
...     'integers', metavar='int', type=int, choices=range(10),
...  nargs='+', help='an integer in the range 0..9')"

As to tuple display expansion: the link points to

2284 def _check_value(self, action, value):
2285   # converted value must be one of the choices (if specified)
2286   if action.choices is not None and value not in action.choices:
2287     args = {'value': value,
2288     'choices': ', '.join(map(repr, action.choices))}
2289     msg = _('invalid choice: %(value)r (choose from %(choices)s)')
2290     raise ArgumentError(action, msg % args)

In 2288 "', '.join(map(repr, action.choices))" produces a tuple display without parentheses. It essentially reproduced str/repr for tuples, lists, frozensets, sets, dicts, dict views, etc., leaving off the irrelevant fence characters. In doing so, by iteration, it introduces a bug --see below.

For range objects, the tuple representation is a drastic change from the normal representation. In that sense, it special cases range among built-ins, and in a bad way when the range represents many values. (Help messages apparently do the same.) I consider this to be something of a bug. So the code as it is would have to special case range objects to avoid special-casing them in the sense above.

The same would apply to any custom objects that have a succinct description for a large, possible infinite set. Here are two examples: "a word containing no 'e's" and "a 'word' containing only the letters a, b, c, d, e". Objects representing such infinite sets of strings could easily have a __contains__ method but not an __iter__ method.

The code above requires the choices object to be iterable as well as supporting 'in'. This contradicts the doc statement "Any object that supports the in operator can be passed as the choices value,". That discrepancy is a bug. It should be fixed by either adding the restriction to the doc or removing it from the code. I recommend the latter.

The code could simply use the str or repr of the choice object without trying to be fancy with a custom, fence-stripped, representation that does not work correctly or at all for all possible choice objects. In other words

if action.choices is not None and value not in action.choices:
  msg = "invalid choice: %r (choose from %r)" % (value, action.choices)
  raise ArgumentError(action, msg)

If the custom representation for non-range builtins is desired, then they are the ones that should be special-cased to not use their default representation.
msg175552 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-11-14 08:51
To simplify the discussion and for issue resolution purposes, I propose that the discussion about "large" choices containers be divided into separate discussions for (1) changes that should be applied to all maintenance releases (i.e. bug fix changes), and (2) changes that should be applied only to the in-development branch (i.e. enhancements).

I propose that the current issue be used for the former.  3.4-only enhancements can be dealt with as part of a separate issue.

I also created issue 16468 for the bug that Terry observed above that ArgumentParser does not in general support "choices" values that support the "in" operator.  That issue exists and can be resolved independent of whether the choices collection is large.
msg175553 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-11-14 08:58
> (1) changes that should be applied to all maintenance releases (i.e. bug fix changes)

This should instead read, "(1) changes that should be applied to all maintenance releases (e.g. bug fix and/or documentation changes)."
msg175555 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-11-14 09:38
> The code could simply use the str or repr of the choice object

It seems to me that this would result in less user-friendly behavior in many cases.  It would also require the end-user to understand Python (e.g. xrange and dictionaries), which I don't think should be necessary for the user of a command-line script.

For example, in Python 2.7 the containers xrange(5, 10), xrange(2, 10, 2), and {1: "foo", 2: "bar"} currently yield the following user-friendly messages for choice 0:

invalid choice: 0 (choose from 5, 6, 7, 8, 9)
invalid choice: 0 (choose from 2, 4, 6, 8)
invalid choice: 0 (choose from 1, 2)

With the proposed change, these messages would be as follows, which seem unnecessarily obfuscated:

invalid choice: 0 (choose from xrange(5, 10))
invalid choice: 0 (choose from xrange(2, 10, 2))
invalid choice: 0 (choose from {1: 'foo', 2: 'bar'})

Thus, I think the proposal above would be a regression if applied.  I think any changes to maintenance releases should preserve the current user-friendly messages (when those messages are user-friendly, e.g. when the containers are small).
msg175564 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-11-14 12:23
I agree with Chris that using the repr in the general case would be a regression in usability for the end user (and certainly not suitable for a maintenance release).

Here is some brainstorming:

We could "special case" this via duck typing.  If the object that represents the choices has 'start' and 'stop' attributes, use those to generate a message.  ("from {start} up to but not including {stop}"). [*] If it doesn't, or it also has a 'step' that is not 1, check the len, generate the list if it is less than, say, 50, and if it is more give up and use the repr.

If there is no len, do the expansion (which is what happens now) and throw it away in favor of the repr if there are more than 50 elements.

If there is no iter, use the repr.

Then as an enhancement we can also look for a special attribute (values_description?) that gives the entire text to use in the parenthesis in the help phrase to provide a way to customize the help text in 3.4+.

[*] Or, at the risk of being too clever, if there is a 'step' use the message above and if there isn't a step attribute at all use "between {start} and {stop}.
msg175577 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-11-14 17:13
For maintenance releases, I think I would favor abbreviating the list only if it is "lossless" (e.g. for xrange objects).  I think I would also be okay with abbreviating for arbitrary xranges -- in particular for arbitrary steps.  For example, for xrange(0, 50, 3), I think something like the following might be better: "0, 3, 6, 9, ... 45, 48" (we could also include a difference hint).
msg175582 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-11-14 19:38
I agree that my extreme, strawman-ish, proposal, was, well, too extreme. Here is a more realistic proposal similar to David's.

if isinstance(choices, range) and len(choices) > 50:
  choice_txt = range_rep(choices)  # details to be determined
  choice_txt = <iterated version of choices as now>
except TypeError:  # because choices not iterable
  choice_txt = repr(choices)  # can be anything for custom class

Then display help or error message.

If someone passes a non-(x)range iterable with 1000s of choices, I am willing to say they deserve what they asked for, once that the docs make clearer that such collections will be displayed in full. Any iterable container too large to display in full can be wrapped as a non-iterable container. ('Container' means that 'in' works.)

class NonIterableContainer:
  def __init__(self, container, description):
    self.container = container
    self.description = description
  def __repr__(self):
    return self.description
  def __contains__(self, item):
    return item in self.container

The description could point one to a file or doc that has an explicit list.

If such a listing enumerates the items, the program arg could instead be an int in the appropriate range. Then the program would look up the actual arg. Ranges *are* special because items from any finite enumerated collection can be replaced, for input, by the corresponding int in the enumerated range. They are therefore worth special attention in help and error displays.
msg175585 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-11-14 19:50
I see that in #16468, Chris proposes that existing versions should let the TypeError propagate, possibly with an improved error message, and call the use of repr for non-iterables a new feature (partly on the basis that more fixes than this are needed to use them).
msg192718 - (view) Author: paul j3 (paul.j3) * (Python triager) Date: 2013-07-09 03:13
In the patch I just posted to I address this long list issue in several ways:

In the Usage line, the metavar gives the user an alternative

In the expanded help line the user can just omit the '%(choices)s' 

In _check_value(), I implemented a numpy like summarize format for choice lists longer than 15   '{1,2,3,...,18,19}'.
Date User Action Args
2013-07-09 03:13:26paul.j3setmessages: + msg192718
2013-06-29 07:20:47paul.j3setnosy: + paul.j3
2012-11-14 19:50:35terry.reedysetmessages: + msg175585
2012-11-14 19:38:07terry.reedysetmessages: + msg175582
2012-11-14 17:13:59chris.jerdoneksetmessages: + msg175577
2012-11-14 12:23:08r.david.murraysetmessages: + msg175564
2012-11-14 09:38:27chris.jerdoneksetmessages: + msg175555
2012-11-14 08:58:32chris.jerdoneksetmessages: + msg175553
2012-11-14 08:51:31chris.jerdoneksetmessages: + msg175552
2012-11-13 19:32:45terry.reedysetkeywords: - easy
assignee: docs@python ->
messages: + msg175520
2012-11-13 14:03:28chris.jerdoneksetmessages: + msg175499
2012-11-11 19:55:05terry.reedysetmessages: + msg175388
2012-11-11 19:32:47Juraj.Varinysetmessages: + msg175386
2012-11-11 17:53:22terry.reedysetnosy: + terry.reedy, bethard
messages: + msg175377
2012-11-10 19:10:41chris.jerdoneksetfiles: + issue-16418-1-default.patch

assignee: docs@python
keywords: + easy, patch
stage: patch review
versions: + Python 3.2, Python 3.3, Python 3.4
nosy: + ezio.melotti, docs@python
messages: + msg175293
components: + Documentation
type: behavior -> enhancement
2012-11-06 06:50:59chris.jerdoneksetmessages: + msg174957
2012-11-06 06:38:36chris.jerdoneksetnosy: + chris.jerdonek
messages: + msg174956
2012-11-06 02:07:30r.david.murraysetnosy: + r.david.murray
messages: + msg174951
2012-11-05 23:11:51Juraj.Varinycreate