Issue 16418: argparse with many choices can generate absurdly long usage message

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/60622

classification

Title:	argparse with many choices can generate absurdly long usage message
Type:	enhancement	Stage:	patch review
Components:	Documentation, Library (Lib)	Versions:	Python 3.2, Python 3.3, Python 3.4, Python 2.7

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Juraj.Variny, bethard, chris.jerdonek, docs@python, ezio.melotti, paul.j3, r.david.murray, terry.reedy
Priority:	normal	Keywords:	patch

Created on 2012-11-05 23:11 by Juraj.Variny, last changed 2022-04-11 14:57 by admin.

Files
File name	Uploaded	Description	Edit
issue-16418-1-default.patch	chris.jerdonek, 2012-11-10 19:10		review

Messages (18)
msg174947 - (view)	Author: Juraj Variny (Juraj.Variny)	Date: 2012-11-05 23:11
Example: parser.add_argument("-p","--port", help="Listening port", type=int, choices=range(1, 65535),default=8007,required = False) generates this usage text: usage: agent.py [-h] [-p {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30, <...snip...> ,65522,65523,65524,65525,65526,65527,65528,65529,65530,65531,65532,65533,65534}] optional arguments: -h, --help show this help message and exit -p {1,2,3,4,5,6,7,8,9,10,11,12,13,14, <...snip...> ,65525,65526,65527,65528,65529,65530,65531,65532,65533,65534} Listening port
msg174951 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2012-11-06 02:07
I don't think choices is a good choice there (pardon the pun). You really want a custom type. I'm inclined to think that that applies to any 'choices' value that is 'too large' for the help display. Part of the point of choices is that you want the help display to list all the choices. I'm inclined to close this as invalid.
msg174956 - (view)	Author: Chris Jerdonek (chris.jerdonek) *	Date: 2012-11-06 06:38
I agree with David. Another sign that using "choices" isn't the right approach is that it requires constructing a list of 66,000 elements. There are better ways of checking if a provided argument is an integer between 1 and 65,535.
msg174957 - (view)	Author: Chris Jerdonek (chris.jerdonek) *	Date: 2012-11-06 06:50
The argparse documentation says in one part, "The choices keyword argument may be more convenient for type checkers that simply check against a range of values." Thus, I wouldn't object to language clarifying that choices is meant for lists of choices that should be displayed in the command-line help (e.g. "smaller" lists and human-readable).
msg175293 - (view)	Author: Chris Jerdonek (chris.jerdonek) *	Date: 2012-11-10 19:10
Proposed documentation patch attached.
msg175377 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2012-11-11 17:53
Juraj: Is the example behavior from Py2 or Py3? The meaning of 'range' changed. In Py2, xrange would be the correct choice for 'choice'. Does argparse actually convert (x)range objects to a list or set (the help indicates the latter) for internal use? That would be foolish as 'n in <range>' is an O(1) operation. (I don't remember is that works for xrange.) For instance, range(0, 1000, 2) is a nice way to say 'even count less than 1000'. If it is not so converted, converting for display is also foolish. 'range(0, 1000, 2)' is clearer than an explicit sequence.
msg175386 - (view)	Author: Juraj Variny (Juraj.Variny)	Date: 2012-11-11 19:32
It was Python 2.7 . But if range shouldn't be used for large number of options, arguing whether it's O(1) is splitting hairs, no? I'll remove the choices from my code. Adding new type for port is overkill, users should know what legal TCP port numbers are.
msg175388 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2012-11-11 19:55
I am arguing that (x)range should be usable for large numbers of options because the containment test is O(1). What happens is you do use xrange instead of range in 2.7 or 3.x instead of 2.7? In 2.x, range(n) is a list so that is a bad choice for large n, regardless of the display issue, which could be fixed separately as is being done on other issues.
msg175499 - (view)	Author: Chris Jerdonek (chris.jerdonek) *	Date: 2012-11-13 14:03
> Does argparse actually convert (x)range objects to a list or set (the help indicates the latter) for internal use? No, it leaves the provided choices argument as is. Here is what the documentation says argparse accepts: "Any object that supports the in operator can be passed as the choices value, so dict objects, set objects, custom containers, etc. are all supported." And here is the code for testing containment: http://hg.python.org/cpython/file/ee7b713fec71/Lib/argparse.py#l2284 Terry, are you okay with the proposed documentation patch? Special-casing the display of range values seems like an enhancement request to me rather than a bug. I would suggest that be handled as an enhancement request targeted initially for Python 3.4. I would be happy to create a new issue for that. Alternatively, it could be considered as a second patch on this issue.
msg175520 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2012-11-13 19:32
I do not agree with the patch. A summary of my view: Range objects support the 'in' operator and they are an intended option for choices, and, as I said before, are exactly the right option for arithmetic sequences with more than a few items. The problem is that they are now, in effect, special-cased relative to other builtins by having their compact representation replaced by an expanded tuple display. Moreover, the iteration required to do this introduces a discrepancy relative to the doc. This bug might be better fixed by a code change. (The OP did not pass a (x)range object but a list. That was unnecessary and easily fixed in itself. But such a fix leaves the issue above. Condensing long sequences is a somewhat separate issue.) As to intent: "The choices keyword argument may be more convenient for type checkers that simply check against a range of values: >>> >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('foo', type=int, choices=range(5, 10)) >>> parser.parse_args('7'.split()) Namespace(foo=7) >>> parser.parse_args('11'.split()) usage: PROG [-h] {5,6,7,8,9} PROG: error: argument foo: invalid choice: 11 (choose from 5, 6, 7, 8, 9)" Note the expansion instead of the normal representation. It is not a big deal here, but obviously can be. ">>> parser.add_argument( ... 'integers', metavar='int', type=int, choices=range(10), ... nargs='+', help='an integer in the range 0..9')" As to tuple display expansion: the link points to 2284 def _check_value(self, action, value): 2285 # converted value must be one of the choices (if specified) 2286 if action.choices is not None and value not in action.choices: 2287 args = {'value': value, 2288 'choices': ', '.join(map(repr, action.choices))} 2289 msg = _('invalid choice: %(value)r (choose from %(choices)s)') 2290 raise ArgumentError(action, msg % args) In 2288 "', '.join(map(repr, action.choices))" produces a tuple display without parentheses. It essentially reproduced str/repr for tuples, lists, frozensets, sets, dicts, dict views, etc., leaving off the irrelevant fence characters. In doing so, by iteration, it introduces a bug --see below. For range objects, the tuple representation is a drastic change from the normal representation. In that sense, it special cases range among built-ins, and in a bad way when the range represents many values. (Help messages apparently do the same.) I consider this to be something of a bug. So the code as it is would have to special case range objects to avoid special-casing them in the sense above. The same would apply to any custom objects that have a succinct description for a large, possible infinite set. Here are two examples: "a word containing no 'e's" and "a 'word' containing only the letters a, b, c, d, e". Objects representing such infinite sets of strings could easily have a __contains__ method but not an __iter__ method. The code above requires the choices object to be iterable as well as supporting 'in'. This contradicts the doc statement "Any object that supports the in operator can be passed as the choices value,". That discrepancy is a bug. It should be fixed by either adding the restriction to the doc or removing it from the code. I recommend the latter. The code could simply use the str or repr of the choice object without trying to be fancy with a custom, fence-stripped, representation that does not work correctly or at all for all possible choice objects. In other words if action.choices is not None and value not in action.choices: msg = "invalid choice: %r (choose from %r)" % (value, action.choices) raise ArgumentError(action, msg) If the custom representation for non-range builtins is desired, then they are the ones that should be special-cased to not use their default representation.
msg175552 - (view)	Author: Chris Jerdonek (chris.jerdonek) *	Date: 2012-11-14 08:51
To simplify the discussion and for issue resolution purposes, I propose that the discussion about "large" choices containers be divided into separate discussions for (1) changes that should be applied to all maintenance releases (i.e. bug fix changes), and (2) changes that should be applied only to the in-development branch (i.e. enhancements). I propose that the current issue be used for the former. 3.4-only enhancements can be dealt with as part of a separate issue. I also created issue 16468 for the bug that Terry observed above that ArgumentParser does not in general support "choices" values that support the "in" operator. That issue exists and can be resolved independent of whether the choices collection is large.
msg175553 - (view)	Author: Chris Jerdonek (chris.jerdonek) *	Date: 2012-11-14 08:58
> (1) changes that should be applied to all maintenance releases (i.e. bug fix changes) This should instead read, "(1) changes that should be applied to all maintenance releases (e.g. bug fix and/or documentation changes)."
msg175555 - (view)	Author: Chris Jerdonek (chris.jerdonek) *	Date: 2012-11-14 09:38
> The code could simply use the str or repr of the choice object It seems to me that this would result in less user-friendly behavior in many cases. It would also require the end-user to understand Python (e.g. xrange and dictionaries), which I don't think should be necessary for the user of a command-line script. For example, in Python 2.7 the containers xrange(5, 10), xrange(2, 10, 2), and {1: "foo", 2: "bar"} currently yield the following user-friendly messages for choice 0: invalid choice: 0 (choose from 5, 6, 7, 8, 9) invalid choice: 0 (choose from 2, 4, 6, 8) invalid choice: 0 (choose from 1, 2) With the proposed change, these messages would be as follows, which seem unnecessarily obfuscated: invalid choice: 0 (choose from xrange(5, 10)) invalid choice: 0 (choose from xrange(2, 10, 2)) invalid choice: 0 (choose from {1: 'foo', 2: 'bar'}) Thus, I think the proposal above would be a regression if applied. I think any changes to maintenance releases should preserve the current user-friendly messages (when those messages are user-friendly, e.g. when the containers are small).
msg175564 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2012-11-14 12:23
I agree with Chris that using the repr in the general case would be a regression in usability for the end user (and certainly not suitable for a maintenance release). Here is some brainstorming: We could "special case" this via duck typing. If the object that represents the choices has 'start' and 'stop' attributes, use those to generate a message. ("from {start} up to but not including {stop}"). [] If it doesn't, or it also has a 'step' that is not 1, check the len, generate the list if it is less than, say, 50, and if it is more give up and use the repr. If there is no len, do the expansion (which is what happens now) and throw it away in favor of the repr if there are more than 50 elements. If there is no iter, use the repr. Then as an enhancement we can also look for a special attribute (values_description?) that gives the entire text to use in the parenthesis in the help phrase to provide a way to customize the help text in 3.4+. [] Or, at the risk of being too clever, if there is a 'step' use the message above and if there isn't a step attribute at all use "between {start} and {stop}.
msg175577 - (view)	Author: Chris Jerdonek (chris.jerdonek) *	Date: 2012-11-14 17:13
For maintenance releases, I think I would favor abbreviating the list only if it is "lossless" (e.g. for xrange objects). I think I would also be okay with abbreviating for arbitrary xranges -- in particular for arbitrary steps. For example, for xrange(0, 50, 3), I think something like the following might be better: "0, 3, 6, 9, ... 45, 48" (we could also include a difference hint).
msg175582 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2012-11-14 19:38
I agree that my extreme, strawman-ish, proposal, was, well, too extreme. Here is a more realistic proposal similar to David's. if isinstance(choices, range) and len(choices) > 50: choice_txt = range_rep(choices) # details to be determined try: choice_txt = <iterated version of choices as now> except TypeError: # because choices not iterable choice_txt = repr(choices) # can be anything for custom class Then display help or error message. If someone passes a non-(x)range iterable with 1000s of choices, I am willing to say they deserve what they asked for, once that the docs make clearer that such collections will be displayed in full. Any iterable container too large to display in full can be wrapped as a non-iterable container. ('Container' means that 'in' works.) class NonIterableContainer: def __init__(self, container, description): self.container = container self.description = description def __repr__(self): return self.description def __contains__(self, item): return item in self.container The description could point one to a file or doc that has an explicit list. If such a listing enumerates the items, the program arg could instead be an int in the appropriate range. Then the program would look up the actual arg. Ranges are special because items from any finite enumerated collection can be replaced, for input, by the corresponding int in the enumerated range. They are therefore worth special attention in help and error displays.
msg175585 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2012-11-14 19:50
I see that in #16468, Chris proposes that existing versions should let the TypeError propagate, possibly with an improved error message, and call the use of repr for non-iterables a new feature (partly on the basis that more fixes than this are needed to use them).
msg192718 - (view)	Author: paul j3 (paul.j3) *	Date: 2013-07-09 03:13
In the patch I just posted to http://bugs.python.org/issue16468 I address this long list issue in several ways: In the Usage line, the metavar gives the user an alternative In the expanded help line the user can just omit the '%(choices)s' In _check_value(), I implemented a numpy like summarize format for choice lists longer than 15 '{1,2,3,...,18,19}'.

History
Date	User	Action	Args
2022-04-11 14:57:38	admin	set	github: 60622
2013-07-09 03:13:26	paul.j3	set	messages: + msg192718
2013-06-29 07:20:47	paul.j3	set	nosy: + paul.j3
2012-11-14 19:50:35	terry.reedy	set	messages: + msg175585
2012-11-14 19:38:07	terry.reedy	set	messages: + msg175582
2012-11-14 17:13:59	chris.jerdonek	set	messages: + msg175577
2012-11-14 12:23:08	r.david.murray	set	messages: + msg175564
2012-11-14 09:38:27	chris.jerdonek	set	messages: + msg175555
2012-11-14 08:58:32	chris.jerdonek	set	messages: + msg175553
2012-11-14 08:51:31	chris.jerdonek	set	messages: + msg175552
2012-11-13 19:32:45	terry.reedy	set	keywords: - easy assignee: docs@python -> messages: + msg175520
2012-11-13 14:03:28	chris.jerdonek	set	messages: + msg175499
2012-11-11 19:55:05	terry.reedy	set	messages: + msg175388
2012-11-11 19:32:47	Juraj.Variny	set	messages: + msg175386
2012-11-11 17:53:22	terry.reedy	set	nosy: + terry.reedy, bethard messages: + msg175377
2012-11-10 19:10:41	chris.jerdonek	set	files: + issue-16418-1-default.patch assignee: docs@python keywords: + easy, patch stage: patch review versions: + Python 3.2, Python 3.3, Python 3.4 nosy: + ezio.melotti, docs@python messages: + msg175293 components: + Documentation type: behavior -> enhancement
2012-11-06 06:50:59	chris.jerdonek	set	messages: + msg174957
2012-11-06 06:38:36	chris.jerdonek	set	nosy: + chris.jerdonek messages: + msg174956
2012-11-06 02:07:30	r.david.murray	set	nosy: + r.david.murray messages: + msg174951
2012-11-05 23:11:51	Juraj.Variny	create