[From the old argparse tracker: http://code.google.com/p/argparse/issues/detail?id=20]
You can't follow a nargs='+' optional argument with a positional argument:
>>> import argparse
>>> parser = argparse.ArgumentParser(prog='PROG')
>>> parser.add_argument('--badger', nargs='+')
>>> parser.add_argument('spam')
>>> parser.parse_args('--badger A B C D'.split())
usage: PROG [-h] [--badger BADGER [BADGER ...]] spam
PROG: error: too few arguments
Ideally, this should produce:
>>> parser.parse_args('--badger A B C D'.split())
Namespace(badger=['A', 'B', 'C'], spam='D')
The problem is that the nargs='+' causes the optional to consume all the arguments following it, even though we should know that we need to save one for the final positional argument.
A workaround is to specify '--', e.g.:
>>> parser.parse_args('--badger A B C -- D'.split())
Namespace(badger=['A', 'B', 'C'], spam='D')
The problem arises from the fact that argparse uses regular-expression style matching for positional arguments, but it does that separately from what it does for optional arguments.
One solution might be to build a regular expression of the possible things a parser could match. So given a parser like::
parser = argparse.ArgumentParser()
parser.add_argument('-w')
parser.add_argument('-x', nargs='+')
parser.add_argument('y')
parser.add_argument('z', nargs='*')
the regular expression might look something like (where positionals have been replaced by the character A)::
(-w A)? (-x A+)? A (-w A)? (-x A+)? A* (-w A)? (-x A+)?
Note that the optionals can appear between any positionals, so I have to repeat their regular expressions multiple times. Because of this, I worry about how big the regular expression might grow to be for large parsers. But maybe this is the right way to solve the problem.
|
I've played a bit the idea that barthard sketched. I don't have all the details worked out, but I believe this is what will happen:
With
parser = argparse.ArgumentParser()
parser.add_argument('-w')
parser.add_argument('-x', nargs='+')
parser.add_argument('y')
parser.add_argument('z', nargs='*')
some possible parses are
'-w 1 -x 2 3 4 5', # w:1, x:[2,3,4], y:5, z:[] -
# fail +
'-w 1 2 -x 3 4 5', # w:1, y:2, x:[3 4 5], z:[] +
'-w 1 -x 2 3', # w:1, x:[2], y:3, z:[] -
# fail +
'-x 1 2 -w 3 4 5 6', # w:3, x:[1,2], y:4, z:[5,6] +
# w:3, x:[1], y:2, z:[4,5,6] -
'-x 1 2 3 4 -w 5 6 7', # w:5, x:[1,2,3,4], y:5, z:[7] +
# w:5, x:[1,2,3], y:4, z:[6,7] -
'1 2 3 -x 4 5 -w 6', # w:6, x:[4,5], y:1, z:[2,3] +
'+' lines are those currently produced
'-' lines are ones that would be produced by these ideas
'-w 1 -x 2 3 4 5' is the protypical problem case. The current parser allocates all [2,3,4,5] to -x, leaving none for y, thus failing. So desired solution is to give 5 to y, leaving -x with the rest.
'-x 1 2 -w 3 4 5 6' is a potentially ambiguous case. The current parser lets -x grab [1,2]; y then gets 4, and z the remainder. But the alternative is to give 2 to y, leaving -x with just [1].
In this case
arg_strings_pattern = 'OAAOAAAA'
replacing the Os with the option flags: '-xAA-wAAAA'
I match this with a refined version of bethard's regex:
pat1='((?:-wA)|(?:-xA+)|(?:-wA-xA+)|(?:-xA+-wA))'
pat = _re.compile('%s?(?P<y>A)%s?(?P<z>A*)%s?'%(pat1,pat1,pat1))
groups (without the Nones) and groupdict are
['-xA', 'A', '-wA', 'AAA']
{'z': 'AAA', 'y': 'A'}
So this does effectively give y the 2nd argument, leaving -x with just the 1st.
The current parser effectively groups the arguments as
['-xAA, '-wA', 'A', 'AA']
In the real world, generating and apply a global pattern like this could get complicated. For example there are long option names ('--water'), and combined argument strings ('-x1', '-x=1').
|
I need to make one correction to my last post:
'-x 1 2 -w 3 4 5 6', # w:3, x:[1,2], y:4, z:[5,6] +
# w:3, x:[1], y:2, z:[4,5,6] -
The second solution is only possible if 'z' is not consumed when 'y' is being processed. In current version, if consume_positionals() is called with a 'AOAAAA' pattern, 'y' will match the first 'A', and 'z' will match ''. That means '4 5 6' will be left over.
It's only when I use the patch in http://bugs.python.org/issue14191#msg187051
(argparse doesn't allow optionals within positionals)
that the processing 'z' is delayed, so it can get [4,5,6].
So at least with the 4 arguments in this example, bethard's idea only seems to make a difference in the case of '-w 1 -x 2 3 4 5', where 'y' lays claim to the last string, and '-x' gets the rest.
|
This patch implements, I think, the ideas bethard proposed. It is test patch, not intended for production.
Most of work is in ArgumentParser._get_alt_length() which
- generates a pattern along the lines bethard proposed
- generates a string like arg_strings_pattern, but with optionals strings ('-x') instead of 'O'.
- runs a match
- from groups like '-xAAA', creates dict entries like:
alt_opt_length['x'] = 3
Later, in consume_optionals(), this alternative count replaces arg_count if it is lower. The next consume_positionals() then takes care of consuming the unconsumed arguments.
If _get_alt_length() has any problems, it logs an error, and returns an otherwise empty dict. So it 'fails' quietly without affecting regular parsing.
Reasons for failing include (for now) the use of subparsers, optionals with explicit args, and special prefix_chars. With exclusions like this, test_argparse.py runs without errors or failures.
Since this is still a testing vehicle, it writes an issue9338.log file with debugging entries.
This version works, but is both not sufficiently general and too general. As bethard notes, the testing pattern could get very large if there are many optionals. Ideally the pattern will allow the optionals in any order and combination between positionals. The ambiguities that I discussed in the previous 2 posts disappear if the patching pattern is sufficiently general.
But I also suspect it is too general. It does not need to match every case, just those where an optional is consuming arguments that should go to a positional. But if we come up with something more specific, this could still be a useful testing tool.
|