Title: Argparse improperly handles "-_"
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.6
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Max Rothman, martin.panter, paul.j3, r.david.murray
Priority: normal Keywords:

Created on 2017-03-03 21:07 by Max Rothman, last changed 2017-03-15 06:49 by mbdevpl.

Messages (9)
msg288929 - (view) Author: Max Rothman (Max Rothman) Date: 2017-03-03 21:07
In the case detailed below, argparse.ArgumentParser improperly parses the argument string "-_":
import argparse

parser = argparse.ArgumentParser()

Expected behavior: prints Namespace(first='-_')
Actual behavior: prints usage message

The issue seems to be specific to the string "-_". Either character alone or both in the opposite order does not trigger the issue.
msg288939 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-03-03 22:24
Have you tried '-' plus any other character?  argparse treats '-' and '--' specially, and this is a known issue.
msg288946 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-03-04 00:32
This is actually expected behaviour of the “argparse”, as well as general Unix CLI programs. See the documentation <>. The general workaround is to use a double-dash separator:

>>> parser.parse_args(['--', '-_'])

Example with the Gnu “rm” command:

$ echo "make a file" >-_
$ rm -_
rm: invalid option -- '_'
Try 'rm ./-_' to remove the file '-_'.
Try 'rm --help' for more information.
[Exit 1]
$ rm -- -_  # Double dash also works

Although I suppose the error message could be improved. Currently it looks like it ignores the argument:

>>> parser.parse_args(['-_'])
usage: [-h] first
: error: the following arguments are required: first
__main__.SystemExit: 2
msg288987 - (view) Author: Max Rothman (Max Rothman) Date: 2017-03-04 17:09
Martin: huh, I didn't notice that documentation. The error message definitely could be improved.

It still seems like an odd choice given that argparse knows about the expected spec, so it knows whether there are any options or not. Perhaps one could enable/disable this cautious behavior with a flag passed to ArgumentParser? It was rather surprising in my case, since I was parsing morse code and the arguments were random combinations of "-", "_", and "*", so it wasn't immediately obvious what the issue was.
msg289504 - (view) Author: paul j3 (paul.j3) * Date: 2017-03-12 17:45
I think that this string falls through to the last case in 'parser._parse_optional' (the first parsing loop)

        # it was meant to be an optional but there is no such option
        # in this parser (though it might be a valid option in a subparser)
        return None, arg_string, None

It has the format of a optional flag, not a positional argument.  If preceded by '--' it gets classed as argument.

(In the second, main, parsing loop) Since it doesn't match any defined Actions it gets put in the list of 'extras' (as returned by 'parse_known_args').  But the parser also runs a check on required arguments, and finds the positional, 'first', was not filled.  So that's the error that's raised.

For example if I provide another string that fills the positional:

    In [5]: parser.parse_known_args(['-_','other'])
    Out[5]: (Namespace(first='other'), ['-_'])

'parse_args' would produce a 'error: unrecognized arguments: -_' error.

I don't see how the error message could be improved without some major changes in the testing and parsing.  It would either have to disallow unmatched optional's flags (and maybe break subparsers) or deduce that this 'extra' was meant for the unfilled positional.  Bernard has argued that it is better to raise an error in ambiguous cases, than to make too many assumptions about what the user intended.
msg289521 - (view) Author: Max Rothman (Max Rothman) Date: 2017-03-13 02:25
I think that makes sense, but there's still an open question: what should the correct way be to allow dashes to be present at the beginning of positional arguments?
msg289530 - (view) Author: paul j3 (paul.j3) * Date: 2017-03-13 04:11, 'argparse does not accept options taking arguments beginning with dash (regression from optparse)'

is an old discussion about strings that begin with a dash and don't 
match defined flags.

One proposal was to add a 'args_default_to_positional' parameter, and change the parsing that I described before to:

+        # behave more like optparse even if the argument looks like a option
+        if self.args_default_to_positional:
+            return None
         # instead of return None, arg_string, None

There's a long discussion but nothing was changed (not even the test for negative numbers).

Two work arounds still apply -- -_       # use -- to signal positional values --first=-_  # = to attach any string to optional

(in my previous post I cited 'Bernard', I meant the module's original author, Steven Bethard.  He's no longer actively involved in these bug/issues.)
msg289536 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-03-13 10:42
Max, I’m not sure if you saw the double-dash (--) workaround. IMO that is the “correct” way to do this for Unix command lines, and for the current version of “argparse”. But I guess that may be too inconvenient for your Morse Code case. Perhaps you can write your own custom sys.argv parser, or find some other argument handling library out there that doesn’t follow the usual Unix conventions.

I don’t really like the proposal from Issue 9334 (classifying CLI arguments based on registered options). It seems hard to predict and specify (too complex) for only a minor use case. Although it does fix part of the other problem with option arguments, it is not a general solution.

Assuming “-h” and “--help” are registered by default, how would an invocation like “ -hi” be treated under the proposal (currently an error because -h does not accept an argument)? What about “ -help”? What about “ --h”, currently treated as an abbreviation of “--help”?
msg289555 - (view) Author: paul j3 (paul.j3) * Date: 2017-03-13 23:38
The change to `_parse_optional` that did go through is the ability to turn off abbreviations

Even that has had a few complaints,
Date User Action Args
2017-03-15 06:49:57mbdevplsettitle: Arparse improperly handles "-_" -> Argparse improperly handles "-_"
2017-03-13 23:38:03paul.j3setmessages: + msg289555
2017-03-13 10:42:19martin.pantersetmessages: + msg289536
2017-03-13 04:11:15paul.j3setmessages: + msg289530
2017-03-13 02:25:13Max Rothmansetmessages: + msg289521
2017-03-12 17:45:04paul.j3setmessages: + msg289504
2017-03-12 16:54:22paul.j3setnosy: + paul.j3
2017-03-04 17:09:36Max Rothmansetmessages: + msg288987
2017-03-04 00:32:46martin.pantersetnosy: + martin.panter
messages: + msg288946
2017-03-03 22:24:55r.david.murraysetnosy: + r.david.murray
messages: + msg288939
2017-03-03 21:07:15Max Rothmancreate