classification
Title: Argparse considers unknown optional arguments with spaces as a known positional argument
Type: behavior Stage:
Components: Extension Modules Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: DenKoren, bethard, paul.j3
Priority: normal Keywords: patch

Created on 2014-09-17 15:57 by DenKoren, last changed 2014-09-24 01:00 by paul.j3.

Files
File name Uploaded Description Edit
argparse.py.patch DenKoren, 2014-09-17 15:57 Bugfix for argparse 1.1
argparse_bug_example.py DenKoren, 2014-09-17 16:12 Bug demonstration
example.py paul.j3, 2014-09-21 01:32
patch_1.diff paul.j3, 2014-09-24 01:00 review
Messages (6)
msg227007 - (view) Author: Денис Кореневский (DenKoren) * Date: 2014-09-17 15:57
Argparse version 1.1 consider ANY unknown argument string containig ' ' (space character) as positional argument. As a result it can use such unknown optional argument as a value of known positional argument.

Demonstration code:

import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--known-optional-arg", "-k", action="store_true")
parser.add_argument("known_positional", action="store", type=str)
parser.parse_known_args(["--known-optional-arg", "--unknown-optional-arg=with spaces", "known positional arg"])
parser.parse_known_args(["--known-optional-arg", "--unknown-optional-arg=without_spaces", "known positional arg"])

Bugfix is attached to issue and affects ArgumentParser._parse_optional() method body.

Sorry, if it is a better way to report (or, possibly, fix) a bug than this place. It is my first python bug report.

Thanks.
msg227009 - (view) Author: Денис Кореневский (DenKoren) * Date: 2014-09-17 16:12
I've signed 'Contributor's Agreement'.

Bug demonstration code listed in issue description is in file attached to comment.

It can be run with following command:
python argparse_bug_example.py
msg227193 - (view) Author: paul j3 (paul.j3) * (Python triager) Date: 2014-09-20 22:58
Your patch fails:

    python3 -m unittest test_argparse.TestEmptyAndSpaceContainingArguments

specifically these 2 subcases:

       (['-a badger'], NS(x='-a badger', y=None)),
       (['-y', '-a badger'], NS(x=None, y='-a badger')),

--------------------

At issue is the ambiguity when the user gives you a string that looks like an optionals flag, but doesn't match one of the defined arguments.  When should it be treated as a positional, and when should it be treated as an unknown?

The code, and test says - if it has the space, treat it like a positional.  You are trying to reverse that choice - if it has the prefix, treat it like an unknown optional.  At the point where you apply the patch, we already know that the string has a prefixchar.  So you are, effectively, eliminating the 'space' test.

I've added a simpler example of where the presence of the space flips that choice.
msg227261 - (view) Author: Денис Кореневский (DenKoren) * Date: 2014-09-22 10:58
There is an standard way to solve this ambiguity.

There is a special marker '--' used to force argument parsing function treat all arguments given in command after this marker as positional arguments. It was invented specially for tasks where you need to force argument parsing engine ignore option indicator and treat argument as positional.

I know this argument as standard 'de facto', but can't find any documents about '--' interpretation.
But you can see, that Linux 'libc' standard C library function getopt() knows about this flag:
http://linux.die.net/man/3/getopt
So, any utility using this function knows about '--'.

For example: grep, tee, cat, tail, head, vim, less, rm, rmdir, common shells (sh, bash, csh, dash, ...) and so on. The list is large. Almost any utility except special ones like 'echo' which main function is to print any argument given to utility.

So, I think, that the true way here is to parse all arguments staring with '-'/'--' and listed before special '--' marker as optional, and treat all of arguments listed after '--' as positional ones and do not try to use any secondary indicators to make a decision: 'optional' or not.


For example:
"""
any parameter before '--' special marker starting with '--' is treated as optional
# python3 example.py --bar="one two" xxx
(Namespace(pos='xxx'), ['--bar=one two'])

all parameters after '--' special marker are treated as positional
# python3 example.py -- --bar="one two" xxx
(Namespace(pos='--bar=one two'), ['xxx'])
"""

Yes, my patch does not the solution and argparse should be patched another way.

Where and how can I get unittests for argparse module to check my code modifications before posting them here? I want to try to modify argparse and produce the patch changing argparse behaviour as described above.

Thanks.
msg227310 - (view) Author: paul j3 (paul.j3) * (Python triager) Date: 2014-09-22 21:25
Proposed patches like this are supposed to be generated against the current development version (3.5...), especially if they are 'enhancements' (as opposed to bugs).  But there isn't much of a difference in argparse between 2.7+ and 3.4+ (except one nested yield expression).

Tests are in 'lib/test/...'

argparse does implement the '--' syntax.  That is, anything after it is understood to be positional, regardless of its prefix characters.  But before that, optionals and positionals can be freely mixed (within limits).

I agree that '--foo=one two' looks a lot more like an unknown optional than the test case 'a badger'.  Strings with '=' are tested earlier in _parse_optional.  

A fix that passes test_argparse.py and your example is:

        if '=' in arg_string:
            option_string, explicit_arg = arg_string.split('=', 1)
            if option_string in self._option_string_actions:
                action = self._option_string_actions[option_string]
                return action, option_string, explicit_arg
            else: # added for unrecognized
                return None, option_string, explicit_arg
                # or return None, arg_string, None

But the '=' case is also tested in the following line:

    option_tuples = self._get_option_tuples(arg_string)

The obvious difference is that _get_option_tuples handles abbreviations.

I wonder if the two can be refined to reduce the duplication, and handler your case as well.

There are other bug issues dealing with multiple '--', the mixing of optionals and positionals, and controlling whether abbreviations are allowed or not.  I don't recall any others dealing with strings that contain '=' or space.
msg227403 - (view) Author: paul j3 (paul.j3) * (Python triager) Date: 2014-09-24 01:00
I've added a patch with tests that I think handles this case, without messing with existing test cases.  It adds a new 'space' test right before the existing one, one that explicitly handles an '=' arg_string.  

If the space is in the 1st part, it is marked as a positional (as it does in the existing code).  But if the space is in the argument portion has no such effect; the arg_string will be handled as an unknown optional.

        if '=' in arg_string:
            option_prefix, explicit_arg = arg_string.split('=', 1)
            if ' ' in option_prefix:
                return None
            else:
                return None, arg_string, None

The new testcase is in the TestParseKnownArgs class (near the end of test_argparse.py), and tests the different ways in which an arg_string can be allocated to a positional or unknown.

The underlying idea is that elsewhere in '_parse_optional()', an arg_string with '=' is handled just like a two part optional.  It first tries an exact match, and then tries an abbreviation match.  In that spirit, this space test also focuses on the flag part of the arg_string, not the argument part.

Much as I like this solution, I still worry about backward compatibility.  As discussed in 'http://bugs.python.org/issue9334', 'argparse does not accept options taking arguments beginning with dash (regression from optparse)', compatibility in this _parse_optional() function is a serious issue.  There the proposed patch adds a switch to the API.  But an API change is itself messy, and not to be taken lightly if it isn't needed.

This patch has some comments that should be stripped out before it is actually applied.  There is a quite a backlog argparse issues, so I don't expect quick action here.
History
Date User Action Args
2014-11-30 02:48:24berker.peksaglinkissue22909 superseder
2014-09-24 01:00:45paul.j3setfiles: + patch_1.diff

messages: + msg227403
2014-09-22 21:25:09paul.j3setmessages: + msg227310
2014-09-22 10:58:57DenKorensetmessages: + msg227261
2014-09-21 01:32:46paul.j3setfiles: + example.py
2014-09-21 01:32:31paul.j3setfiles: - example.py
2014-09-20 22:58:18paul.j3setfiles: + example.py
nosy: + paul.j3
messages: + msg227193

2014-09-19 18:48:23terry.reedysetnosy: + bethard
2014-09-17 16:12:20DenKorensetfiles: + argparse_bug_example.py

messages: + msg227009
2014-09-17 15:57:56DenKorencreate