classification
Title: argparse should accept json and yaml argument types
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: bethard, bob.ippolito, bradengroom, derelbenkoenig, mauvilsa, paul.j3, rhettinger
Priority: normal Keywords:

Created on 2018-10-16 23:59 by derelbenkoenig, last changed 2020-12-18 23:13 by rhettinger. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 23849 merged rhettinger, 2020-12-18 23:13
Messages (12)
msg327848 - (view) Author: Robert Benson (derelbenkoenig) Date: 2018-10-17 00:15
Using `argparse`, I wanted to create an argument that was a JSON dictionary. I found that using this in combination with the `fromfile_prefix_args` keyword argument, that the parser assumes that each argument provided in the file must be on a single line. I want the module to be able to support json files that may be pretty-printed. If it is to accept JSON in this manner, it would be not much more effort to implement YAML parsing as well
msg327867 - (view) Author: paul j3 (paul.j3) * (Python triager) Date: 2018-10-17 06:47
The results of reading one of these @-prefix files are just spliced into the `argv` list before it is parsed.  This is done early in the parsing method, in the 

_read_args_from_files

method.  The documentation illustrates how this file reading can be modified to take several strings from each line:

https://docs.python.org/3/library/argparse.html#argparse.ArgumentParser.convert_arg_line_to_args

That shouldn't be taken as the only possible modification.

That said, I don't see how reading from a JSON or YAML file fits with this mechanism.  Presumably those would define some dictionary like key:value pairs.  I assume you'd want to enter those directly into the Namespace, not the argv list that will be parsed.  Or otherwise merged with a dictionary produced by parsing the other commandline strings.

So what you want to do with a JSON file, and how that relates to argparse is not clear.  You need to elaborate before we can discuss this issue further.

You might want to search on Stackoverflow with the tags '[argparse] [json]' to see how others have tried to use the two together.
msg327898 - (view) Author: Robert Benson (derelbenkoenig) Date: 2018-10-17 13:54
What I'm talking about is reading a single arg (of a dictionary or collection type) that can be split across multiple lines, rather than a single line containing multiple args

My motivation was that reading args from a file should behave in a manner similar to other command-line utilities, such as the `-d` option for `curl` and the `-e` option for `ansible`. These take the entire file you give it and store it as one dictionary or object, not by merging it with the rest of the namespace but by taking the dictionary as the value of just that arg. So:

argument_parser.add_argument("-d", "--data", type=argparse.JsonType)  # just for example

if I call the program with `--data @foo.json`
I want argument_parser.parse_args().data to be the dict that is in foo.json, whether foo.json is pretty-printed or not.

I haven't done an exhaustive search of StackOverflow, but seeing a couple top answers indicated that this was not readily available without the user at least having to call `json.loads` on a string argument themselves, when it seems logical that it would be built into the library to parse the json into a dictionary
msg327911 - (view) Author: paul j3 (paul.j3) * (Python triager) Date: 2018-10-17 17:38
If I define a 'type' function like:

def readdata(astr):
    if astr.startswith('@'):
        with open(astr[1:], 'r') as f:
            d = json.load(f)
            return d
    else:
        return astr

I can use it to load a json file, or use the string directly:

In [59]: parser = argparse.ArgumentParser()
In [60]: parser.add_argument('-d','--data', type=readdata);
In [61]: parser.parse_args(['-d','@foo.json'])
Out[61]: Namespace(data={'foo': 12, 'bar': 'twelve'})
In [62]: parser.parse_args(['-d','xxx'])
Out[62]: Namespace(data='xxx')

This seems to behave as the 'curl' and 'ansible' examples you give, where the interpretation of the '@' is option specific.

A fully functional version of this type function needs to catch possible errors (not a file, not proper json, etc) and raise a ValueError or argparse.ArgumentTypeError.

The fact that 'curl -d' uses the '@', and 'fromfile_prefix_args' uses '@' in the documentation, should be seen as purely coincidental.  argparse would be just as happy using '#' or '%' as fromfile-prefix characters, just so long as the shell passes them unchanged to 'sys.argv'.  Conversely a type function can pay attention to special characters without needing to define them in the parser definition.

argparse doesn't define many custom 'type' functions.  Mostly it depends on people using stock Python functions like 'int' and 'float', or writing their own functions.  This keeps things simple for the common uses, while giving more advanced users a lot of flexibility.

This '@file' sensitivity could also be built into a custom Action subclass.  There too, argparse has defined a set of common cases, but lets the users customize to their heart's content.
msg327922 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-10-18 00:51
Bob, do you prefer the current arrange where the user needs to write their own "type" function to handle JSON and YAML, or would you like to add this as a built-in option?
msg327981 - (view) Author: Braden Groom (bradengroom) * Date: 2018-10-18 15:43
I could try adding JSON and YAML argument types if that's what we want to do.
msg327987 - (view) Author: paul j3 (paul.j3) * (Python triager) Date: 2018-10-18 16:57
Adding a new 'type' function or factor is certainly possible, and probably will be free of backward compatibility issues.  But so far no other custom type has been added to argparse.

https://bugs.python.org/issue23884 - rejects adding a DateTime class

https://bugs.python.org/issue22884 - there are several outstanding issue dealing with the FileType class.  That hasn't aged well, since file IO standards have changed over the years.
msg327995 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2018-10-18 18:37
I don't think that this has anything in particular to do with the json module, at least it certainly shouldn't need any additional functionality from there.

YAML parsing isn't available in the stdlib last I checked, so that is probably not really up for consideration for direct integration.

In any case, I think the best approach would be to first do some research (StackOverflow, GitHub, etc.) to see how other folks are doing this in the wild, to see if there's a common pattern that should be made available in the stdlib.
msg328000 - (view) Author: paul j3 (paul.j3) * (Python triager) Date: 2018-10-18 19:35
This kind of file read can be done just as easily after parsing.  For example using the function in my previous post:

In [127]: parser = argparse.ArgumentParser()
In [128]: parser.add_argument('-d','--data');
In [129]: args = parser.parse_args(['-d','@foo.json'])
In [130]: args
Out[130]: Namespace(data='@foo.json')
In [131]: args.data = readdata(args.data)
In [132]: args
Out[132]: Namespace(data={'foo': 12, 'bar': 'twelve'})

I've pointed out in various SO answers that using the 'type' parameter has just a few benefits.

- If you have many arguments that require this conversion, using type is a little more streamlined.  But it's not hard to iterate of a list of 'dest'.

- Using type routes the errors through the standard argparse mechanism, including the display of usage and system exit.  But the type function has to raise TypeError, ValueError, or ArgumentTypeError.  But you can also use 'parser.error(...)' in the post parsing processing.

Error handling is the make-or-break-it issue.  What kinds of errors do we want to handle?  What if the file name is bad or not accessible?  What if the file is poorly formatted JSON?  How do other APIs handle these errors?

For example the function that I defined can raise a FileNotFoundError if the file isn't found, or a JSONDecodeError if the file is badly formed.  JSONDecodeError is a subclass of ValueError, but IO errors are not.

If such a type function is added to argparse, the unittest file, test_argparse.py will have to have a number of test cases.  It will have to create a valid json test file, and possibly an invalid one.  It will have test working cases, and various error cases.  The amount of testing code is likely to be many times larger than the function code itself.  And we can't overlook the documentation additions.
msg328020 - (view) Author: Braden Groom (bradengroom) * Date: 2018-10-19 03:04
I agree with Paul. This is probably simple enough for applications to implement. It also doesn't make sense to add either of these if the precedent was set by rejecting DateTime previously.
msg354039 - (view) Author: Mauricio Villegas (mauvilsa) Date: 2019-10-06 15:39
FYI there is a new python package that extends argparse with the enhancements proposed here and more.

https://pypi.org/project/jsonargparse/
msg354048 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-10-06 20:54
I am going to decline this feature request on the principle of keeping our modules loosely coupled and orthogonal to one another.  

As paul.j3 pointed out, the "type" option seems to work best with simple conversions like "int".  As pointed-out by another respondent, FileType hasn't aged well and it may have been a mistake to include it all.  Likewise other converters like DateTime have been previously discussed and rejected.

There is a gray area between where argument parsing stops and application logic begins.   Since this is a standard library module, we should draw the line at simple converters/validators like "int", leaving anything more complex for downstream logic.  That will afford users greater flexibility and control than supported by the add_argument() API which intentionally only handles common cases.

[Braden Groom]
> I agree with Paul. This is probably simple enough for applications
> to implement. It also doesn't make sense to add either of these
> if the precedent was set by rejecting DateTime previously.

I concur as well.

[Mauricio Villegas]
> FYI there is a new python package that extends argparse 
> with the enhancements proposed here and more.

Thank you for the link. It will likely prove to be a valuable resource for people finding this issue in the future.  The referenced project shows the value of external projects being able to go in directions that are well beyond the scope of our standard library module: "Not exclusively intended for parsing command line arguments. The main focus is parsing yaml or jsonnet configuration files and not necessarily from a command line tool."
History
Date User Action Args
2020-12-18 23:13:40rhettingersetpull_requests: + pull_request22713
2019-10-06 20:54:47rhettingersetstatus: open -> closed
messages: + msg354048

assignee: bob.ippolito -> rhettinger
resolution: rejected
stage: resolved
2019-10-06 15:39:36mauvilsasetnosy: + mauvilsa
messages: + msg354039
2018-10-19 03:04:28bradengroomsetmessages: + msg328020
2018-10-18 19:35:31paul.j3setmessages: + msg328000
2018-10-18 18:37:48bob.ippolitosetmessages: + msg327995
2018-10-18 16:57:30paul.j3setmessages: + msg327987
2018-10-18 15:43:57bradengroomsetnosy: + bradengroom
messages: + msg327981
2018-10-18 00:51:42rhettingersetassignee: bob.ippolito

messages: + msg327922
nosy: + bob.ippolito, rhettinger
2018-10-17 17:38:36paul.j3setmessages: + msg327911
2018-10-17 13:54:57derelbenkoenigsetmessages: + msg327898
2018-10-17 06:47:21paul.j3setmessages: + msg327867
2018-10-17 02:09:09zach.waresetnosy: + bethard, paul.j3

versions: + Python 3.8
2018-10-17 00:15:38derelbenkoenigsetmessages: + msg327848
2018-10-17 00:04:10derelbenkoenigsetversions: - Python 2.7
2018-10-17 00:00:41derelbenkoenigsetversions: + Python 2.7
2018-10-16 23:59:54derelbenkoenigcreate