Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argparse.FileType for '-' doesn't work for a mode of 'rb' #58364

Closed
anacrolix mannequin opened this issue Feb 29, 2012 · 33 comments
Closed

argparse.FileType for '-' doesn't work for a mode of 'rb' #58364

anacrolix mannequin opened this issue Feb 29, 2012 · 33 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@anacrolix
Copy link
Mannequin

anacrolix mannequin commented Feb 29, 2012

BPO 14156
Nosy @naufraghi, @bitdancer, @serhiy-storchaka, @mgrandi, @MojoVampire, @Lekensteyn, @palaviv, @evanunderscore, @sedrubal, @miss-islington, @woodruffw
PRs
  • bpo-14156:Add argparse.FileType for a mode of 'rb' and 'wb' #9124
  • bpo-14156: Make argparse.FileType work correctly for binary file mode… #13165
  • [3.10] bpo-14156: Make argparse.FileType work correctly for binary file modes when argument is '-' (GH-13165) #31706
  • [3.9] bpo-14156: Make argparse.FileType work correctly for binary fil… #31979
  • Files
  • argparse-filetype-stdio-modes.patch
  • 14156.patch
  • 14156-2.patch: patch with tests working
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2022-03-18.15:03:40.764>
    created_at = <Date 2012-02-29.11:01:10.726>
    labels = ['type-bug', 'library', '3.9', '3.10', '3.11']
    title = "argparse.FileType for '-' doesn't work for a mode of 'rb'"
    updated_at = <Date 2022-03-18.15:03:40.762>
    user = 'https://github.com/anacrolix'

    bugs.python.org fields:

    activity = <Date 2022-03-18.15:03:40.762>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2022-03-18.15:03:40.764>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2012-02-29.11:01:10.726>
    creator = 'anacrolix'
    dependencies = []
    files = ['24868', '26484', '42072']
    hgrepos = []
    issue_num = 14156
    keywords = ['patch', 'needs review']
    message_count = 33.0
    messages = ['154612', '154641', '155761', '155923', '162342', '166170', '215299', '215410', '221271', '221456', '221457', '261218', '282851', '282922', '282948', '283210', '283213', '283216', '283463', '293095', '294544', '297997', '298237', '324859', '324892', '325011', '341767', '341796', '373025', '414479', '414614', '414615', '415501']
    nosy_count = 17.0
    nosy_names = ['bethard', 'naufraghi', 'r.david.murray', 'eli.bendersky', 'paul.j3', 'serhiy.storchaka', 'moritz', 'markgrandi', 'josh.r', 'lekensteyn', 'palaviv', 'evan_', 'sedrubal', 'Marcel H2', 'miss-islington', 'Leo Singer', 'yossarian']
    pr_nums = ['9124', '13165', '31706', '31979']
    priority = 'high'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue14156'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @anacrolix
    Copy link
    Mannequin Author

    anacrolix mannequin commented Feb 29, 2012

    If an argument of '-' is handled by argparse.FileType, it defaults to sys.stdin. However a mode of 'rb' is ignored, the returned file object does not work with raw bytes.

    @anacrolix anacrolix mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Feb 29, 2012
    @bethard
    Copy link
    Mannequin

    bethard mannequin commented Feb 29, 2012

    Yes, the problem is in FileType.__call__ - the handling of '-' is pretty simplistic. Patches welcome.

    @anacrolix
    Copy link
    Mannequin Author

    anacrolix mannequin commented Mar 14, 2012

    Roger that. I'll start on a patch for this in a month or two if all goes well.

    @anacrolix
    Copy link
    Mannequin Author

    anacrolix mannequin commented Mar 15, 2012

    Steven, patch attached. I lost steam in the unittests with all the meta, suffice it that the names match the file descriptors of the stream sources. i.e. FileType('rb') would give a file with name=0, and so forth. My chosen method also allows other mode flags as well as custom bufsizes.

    @moritz
    Copy link
    Mannequin

    moritz mannequin commented Jun 5, 2012

    I don't know how if this is the perfect solution but it keeps the program from crashing.

    1154c1154,1157
    < return _sys.stdin
    ---

                if 'b' in self.\_mode:
                    return \_sys.stdin.buffer
                else:
                    return \_sys.stdin
    

    1156c1159,1162
    < return _sys.stdout
    ---

                if 'b' in self.\_mode:
                    return \_sys.stdout.buffer
                else:
                    return \_sys.stdout
    

    @moritz moritz mannequin added type-crash A hard crash of the interpreter, possibly with a core dump and removed type-bug An unexpected behavior, bug, or error labels Jun 5, 2012
    @bethard
    Copy link
    Mannequin

    bethard mannequin commented Jul 22, 2012

    The fix looks right, but we definitely need a test. I tried to write one, but I'm not sure how to do this properly given how test_argparse redirects standard input and output (so that fileno() doesn't work anymore). I've attached my current (failing) attempt to test this.

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented Apr 1, 2014

    A related issue http://bugs.python.org/issue13824
    "argparse.FileType opens a file and never closes it"

    I proposed a FileContext class that returns a `partial(open, filename,...)' context object. The issues of how to deal with stdin/out are similar.

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented Apr 3, 2014

    There are a couple of complications to using 'fileno'.

    We probably don't want to close 'sys.stdin' or 'sys.stdout' (not even if they are redirected to other files?). That means using:

    open(sys.stdin.fileno(), ..., closefd=False)
    

    'closefd', on the other hand, has to be True for string file specifications.

    But in 'test_argparse.py', 'sys.stdout' is redirected to an 'io.StringIO'. This has many of the same features as an open file, but 'fileno' is not implemented. So the TypeFile probably needs to make an exception for this case. I don't how this will play with a 'BytesIO' for 'wb' cases.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jun 22, 2014

    Nosy-ing myself since I just ran into it. Annoying issue that precludes from using argparse's builtin '-' recognition for reading binary data.

    I'll try to carve some time later to look at the patches.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jun 24, 2014

    The patch looks reasonable? Is the only remaining problem with crafting the test?

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jun 24, 2014

    [sorry, the first question mark shouldn't be - the patch indeed looks reasonable to me]

    Steven - how about launching a subprocess for stdin tests to avoid weird issues?

    @serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Sep 30, 2014
    @palaviv
    Copy link
    Mannequin

    palaviv mannequin commented Mar 5, 2016

    I fixed the tests to work with Steven patch. Also changed the patch to open sys.std{in,out} with closefd=False.

    I changed the 'io.StringIO' that we redirect the stdout, stderr to. Now the 'StdIOBuffer' return the real stdout,stderr when '-' is passed to FileType. This was already done in the 'stderr_to_parser_error' function previously after the call to 'parse_args'.

    @palaviv
    Copy link
    Mannequin

    palaviv mannequin commented Dec 10, 2016

    Pinging as mentioned in the devguide.

    @anacrolix
    Copy link
    Mannequin Author

    anacrolix mannequin commented Dec 11, 2016

    This is why I stopped contributing to Python.

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented Dec 11, 2016

    The problem with the argparse backlog is that the original author of the module is too busy with other things. And no one has stepped into his shoes. There are people with experience in apply patches, and people who know the argparse code well, but few, if any with both skills (and/or the time to invest in this module).

    In addition the module has some serious backward compatibility issues. I know of several patches that were applied, and then withdrawn because of unforseen (or at least untested) compatibility problems.

    While I commented earlier, I don't recall testing it. I just tried it now, and ran into problems - until I realized this isn't compatible with Python2.7. Py3 is the development world, but there's still a lot of PY2 use (e.g look at Stackoverflow argparse questions). On SO if people have problems with FileType, I often recommend that they just accept the filename, and take care of opening it themselves.

    To raise the attention to this patch I'd suggest

    • making the case that it is really needed

    • demonstrating that it has been field tested, and is free of backward compatibility issues.

    @palaviv
    Copy link
    Mannequin

    palaviv mannequin commented Dec 14, 2016

    Hi paul thanks for looking into this. First are you sure this is a bug in python 2? If so I will happily port this patch once it is reviewed.
    As for use cases you may look at issue bpo-26488. Although the patch was rejected you can see that I first used argparse.FileType and moved it because of this issue. I can't prove this patch is free of backward's compatibility issue's but there are tests now which should help to avoid problem.

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented Dec 14, 2016

    The problem with Python2.7 is that 'open' does not take 'closefd', or any of the other parameters that were added for Python3.

        open(name[, mode[, buffering]])

    'rb' may make a difference on Py2 on Windows, but I haven't done any work in the environment in a long time.

    I wasn't aware of that other issue. Some core Python developers have participated in that one. I suspect a lot of the discussion is beyond my level of expertise.

    I once wrote that I thought 'FileType' was included primarily as an example of a 'type' factory. Something users could copy and extend for their own use. Bethard corrected me, saying that it was meant for quick-n-dirty script uses, ones with an input file, output file and a few options. In a bigger scripts, the users are encouraged to open/close files in 'with' contexts.

    See http://bugs.python.org/issue22884 and the issues I reference there.

    @palaviv
    Copy link
    Mannequin

    palaviv mannequin commented Dec 14, 2016

    As we talk here about stdin and stdout which we don't want to close I think this issue is irrelevant to python2. In any case the patch in issue bpo-13824 cover that problem.
    I actually write a lot of small scripts and if I need to open the file in binary mode I don't use FileType because of this issue.
    Any way I think this discussion is irrelevant because it does not seem like any core developer is currently working on argparse.

    @evanunderscore
    Copy link
    Mannequin

    evanunderscore mannequin commented Dec 17, 2016

    This issue is relevant to Python 2 on Windows since you need to disable the EOL conversion if you're trying to receive binary data on stdin.

    See: http://stackoverflow.com/questions/2850893/reading-binary-data-from-stdin

    @naufraghi
    Copy link
    Mannequin

    naufraghi mannequin commented May 5, 2017

    Bumped in this bug yesterday, sadly a script working (by chance) in Python2 doesn't work in Python3 because of this bug.

    @MarcelH2
    Copy link
    Mannequin

    MarcelH2 mannequin commented May 26, 2017

    I want to see this fixed in python3.x as well, please :) the patch should be the same

    @MarcelH2 MarcelH2 mannequin added the 3.7 (EOL) end of life label May 26, 2017
    @sedrubal
    Copy link
    Mannequin

    sedrubal mannequin commented Jul 9, 2017

    What is the problem with using the patch by moritz (https://bugs.python.org/issue14156#msg162342) and change the unit test to this:

    class TestFileTypeWB(TempDirMixin, ParserTestCase):
        ...
        successes = [
            ...,
            ('-x - -', NS(x=sys.stdout.buffer, spam=sys.stdout.buffer)),
        ]

    and respectively for stdin?

    @bitdancer
    Copy link
    Member

    The biggest problem, as paul.j3 says, is to get someone from core to review the argparse issues. I am currently planning to make argparse one of my foci in a sprint we are doing at the beginning of September, so there is some hope....

    Any reviews/testing people do on argparse patches between now and then will be helpful.

    @LeoSinger
    Copy link
    Mannequin

    LeoSinger mannequin commented Sep 8, 2018

    I just hit this bug. Would the proposed patch get any more attention if submitted as a pull request?

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented Sep 9, 2018

    It's been sometime since I looked at this issue.

    The main sticking point is passing unittests, and ensuring that there are no backward compatibility issues.

    But, since FileType is a standalone class, anyone could put a corrected version in their own workspace without modifying their stock version. The 'type' parameter is designed for this kind of flexibility - it accepts any callable, whether a function, or a class with a __call__ method.

    @serhiy-storchaka serhiy-storchaka added the 3.8 only security fixes label Sep 11, 2018
    @serhiy-storchaka
    Copy link
    Member

    The solution with fileno() is clever, but as was mentioned before, it doesn't work if stdin or stdout are not real files, but something like StringIO. It is not that in common use of argparse for parsing arguments in scripts they are redefined, but argparse can be used in uncommon environments, for example for emulating command line in the environment like IDLE which redefines standard streams. And I'm sure this will break third-party tests which main() with patched stdin/stdout for testing CLI. The initial solution proposed by Moritz is more reliable, although it doesn't fix an issues with closing stdin/stdout. But this is a different issue at all.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented May 7, 2019

    Serhiy: To be clear, Moritz's patch won't work if stdin/stdout are io.StringIO or the like either, since StringIO lacks a buffer attribute.

    Personally, I'm content to:

    1. Ignore the closeability of the standard handles; that's not part of this bug, and just introduces more headaches to getting a patch approved (and conceivably breaks back compat if the user *wants* the handle closed even if it's a standard handle). Nothing is made worse by ignoring it after all, just not made better immediately.
    2. Ignore the possibility of stdin/stdout without a buffer attribute; it will raise an error, but that's a reasonable response to demanding a binary stream in a scenario where only a text stream is available
    3. Not use fileno, as it mostly supports the "anti-close" behavior I dismissed in Support "bpo-" in Misc/NEWS #1, and has a bunch of pitfalls (e.g. a standard handle rebound to a TextIOWrapper around GzipFile or the like has a fileno attribute, but using it bypasses the compression; basically, you can't consider fileno to be equivalent to the underlying binary stream, because binary streams can be wrapped as well).

    I'm going to convert Moritz's proposal to a PR and hope I can get core developer approval for it.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented May 7, 2019

    I've created PR13165 to address this bug. It's 99% test updates; the actual code changes to argparse.py are trivial and should be equally trivial to review.

    The only functional change beyond Moritz's proposal is that I added support for having accepting '-' when the mode string uses 'a' or 'x' instead of 'w'; for sys.stdout, they're all effectively equivalent ('x' is trying to prevent stomping an existing file, which borrowing sys.stdout won't do, and sys.stdout is already more closely equivalent to mode 'a' in any event). No working code should break as a result of that change (passing 'a' or 'x' previously just caused FileType to exit immediately with a ValueError, which in turn caused parse_args to kill the program, which I'm assuming isn't considered a valuable "feature").

    In addition to testing binary mode with argument '-' properly, I also added complete test cases for mode 'x' and 'xb' (for all arguments, both file names and '-') since we had no such tests, and ensuring exclusive creation mode behaves correctly is fairly important.

    @Lekensteyn
    Copy link
    Mannequin

    Lekensteyn mannequin commented Jul 5, 2020

    I just ran into this issue on Linux when piping a binary file to stdin resulted in a UnicodeDecodeError while trying to read a byte from the stream. Passing /dev/stdin is a workaround that does not require modifications to an application.

    As for the proposed PR 13165, I'd suggest to gracefully fallback to normal stdout/stdin if the buffer is not available. That approach is also followed in the fileinput module, and takes care of the note for library developers in the documentation at https://docs.python.org/3/library/sys.html#sys.stdin

    Not feeling particular strong about the graceful handling, but I hope that the test code can be simplified in that case.

    @woodruffw
    Copy link
    Mannequin

    woodruffw mannequin commented Mar 3, 2022

    Nosying myself; this affects 3.9 and 3.10 as well.

    @woodruffw woodruffw mannequin added 3.9 only security fixes 3.10 only security fixes labels Mar 3, 2022
    @serhiy-storchaka
    Copy link
    Member

    New changeset eafec26 by MojoVampire in branch 'main':
    bpo-14156: Make argparse.FileType work correctly for binary file modes when argument is '-' (GH-13165)
    eafec26

    @miss-islington
    Copy link
    Contributor

    New changeset ee18df4 by Miss Islington (bot) in branch '3.10':
    bpo-14156: Make argparse.FileType work correctly for binary file modes when argument is '-' (GH-13165)
    ee18df4

    @serhiy-storchaka
    Copy link
    Member

    New changeset 4d2099f by Serhiy Storchaka in branch '3.9':
    [3.9] bpo-14156: Make argparse.FileType work correctly for binary file modes when argument is '-' (GH-13165) (GH-31979)
    4d2099f

    @serhiy-storchaka serhiy-storchaka added 3.11 only security fixes and removed 3.7 (EOL) end of life 3.8 only security fixes labels Mar 18, 2022
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants