Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tempfile.NamedTemporaryFile can close too early if used as iterator #67888

Closed
bkabrda mannequin opened this issue Mar 18, 2015 · 35 comments
Closed

tempfile.NamedTemporaryFile can close too early if used as iterator #67888

bkabrda mannequin opened this issue Mar 18, 2015 · 35 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@bkabrda
Copy link
Mannequin

bkabrda mannequin commented Mar 18, 2015

BPO 23700
Nosy @birkenfeld, @pfmoore, @ncoghlan, @vstinner, @bitdancer, @ethanfurman, @serhiy-storchaka, @wm75
Files
  • python-3.4-tempfile-iter.patch
  • python-3.4-tempfile-iter-v2.patch
  • tempfile_iter_fix.patch: Works, but I don't know why
  • csv_iter_commemt.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-03-20.15:22:24.390>
    created_at = <Date 2015-03-18.15:17:06.581>
    labels = ['type-bug', 'library']
    title = 'tempfile.NamedTemporaryFile can close too early if used as iterator'
    updated_at = <Date 2015-03-22.19:28:57.514>
    user = 'https://bugs.python.org/bkabrda'

    bugs.python.org fields:

    activity = <Date 2015-03-22.19:28:57.514>
    actor = 'vstinner'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-03-20.15:22:24.390>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2015-03-18.15:17:06.581>
    creator = 'bkabrda'
    dependencies = []
    files = ['38543', '38556', '38586', '38635']
    hgrepos = []
    issue_num = 23700
    keywords = ['patch']
    message_count = 35.0
    messages = ['238451', '238505', '238506', '238510', '238511', '238512', '238513', '238518', '238575', '238606', '238611', '238613', '238614', '238615', '238617', '238619', '238621', '238622', '238623', '238664', '238677', '238678', '238683', '238691', '238697', '238698', '238699', '238700', '238757', '238895', '238911', '238917', '238918', '238919', '238929']
    nosy_count = 12.0
    nosy_names = ['georg.brandl', 'paul.moore', 'ncoghlan', 'vstinner', 'Arfrever', 'r.david.murray', 'ethan.furman', 'python-dev', 'serhiy.storchaka', 'bkabrda', 'sYnfo', 'wolma']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue23700'
    versions = ['Python 3.4', 'Python 3.5']

    @bkabrda
    Copy link
    Mannequin Author

    bkabrda mannequin commented Mar 18, 2015

    This bug is very similar to bpo-18879, the only difference is that _TemporaryFileWrapper.__iter__ is the problem (in bpo-18879, __getattr__ was fixed, but __iter__ was not). The real world use case that helped me find this bug is at the bottom of this report, this is a simple reproducer:

    >>> import tempfile
    >>> for l in tempfile.NamedTemporaryFile(mode='a+b'):
    ...     print(l)
    ... 
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: readline of closed file

    I'm attaching a patch that fixes this (+ testcase).

    Note: I actually discovered this while using

    >>> from urllib.request import urlopen
    >>> for l in urlopen('ftp://<some_ftp>'):
    ...     print(l)
    ... 
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: readline of closed file

    Opening FTP uses urllib.response, which in turn uses tempfile._TemporaryFileWrapper, which makes this example fail.

    @bkabrda bkabrda mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Mar 18, 2015
    @bkabrda
    Copy link
    Mannequin Author

    bkabrda mannequin commented Mar 19, 2015

    I'm attaching second version of the patch. It now contains link to this bug and a more real test case according to suggestion from the review.

    One of the reviews of first patch version mentioned that there should be a better explanation of how this issue can be provoked. I think that the test shows it nice and there's no need to explain it in greater detail. Also, the comment references this bug, where anyone can find the explanation. (Fix for bpo-18879 fixes pretty much the same in a different method and has very similar comment, so I think this should be fine.)

    @serhiy-storchaka
    Copy link
    Member

    LGTM.

    @serhiy-storchaka serhiy-storchaka self-assigned this Mar 19, 2015
    @pfmoore
    Copy link
    Member

    pfmoore commented Mar 19, 2015

    Agreed, the test is sufficient documentation. However, I can't make the test fail here (Windows 7, Python 3.4.3):

    py ti.py
    b'spam\n' b'spam\n'
    b'eggs\n' b'eggs\n'
    b'beans\n' b'beans\n'
    cat ti.py
    import tempfile

    def test_iter():
        # getting iterator from a temporary file should keep it alive
        # as long as it's being iterated over
        lines = [b'spam\n', b'eggs\n', b'beans\n']
        def make_file():
            f = tempfile.NamedTemporaryFile(mode='w+b')
            f.write(b''.join(lines))
            f.seek(0)
            return iter(f)
        for i, l in enumerate(make_file()):
            print(l, lines[i])
    
    test_iter()

    Is it somehow OS-specific?

    Regardless, the patch seems fine and I have no problem with it being applied.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 19, 2015

    New changeset 7fa741fe9425 by Serhiy Storchaka in branch '3.4':
    Issue bpo-23700: Iterator of NamedTemporaryFile now keeps a reference to
    https://hg.python.org/cpython/rev/7fa741fe9425

    New changeset c84a0b35999a by Serhiy Storchaka in branch 'default':
    Issue bpo-23700: Iterator of NamedTemporaryFile now keeps a reference to
    https://hg.python.org/cpython/rev/c84a0b35999a

    @serhiy-storchaka
    Copy link
    Member

    Thank you for your contribution Bohuslav.

    @bkabrda
    Copy link
    Mannequin Author

    bkabrda mannequin commented Mar 19, 2015

    Thank you!
    To answer Paul's question: I honestly have no idea why this can't be reproduced on Windows. I managed to reproduce this in 100 % cases on various RPM-flavour Linux distros (Fedora, CentOS, RHEL) as well as on Debian and Ubuntu.

    @pfmoore
    Copy link
    Member

    pfmoore commented Mar 19, 2015

    Cool, no problem.

    @vstinner
    Copy link
    Member

    test_csv now fails on Windows:

    http://buildbot.python.org/all/builders/x86 Windows7 3.x/builds/9421/

    ======================================================================
    ERROR: test_read_dict_fieldnames_from_file (test.test_csv.TestDictFields)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\test\test_csv.py", line 629, in test_read_dict_fieldnames_from_file
        self.assertEqual(next(reader), {"f1": '1', "f2": '2', "f3": 'abc'})
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\csv.py", line 110, in __next__
        row = next(self.reader)
      File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows7\build\lib\tempfile.py", line 431, in __iter__
        yield from iter(self.file)
    ValueError: I/O operation on closed file.

    @vstinner vstinner reopened this Mar 19, 2015
    @serhiy-storchaka
    Copy link
    Member

    Following patch fixes the issue, but I don't understand why.

    @vstinner
    Copy link
    Member

    Maybe we need to keep explicitly a reference to self.file in the method
    (file = self.file) to keep it alive in the frame?

    @serhiy-storchaka
    Copy link
    Member

    No, it doesn't help.

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented Mar 20, 2015

    I think this is a consequence of PEP-380 and its decision to finalize the subgenerator when the delegating generator is closed.
    Consider this simple example without tempfile:

    def yielder (fileobj):
        yield from fileobj
    
    with open('some_test_file', 'w') as f:
        f.write('line one\nline two\nline three')
    
    with open('some_test_file', 'r') as f:
        line = next(yielder(f))
        nline = next(f)

    ==>

    Traceback (most recent call last):
      File "<pyshell#11>", line 3, in <module>
        nline = next(f)
    ValueError: I/O operation on closed file.

    I think test_csv does the file-closing operation on lines 626/627 when it creates the temporary csv.reader(fileobj).

        def test_read_dict_fieldnames_from_file(self):
            with TemporaryFile("w+") as fileobj:
                fileobj.write("f1,f2,f3\r\n1,2,abc\r\n")
                fileobj.seek(0)
                reader = csv.DictReader(fileobj,
                                        fieldnames=next(csv.reader(fileobj)))
                self.assertEqual(reader.fieldnames, ["f1", "f2", "f3"])
                self.assertEqual(next(reader), {"f1": '1', "f2": '2', "f3": 'abc'})

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented Mar 20, 2015

    Actually, its scary that use of yield from can have such a subtle side-effect. Maybe PEP-380 should have taken this more seriously?

    @vstinner
    Copy link
    Member

    Ah yes, correct: when a generator using "yield from obj" is destroyed while yield from is not done, obj.close() is called if the method exists.

    So "yield from file" *is* different than "for line in file: yield file" when we don't consume the whole generator.

    A workaround is to create a wrapper class and returns it in the _TemporaryFileWrapper.__iter__() method:
    ---

    class Iterator:
        def __init__(self, obj):
            self.obj = obj
    
        def __next__(self):
            if self.obj is None:
                raise StopIteration
            return next(self.obj)
    
        def __iter__(self):
            return self
    
        def close(self):
            self.obj = None

    Or simply:
    ---

    class Iterator:
        def __init__(self, obj):
            self.obj = obj
    
        def __next__(self):
            return next(self.obj)
    
        def __iter__(self):
            return self

    This solution looks more complex than tempfile_iter_fix.patch.

    @serhiy: Maybe add a short comment to explain why yield from is not used and must be used. (By the way, the current comment contains "yields from" which is confusing :-))

    @serhiy-storchaka
    Copy link
    Member

    Ah yes, correct: when a generator using "yield from obj" is destroyed while
    yield from is not done, obj.close() is called if the method exists.

    But why obj.close() is called? The reference to fileobj is live, it shouldn't
    be closed.

    This solution looks more complex than tempfile_iter_fix.patch.

    Why you prefer more complex solution to simple solution?

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented Mar 20, 2015

    @serhiy

    in this line of code:

    reader = csv.DictReader(fileobj, fieldnames=next(csv.reader(fileobj)))

    csv.reader(fileobj) returns the generator created by fileobj.__iter__, but no reference to it is kept so the object gets destroyed right afterwards. This closes the generator and because it uses yield from also the contained subgenerator, which is the file itself.

    @bkabrda
    Copy link
    Mannequin Author

    bkabrda mannequin commented Mar 20, 2015

    @wolma any idea why this only happens on Windows? I can't reproduce the CSV failing test on Linux.

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented Mar 20, 2015

    @bkabrda
    not sure, but it may have to do with when exactly the object gets garbage collected

    @vstinner
    Copy link
    Member

    tempfile_iter_fix.patch looks good to me, can you commit it please?

    @serhiy-storchaka
    Copy link
    Member

    csv.reader(fileobj) returns the generator created by fileobj.__iter__, but no reference to it is kept so the object gets destroyed right afterwards. This closes the generator and because it uses yield from also the contained subgenerator, which is the file itself.

    Yes, there are no references to to the generator, created by fileobj.__iter__, but there are references to fileobj itself and to the file fileobj.file. I still don't understand why the file is closed. This looks as a bug.

    Committed existing fix only to make buildbots green.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 20, 2015

    New changeset a90ec6b96af2 by Serhiy Storchaka in branch '3.4':
    Issue bpo-23700: NamedTemporaryFile iterator closed underlied file object in
    https://hg.python.org/cpython/rev/a90ec6b96af2

    New changeset e639750ecd92 by Serhiy Storchaka in branch 'default':
    Issue bpo-23700: NamedTemporaryFile iterator closed underlied file object in
    https://hg.python.org/cpython/rev/e639750ecd92

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented Mar 20, 2015

    so let's look at this step-by-step (and I hope I fully understood this myself):

    • calling fileobj.__iter__ creates a generator because the method uses yield from

    • that generator does not get assigned to any reference so it will be garbage-collected

    • when the generator is garbage-collected, the subgenerator specified to the rigth of the yield from is finalized (that is PEP-380-mandated behavior) and, in this case, that is iter(self.file)

    • for an io module-based fileobject, iter(f) is f and finalizing it means that its close method will be called

    So this is not about the file object getting garbage-collected, it is about it getting closed.

    Since PEP-380 explicitly mentions that problem with yield from and a shared subiterator, I don't think you can call it a bug, but I think it is very problematic behavior as illustrated by this issue because client code is supposed to know whether a particular generator uses yield from or not.

    @serhiy-storchaka
    Copy link
    Member

    Thank you for your explanation Wolfgang! Now it is clear to me. The issue is that the generator calls the close() method of the subgenerator, but if the subgenerator is a file, the close() method closes (surprise!) the file. Two different protocols use the same method.

    Interesting, how many similar bugs was introduced by blindly replacing "for/yield" with "yield from"?

    @bitdancer
    Copy link
    Member

    Isn't there some discussion somewhere that if iter(x) returns x you probably have buggy code? Maybe it is io that is broken, design-wise. I think there was another issue related to iter(file) recently...someone surprised by the fact that you can't iterate a file twice without reopening it...the source of that problem is similar. Not that it necessarily soluble, but it is certainly *interesting* :)

    @vstinner
    Copy link
    Member

    Isn't there some discussion somewhere that if iter(x) returns x you probably have buggy code?

    I agree that the issue comes from TextIOWrapper.__iter__(),
    BufferedReader.__iter__() and FileIO.__iter__() returns simply return
    self *and* have a close method. The issue is not "yield from".

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented Mar 20, 2015

    You are probably right that the io classes are broken.

    From https://docs.python.org/3/library/stdtypes.html#iterator-types:

    Once an iterator’s __next__() method raises StopIteration, it must continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken.

    One consequence of __iter__ returning self is that the above is not guaranteed:

    >>> with open('somefile', 'w') as f:
    	f.write('some text')
    
    	
    9
    >>> with open('somefile', 'r') as f:
    	i = iter(f)
    	assert f is i
    	for line in i:
    		print(line)
    	try:
    		next(i)
    	except StopIteration:
    		print('exhausted iterator')
    	f.seek(0)
    	print(next(i))

    some text
    exhausted iterator
    0
    some text

    So the io classes are *officially* broken.

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented Mar 20, 2015

    OTOH, it would be very hard to change the way fileobjects compared to designing yield from differently so I'd still blame it partly.
    Maybe it is unfortunate that generators have a close method instead of, say, __close__ ?

    @bitdancer
    Copy link
    Member

    There's actually an issue about exactly that broken-per-docs issue for io. IMO if the goal of calling close is to close only things that are generator objects or pretending to be one, the method should have been named close_generator or something.

    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented Mar 22, 2015

    Comment in Lib/tempfile.py mentions issue bpo-23000, but should mention issue bpo-23700.

    @serhiy-storchaka
    Copy link
    Member

    Indeed. And all the comment could be better.

    Does anyone want to write better comment in the light of recent estimations?

    @bitdancer
    Copy link
    Member

    How's this?

    @serhiy-storchaka
    Copy link
    Member

    LGTM

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 22, 2015

    New changeset e9f03315d66c by R David Murray in branch '3.4':
    bpo-23700: fix/improve comment
    https://hg.python.org/cpython/rev/e9f03315d66c

    New changeset 64f4dbac9d07 by R David Murray in branch 'default':
    Merge: bpo-23700: fix/improve comment
    https://hg.python.org/cpython/rev/64f4dbac9d07

    @vstinner
    Copy link
    Member

    Yeah, the new comment is better :-) Thanks.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants