Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tokenize.open() leaks an open binary file on TextIOWrapper error #68028

Closed
vstinner opened this issue Apr 1, 2015 · 18 comments
Closed

tokenize.open() leaks an open binary file on TextIOWrapper error #68028

vstinner opened this issue Apr 1, 2015 · 18 comments
Labels
easy performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@vstinner
Copy link
Member

vstinner commented Apr 1, 2015

BPO 23840
Nosy @vstinner
Files
  • tokenize_open_bug.py
  • tokenize.patch
  • issue23840.patch
  • tokenize.patch
  • tokenizeV2.patch
  • tokenize.patch: moved buffer outside of try block per storchaka
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-05-25.22:50:49.794>
    created_at = <Date 2015-04-01.11:52:33.230>
    labels = ['easy', 'library', 'performance']
    title = 'tokenize.open() leaks an open binary file on TextIOWrapper error'
    updated_at = <Date 2015-05-25.22:50:49.793>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2015-05-25.22:50:49.793>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-05-25.22:50:49.794>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2015-04-01.11:52:33.230>
    creator = 'vstinner'
    dependencies = []
    files = ['38781', '38792', '38793', '38795', '38796', '38799']
    hgrepos = []
    issue_num = 23840
    keywords = ['patch', 'easy']
    message_count = 18.0
    messages = ['239786', '239787', '239847', '239848', '239850', '239852', '239854', '239857', '239858', '239861', '239862', '239864', '240192', '241988', '241990', '243285', '244057', '244058']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'python-dev', 'introom', 'itsmemattchung']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'needs patch'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue23840'
    versions = ['Python 3.4', 'Python 3.5']

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 1, 2015

    If TextIOWrapper constructor fails in tokenize.open(), the open binary file is not closed and a ResourceWarning is emited.

    Try attached patch:

    $ python3 -Wd ~/tokenize_open_bug.py
    tokenize.open failed: unknown encoding for 'test.py': xxx
    /home/haypo/tokenize_open_bug.py:13: ResourceWarning: unclosed file <_io.BufferedReader name='test.py'>
      print("tokenize.open failed: %s" % err)

    The fix is quite simple: add "try: ... except: buffer.close(); raise". If someone wants to fix this issue, an unit test must be added, test based on my script and ensures that the buffer is closed (ex: mock tokenize._builtin_open and checks that close() was called).

    @vstinner vstinner added the easy label Apr 1, 2015
    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 1, 2015

    I sent an email to the python core mentorship mailing list to find a volunteer to fix this easy issue.

    @serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir performance Performance or resource usage labels Apr 1, 2015
    @introom
    Copy link
    Mannequin

    introom mannequin commented Apr 1, 2015

    ITSM it's not the TextIOWrapper but the detect_encoding fails and throws an error.

    So I wrapped a try/catch block around that.

    @itsmemattchung
    Copy link
    Mannequin

    itsmemattchung mannequin commented Apr 1, 2015

    Comparing to introom. Any reason to not wrap the entire in a try? Attached patch.

    @itsmemattchung
    Copy link
    Mannequin

    itsmemattchung mannequin commented Apr 1, 2015

    Curious,
    @Haypo, are you looking for a new unittest or a function within an existing unit test? Perhaps go under TestTokenize?

    1227 def test_main():
    1228 from test import test_tokenize
    1229 support.run_doctest(test_tokenize, True)
    1230 support.run_unittest(TestTokenizerAdheresToPep0263)
    1231 support.run_unittest(Test_Tokenize)
    1232 support.run_unittest(TestDetectEncoding)
    1233 support.run_unittest(TestTokenize)
    1234 support.run_unittest(UntokenizeTest)

    @itsmemattchung
    Copy link
    Mannequin

    itsmemattchung mannequin commented Apr 1, 2015

    I believe it goes here under the TestTokenize class. Going to give it a shot now.

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 1, 2015

    ITSM it's not the TextIOWrapper but the detect_encoding fails and throws an error.

    Oh, right. But TextIOWrapper can fail for differen reasons. For example, CTRL+c may send KeyboardInterrupt.

    Try for example:

        with unittest.mock.patch.object(tokenize, '_builtin_open') as mock_open:
            mock_file = mock_open.return_value
            mock_file.tell.side_effect = OSError
            mock_file.readline.return_value = b''
    
            tokenize.open(fn)

    This example raises an OSError in TextIOWrapper on file.tell(), and file.close() is not called.

    @introom
    Copy link
    Mannequin

    introom mannequin commented Apr 1, 2015

    On Wed, Apr 1, 2015 at 6:18 PM, STINNER Victor <report@bugs.python.org> wrote:

    Oh, right. But TextIOWrapper can fail for differen reasons. For example, CTRL+c may send KeyboardInterrupt.

    Yes. Any comment on the test?

    I wil update a second patch.

    Regards.

    --

    吾輩は猫である。ホームーページはhttp://introo.me。

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 1, 2015

    Yes. Any comment on the test?

    Yes, as I wrote, I reviewed patches. You may get a notification by email. Anyway, it's here:
    http://bugs.python.org/review/23840/diff/14417/Lib/test/test_tokenize.py
    (I also commented tokenize.py)

    @itsmemattchung
    Copy link
    Mannequin

    itsmemattchung mannequin commented Apr 1, 2015

    Updated patch based off of haypo's comment. Changed "except Exception as err:" -> "except"

    @introom
    Copy link
    Mannequin

    introom mannequin commented Apr 1, 2015

    Based upon the previous review.

    Catch the TextWrapper, move the test into a function.

    @itsmemattchung
    Copy link
    Mannequin

    itsmemattchung mannequin commented Apr 2, 2015

    Based off of storchaka's comment, moving the

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 6, 2015

    tokenize.patch is wrong: you should not call buffer.close() if TextIOWrapper() was created successfully.

    @vstinner
    Copy link
    Member Author

    tokenizeV2.patch and tokenize.patch have issues, I reviewed them. Can someone please write a new patch taking my comments in account?

    @itsmemattchung
    Copy link
    Mannequin

    itsmemattchung mannequin commented Apr 24, 2015

    Hey Haypo,
    I'm working on submitting the new patch now. Still getting used to the workflow and tools here. Thanks for being patient.

    You should see the new file in the next 30 minutes.

    @vstinner
    Copy link
    Member Author

    You should see the new file in the next 30 minutes.

    I don't see the new file.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 25, 2015

    New changeset 623e07ea43df by Victor Stinner in branch '3.4':
    Issue bpo-23840: tokenize.open() now closes the temporary binary file on error to
    https://hg.python.org/cpython/rev/623e07ea43df

    New changeset a640d268ba97 by Victor Stinner in branch '3.5':
    (Merge 3.5) Issue bpo-23840: tokenize.open() now closes the temporary binary file
    https://hg.python.org/cpython/rev/a640d268ba97

    New changeset c5cfd6353b4b by Victor Stinner in branch 'default':
    (Merge 3.6) Issue bpo-23840: tokenize.open() now closes the temporary binary file
    https://hg.python.org/cpython/rev/c5cfd6353b4b

    @vstinner
    Copy link
    Member Author

    Sorry for not being more available for feedback on patches. I chose to write the final patch because patches were not updated to take in account my latest comments.

    I hope that this issue helps you at least the process for reviewing patches ;-)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    easy performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants