Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: path_converter() leaks memory for Unicode filenames #72387

Closed
vstinner opened this issue Sep 18, 2016 · 10 comments
Closed

Windows: path_converter() leaks memory for Unicode filenames #72387

vstinner opened this issue Sep 18, 2016 · 10 comments
Labels
3.7 (EOL) end of life

Comments

@vstinner
Copy link
Member

BPO 28200
Nosy @vstinner, @serhiy-storchaka, @zooba
PRs
  • [Do Not Merge] Convert Misc/NEWS so that it is managed by towncrier #552
  • Files
  • path_converter.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-09-19.09:59:57.988>
    created_at = <Date 2016-09-18.22:56:38.039>
    labels = ['3.7']
    title = 'Windows: path_converter() leaks memory for Unicode filenames'
    updated_at = <Date 2017-03-31.16:36:39.240>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2017-03-31.16:36:39.240>
    actor = 'dstufft'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-09-19.09:59:57.988>
    closer = 'vstinner'
    components = []
    creation = <Date 2016-09-18.22:56:38.039>
    creator = 'vstinner'
    dependencies = []
    files = ['44744']
    hgrepos = []
    issue_num = 28200
    keywords = ['patch']
    message_count = 10.0
    messages = ['276924', '276926', '276935', '276939', '276957', '276959', '276962', '276965', '276966', '276967']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'python-dev', 'serhiy.storchaka', 'steve.dower']
    pr_nums = ['552']
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue28200'
    versions = ['Python 3.6', 'Python 3.7']

    @vstinner
    Copy link
    Member Author

    Memory leak spotted by the issue bpo-28195: path_converter() calls PyUnicode_AsWideCharString() which allocates a new buffer at each call, but this buffer is never released.

    On Python 3.5, PyUnicode_AsWideCharString() was used which handles internally the memory buffer and so release the memory later.

    Attached patch fixes the regression introduced in Python 3.6 beta 1 by the change e20c7d8a8187 ("Issue bpo-27781: Change file system encoding on Windows to UTF-8 (PEP-529)").

    @vstinner vstinner added the 3.7 (EOL) end of life label Sep 18, 2016
    @vstinner
    Copy link
    Member Author

    I didn't push the fix myself, because Steve Dower maybe made the change for a specific reason?

    @zooba
    Copy link
    Member

    zooba commented Sep 19, 2016

    It's not clear to me that Py_UNICODE is guaranteed to be wchar_t for all time, that's all. If it is, go ahead. Otherwise the path_t object has the ability to clean up after itself, so perhaps it should be used here?

    @serhiy-storchaka
    Copy link
    Member

    Deprecated functions and types of the C API
    -------------------------------------------

    The :c:type:`Py_UNICODE` has been deprecated by :pep:`393` and will be
    removed in Python 4. All functions using this type are deprecated:

    Unicode functions and methods using :c:type:`Py_UNICODE` and
    :c:type:`Py_UNICODE*` types:

    • :c:macro:`PyUnicode_AS_UNICODE`, :c:func:`PyUnicode_AsUnicode`,
      :c:func:`PyUnicode_AsUnicodeAndSize`: use :c:func:`PyUnicode_AsWideCharString`

    (From Doc/whatsnew/3.3.rst)

    @vstinner
    Copy link
    Member Author

    Steve Dower added the comment:

    It's not clear to me that Py_UNICODE is guaranteed to be wchar_t for all time, that's all. If it is, go ahead.

    Right now, it's the case and path_converter() leaks memory, so I
    proposed a simple and obvious fix :-)

    On Windows, it makes sense to continue to use the UTF-16 encoded
    string cached in Unicode objects, because this conversion is common,
    and it's convenient to not have to release the memory explicitly. The
    side effect is that we may waste memory in some cases.

    Otherwise the path_t object has the ability to clean up after itself, so perhaps it should be used here?

    Maybe, but I'm not interested to write a more complex patch for
    Windows :-) Try to just call PyMem_Free(path->wide) (do nothing it's
    NULL).

    The advantage of the old code (and my patch) is to only convert a
    filename once to UTF-16 and then use the cached result.

    @vstinner
    Copy link
    Member Author

    Serhiy Storchaka added the comment:

    The :c:type:`Py_UNICODE` has been deprecated by :pep:`393` and will be
    removed in Python 4. All functions using this type are deprecated:

    Right... I tried to deprecate and remove all functions using
    Py_UNICODE but it's hard to change all this code. I gave up :-)

    :c:func:`PyUnicode_AsUnicodeAndSize`: use :c:func:`PyUnicode_AsWideCharString`

    Sadly, it's not exactly the same: PyUnicode_AsWideCharString returns a
    new fresh buffer at each call. I'm not sure that it caches the result
    neither.

    Victor

    @serhiy-storchaka
    Copy link
    Member

    path_converter.patch LGTM for 3.6 (and 3.7, 3.8), but we should find better solution for future versions. Could you please add a comment that PyUnicode_AsUnicodeAndSize is deprecated, but used because it is the simplest and the most efficient way for now?

    @vstinner
    Copy link
    Member Author

    Serhiy Storchaka added the comment:

    Could you please add a comment that PyUnicode_AsUnicodeAndSize is deprecated, but used because it is the simplest and the most efficient way for now?

    See old issues bpo-22271 and bpo-22324.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 19, 2016

    New changeset 6232e610e310 by Victor Stinner in branch '3.6':
    Fix memory leak in path_converter()
    https://hg.python.org/cpython/rev/6232e610e310

    @vstinner
    Copy link
    Member Author

    path_converter.patch LGTM for 3.6 (and 3.7, 3.8),

    Ok, I pushed my simple fix.

    Feel free to modify the code if you see a better way to encode paths on Windows ;-) But it's a much larger project, and I'm not really interested to jump again in this silly deprecated APIs :-) It seems like the status quo is that Py_UNICODE is going to stay longer than expected and since it "just works", nobody really cares :-)

    I close this issue since the initial bug (memory leak) is now fixed.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants