classification
Title: Windows: path_converter() leaks memory for Unicode filenames
Type: Stage:
Components: Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: python-dev, serhiy.storchaka, steve.dower, vstinner
Priority: normal Keywords: patch

Created on 2016-09-18 22:56 by vstinner, last changed 2017-03-31 16:36 by dstufft. This issue is now closed.

Files
File name Uploaded Description Edit
path_converter.patch vstinner, 2016-09-18 22:56 review
Pull Requests
URL Status Linked Edit
PR 552 closed dstufft, 2017-03-31 16:36
Messages (10)
msg276924 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-18 22:56
Memory leak spotted by the issue #28195: path_converter() calls PyUnicode_AsWideCharString() which allocates a new buffer at each call, but this buffer is never released.

On Python 3.5, PyUnicode_AsWideCharString() was used which handles internally the memory buffer and so release the memory later.

Attached patch fixes the regression introduced in Python 3.6 beta 1 by the change e20c7d8a8187 ("Issue #27781: Change file system encoding on Windows to UTF-8 (PEP 529)").
msg276926 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-18 22:58
I didn't push the fix myself, because Steve Dower maybe made the change for a specific reason?
msg276935 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-19 04:01
It's not clear to me that Py_UNICODE is guaranteed to be wchar_t for all time, that's all. If it is, go ahead. Otherwise the path_t object has the ability to clean up after itself, so perhaps it should be used here?
msg276939 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-19 04:46
Deprecated functions and types of the C API
-------------------------------------------

The :c:type:`Py_UNICODE` has been deprecated by :pep:`393` and will be
removed in Python 4. All functions using this type are deprecated:

Unicode functions and methods using :c:type:`Py_UNICODE` and
:c:type:`Py_UNICODE*` types:

* :c:macro:`PyUnicode_AS_UNICODE`, :c:func:`PyUnicode_AsUnicode`,
  :c:func:`PyUnicode_AsUnicodeAndSize`: use :c:func:`PyUnicode_AsWideCharString`

(From Doc/whatsnew/3.3.rst)
msg276957 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-19 08:44
Steve Dower added the comment:
> It's not clear to me that Py_UNICODE is guaranteed to be wchar_t for all time, that's all. If it is, go ahead.

Right now, it's the case and path_converter() leaks memory, so I
proposed a simple and obvious fix :-)

On Windows, it makes sense to continue to use the UTF-16 encoded
string cached in Unicode objects, because this conversion is common,
and it's convenient to not have to release the memory explicitly. The
side effect is that we may waste memory in some cases.

> Otherwise the path_t object has the ability to clean up after itself, so perhaps it should be used here?

Maybe, but I'm not interested to write a more complex patch for
Windows :-) Try to just call PyMem_Free(path->wide) (do nothing it's
NULL).

The advantage of the old code (and my patch) is to only convert a
filename once to UTF-16 and then use the cached result.
msg276959 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-19 08:54
Serhiy Storchaka added the comment:
> The :c:type:`Py_UNICODE` has been deprecated by :pep:`393` and will be
> removed in Python 4. All functions using this type are deprecated:

Right... I tried to deprecate and remove all functions using
Py_UNICODE but it's hard to change all this code. I gave up :-)

>   :c:func:`PyUnicode_AsUnicodeAndSize`: use :c:func:`PyUnicode_AsWideCharString`

Sadly, it's not exactly the same: PyUnicode_AsWideCharString returns a
new fresh buffer at each call. I'm not sure that it caches the result
neither.

Victor
msg276962 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-19 09:13
path_converter.patch LGTM for 3.6 (and 3.7, 3.8), but we should find better solution for future versions. Could you please add a comment that PyUnicode_AsUnicodeAndSize is deprecated, but used because it is the simplest and the most efficient way for now?
msg276965 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-19 09:54
Serhiy Storchaka added the comment:
> Could you please add a comment that PyUnicode_AsUnicodeAndSize is deprecated, but used because it is the simplest and the most efficient way for now?

See old issues #22271 and #22324.
msg276966 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-09-19 09:56
New changeset 6232e610e310 by Victor Stinner in branch '3.6':
Fix memory leak in path_converter()
https://hg.python.org/cpython/rev/6232e610e310
msg276967 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-19 09:59
> path_converter.patch LGTM for 3.6 (and 3.7, 3.8),

Ok, I pushed my simple fix.

Feel free to modify the code if you see a better way to encode paths on Windows ;-) But it's a much larger project, and I'm not really interested to jump again in this silly deprecated APIs :-) It seems like the status quo is that Py_UNICODE is going to stay longer than expected and since it "just works", nobody really cares :-)

I close this issue since the initial bug (memory leak) is now fixed.
History
Date User Action Args
2017-03-31 16:36:39dstufftsetpull_requests: + pull_request1111
2016-09-19 09:59:57vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg276967
2016-09-19 09:56:23python-devsetnosy: + python-dev
messages: + msg276966
2016-09-19 09:54:17vstinnersetmessages: + msg276965
2016-09-19 09:13:43serhiy.storchakasetmessages: + msg276962
2016-09-19 08:54:42vstinnersetmessages: + msg276959
2016-09-19 08:44:12vstinnersetmessages: + msg276957
2016-09-19 04:46:34serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg276939
2016-09-19 04:01:39steve.dowersetmessages: + msg276935
2016-09-18 22:58:24vstinnersetmessages: + msg276926
2016-09-18 22:56:38vstinnercreate