Issue 33231: Potential memory leak in normalizestring()

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/77412

classification

Title:	Potential memory leak in normalizestring()
Type:	resource usage	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.8, Python 3.7, Python 3.6

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	josh.r, methane, miss-islington
Priority:	normal	Keywords:	patch

Created on 2018-04-05 12:34 by methane, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 6386	merged	methane, 2018-04-05 13:21
PR 6393	merged	miss-islington, 2018-04-06 06:51
PR 6394	merged	miss-islington, 2018-04-06 06:52

Messages (5)
msg314986 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2018-04-05 14:15
Patch is good, but while we're at it, is there any reason why this multi-allocation design was even used? It PyMem_Mallocs a buffer, makes a C-style string in it, then uses PyUnicode_FromString to convert C-style string to Python str. Seems like the correct approach would be to just use PyUnicode_New to preallocate the final string buffer up front, then pull out the internal buffer with PyUnicode_1BYTE_DATA and populate that directly, saving a pointless allocation/deallocation, which also means the failure case means no cleanup needed at all, while barely changing the code (aside from removing the need to explicitly NUL terminate). Only reason I can see to avoid this would be if the codec names could contain arbitrary Unicode encoded as UTF-8 (and therefore strlen wouldn't tell you the final length in Unicode ordinals), but I'm pretty sure that's not the case (if it is, we're not normalizing properly, since we only lower case ASCII). If Unicode codec names need to be handled, there are other options, though the easy savings go away.
msg315010 - (view)	Author: Inada Naoki (methane) *	Date: 2018-04-06 06:46
> Only reason I can see to avoid this would be if the codec names could contain arbitrary Unicode encoded as UTF-8 (and therefore strlen wouldn't tell you the final length in Unicode ordinals), but I'm pretty sure that's not the case (if it is, we're not normalizing properly, since we only lower case ASCII). If Unicode codec names need to be handled, there are other options, though the easy savings go away. Maybe, we can add "encoding name must be ascii" restriction in future version (3.8+). But for now, I want to avoid any potential backward incompatibility.
msg315011 - (view)	Author: Inada Naoki (methane) *	Date: 2018-04-06 06:51
New changeset 0c1c4563a65ac451021d927058e4f25013934eb2 by INADA Naoki in branch 'master': bpo-33231: Fix potential leak in normalizestring() (GH-6386) https://github.com/python/cpython/commit/0c1c4563a65ac451021d927058e4f25013934eb2
msg315012 - (view)	Author: miss-islington (miss-islington)	Date: 2018-04-06 07:12
New changeset 64421d9237e33725e3c2916cdf2b6d6da1751c2a by Miss Islington (bot) in branch '3.7': bpo-33231: Fix potential leak in normalizestring() (GH-6386) https://github.com/python/cpython/commit/64421d9237e33725e3c2916cdf2b6d6da1751c2a
msg315013 - (view)	Author: miss-islington (miss-islington)	Date: 2018-04-06 07:37
New changeset 2350a4765265158072bf7ad9f04402406d3d1ada by Miss Islington (bot) in branch '3.6': bpo-33231: Fix potential leak in normalizestring() (GH-6386) https://github.com/python/cpython/commit/2350a4765265158072bf7ad9f04402406d3d1ada

History
Date	User	Action	Args
2022-04-11 14:58:59	admin	set	github: 77412
2018-04-06 07:37:49	methane	set	status: open -> closed stage: patch review -> resolved
2018-04-06 07:37:29	methane	set	resolution: fixed
2018-04-06 07:37:06	miss-islington	set	messages: + msg315013
2018-04-06 07:12:40	miss-islington	set	nosy: + miss-islington messages: + msg315012
2018-04-06 06:52:34	miss-islington	set	pull_requests: + pull_request6102
2018-04-06 06:51:48	miss-islington	set	pull_requests: + pull_request6101
2018-04-06 06:51:29	methane	set	messages: + msg315011
2018-04-06 06:46:10	methane	set	messages: + msg315010
2018-04-05 14:15:19	josh.r	set	nosy: + josh.r messages: + msg314986
2018-04-05 13:21:58	methane	set	keywords: + patch stage: patch review pull_requests: + pull_request6095
2018-04-05 12:34:19	methane	create