Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory leak in normalizestring() #77412

Closed
methane opened this issue Apr 5, 2018 · 5 comments
Closed

Potential memory leak in normalizestring() #77412

methane opened this issue Apr 5, 2018 · 5 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@methane
Copy link
Member

methane commented Apr 5, 2018

BPO 33231
Nosy @methane, @MojoVampire, @miss-islington
PRs
  • bpo-33231: Fix potential leak in normalizestring() #6386
  • [3.7] bpo-33231: Fix potential leak in normalizestring() (GH-6386) #6393
  • [3.6] bpo-33231: Fix potential leak in normalizestring() (GH-6386) #6394
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-04-06.07:37:49.382>
    created_at = <Date 2018-04-05.12:34:19.177>
    labels = ['interpreter-core', '3.7', '3.8', 'performance']
    title = 'Potential memory leak in normalizestring()'
    updated_at = <Date 2018-04-06.07:37:49.381>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2018-04-06.07:37:49.381>
    actor = 'methane'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-04-06.07:37:49.382>
    closer = 'methane'
    components = ['Interpreter Core']
    creation = <Date 2018-04-05.12:34:19.177>
    creator = 'methane'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33231
    keywords = ['patch']
    message_count = 5.0
    messages = ['314986', '315010', '315011', '315012', '315013']
    nosy_count = 3.0
    nosy_names = ['methane', 'josh.r', 'miss-islington']
    pr_nums = ['6386', '6393', '6394']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue33231'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    @methane methane added 3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Apr 5, 2018
    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Apr 5, 2018

    Patch is good, but while we're at it, is there any reason why this multi-allocation design was even used? It PyMem_Mallocs a buffer, makes a C-style string in it, then uses PyUnicode_FromString to convert C-style string to Python str.

    Seems like the correct approach would be to just use PyUnicode_New to preallocate the final string buffer up front, then pull out the internal buffer with PyUnicode_1BYTE_DATA and populate that directly, saving a pointless allocation/deallocation, which also means the failure case means no cleanup needed at all, while barely changing the code (aside from removing the need to explicitly NUL terminate).

    Only reason I can see to avoid this would be if the codec names could contain arbitrary Unicode encoded as UTF-8 (and therefore strlen wouldn't tell you the final length in Unicode ordinals), but I'm pretty sure that's not the case (if it is, we're not normalizing properly, since we only lower case ASCII). If Unicode codec names need to be handled, there are other options, though the easy savings go away.

    @methane
    Copy link
    Member Author

    methane commented Apr 6, 2018

    Only reason I can see to avoid this would be if the codec names could contain arbitrary Unicode encoded as UTF-8 (and therefore strlen wouldn't tell you the final length in Unicode ordinals), but I'm pretty sure that's not the case (if it is, we're not normalizing properly, since we only lower case ASCII). If Unicode codec names need to be handled, there are other options, though the easy savings go away.

    Maybe, we can add "encoding name must be ascii" restriction in future version (3.8+).
    But for now, I want to avoid any potential backward incompatibility.

    @methane
    Copy link
    Member Author

    methane commented Apr 6, 2018

    New changeset 0c1c456 by INADA Naoki in branch 'master':
    bpo-33231: Fix potential leak in normalizestring() (GH-6386)
    0c1c456

    @miss-islington
    Copy link
    Contributor

    New changeset 64421d9 by Miss Islington (bot) in branch '3.7':
    bpo-33231: Fix potential leak in normalizestring() (GH-6386)
    64421d9

    @miss-islington
    Copy link
    Contributor

    New changeset 2350a47 by Miss Islington (bot) in branch '3.6':
    bpo-33231: Fix potential leak in normalizestring() (GH-6386)
    2350a47

    @methane methane closed this as completed Apr 6, 2018
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants