Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryError with custom error handlers and multibyte codecs #67404

Closed
alexer mannequin opened this issue Jan 10, 2015 · 5 comments
Closed

MemoryError with custom error handlers and multibyte codecs #67404

alexer mannequin opened this issue Jan 10, 2015 · 5 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@alexer
Copy link
Mannequin

alexer mannequin commented Jan 10, 2015

BPO 23215
Nosy @malemburg, @loewis, @vstinner, @bitdancer, @alexer, @serhiy-storchaka
Files
  • python_codec_crasher.py
  • python_codec_crash_fix.patch
  • python_codec_crash_fix_2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-02-20.23:28:25.263>
    created_at = <Date 2015-01-10.03:33:03.633>
    labels = ['interpreter-core', 'performance']
    title = 'MemoryError with custom error handlers and multibyte codecs'
    updated_at = <Date 2015-02-20.23:28:25.262>
    user = 'https://github.com/alexer'

    bugs.python.org fields:

    activity = <Date 2015-02-20.23:28:25.262>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-02-20.23:28:25.263>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2015-01-10.03:33:03.633>
    creator = 'alexer'
    dependencies = []
    files = ['37659', '37660', '38150']
    hgrepos = []
    issue_num = 23215
    keywords = ['patch']
    message_count = 5.0
    messages = ['233800', '234687', '234689', '236054', '236343']
    nosy_count = 7.0
    nosy_names = ['lemburg', 'loewis', 'vstinner', 'r.david.murray', 'alexer', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue23215'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5']

    @alexer
    Copy link
    Mannequin Author

    alexer mannequin commented Jan 10, 2015

    Using a multibyte codec and a custom error handler that ignores errors to encode a string that contains characters not representable in said encoding causes exponential growth of the output buffer, raising MemoryError.

    The problem is in multibytecodec_encerror() and REQUIRE_ENCODEBUFFER() in Modules/cjkcodecs/multibytecodec.c. multibytecodec_encerror() always uses REQUIRE_ENCODEBUFFER() to ensure there's enough space for the replacement string, and if more space is needed, REQUIRE_ENCODEBUFFER() calls expand_encodebuffer(), which in turn always grows the buffer by at least 50%. However, if size < 1, REQUIRE_ENCODEBUFFER() doesn't check if more space is actually needed. (It's used with negative values in other places)

    I have no idea why the condition was originally size < 1 instead of size < 0, but changing it seems to fix this. The replacement string case is also the only use of the macro that may use 0 as the argument.

    In the patch, I've instead wrapped the REQUIRE_ENCODEBUFFER() (and memcpy) in a if(size > 0), since that's what the corresponding part in multibytecodec_decerror() did in the past:
    https://hg.python.org/cpython/file/1c3f8d044589/Modules/cjkcodecs/multibytecodec.c#l438

    Not sure which one makes more sense.

    As for the tests, I'm not sure if 1) all of the affected encodings should be tested or only one (or even all encodings, affected or not?) and 2) whether it should be a new test or if I should just add it to test_longstrings in Lib/test/test_codeccallbacks.py. (Structurally it's a perfect fit, but it really isn't a "long string" test as it can happen with <50 characters) At the moment, the patch is testing affected encodings in a separate test.

    Is the test philosophy "as thorough as possible" or "as fast as possible"?

    @alexer alexer mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Jan 10, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 25, 2015

    New changeset 3a9b1e5fe179 by R David Murray in branch '3.4':
    bpo-23215: reflow paragraph.
    https://hg.python.org/cpython/rev/3a9b1e5fe179

    New changeset 52a06812d5da by R David Murray in branch 'default':
    Merge: bpo-23215: note that time.sleep affects the current thread only.
    https://hg.python.org/cpython/rev/52a06812d5da

    @bitdancer
    Copy link
    Member

    Oops, typoed the issue number. That should have been 23251.

    @serhiy-storchaka serhiy-storchaka self-assigned this Feb 15, 2015
    @serhiy-storchaka
    Copy link
    Member

    Thank you for your patch Aleksi. It LGTM in general. Updated patch just moves the test to Lib/test/test_multibytecodec.py where it can reuse ALL_CJKENCODINGS and fixes few other minor bugs in multibyte codecs.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 20, 2015

    New changeset af8089217cc6 by Serhiy Storchaka in branch '2.7':
    Issue bpo-23215: Multibyte codecs with custom error handlers that ignores errors
    https://hg.python.org/cpython/rev/af8089217cc6

    New changeset 4dc8b7ed8973 by Serhiy Storchaka in branch '3.4':
    Issue bpo-23215: Multibyte codecs with custom error handlers that ignores errors
    https://hg.python.org/cpython/rev/4dc8b7ed8973

    New changeset 5620691ce26b by Serhiy Storchaka in branch 'default':
    Issue bpo-23215: Multibyte codecs with custom error handlers that ignores errors
    https://hg.python.org/cpython/rev/5620691ce26b

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants