Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in str.decode() with special error handler #67510

Closed
serhiy-storchaka opened this issue Jan 25, 2015 · 7 comments
Closed

Crash in str.decode() with special error handler #67510

serhiy-storchaka opened this issue Jan 25, 2015 · 7 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@serhiy-storchaka
Copy link
Member

BPO 23321
Nosy @vstinner, @serhiy-storchaka
Files
  • unicode_decode_call_errorhandler_writer.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-02-02.11:25:17.715>
    created_at = <Date 2015-01-25.23:16:13.748>
    labels = ['interpreter-core', 'type-crash']
    title = 'Crash in str.decode() with special error handler'
    updated_at = <Date 2015-02-02.11:25:23.701>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2015-02-02.11:25:23.701>
    actor = 'vstinner'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-02-02.11:25:17.715>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2015-01-25.23:16:13.748>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['37861']
    hgrepos = []
    issue_num = 23321
    keywords = ['patch']
    message_count = 7.0
    messages = ['234705', '234707', '234725', '234731', '234783', '235160', '235242']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'Arfrever', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue23321'
    versions = ['Python 3.4', 'Python 3.5']

    @serhiy-storchaka
    Copy link
    Member Author

    Debugging build crashes in some circumstances in str.decode() with error handler which produces replacement string with length larger than malformed data. For example the backslashreplace error handler produces 4-character string for every illegal byte. All other standard error handlers produce no longer than 1 character for every illegal unit.

    Here is a patch which fixes this issue. I'll commit it without review because buildbots are broken without it. This issue is open for reference and post-commit review.

    @serhiy-storchaka serhiy-storchaka self-assigned this Jan 25, 2015
    @serhiy-storchaka serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Jan 25, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 25, 2015

    New changeset 2de90090e486 by Serhiy Storchaka in branch '3.4':
    Issue bpo-23321: Fixed a crash in str.decode() when error handler returned
    https://hg.python.org/cpython/rev/2de90090e486

    New changeset 1cd68b3c46aa by Serhiy Storchaka in branch 'default':
    Issue bpo-23321: Fixed a crash in str.decode() when error handler returned
    https://hg.python.org/cpython/rev/1cd68b3c46aa

    @vstinner
    Copy link
    Member

    Debugging build crashes in some circumstances in str.decode() (...) buildbots are broken without it

    Is it a regression? Would it be possible to identify the changeset
    responsible of the regression?

    @serhiy-storchaka
    Copy link
    Member Author

    I think the changeset which made decoders to use _PyUnicodeWriter (bpo-16311)
    is responsible of the regression.

    For example consider b'\x80abc'.decode('utf-8', 'backslashreplace').

    The writer reserves string buffer with size 4 (every byte produces at most 1
    character). First byte is incorrect and replaced by 4-character string
    '\\x80'. The writer increases min_length but doesn't resize the buffer because
    its size is enough to write replacement string. But following writes of ASCII
    characters cause buffer overflow.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 26, 2015

    New changeset 1e8937861ee3 by Victor Stinner in branch 'default':
    Issue bpo-22286, bpo-23321: Fix failing test on Windows code page 932
    https://hg.python.org/cpython/rev/1e8937861ee3

    @serhiy-storchaka
    Copy link
    Member Author

    If you have no enhancements to my quick fix Victor, may be this issue can be closed.

    @vstinner vstinner closed this as completed Feb 2, 2015
    @vstinner
    Copy link
    Member

    vstinner commented Feb 2, 2015

    I closed the issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants