Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in the reset() method of StreamWriter of CJK codecs #67436

Closed
vadmium opened this issue Jan 16, 2015 · 13 comments
Closed

Crash in the reset() method of StreamWriter of CJK codecs #67436

vadmium opened this issue Jan 16, 2015 · 13 comments
Assignees
Labels
topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@vadmium
Copy link
Member

vadmium commented Jan 16, 2015

BPO 23247
Nosy @vstinner, @ezio-melotti, @vadmium, @serhiy-storchaka
Files
  • cjk_codecs_reset.patch
  • fix-multibytecodec-segfault.patch
  • fix-multibytecodec-segfault-with-test.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/vstinner'
    closed_at = <Date 2015-07-16.20:22:58.244>
    created_at = <Date 2015-01-16.06:17:50.827>
    labels = ['expert-unicode', 'type-crash']
    title = 'Crash in the reset() method of StreamWriter of CJK codecs'
    updated_at = <Date 2015-07-16.20:25:47.154>
    user = 'https://github.com/vadmium'

    bugs.python.org fields:

    activity = <Date 2015-07-16.20:25:47.154>
    actor = 'vstinner'
    assignee = 'vstinner'
    closed = True
    closed_date = <Date 2015-07-16.20:22:58.244>
    closer = 'python-dev'
    components = ['Unicode']
    creation = <Date 2015-01-16.06:17:50.827>
    creator = 'martin.panter'
    dependencies = []
    files = ['39932', '39933', '39934']
    hgrepos = []
    issue_num = 23247
    keywords = ['patch']
    message_count = 13.0
    messages = ['234112', '246781', '246782', '246783', '246784', '246785', '246786', '246787', '246801', '246802', '246810', '246818', '246819']
    nosy_count = 6.0
    nosy_names = ['vstinner', 'ezio.melotti', 'python-dev', 'martin.panter', 'serhiy.storchaka', 'Aaron1011']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue23247'
    versions = ['Python 3.4', 'Python 3.5', 'Python 3.6']

    @vadmium
    Copy link
    Member Author

    vadmium commented Jan 16, 2015

    $ python3 -c 'import codecs; from io import BytesIO; codecs.getwriter("big5")(BytesIO()).reset()'
    Segmentation fault (core dumped)
    [Exit 139]

    Happens for all the multibyte codecs:
    broken_stream_codecs = {
    "big5", "big5hkscs", "cp932", "cp949", "cp950",
    "euc_jp", "euc_jis_2004", "euc_jisx0213", "euc_kr",
    "gb2312", "gbk", "gb18030", "hz",
    "iso2022_jp", "iso2022_jp_1", "iso2022_jp_2", "iso2022_jp_2004",
    "iso2022_jp_3", "iso2022_jp_ext", "iso2022_kr",
    "johab", "shift_jis", "shift_jis_2004", "shift_jisx0213",
    }

    These codecs also share the property that their StreamReader.read() methods do not accept the second “chars” parameter:

    >>> codecs.getreader("big5")(BytesIO()).read(1, 1)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: read expected at most 1 arguments, got 2

    @vadmium vadmium added topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump labels Jan 16, 2015
    @serhiy-storchaka serhiy-storchaka self-assigned this Feb 28, 2015
    @Aaron1011
    Copy link
    Mannequin

    Aaron1011 mannequin commented Jul 15, 2015

    This is also present in the latest Python 3.6.

    I'm going to work on providing a patch for this, unless someone else already is

    @vstinner
    Copy link
    Member

    Happens for all the multibyte codecs: (...)

    All these codecs share the same C implementation: the "CJK" codecs.

    The bug was introduced in Python 3.4 by my huge changeset d621bdaed7c3: Issue "bpo-17693: CJK encoders now use the new Unicode API (PEP-393)".

    It looks like there was no test for this specific bug :-/ Calling reset() just after creating a StreamReader object.

    Attached patch should fix the crash.

    @Aaron1011: Can you please try to write a patch adding a unit test to test_codecs reproducing the crash?

    @vstinner vstinner changed the title Multibyte codec StreamWriter.reset() crashes Crash in the reset() method of StreamWriter of CJK codecs Jul 15, 2015
    @vstinner
    Copy link
    Member

    It looks like there was no test for this specific bug :-/ Calling reset() just after creating a StreamReader object.

    Oops: StreamWriter, not StreamReader.

    @vadmium
    Copy link
    Member Author

    vadmium commented Jul 16, 2015

    Perhaps this test case proposed in my patch for bpo-13881 could be useful:

    +def test_writer_reuse(self):
    + """StreamWriter should be reusable after reset"""

    Looks like that is where my “broken_stream_codecs” list from the original post came from.

    @Aaron1011
    Copy link
    Mannequin

    Aaron1011 mannequin commented Jul 16, 2015

    The included patch fixes the issue, and modifies the existing unittest to prevent a future regression.

    The patch corrects an issue where the 'pending' struct field was NULL, but was used as the input to multibytecodec_encode anyay.

    @serhiy-storchaka
    Copy link
    Member

    cjk_codecs_reset.patch LGTM.

    @Aaron1011
    Copy link
    Mannequin

    Aaron1011 mannequin commented Jul 16, 2015

    The patch didn't get attached for some reason. It's attached now.

    @vadmium
    Copy link
    Member Author

    vadmium commented Jul 16, 2015

    Aaron: Your version of the fix immediately returns None, while Victor’s tries to encode an empty string (if I understand it correctly). I imagine this shortcut could be slightly more efficient, but is it always correct?

    In any case, Aaron’s test looks okay to me.

    @vstinner
    Copy link
    Member

    Aaron: Your version of the fix immediately returns None, while Victor’s tries to encode an empty string (if I understand it correctly).

    Aaron patch is better. I misunderstood the code. reset() always return None, the empty string is unused in my patch.

    fix-multibytecodec-segfault.patch: please write a _new_ test mentionning this issue to check for non regression.

    @Aaron1011
    Copy link
    Mannequin

    Aaron1011 mannequin commented Jul 16, 2015

    I've added a test case to exercise reset()

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 16, 2015

    New changeset 35a6fe0e2b27 by Victor Stinner in branch '3.4':
    Closes bpo-23247: Fix a crash in the StreamWriter.reset() of CJK codecs
    https://hg.python.org/cpython/rev/35a6fe0e2b27

    @python-dev python-dev mannequin closed this as completed Jul 16, 2015
    @vstinner
    Copy link
    Member

    @aaron Hill: Thanks for your patch! I only kept the minimal test of your patch. If you want to enhance the test suite, you may write new test to test the behaviour of reset(). I prefer to only commit the minimum patch.

    @martin: Thanks for your report. The issue is now fixed.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants