New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash in the reset() method of StreamWriter of CJK codecs #67436
Comments
$ python3 -c 'import codecs; from io import BytesIO; codecs.getwriter("big5")(BytesIO()).reset()'
Segmentation fault (core dumped)
[Exit 139] Happens for all the multibyte codecs: These codecs also share the property that their StreamReader.read() methods do not accept the second “chars” parameter: >>> codecs.getreader("big5")(BytesIO()).read(1, 1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: read expected at most 1 arguments, got 2 |
This is also present in the latest Python 3.6. I'm going to work on providing a patch for this, unless someone else already is |
All these codecs share the same C implementation: the "CJK" codecs. The bug was introduced in Python 3.4 by my huge changeset d621bdaed7c3: Issue "bpo-17693: CJK encoders now use the new Unicode API (PEP-393)". It looks like there was no test for this specific bug :-/ Calling reset() just after creating a StreamReader object. Attached patch should fix the crash. @Aaron1011: Can you please try to write a patch adding a unit test to test_codecs reproducing the crash? |
Oops: StreamWriter, not StreamReader. |
Perhaps this test case proposed in my patch for bpo-13881 could be useful: +def test_writer_reuse(self): Looks like that is where my “broken_stream_codecs” list from the original post came from. |
The included patch fixes the issue, and modifies the existing unittest to prevent a future regression. The patch corrects an issue where the 'pending' struct field was NULL, but was used as the input to multibytecodec_encode anyay. |
cjk_codecs_reset.patch LGTM. |
The patch didn't get attached for some reason. It's attached now. |
Aaron: Your version of the fix immediately returns None, while Victor’s tries to encode an empty string (if I understand it correctly). I imagine this shortcut could be slightly more efficient, but is it always correct? In any case, Aaron’s test looks okay to me. |
Aaron patch is better. I misunderstood the code. reset() always return None, the empty string is unused in my patch. fix-multibytecodec-segfault.patch: please write a _new_ test mentionning this issue to check for non regression. |
I've added a test case to exercise reset() |
New changeset 35a6fe0e2b27 by Victor Stinner in branch '3.4': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: