New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CVE-2012-2135: Vulnerability in the utf-16 decoder after error handling #58784
Comments
In the utf-16 decoder after calling unicode_decode_call_errorhandler aligned_end is not updated. This may potentially cause data leaks, memory damage, and crash. The bug introduced by implementation of the issue bpo-4868. In a similar situation in the utf-8 decoder aligned_end is updated. |
There is the crasher and leaker. When Python is not crashing, there is garbage (i.e. leakage of data) at the end of the decoded string. Indeed, I see an English text in some versions of Python. There are many other errors in utf-16 decoder (see, for example, b'\xD8\x00\xDC'.decode('utf-16be')). I'm now finishing work on a new decoder, and after that take the bug fixing in 3.2. |
Here is the bugs in the utf-16 decoder:
|
The proposed patch will fix only the first of these bugs. The patch in issue bpo-14624 fixes all bugs for Python 3.3. For Python 3.2 soon I will make a patch. |
[moving from Rietveld back to Roundup] On 2012/04/20 11:15:48, storchaka wrote:
How so? The aligned_end *never* points into the unicode object: q = (unsigned char *)s;
e = q + size - 1;
aligned_end = (const unsigned char *) ((size_t) e & ~LONG_PTR_MASK); So aligned_end points into s, not into the unicode object. Why this is relevant to this issue, is unclear to me, though: the ignore handler |
You're right, and my eyes in a lather. Now I saw it. What you have to offer any comment? If someone would correct a comment
I first got the crash using a custom handler, and then I saw that |
"might have changed the input object"
I agree that the change is necessary. It just does not explain why it |
Here is a minimal patch that corrects all bugs for 3.2. As a side effect, decoding is accelerated by 4-8%. |
Now I see the problem: make_decode_exception creates a new bytes object in any case, regardless of whether the error handler will update it or not. Therefore, decoding will continue in this new bytes object. I think the same issue also applies to the ASCII decoder in 3.3. |
No, the ASCII decoder is not affected by this vulnerability. In a loop, |
Here is a patch, which took into account the Martin suggestions. |
Please use CVE-2012-2135 for this issue as per http://www.openwall.com/lists/oss-security/2012/04/25/3 |
I have not tried the patch yet, but modifying the reproducer yields a different crash. This one seems to be a heap-based buffer overflow which is slightly more serious. In the reproducer, you just need to replace ascii() with str(). Again works on python3 only. |
I now write tests and I have a question. Should b'\xd8\x00\x41'.decode('utf-16be', 'replace') to give '\xfffd' or '\xfffd\xfffd'? |
Debian bug-report: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=670389 |
I tested versions 3.1.1, 3.1.2, 3.1.3, 3.1.4 and 3.1.5 and only 3.1.3 crashed with Segmentation fault: Program received signal SIGSEGV, Segmentation fault. (gdb) bt |
I thought it was one error, and not two. The updated patch adds tests and fixes minor mistake. 2.7 is not |
I ran tests of utf16_error_handling-3.2_4.patch on Python 3.1. Two tests are failing:
I don't think that the test is correct: UTF-16 should resynchronize as early as possible (ignore the first invalid byte and restart at the following byte), so '\ufffd\ufffd' is the correct answer. Another examples:
|
UTF-16 units are 16-bit words, not bytes, so '\uffffd' sounds correct to
That's because UTF-8 operates on bytes: the invalid byte is skipped. |
I agree. The only odd case is when the number of bytes is not even |
Please, can anyone do a final review and commit? Here are three patches for three Python versions: 2.7: utf16_error_handling-2.7.patch. Fix for one minor bug (overreading) and tests. 3.2: utf16_error_handling-3.2_4.patch. Fix for one critical security bug (CVE-2012-2135) and several minor bugs, tests. 3.3: utf16_error_handling-3.3.patch. Only tests. |
There are spurious print() calls in the 2.7 patch. |
New changeset 034ff986019d by Antoine Pitrou in branch '3.2': New changeset 118fe0ee6921 by Antoine Pitrou in branch 'default': |
New changeset 4cadf91aaddd by Antoine Pitrou in branch '2.7': |
Thanks for the patches, Serhiy! They're now pushed. |
Oh, my inattentiveness. Thank you for pushing, Antoine. And thank Martin for |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: